High Availability Considerations
Single Points of Failure:
- Database services (PostgreSQL, MariaDB) run on manager node only
- Authentication (Authentik) centralized for security
- Reverse proxy (Traefik) single instance with restart policies
Resilience Measures:
- Global agent deployment for management access
- Multiple replica services where applicable
- Automatic restart policies on all services
- Centralized storage prevents data loss during node failures
Load Balancing Approach
Traefik Load Balancing:
- Round-robin distribution for multi-replica services
- Health checks ensure traffic only reaches healthy instances
- SSL termination at proxy level
Service Examples:
# Taylor's Tracker: 3 replicas across worker nodes
replicas: 3
placement:
constraints: [node.role == worker]
# Critical services: Single replica on manager
replicas: 1
placement:
constraints: [node.hostname == p0]
Resource Allocation Patterns
Memory Distribution Strategy:
- Manager Node: Database and authentication workloads
- Worker Nodes: Application services with lower memory requirements
- No explicit resource limits: Allows dynamic allocation based on demand
CPU Utilization:
- Manager Node: 16 cores handle orchestration and intensive services
- Worker Nodes: 4 cores each for distributed application processing
- Load balancing: Multiple worker nodes distribute CPU load effectively