High Availability Considerations

Single Points of Failure:

  • Database services (PostgreSQL, MariaDB) run on manager node only
  • Authentication (Authentik) centralized for security
  • Reverse proxy (Traefik) single instance with restart policies

Resilience Measures:

  • Global agent deployment for management access
  • Multiple replica services where applicable
  • Automatic restart policies on all services
  • Centralized storage prevents data loss during node failures

Load Balancing Approach

Traefik Load Balancing:

  • Round-robin distribution for multi-replica services
  • Health checks ensure traffic only reaches healthy instances
  • SSL termination at proxy level

Service Examples:

# Taylor's Tracker: 3 replicas across worker nodes
replicas: 3
placement:
  constraints: [node.role == worker]

# Critical services: Single replica on manager
replicas: 1
placement:
  constraints: [node.hostname == p0]

Resource Allocation Patterns

Memory Distribution Strategy:

  • Manager Node: Database and authentication workloads
  • Worker Nodes: Application services with lower memory requirements
  • No explicit resource limits: Allows dynamic allocation based on demand

CPU Utilization:

  • Manager Node: 16 cores handle orchestration and intensive services
  • Worker Nodes: 4 cores each for distributed application processing
  • Load balancing: Multiple worker nodes distribute CPU load effectively