Full notes: GKE Compute Classes - Detailed Explanation β
Key Concepts
What Are Compute Classes?
Compute Classes are a GKE Custom Resource (CRD) that define priority-ordered sets of node pools with autoscaling settings. A ComputeClass spec declares priority-based node pool selection, fallback behavior when preferred resources are unavailable, and active migration to automatically move workloads to higher-priority nodes when capacity returns. When a pod selects a ComputeClass via nodeSelector with cloud.google.com/compute-class, GKE walks the priority list top-to-bottom, scheduling on the highest-priority pool available and falling back as needed.
Cluster Implementation (citadel-dev Example)
The citadel-2g-dev-tokyo-01 cluster defines three custom ComputeClasses:
| ComputeClass | Workload Type | Priority 1 (Cheapest) | Priority 2 (Fallback) | Priority 3 (Last Resort) |
|---|---|---|---|---|
citadel-default-cc | Shared | t2d-32 Spot (2 pools) | t2d-32 On-demand | n2d-32 Spot |
citadel-mercari-api-cc | Mercari API | t2d-16 Spot (2 pools) | t2d-32 On-demand | n2d-32 Spot |
citadel-mercari-eaas-cc | Elasticsearch | t2d-32 Spot | t2d-32 On-demand | n2d-64 Spot |
Each node pool carries a label (cloud.google.com/compute-class: <name>) and a NoSchedule taint matching the compute class name, ensuring only pods that tolerate the taint land on those nodes. The API and EaaS pools also carry additional taints like node-pool-id and machine-series for further isolation. All three classes set activeMigration.optimizeRulePriority: true to auto-migrate pods back to cheaper pools when spot capacity recovers.
How Workloads Use Compute Classes
Method 1 β Explicit Node Selector: Workloads set nodeSelector: cloud.google.com/compute-class: <name> and a matching NoSchedule toleration in their pod spec. Example: Elasticsearch workloads explicitly select citadel-mercari-eaas-cc.
Method 2 β Automatic Assignment via KubeMod: KubeMod ModRules automatically inject compute class selectors and tolerations into pods that lack them. For citadel-default-cc, four ModRules cover cases: no affinity at all, affinity but no nodeAffinity, nodeAffinity but no required rules, and only availability-type affinity. For citadel-mercari-api-cc, two ModRules match pods with nodeAffinity or nodeSelector targeting d-mercari-api.
Cost Optimization Strategy
The priority system encodes a cost ladder: Spot T2D VMs first (cheapest), On-demand T2D second (reliable fallback), Spot N2D third (different availability pool as a last resort). Active migration ensures workloads drift back to cheaper tiers automatically without manual intervention.
Architecture
citadel-2g-dev-tokyo-01
|
+--------------------+--------------------+
| | |
citadel-default-cc citadel-mercari-api-cc citadel-mercari-eaas-cc
(Shared Pools) (Dedicated API) (Elasticsearch)
| | |
P1: t2d Spot P1: t2d-16 Spot P1: t2d-32 Spot
P2: t2d On-demand P2: t2d-32 OD P2: t2d-32 OD
P3: n2d Spot P3: n2d-32 Spot P3: n2d-64 Spot
Quick Reference
| Component | Location |
|---|---|
| Node Pool Definitions | microservices-terraform/.../terragrunt.hcl |
| ComputeClass CRDs | microservices-kubernetes/.../ComputeClass/ |
| KubeMod ModRules | microservices-kubernetes/.../ModRule/ |
| ES Compute Class Config | kit/pkg/elasticsearch/elasticsearch-spec.cue |
Pod assignment flow:
| Method | How |
|---|---|
| Explicit | nodeSelector: cloud.google.com/compute-class: <name> + matching toleration |
| Automatic | KubeMod ModRules inject selector/toleration for pods without affinity rules |
Key Takeaways
- Compute Classes are a cost optimization tool β they encode a preference order from cheapest to most reliable node pools
- Each node pool gets a label and a NoSchedule taint matching the compute class name, ensuring pods only land on intended pools
- Active migration means automatic cost savings without manual intervention when spot capacity fluctuates
- KubeMod handles the βdefaultβ case so teams donβt need to add compute class config to every workload
- Dedicated pools (API, EaaS) carry extra taints for workload isolation beyond just the compute class
- The microservices-ci repo has no compute class configs β CI/CD pipelines apply the Terraform and K8s manifests that contain the definitions