Kubernetes
Kubernetes Batch Workloads with Kueue: Queues, Quota, and Fairness
Use Kueue to queue Kubernetes batch workloads before they hit the scheduler, so shared cluster quota is admitted deliberately instead of all at once.
Kubernetes Jobs are blunt instruments. Create one, and Kubernetes tries to run it as soon as there is anywhere to put the Pods. That is fine when one team runs the occasional batch task on a quiet cluster. It is much less fine when half the company drops training jobs, ETL runs, and CI batches onto the same fleet at 09:00.
That is the gap Kueue fills. It is a Kubernetes-native job queueing system — a job-level manager that decides when a job should be admitted to start and when it should stop. The useful mental model is that it adds an admission layer in front of the scheduler, so Kubernetes batch workloads wait for quota before they start consuming nodes.
For platform teams, that matters for two reasons. First, it stops one team’s launch from turning into everyone else’s eviction event. Second, it gives you a place to express policy: who gets access to scarce capacity, which hardware a workload can use, and whether idle quota can be shared or borrowed safely.
Why plain Kubernetes Jobs struggle on a shared cluster
Native Kubernetes Jobs are deliberately simple. A Job says, in effect, “run this until it completes”, and the control plane creates Pods to make that happen. Kubernetes does support suspending a Job, but plain Jobs still do not give you a shared queue, quota accounting, or fairness between teams.
Once the cluster becomes shared infrastructure, the obvious failure modes show up:
- every team submits work at once after a deploy, data drop, or retraining window
- long-running jobs occupy nodes that short jobs need immediately
- GPU-hungry workloads crowd out smaller CPU batches
- retries and evictions turn a busy period into a slow, expensive one
None of that is a scheduler bug. Kubernetes is doing what it was asked to do. The missing piece is admission control for batch work before the scheduler starts fighting over Pods.
What Kueue changes in the control flow
Kueue does not replace the Kubernetes scheduler. It sits in front of it.
Kueue runs a two-phase admission cycle. A workload first lands in a LocalQueue, which points at a ClusterQueue. Kueue then reserves quota for that workload and, if configured, waits for any admission checks to pass before admitting it.
In practice, the flow looks like this:
- A team submits a Job to a namespace queue.
- Kueue creates a matching
Workloadobject and keeps the Job from starting immediately. - The target
ClusterQueuechecks available quota and resource flavours. - Only when quota is available does Kueue admit the workload and let the scheduler place the Pods.
That small change has a big effect. Jobs stop competing for capacity before there is a sensible place for them to run.
The Kueue objects that matter
Kueue’s model is straightforward once you separate queueing from scheduling.
LocalQueueis namespace-scoped. It is the queue a team submits work into.ClusterQueueis cluster-scoped. It governs a pool of resources such as CPU, memory, Pods, and accelerators, and it is where quota and fair sharing rules live.ResourceFlavordescribes a variation of a resource and can map it to nodes through labels, taints, and tolerations.Workloadis Kueue’s internal record of the admitted-or-waiting batch job.
A minimal setup for GPU jobs might look like this:
apiVersion: kueue.x-k8s.io/v1beta2
kind: ResourceFlavor
metadata:
name: gpu
spec:
nodeLabels:
accelerator: nvidia
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: ClusterQueue
metadata:
name: batch-gpu
spec:
namespaceSelector: {}
resourceGroups:
- coveredResources: ['cpu', 'memory', 'nvidia.com/gpu']
flavors:
- name: gpu
resources:
- name: cpu
nominalQuota: 40
- name: memory
nominalQuota: 160Gi
- name: nvidia.com/gpu
nominalQuota: 8
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: LocalQueue
metadata:
name: batch
namespace: training
spec:
clusterQueue: batch-gpu
And the Job submitted to that queue looks like this:
apiVersion: batch/v1
kind: Job
metadata:
name: train-model
namespace: training
labels:
kueue.x-k8s.io/queue-name: batch
spec:
template:
spec:
restartPolicy: Never
containers:
- name: trainer
image: ghcr.io/example/trainer:latest
resources:
requests:
cpu: '4'
memory: 16Gi
nvidia.com/gpu: '1'
Two details are worth calling out.
First, Kueue uses the kueue.x-k8s.io/queue-name label to select the LocalQueue. Second, you do not need to hand-set suspend: true on every Job; Kueue can manage Job suspension through its webhook and start the workload when admission succeeds.
A concrete example: taming a shared GPU queue
Imagine two namespaces sharing an eight-GPU pool. Team A submits eight one-GPU training jobs. Team B submits a two-GPU evaluation run five minutes later.
Without queueing, all of those Pods rush straight at the cluster. Some will schedule, some will sit pending, and the outcome depends on timing, retries, and whatever else is competing for nodes at the time. Operationally, it feels like a race.
With Kueue, both teams still submit normal Jobs, but the cluster now behaves like a service with rules. The ClusterQueue knows there are only eight GPUs to hand out. It admits workloads when quota is genuinely available instead of letting every Pod pile into the scheduler and discover the shortage the hard way.
That is the practical value of ClusterQueue: it turns “first one to create Pods wins” into a deliberate quota decision. If you later decide that some queues may borrow idle quota, or that certain jobs should prefer one hardware flavour over another, you add that policy to the queueing layer rather than hoping conventions survive a busy day.
Why this is better than “just add more nodes”
When teams first hit this problem, the instinct is often to add more worker nodes. Sometimes that is the right answer. Often it is only half an answer.
If every job starts at once, more nodes only mean a larger stampede. Kueue is useful precisely because it works with the rest of the Kubernetes stack instead of pretending to replace it. Its own overview is explicit: Kueue does not replace existing Kubernetes components. Autoscaling, pod-to-node scheduling, and job lifecycle stay with Cluster Autoscaler, kube-scheduler, and kube-controller-manager respectively.
That gives you a saner operating model:
- short jobs do not disappear behind long ones that already consumed the fleet
- shared GPU pools are used intentionally rather than opportunistically
- quota becomes visible policy instead of social convention
- autoscaling reacts to admitted demand instead of a wall of immediately pending work
- retries and evictions drop because jobs are not admitted before there is room
On a shared batch platform, that usually means fewer half-finished runs and less time spent explaining why a queue of apparently healthy Pods still did not get anywhere.
What Kueue is not
Kueue is not a replacement for the scheduler, and it is not a node autoscaler.
- The scheduler still decides where Pods land once a workload is admitted.
- Cluster Autoscaler or Karpenter still handles node supply.
- Kueue decides whether a batch workload should enter the cluster yet.
That separation is exactly why it is useful. You usually want all three, not one tool pretending to do the work of the others.
Kueue also does not fix bad resource requests or bad manifests. If a Job asks for the wrong shape of CPU, memory, or GPU, it can still wait forever or fail immediately after admission. Queueing makes scheduling more orderly; it does not make a broken workload healthy.
Where Kueue fits best for Kubernetes batch workloads
Kueue is most useful when the cluster is shared and batch work is genuinely competitive:
- ML training or evaluation on a GPU pool
- data processing jobs with uneven runtime
- CI workloads that arrive in waves
- platform teams that need fair access to limited capacity across namespaces
It is much less interesting for a single team with exclusive ownership of a cluster. If nobody else is competing for quota, admission control is mostly ceremony.
The practical takeaway
If your Kubernetes batch workloads all start the moment they are created, the platform is treating them as immediate work, not queued work. Kueue changes that. It gives you a clean admission layer, queue-aware quota, and a better fairness story for clusters where “run now” is often the wrong default.
For platform teams, that is the real payoff: fewer surprise evictions, fewer overloaded nodes, and a batch system that behaves more like a managed service than a race condition.