Kubernetes

Kubernetes Batch Workloads with Kueue: Queues, Quota, and Fairness

Use Kueue to queue Kubernetes batch workloads before they hit the scheduler, so shared cluster quota is admitted deliberately instead of all at once.

July 1, 2026 • Platform Engineering • 9 min read

Cyan glass queueing gate with translucent workload cards waiting to enter a shared Kubernetes cluster

Kubernetes Jobs are blunt instruments. Create one, and Kubernetes tries to run it as soon as there is anywhere to put the Pods. That is fine when one team runs the occasional batch task on a quiet cluster. It is much less fine when half the company drops training jobs, ETL runs, and CI batches onto the same fleet at 09:00.

That is the gap Kueue fills. It is a Kubernetes-native job queueing system — a job-level manager that decides when a job should be admitted to start and when it should stop. The useful mental model is that it adds an admission layer in front of the scheduler, so Kubernetes batch workloads wait for quota before they start consuming nodes.

For platform teams, that matters for two reasons. First, it stops one team’s launch from turning into everyone else’s eviction event. Second, it gives you a place to express policy: who gets access to scarce capacity, which hardware a workload can use, and whether idle quota can be shared or borrowed safely.

Why plain Kubernetes Jobs struggle on a shared cluster

Native Kubernetes Jobs are deliberately simple. A Job says, in effect, “run this until it completes”, and the control plane creates Pods to make that happen. Kubernetes does support suspending a Job, but plain Jobs still do not give you a shared queue, quota accounting, or fairness between teams.

Once the cluster becomes shared infrastructure, the obvious failure modes show up:

every team submits work at once after a deploy, data drop, or retraining window
long-running jobs occupy nodes that short jobs need immediately
GPU-hungry workloads crowd out smaller CPU batches
retries and evictions turn a busy period into a slow, expensive one

None of that is a scheduler bug. Kubernetes is doing what it was asked to do. The missing piece is admission control for batch work before the scheduler starts fighting over Pods.

What Kueue changes in the control flow

Kueue does not replace the Kubernetes scheduler. It sits in front of it.

Kueue runs a two-phase admission cycle. A workload first lands in a LocalQueue, which points at a ClusterQueue. Kueue then reserves quota for that workload and, if configured, waits for any admission checks to pass before admitting it.

In practice, the flow looks like this:

A team submits a Job to a namespace queue.
Kueue creates a matching Workload object and keeps the Job from starting immediately.
The target ClusterQueue checks available quota and resource flavours.
Only when quota is available does Kueue admit the workload and let the scheduler place the Pods.

That small change has a big effect. Jobs stop competing for capacity before there is a sensible place for them to run.

The Kueue objects that matter

Kueue’s model is straightforward once you separate queueing from scheduling.

LocalQueue is namespace-scoped. It is the queue a team submits work into.
ClusterQueue is cluster-scoped. It governs a pool of resources such as CPU, memory, Pods, and accelerators, and it is where quota and fair sharing rules live.
ResourceFlavor describes a variation of a resource and can map it to nodes through labels, taints, and tolerations.
Workload is Kueue’s internal record of the admitted-or-waiting batch job.

A minimal setup for GPU jobs might look like this:

apiVersion: kueue.x-k8s.io/v1beta2
kind: ResourceFlavor
metadata:
  name: gpu
spec:
  nodeLabels:
    accelerator: nvidia
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: ClusterQueue
metadata:
  name: batch-gpu
spec:
  namespaceSelector: {}
  resourceGroups:
    - coveredResources: ['cpu', 'memory', 'nvidia.com/gpu']
      flavors:
        - name: gpu
          resources:
            - name: cpu
              nominalQuota: 40
            - name: memory
              nominalQuota: 160Gi
            - name: nvidia.com/gpu
              nominalQuota: 8
---
apiVersion: kueue.x-k8s.io/v1beta2
kind: LocalQueue
metadata:
  name: batch
  namespace: training
spec:
  clusterQueue: batch-gpu

And the Job submitted to that queue looks like this:

apiVersion: batch/v1
kind: Job
metadata:
  name: train-model
  namespace: training
  labels:
    kueue.x-k8s.io/queue-name: batch
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: trainer
          image: ghcr.io/example/trainer:latest
          resources:
            requests:
              cpu: '4'
              memory: 16Gi
              nvidia.com/gpu: '1'

Two details are worth calling out.

First, Kueue uses the kueue.x-k8s.io/queue-name label to select the LocalQueue. Second, you do not need to hand-set suspend: true on every Job; Kueue can manage Job suspension through its webhook and start the workload when admission succeeds.

A concrete example: taming a shared GPU queue

Imagine two namespaces sharing an eight-GPU pool. Team A submits eight one-GPU training jobs. Team B submits a two-GPU evaluation run five minutes later.

Without queueing, all of those Pods rush straight at the cluster. Some will schedule, some will sit pending, and the outcome depends on timing, retries, and whatever else is competing for nodes at the time. Operationally, it feels like a race.

With Kueue, both teams still submit normal Jobs, but the cluster now behaves like a service with rules. The ClusterQueue knows there are only eight GPUs to hand out. It admits workloads when quota is genuinely available instead of letting every Pod pile into the scheduler and discover the shortage the hard way.

That is the practical value of ClusterQueue: it turns “first one to create Pods wins” into a deliberate quota decision. If you later decide that some queues may borrow idle quota, or that certain jobs should prefer one hardware flavour over another, you add that policy to the queueing layer rather than hoping conventions survive a busy day.

Why this is better than “just add more nodes”

When teams first hit this problem, the instinct is often to add more worker nodes. Sometimes that is the right answer. Often it is only half an answer.

If every job starts at once, more nodes only mean a larger stampede. Kueue is useful precisely because it works with the rest of the Kubernetes stack instead of pretending to replace it. Its own overview is explicit: Kueue does not replace existing Kubernetes components. Autoscaling, pod-to-node scheduling, and job lifecycle stay with Cluster Autoscaler, kube-scheduler, and kube-controller-manager respectively.

That gives you a saner operating model:

short jobs do not disappear behind long ones that already consumed the fleet
shared GPU pools are used intentionally rather than opportunistically
quota becomes visible policy instead of social convention
autoscaling reacts to admitted demand instead of a wall of immediately pending work
retries and evictions drop because jobs are not admitted before there is room

On a shared batch platform, that usually means fewer half-finished runs and less time spent explaining why a queue of apparently healthy Pods still did not get anywhere.

What Kueue is not

Kueue is not a replacement for the scheduler, and it is not a node autoscaler.

The scheduler still decides where Pods land once a workload is admitted.
Cluster Autoscaler or Karpenter still handles node supply.
Kueue decides whether a batch workload should enter the cluster yet.

That separation is exactly why it is useful. You usually want all three, not one tool pretending to do the work of the others.

Kueue also does not fix bad resource requests or bad manifests. If a Job asks for the wrong shape of CPU, memory, or GPU, it can still wait forever or fail immediately after admission. Queueing makes scheduling more orderly; it does not make a broken workload healthy.

Where Kueue fits best for Kubernetes batch workloads

Kueue is most useful when the cluster is shared and batch work is genuinely competitive:

ML training or evaluation on a GPU pool
data processing jobs with uneven runtime
CI workloads that arrive in waves
platform teams that need fair access to limited capacity across namespaces

It is much less interesting for a single team with exclusive ownership of a cluster. If nobody else is competing for quota, admission control is mostly ceremony.

The practical takeaway

If your Kubernetes batch workloads all start the moment they are created, the platform is treating them as immediate work, not queued work. Kueue changes that. It gives you a clean admission layer, queue-aware quota, and a better fairness story for clusters where “run now” is often the wrong default.

For platform teams, that is the real payoff: fewer surprise evictions, fewer overloaded nodes, and a batch system that behaves more like a managed service than a race condition.