Question 1

What are Kubernetes resource requests and limits?

Accepted Answer

Resource requests define the minimum amount of CPU and memory that a pod needs to be scheduled on a node. The Kubernetes scheduler uses requests to find a node with sufficient available resources. If no node can satisfy the request, the pod remains in Pending state. Resource limits define the maximum amount of CPU and memory a pod can use. If a pod exceeds its CPU limit, it gets throttled (slowed down) but continues running. If a pod exceeds its memory limit, it gets killed with an OOMKilled (Out Of Memory) status and restarted according to its restart policy. Setting requests too low causes scheduling issues under load, while setting limits too high wastes cluster resources and increases costs.

Question 2

How does CPU throttling work in Kubernetes?

Accepted Answer

CPU throttling in Kubernetes is enforced by the Linux kernel CFS (Completely Fair Scheduler) when a container attempts to use more CPU than its limit allows. CFS operates on a quota and period system, where each container gets a CPU time quota per scheduling period (typically 100 milliseconds). If a container with a 500 millicore limit exhausts its 50ms quota within a 100ms period, it is throttled for the remaining time regardless of available CPU on the node. This can cause latency spikes even when the node has spare CPU capacity. Monitoring the container_cpu_cfs_throttled_seconds_total metric reveals throttling events. Some teams intentionally omit CPU limits to prevent throttling, relying on requests for scheduling while allowing pods to burst freely, though this requires careful capacity planning to avoid noisy neighbor problems.

Question 3

What is resource overcommitment and when is it appropriate?

Accepted Answer

Resource overcommitment occurs when the total resource limits across all pods on a node exceed the node capacity. This is possible because limits represent the maximum a pod might use, not what it typically uses. The overcommit ratio (limits divided by requests) indicates how aggressively resources are overcommitted. A ratio of 2.0 means pods can potentially use twice what they requested. Moderate overcommitment of 1.5 to 2.0 is common and generally safe for CPU because throttling gracefully handles contention. Memory overcommitment is riskier because exceeding limits results in OOMKilled rather than throttling. Production workloads should keep memory overcommit ratios below 1.5. Development and testing environments can safely use higher overcommit ratios since the consequences of occasional OOMKilled events are less severe.

Question 4

What monitoring tools should I use to optimize resource allocation?

Accepted Answer

A comprehensive resource monitoring stack includes several components working together. Metrics Server provides real-time CPU and memory metrics used by kubectl top and the Horizontal Pod Autoscaler. Prometheus scrapes detailed time-series metrics from containers, nodes, and custom application endpoints, storing historical data for trend analysis. Grafana dashboards visualize resource utilization patterns and help identify over-provisioned or under-provisioned pods. The Vertical Pod Autoscaler (VPA) analyzes historical usage and recommends or automatically adjusts resource requests and limits. Kubernetes Resource Report and kubecost provide cost visibility by correlating resource usage with cloud provider pricing. Implement alerting on key metrics like CPU throttling, memory pressure, OOMKilled events, and pod pending duration to proactively identify resource issues before they impact application performance.

Kubernetes Pod Resource Calculator

Formula

Worked Examples

Example 1: Web API Service Sizing

Example 2: Java Microservice with High Memory

Frequently Asked Questions

What are Kubernetes resource requests and limits?

How does CPU throttling work in Kubernetes?

What is resource overcommitment and when is it appropriate?

What monitoring tools should I use to optimize resource allocation?

References