Question 1

What components make up total API latency?

Accepted Answer

Total API latency comprises several distinct components that accumulate sequentially and in parallel. DNS resolution (1-50ms) translates the domain name to an IP address. TLS handshake (1-3 RTTs for initial connection) establishes an encrypted connection. Network round-trip time (RTT) varies from 1ms for same-datacenter calls to 200+ms for cross-continent requests. Server processing includes request parsing, business logic execution, and response serialization. Database queries add their own latency depending on query complexity and data volume. Data transfer time depends on payload size and available bandwidth. For multi-hop architectures where APIs call other APIs, each hop adds another RTT plus processing time. Connection pooling and keep-alive connections eliminate DNS and TLS overhead for subsequent requests on established connections.

Question 2

What is the difference between P50, P95, P99, and P99.9 latency percentiles?

Accepted Answer

Latency percentiles describe the distribution of response times across all requests. P50 (median) is the latency that 50 percent of requests complete within, representing the typical user experience. P95 means 95 percent of requests complete within this time, capturing the experience of most users including occasional slow requests. P99 captures the experience of 1 in 100 requests, which for a service handling millions of requests affects thousands of users daily. P99.9 represents the worst 0.1 percent of requests and is important for critical services where even rare slow responses have business impact. In practice, P99 latency is often 2-5x higher than P50 due to garbage collection pauses, cache misses, database lock contention, and network jitter. SLAs are typically defined at P95 or P99 rather than average latency because averages hide problematic tail latency that affects real users.

Question 3

How does TLS handshake overhead impact API latency?

Accepted Answer

The TLS handshake adds significant overhead to the first request on a new connection. A full TLS 1.2 handshake requires 2 round trips between client and server, adding 2x RTT to latency. TLS 1.3 improved this to a single round trip for the initial handshake. For resumed connections, TLS 1.3 supports 0-RTT resumption, eliminating handshake latency entirely for returning clients. The overhead is most impactful for short-lived connections and high-RTT scenarios. With 200ms RTT, a TLS 1.2 handshake adds 400ms, which can dominate total latency. HTTP/2 and HTTP/3 multiplexing help by maintaining persistent connections that reuse the initial TLS session. In practice, connection pooling and keep-alive settings ensure that most requests in production use established connections, limiting TLS overhead to the initial connection or periodic renegotiation.

Question 4

How do I optimize API latency for high-throughput services?

Accepted Answer

High-throughput API optimization requires addressing bottlenecks at every layer. At the network layer, use HTTP/2 or HTTP/3 for multiplexed connections, enable compression (gzip or brotli) for text payloads, and minimize payload sizes by returning only necessary fields. At the application layer, implement efficient serialization formats like Protocol Buffers or MessagePack instead of JSON for internal APIs, use asynchronous processing for non-critical operations, and optimize hot code paths identified through profiling. At the database layer, add appropriate indexes, implement query result caching with Redis, and use connection pooling to eliminate connection establishment overhead. Architecture-level optimizations include deploying services geographically close to their consumers, using CDNs for cacheable responses, and implementing circuit breakers to prevent cascading failures from slow dependencies.

API Latency Calculator

Formula

Worked Examples

Example 1: Cross-Region REST API Call

Example 2: Same-Datacenter Microservice Call

Frequently Asked Questions

What components make up total API latency?

What is the difference between P50, P95, P99, and P99.9 latency percentiles?

How does TLS handshake overhead impact API latency?

How do I optimize API latency for high-throughput services?

References