API Latency Calculator
Calculate expected API latency from network RTT, processing time, and payload size. Enter values for instant results with step-by-step formulas.
Formula
Total Latency = DNS + TLS_Overhead + RTT + Processing + DB_Query + Transfer_Time + Hop_Overhead
Total latency sums all sequential components: DNS lookup, TLS handshake overhead (2x RTT for TLS 1.2), network round-trip time, server processing, database query time, data transfer time (payload_size / bandwidth), and additional overhead per service hop. Percentiles are estimated using multipliers over the baseline: P95 = 1.8x, P99 = 2.5x, P99.9 = 4.0x.
Worked Examples
Example 1: Cross-Region REST API Call
Problem: An API request from US East to US West has 60ms RTT, 100ms server processing, 30ms database query, 50 KB payload on 100 Mbps bandwidth, with TLS handshake, single hop, and 100 concurrent requests.
Solution: DNS lookup = 5ms\nTLS overhead = 60 x 2 = 120ms\nNetwork RTT = 60ms\nServer processing = 100ms\nDB query = 30ms\nTransfer time = (50 x 8) / (100 x 1000) x 1000 = 4ms\nTotal latency = 5 + 120 + 60 + 100 + 30 + 4 = 319ms\nP95 = 319 x 1.8 = 574ms\nP99 = 319 x 2.5 = 798ms\nTTFB = 5 + 120 + 60 + 100 = 285ms\nMax RPS = 1000/319 x 100 = 313
Result: 319ms total latency | 285ms TTFB | P99 at 798ms | 313 max RPS | Network is 59% of total
Example 2: Same-Datacenter Microservice Call
Problem: Internal API call with 1ms RTT, 15ms processing, 5ms DB query, 5 KB payload on 10 Gbps network, no TLS, 3 hops, 500 concurrent requests.
Solution: DNS lookup = 5ms\nTLS overhead = 0ms (no TLS)\nNetwork RTT = 1ms\nServer processing = 15ms\nDB query = 5ms\nTransfer time = (5 x 8) / (10000 x 1000) x 1000 = 0.004ms\nSingle hop = 5 + 0 + 1 + 15 + 5 + 0 = 26ms\nAdditional hops = 2 x (1 + 10) = 22ms\nTotal = 26 + 22 = 48ms\nMax RPS = 1000/48 x 500 = 10,416
Result: 48ms total (3 hops) | 21ms TTFB | P99 at 120ms | 10,416 max RPS | Server processing dominates
Frequently Asked Questions
What components make up total API latency?
Total API latency comprises several distinct components that accumulate sequentially and in parallel. DNS resolution (1-50ms) translates the domain name to an IP address. TLS handshake (1-3 RTTs for initial connection) establishes an encrypted connection. Network round-trip time (RTT) varies from 1ms for same-datacenter calls to 200+ms for cross-continent requests. Server processing includes request parsing, business logic execution, and response serialization. Database queries add their own latency depending on query complexity and data volume. Data transfer time depends on payload size and available bandwidth. For multi-hop architectures where APIs call other APIs, each hop adds another RTT plus processing time. Connection pooling and keep-alive connections eliminate DNS and TLS overhead for subsequent requests on established connections.
What is the difference between P50, P95, P99, and P99.9 latency percentiles?
Latency percentiles describe the distribution of response times across all requests. P50 (median) is the latency that 50 percent of requests complete within, representing the typical user experience. P95 means 95 percent of requests complete within this time, capturing the experience of most users including occasional slow requests. P99 captures the experience of 1 in 100 requests, which for a service handling millions of requests affects thousands of users daily. P99.9 represents the worst 0.1 percent of requests and is important for critical services where even rare slow responses have business impact. In practice, P99 latency is often 2-5x higher than P50 due to garbage collection pauses, cache misses, database lock contention, and network jitter. SLAs are typically defined at P95 or P99 rather than average latency because averages hide problematic tail latency that affects real users.
How does TLS handshake overhead impact API latency?
The TLS handshake adds significant overhead to the first request on a new connection. A full TLS 1.2 handshake requires 2 round trips between client and server, adding 2x RTT to latency. TLS 1.3 improved this to a single round trip for the initial handshake. For resumed connections, TLS 1.3 supports 0-RTT resumption, eliminating handshake latency entirely for returning clients. The overhead is most impactful for short-lived connections and high-RTT scenarios. With 200ms RTT, a TLS 1.2 handshake adds 400ms, which can dominate total latency. HTTP/2 and HTTP/3 multiplexing help by maintaining persistent connections that reuse the initial TLS session. In practice, connection pooling and keep-alive settings ensure that most requests in production use established connections, limiting TLS overhead to the initial connection or periodic renegotiation.
How do I optimize API latency for high-throughput services?
High-throughput API optimization requires addressing bottlenecks at every layer. At the network layer, use HTTP/2 or HTTP/3 for multiplexed connections, enable compression (gzip or brotli) for text payloads, and minimize payload sizes by returning only necessary fields. At the application layer, implement efficient serialization formats like Protocol Buffers or MessagePack instead of JSON for internal APIs, use asynchronous processing for non-critical operations, and optimize hot code paths identified through profiling. At the database layer, add appropriate indexes, implement query result caching with Redis, and use connection pooling to eliminate connection establishment overhead. Architecture-level optimizations include deploying services geographically close to their consumers, using CDNs for cacheable responses, and implementing circuit breakers to prevent cascading failures from slow dependencies.
What is the impact of payload size on API response time?
Payload size affects latency through transfer time and serialization overhead. Transfer time is calculated as payload size divided by available bandwidth, which is negligible for small payloads on fast networks but significant for large responses on slower connections. A 1 MB JSON response takes 80ms to transfer on a 100 Mbps connection but 8 seconds on a 1 Mbps mobile connection. Serialization and deserialization of large payloads also consume CPU time on both server and client. JSON parsing of a 1 MB payload can take 10-50ms depending on the parser and hardware. Pagination is essential for endpoints that could return large datasets, typically limiting responses to 50-100 items per page. Compression reduces transfer time by 60-80 percent for text-based formats but adds 1-5ms of CPU overhead for compression and decompression. GraphQL helps by allowing clients to request only the specific fields they need, reducing unnecessary data transfer.
How does database query latency contribute to API response time?
Database queries often represent 30-60 percent of total server-side processing time in data-driven APIs. Simple indexed lookups typically complete in 1-5ms, while complex joins, aggregations, or full-table scans can take hundreds of milliseconds or even seconds. Connection acquisition from the pool adds 0.1-5ms depending on pool utilization. Network latency between the application server and database server adds another 0.5-5ms for same-datacenter deployments. The most impactful optimizations are proper indexing (which can reduce query time by 100-1000x), query result caching for frequently accessed data, and avoiding the N+1 query problem where an API endpoint executes one query per item in a list instead of a single batch query. Read replicas can distribute query load and reduce latency for read-heavy workloads by serving queries from a replica geographically closer to the application server.