Skip to main content

API Latency Calculator

Calculate expected API latency from network RTT, processing time, and payload size. Enter values for instant results with step-by-step formulas.

Skip to calculator
Computer & IT

API Latency Calculator

Calculate expected API response latency from network RTT, server processing time, database queries, payload size, and TLS overhead. Estimate percentile latencies and maximum throughput.

Last updated: December 2025

Calculator

Adjust values & calculate
50 ms
1
Total API Latency
289.0 ms
TTFB: 255.0 ms | Under load: 317.6 ms
P50
289.0ms
P95
520.2ms
P99
722.5ms
P99.9
1156.0ms
Latency Breakdown
55.0% Net
45.0% Server
Max RPS
346
Throughput
16.9 MB/s
TLS Overhead
100.0ms
Tip: Use HTTP/2 with connection pooling to eliminate TLS overhead on subsequent requests. Deploy services closer to consumers to reduce RTT.
Your Result
289.0ms latency | 255.0ms TTFB | 346 max RPS
Share Your Result
Understand the Math

Formula

Total Latency = DNS + TLS_Overhead + RTT + Processing + DB_Query + Transfer_Time + Hop_Overhead

Total latency sums all sequential components: DNS lookup, TLS handshake overhead (2x RTT for TLS 1.2), network round-trip time, server processing, database query time, data transfer time (payload_size / bandwidth), and additional overhead per service hop. Percentiles are estimated using multipliers over the baseline: P95 = 1.8x, P99 = 2.5x, P99.9 = 4.0x.

Last reviewed: December 2025

Worked Examples

Example 1: Cross-Region REST API Call

An API request from US East to US West has 60ms RTT, 100ms server processing, 30ms database query, 50 KB payload on 100 Mbps bandwidth, with TLS handshake, single hop, and 100 concurrent requests.
Solution:
DNS lookup = 5ms TLS overhead = 60 x 2 = 120ms Network RTT = 60ms Server processing = 100ms DB query = 30ms Transfer time = (50 x 8) / (100 x 1000) x 1000 = 4ms Total latency = 5 + 120 + 60 + 100 + 30 + 4 = 319ms P95 = 319 x 1.8 = 574ms P99 = 319 x 2.5 = 798ms TTFB = 5 + 120 + 60 + 100 = 285ms Max RPS = 1000/319 x 100 = 313
Result: 319ms total latency | 285ms TTFB | P99 at 798ms | 313 max RPS | Network is 59% of total

Example 2: Same-Datacenter Microservice Call

Internal API call with 1ms RTT, 15ms processing, 5ms DB query, 5 KB payload on 10 Gbps network, no TLS, 3 hops, 500 concurrent requests.
Solution:
DNS lookup = 5ms TLS overhead = 0ms (no TLS) Network RTT = 1ms Server processing = 15ms DB query = 5ms Transfer time = (5 x 8) / (10000 x 1000) x 1000 = 0.004ms Single hop = 5 + 0 + 1 + 15 + 5 + 0 = 26ms Additional hops = 2 x (1 + 10) = 22ms Total = 26 + 22 = 48ms Max RPS = 1000/48 x 500 = 10,416
Result: 48ms total (3 hops) | 21ms TTFB | P99 at 120ms | 10,416 max RPS | Server processing dominates
Expert Insights

Background & Theory

The API Latency Calculator applies the following established principles and formulas. Computers represent all information using binary, a base-2 number system consisting solely of the digits 0 and 1, each called a bit. Because long binary strings are unwieldy, programmers routinely use octal (base 8) and hexadecimal (base 16) as compact shorthand. Converting between bases follows a consistent algorithm: divide the source number repeatedly by the target base, collecting remainders in reverse order. Hexadecimal digits A through F represent the values 10 through 15, allowing a single character to encode four binary bits, making it the preferred notation for memory addresses, color codes, and bytecode. Bitwise operations manipulate individual bits within integers. AND produces a 1 only when both input bits are 1, making it useful for masking. OR produces a 1 when either bit is 1 and is used for combining flags. XOR flips bits that differ, enabling simple toggle logic and efficient swap algorithms. NOT inverts every bit (one's complement), while left and right shifts multiply or divide by powers of two in constant time. Data storage units ascend in binary multiples of 1024: 8 bits form one byte, 1024 bytes form one kibibyte (KiB), 1024 KiB form one mebibyte (MiB), and so forth. Hard-drive manufacturers historically use decimal prefixes (1 KB = 1000 bytes), creating the persistent confusion between binary and decimal interpretations of the same label. The IEC standardized the binary prefixes KiB, MiB, GiB, and TiB in 1998 to resolve this ambiguity. Network bandwidth is measured in bits per second (bps), most commonly megabits per second (Mbps) or gigabits per second (Gbps). A 100 Mbps connection transfers 100 million bits every second, equating to roughly 12.5 megabytes per second. IP subnet masks define network boundaries; CIDR notation appends a prefix length (e.g., /24) to an address, indicating how many leading bits are fixed. A /24 subnet contains 256 addresses with 254 usable hosts. Algorithm efficiency is described using Big-O notation, which characterises the worst-case growth of time or space relative to input size. O(1) is constant, O(log n) is logarithmic (binary search), O(n) is linear, and O(nยฒ) is quadratic. Cryptographic hash functions like SHA-256 produce a fixed 256-bit (32-byte) digest regardless of input length. File compression algorithms exploit statistical redundancy to reduce storage footprint, and compression ratio equals the original file size divided by the compressed size.

History

The history behind the API Latency Calculator traces back through the following developments. The conceptual foundation of modern computing traces back to Charles Babbage, whose Analytical Engine design of 1837 introduced the idea of a general-purpose mechanical computer with separate storage and processing units, including what he called the Store and the Mill. Ada Lovelace wrote what many consider the first algorithm intended for machine execution while annotating a translation of Luigi Menabrea's account of Babbage's work, also recognising the machine's potential to manipulate symbols beyond mere numbers. George Boole published "The Laws of Thought" in 1854, formalising a two-valued algebra of logic that would later map perfectly to electrical circuits. It remained largely a mathematical curiosity until Claude Shannon's landmark 1937 master's thesis demonstrated that Boolean algebra could describe switching circuits, laying the theoretical groundwork for all digital electronics. Shannon's 1948 paper "A Mathematical Theory of Communication" defined the bit as the fundamental unit of information and established information theory as a rigorous discipline. The same year, the transistor was invented at Bell Labs by Bardeen, Brattain, and Shockley, eventually replacing vacuum tubes and enabling miniaturisation at scale. ENIAC, completed in 1945, was one of the first general-purpose electronic computers, occupying 1800 square feet and consuming 150 kilowatts of power while performing roughly 5000 additions per second. The ASCII standard was ratified in 1963, assigning 7-bit codes to 128 characters and enabling interoperability between computers from different manufacturers. Through the 1970s, the microprocessor consolidated an entire CPU onto a single chip; Intel's 4004 in 1971 marked the beginning of this trend. The Apple II launched in 1977 and the IBM PC in 1981 brought computing to homes and offices, triggering a mass-market software industry. Tim Berners-Lee proposed the World Wide Web in 1989 and launched the first website in 1991 at CERN, transforming the internet from an academic and military network into a global information infrastructure. Mobile computing accelerated through the 2000s with smartphones integrating powerful processors, wireless networking, and GPS into pocket-sized devices, extending computation into every facet of daily life and cementing TCP/IP as the universal communications fabric.

Share this calculator

Explore More

Frequently Asked Questions

Total API latency comprises several distinct components that accumulate sequentially and in parallel. DNS resolution (1-50ms) translates the domain name to an IP address. TLS handshake (1-3 RTTs for initial connection) establishes an encrypted connection. Network round-trip time (RTT) varies from 1ms for same-datacenter calls to 200+ms for cross-continent requests. Server processing includes request parsing, business logic execution, and response serialization. Database queries add their own latency depending on query complexity and data volume. Data transfer time depends on payload size and available bandwidth. For multi-hop architectures where APIs call other APIs, each hop adds another RTT plus processing time. Connection pooling and keep-alive connections eliminate DNS and TLS overhead for subsequent requests on established connections.
Latency percentiles describe the distribution of response times across all requests. P50 (median) is the latency that 50 percent of requests complete within, representing the typical user experience. P95 means 95 percent of requests complete within this time, capturing the experience of most users including occasional slow requests. P99 captures the experience of 1 in 100 requests, which for a service handling millions of requests affects thousands of users daily. P99.9 represents the worst 0.1 percent of requests and is important for critical services where even rare slow responses have business impact. In practice, P99 latency is often 2-5x higher than P50 due to garbage collection pauses, cache misses, database lock contention, and network jitter. SLAs are typically defined at P95 or P99 rather than average latency because averages hide problematic tail latency that affects real users.
The TLS handshake adds significant overhead to the first request on a new connection. A full TLS 1.2 handshake requires 2 round trips between client and server, adding 2x RTT to latency. TLS 1.3 improved this to a single round trip for the initial handshake. For resumed connections, TLS 1.3 supports 0-RTT resumption, eliminating handshake latency entirely for returning clients. The overhead is most impactful for short-lived connections and high-RTT scenarios. With 200ms RTT, a TLS 1.2 handshake adds 400ms, which can dominate total latency. HTTP/2 and HTTP/3 multiplexing help by maintaining persistent connections that reuse the initial TLS session. In practice, connection pooling and keep-alive settings ensure that most requests in production use established connections, limiting TLS overhead to the initial connection or periodic renegotiation.
High-throughput API optimization requires addressing bottlenecks at every layer. At the network layer, use HTTP/2 or HTTP/3 for multiplexed connections, enable compression (gzip or brotli) for text payloads, and minimize payload sizes by returning only necessary fields. At the application layer, implement efficient serialization formats like Protocol Buffers or MessagePack instead of JSON for internal APIs, use asynchronous processing for non-critical operations, and optimize hot code paths identified through profiling. At the database layer, add appropriate indexes, implement query result caching with Redis, and use connection pooling to eliminate connection establishment overhead. Architecture-level optimizations include deploying services geographically close to their consumers, using CDNs for cacheable responses, and implementing circuit breakers to prevent cascading failures from slow dependencies.
Payload size affects latency through transfer time and serialization overhead. Transfer time is calculated as payload size divided by available bandwidth, which is negligible for small payloads on fast networks but significant for large responses on slower connections. A 1 MB JSON response takes 80ms to transfer on a 100 Mbps connection but 8 seconds on a 1 Mbps mobile connection. Serialization and deserialization of large payloads also consume CPU time on both server and client. JSON parsing of a 1 MB payload can take 10-50ms depending on the parser and hardware. Pagination is essential for endpoints that could return large datasets, typically limiting responses to 50-100 items per page. Compression reduces transfer time by 60-80 percent for text-based formats but adds 1-5ms of CPU overhead for compression and decompression. GraphQL helps by allowing clients to request only the specific fields they need, reducing unnecessary data transfer.
Database queries often represent 30-60 percent of total server-side processing time in data-driven APIs. Simple indexed lookups typically complete in 1-5ms, while complex joins, aggregations, or full-table scans can take hundreds of milliseconds or even seconds. Connection acquisition from the pool adds 0.1-5ms depending on pool utilization. Network latency between the application server and database server adds another 0.5-5ms for same-datacenter deployments. The most impactful optimizations are proper indexing (which can reduce query time by 100-1000x), query result caching for frequently accessed data, and avoiding the N+1 query problem where an API endpoint executes one query per item in a list instead of a single batch query. Read replicas can distribute query load and reduce latency for read-heavy workloads by serving queries from a replica geographically closer to the application server.
Educational Note: This calculator is provided for educational and informational purposes. Results are based on the formulas and inputs provided. Always verify important calculations independently. NovaCalculator processes calculator inputs client-side; optional analytics follow visitor consent settings. ยฉ 2024โ€“2026 NovaCalculator.

Share this calculator

Formula

Total Latency = DNS + TLS_Overhead + RTT + Processing + DB_Query + Transfer_Time + Hop_Overhead

Total latency sums all sequential components: DNS lookup, TLS handshake overhead (2x RTT for TLS 1.2), network round-trip time, server processing, database query time, data transfer time (payload_size / bandwidth), and additional overhead per service hop. Percentiles are estimated using multipliers over the baseline: P95 = 1.8x, P99 = 2.5x, P99.9 = 4.0x.

Worked Examples

Example 1: Cross-Region REST API Call

Problem: An API request from US East to US West has 60ms RTT, 100ms server processing, 30ms database query, 50 KB payload on 100 Mbps bandwidth, with TLS handshake, single hop, and 100 concurrent requests.

Solution: DNS lookup = 5ms\nTLS overhead = 60 x 2 = 120ms\nNetwork RTT = 60ms\nServer processing = 100ms\nDB query = 30ms\nTransfer time = (50 x 8) / (100 x 1000) x 1000 = 4ms\nTotal latency = 5 + 120 + 60 + 100 + 30 + 4 = 319ms\nP95 = 319 x 1.8 = 574ms\nP99 = 319 x 2.5 = 798ms\nTTFB = 5 + 120 + 60 + 100 = 285ms\nMax RPS = 1000/319 x 100 = 313

Result: 319ms total latency | 285ms TTFB | P99 at 798ms | 313 max RPS | Network is 59% of total

Example 2: Same-Datacenter Microservice Call

Problem: Internal API call with 1ms RTT, 15ms processing, 5ms DB query, 5 KB payload on 10 Gbps network, no TLS, 3 hops, 500 concurrent requests.

Solution: DNS lookup = 5ms\nTLS overhead = 0ms (no TLS)\nNetwork RTT = 1ms\nServer processing = 15ms\nDB query = 5ms\nTransfer time = (5 x 8) / (10000 x 1000) x 1000 = 0.004ms\nSingle hop = 5 + 0 + 1 + 15 + 5 + 0 = 26ms\nAdditional hops = 2 x (1 + 10) = 22ms\nTotal = 26 + 22 = 48ms\nMax RPS = 1000/48 x 500 = 10,416

Result: 48ms total (3 hops) | 21ms TTFB | P99 at 120ms | 10,416 max RPS | Server processing dominates

Frequently Asked Questions

What components make up total API latency?

Total API latency comprises several distinct components that accumulate sequentially and in parallel. DNS resolution (1-50ms) translates the domain name to an IP address. TLS handshake (1-3 RTTs for initial connection) establishes an encrypted connection. Network round-trip time (RTT) varies from 1ms for same-datacenter calls to 200+ms for cross-continent requests. Server processing includes request parsing, business logic execution, and response serialization. Database queries add their own latency depending on query complexity and data volume. Data transfer time depends on payload size and available bandwidth. For multi-hop architectures where APIs call other APIs, each hop adds another RTT plus processing time. Connection pooling and keep-alive connections eliminate DNS and TLS overhead for subsequent requests on established connections.

What is the difference between P50, P95, P99, and P99.9 latency percentiles?

Latency percentiles describe the distribution of response times across all requests. P50 (median) is the latency that 50 percent of requests complete within, representing the typical user experience. P95 means 95 percent of requests complete within this time, capturing the experience of most users including occasional slow requests. P99 captures the experience of 1 in 100 requests, which for a service handling millions of requests affects thousands of users daily. P99.9 represents the worst 0.1 percent of requests and is important for critical services where even rare slow responses have business impact. In practice, P99 latency is often 2-5x higher than P50 due to garbage collection pauses, cache misses, database lock contention, and network jitter. SLAs are typically defined at P95 or P99 rather than average latency because averages hide problematic tail latency that affects real users.

How does TLS handshake overhead impact API latency?

The TLS handshake adds significant overhead to the first request on a new connection. A full TLS 1.2 handshake requires 2 round trips between client and server, adding 2x RTT to latency. TLS 1.3 improved this to a single round trip for the initial handshake. For resumed connections, TLS 1.3 supports 0-RTT resumption, eliminating handshake latency entirely for returning clients. The overhead is most impactful for short-lived connections and high-RTT scenarios. With 200ms RTT, a TLS 1.2 handshake adds 400ms, which can dominate total latency. HTTP/2 and HTTP/3 multiplexing help by maintaining persistent connections that reuse the initial TLS session. In practice, connection pooling and keep-alive settings ensure that most requests in production use established connections, limiting TLS overhead to the initial connection or periodic renegotiation.

How do I optimize API latency for high-throughput services?

High-throughput API optimization requires addressing bottlenecks at every layer. At the network layer, use HTTP/2 or HTTP/3 for multiplexed connections, enable compression (gzip or brotli) for text payloads, and minimize payload sizes by returning only necessary fields. At the application layer, implement efficient serialization formats like Protocol Buffers or MessagePack instead of JSON for internal APIs, use asynchronous processing for non-critical operations, and optimize hot code paths identified through profiling. At the database layer, add appropriate indexes, implement query result caching with Redis, and use connection pooling to eliminate connection establishment overhead. Architecture-level optimizations include deploying services geographically close to their consumers, using CDNs for cacheable responses, and implementing circuit breakers to prevent cascading failures from slow dependencies.

What is the impact of payload size on API response time?

Payload size affects latency through transfer time and serialization overhead. Transfer time is calculated as payload size divided by available bandwidth, which is negligible for small payloads on fast networks but significant for large responses on slower connections. A 1 MB JSON response takes 80ms to transfer on a 100 Mbps connection but 8 seconds on a 1 Mbps mobile connection. Serialization and deserialization of large payloads also consume CPU time on both server and client. JSON parsing of a 1 MB payload can take 10-50ms depending on the parser and hardware. Pagination is essential for endpoints that could return large datasets, typically limiting responses to 50-100 items per page. Compression reduces transfer time by 60-80 percent for text-based formats but adds 1-5ms of CPU overhead for compression and decompression. GraphQL helps by allowing clients to request only the specific fields they need, reducing unnecessary data transfer.

How does database query latency contribute to API response time?

Database queries often represent 30-60 percent of total server-side processing time in data-driven APIs. Simple indexed lookups typically complete in 1-5ms, while complex joins, aggregations, or full-table scans can take hundreds of milliseconds or even seconds. Connection acquisition from the pool adds 0.1-5ms depending on pool utilization. Network latency between the application server and database server adds another 0.5-5ms for same-datacenter deployments. The most impactful optimizations are proper indexing (which can reduce query time by 100-1000x), query result caching for frequently accessed data, and avoiding the N+1 query problem where an API endpoint executes one query per item in a list instead of a single batch query. Read replicas can distribute query load and reduce latency for read-heavy workloads by serving queries from a replica geographically closer to the application server.

References

Reviewed by Daniel Agrici, Founder & Lead Developer ยท Editorial policy