Prompt Cost Estimator

Estimate the cost of a prompt from system message, user input, and expected output length. Enter values for instant results with step-by-step formulas.

Reviewed by Daniel Agrici, Founder & Lead Developer

Formula

Cost = (Input Tokens / 1M × Input Rate) + (Output Tokens / 1M × Output Rate)

Input tokens include both your system prompt and user message. Output tokens are the model's response. Total cost per call is the sum of input and output costs at the model's per-million-token rates.

Worked Examples

Example 1: Customer Support Bot Prompt

Problem:System prompt: 200 words. Average user message: 50 words. Expected output: 150 words. Model: GPT-4o. Estimate cost per call and for 1,000 calls.

Solution:Input: (200 + 50) × 1.33 = 333 tokens\nOutput: 150 × 1.33 = 200 tokens\nInput cost: 333/1M × $2.50 = $0.000833\nOutput cost: 200/1M × $10.00 = $0.002000\nTotal per call: $0.002833

Result:$0.0028/call | $2.83 for 1,000 calls

Example 2: Document Analysis Pipeline

Problem:System prompt: 500 words. User message (document): 2,000 words. Expected summary: 300 words. Model: Claude 3.5 Sonnet.

Solution:Input: (500 + 2,000) × 1.35 = 3,375 tokens\nOutput: 300 × 1.35 = 405 tokens\nInput cost: 3,375/1M × $3.00 = $0.010125\nOutput cost: 405/1M × $15.00 = $0.006075\nTotal: $0.016200

Result:$0.0162/call | $16.20 for 1,000 calls

Frequently Asked Questions

What is a system prompt and how does it affect cost?

A system prompt is the initial instruction that sets the AI model's behavior, personality, and constraints. It is sent with every API call and counted as input tokens. Long system prompts (e.g., 2,000+ words) significantly increase costs because those tokens are billed on every single request. Optimizing your system prompt length is one of the easiest ways to reduce API costs.

Does prompt caching reduce costs?

Yes. Both Anthropic and OpenAI offer prompt caching for repeated prefixes (like system prompts). Cached input tokens can cost 50-90% less than uncached tokens. If your system prompt stays the same across requests, prompt caching can substantially reduce your input token costs. The exact savings depend on the provider and cache hit rate.

What is the max_tokens parameter and how does it affect cost?

The max_tokens parameter sets the maximum number of tokens the model can generate in its response. You are only charged for tokens actually generated, not the maximum you set. However, setting a reasonable limit prevents unexpectedly long and expensive responses. For a customer support bot expecting two-sentence answers, setting max_tokens to 200 provides adequate room while preventing runaway costs from verbose responses that could otherwise reach thousands of tokens.

Should I use one large prompt or multiple smaller prompts?

Breaking a complex task into multiple smaller prompts can sometimes be cheaper and produce better results, especially when the system prompt is large and only some subtasks need the full context. However, each additional API call adds latency and a minimum token overhead. For tasks where the entire context is needed, a single prompt is typically more efficient. Use chain-of-thought prompting within a single call for complex reasoning, and reserve multi-step workflows for tasks that naturally decompose into independent subtasks.

References

Reviewed by Daniel Agrici, Founder & Lead Developer · Editorial policy