Skip to main content
< All Topics
Print

05 cost analysis

id: cost-analysis

title: “Cost Analysis & Token Efficiency ROI”

version: 1.0

last_updated: 2026-02-11

priority: P0

keywords: [“cost”, “ROI”, “token-efficiency”, “savings”]

5) Cost Analysis: LLM Token Efficiency

5.1 Token Cost Comparison

Assumptions:

  • Claude Sonnet 4.5: $3/MTok input, $15/MTok output
  • Average query: 150 output tokens
  • Active user: 20 AI queries/day

Scenario 1: Naive 50-State Context Dump

Per query:

  • Input: 5,500 tokens (full context)
  • Output: 150 tokens
  • Cost: $0.0188 per query

Per user per month:

  • 600 queries × $0.0188 = $11.28/user/month

Scenario 2: Token-Optimized Retrieval

Per query:

  • Input: 400 tokens (optimized context)
  • Output: 150 tokens
  • Cost: $0.00345 per query

Per user per month:

  • 600 queries × $0.00345 = $2.07/user/month

Savings: $9.21/user/month = 82% cost reduction

At scale:

  • 1,000 users: $9,210/month savings
  • 10,000 users: $92,100/month savings

5.2 Pinecone Vector DB Costs

Starter tier: $70/month

  • 1M vectors (enough for ~3,000 embedded chunks)
  • Unlimited queries
  • 1 pod

Break-even: 8 active users (vs naive approach)

ROI: Immediate positive at >10 users

5.3 Token Budget Management

Per-query token budget: 600 tokens max

Budget allocation:

  • System prompt: 100 tokens (fixed)
  • Location context: 50-80 tokens
  • Retrieved knowledge: 200-300 tokens
  • Query-specific data: 100-150 tokens
  • Buffer: 50 tokens

Enforcement:


class TokenBudgetManager {
    let maxInputTokens = 600
    var currentTokens = 0
    
    func addContext(_ text: String, priority: Int) -> Bool {
        let tokens = estimateTokens(text)
        if currentTokens + tokens > maxInputTokens {
            if priority < 3 {
                return false  // Skip low-priority context
            }
        }
        currentTokens += tokens
        return true
    }
    
    func estimateTokens(_ text: String) -> Int {
        // Rough estimate: 1 token ≈ 4 chars
        return text.count / 4
    }
}

Priority levels:

  1. Critical (always include): User location, query-specific data
  2. High (include if space): Retrieved plant/pollinator profiles
  3. Medium (include if space): Regional characteristics
  4. Low (skip if tight): General gardening tips, historical notes

5.4 Caching Strategy for Cost Reduction

Cache hit = zero OpenAI embedding cost

Cache layers:

  1. Local SQLite cache: Pre-computed AI context fragments
  • Hit rate target: 70%+
  • Saves: API calls, latency, tokens
  1. Redis cache (optional for cloud sync):
  • Shared contexts across users in same region
  • Hit rate target: 40%+
  1. Pinecone metadata cache:
  • Store common queries with pre-computed results
  • Hit rate target: 20%+

Example cache keys:


location:30308:context          → "GA, zone 7b, southeast..."
plant:tomato:region:southeast   → "Tomato planting: Feb 18..."
pollinator:region:southeast:spring → "Active pollinators Mar-May..."

Cost savings:

  • 70% cache hit rate = 70% fewer OpenAI embedding calls
  • Embedding cost: $0.02/1M tokens
  • 1,000 users × 20 queries/day × 300 tokens = 6M tokens/day
  • Savings: $0.12/day × 70% = $2.52/month (small but adds up)

5.5 Incremental Rollout Cost Management

Phase 1-3 (Weeks 1-10): Foundation + AI Engine

  • Users: 100 (alpha testers)
  • Monthly cost: $207 (AI) + $70 (Pinecone) = $277

Phase 4-6 (Weeks 11-18): Caching + Deep Dive

  • Users: 1,000 (beta)
  • Monthly cost: $2,070 (AI) + $70 (Pinecone) = $2,140

Phase 7-8 (Weeks 19-24): Optimization + National Launch

  • Users: 10,000 (GA)
  • Monthly cost: $20,700 (AI) + $140 (Pinecone, upgraded tier) = $20,840

vs Naive approach at 10K users:

  • $112,800/month (naive)
  • $20,840/month (optimized)
  • Savings: $91,960/month

Table of Contents