05 cost analysis

PostedApril 21, 2026

UpdatedApril 22, 2026

ByPeter Westerman

id: cost-analysis

title: “Cost Analysis & Token Efficiency ROI”

version: 1.0

last_updated: 2026-02-11

priority: P0

keywords: [“cost”, “ROI”, “token-efficiency”, “savings”]

5) Cost Analysis: LLM Token Efficiency

5.1 Token Cost Comparison

Assumptions:

Claude Sonnet 4.5: $3/MTok input, $15/MTok output
Average query: 150 output tokens
Active user: 20 AI queries/day

Scenario 1: Naive 50-State Context Dump

Per query:

Input: 5,500 tokens (full context)
Output: 150 tokens
Cost: $0.0188 per query

Per user per month:

600 queries × $0.0188 = $11.28/user/month

Scenario 2: Token-Optimized Retrieval

Per query:

Input: 400 tokens (optimized context)
Output: 150 tokens
Cost: $0.00345 per query

Per user per month:

600 queries × $0.00345 = $2.07/user/month

Savings: $9.21/user/month = 82% cost reduction

At scale:

1,000 users: $9,210/month savings
10,000 users: $92,100/month savings

5.2 Pinecone Vector DB Costs

Starter tier: $70/month

1M vectors (enough for ~3,000 embedded chunks)
Unlimited queries
1 pod

Break-even: 8 active users (vs naive approach)

ROI: Immediate positive at >10 users

5.3 Token Budget Management

Per-query token budget: 600 tokens max

Budget allocation:

System prompt: 100 tokens (fixed)
Location context: 50-80 tokens
Retrieved knowledge: 200-300 tokens
Query-specific data: 100-150 tokens
Buffer: 50 tokens

Enforcement:


class TokenBudgetManager {
    let maxInputTokens = 600
    var currentTokens = 0
    
    func addContext(_ text: String, priority: Int) -> Bool {
        let tokens = estimateTokens(text)
        if currentTokens + tokens > maxInputTokens {
            if priority < 3 {
                return false  // Skip low-priority context
            }
        }
        currentTokens += tokens
        return true
    }
    
    func estimateTokens(_ text: String) -> Int {
        // Rough estimate: 1 token ≈ 4 chars
        return text.count / 4
    }
}

Priority levels:

Critical (always include): User location, query-specific data
High (include if space): Retrieved plant/pollinator profiles
Medium (include if space): Regional characteristics
Low (skip if tight): General gardening tips, historical notes

5.4 Caching Strategy for Cost Reduction

Cache hit = zero OpenAI embedding cost

Cache layers:

Local SQLite cache: Pre-computed AI context fragments

Hit rate target: 70%+
Saves: API calls, latency, tokens

Redis cache (optional for cloud sync):

Shared contexts across users in same region
Hit rate target: 40%+

Pinecone metadata cache:

Store common queries with pre-computed results
Hit rate target: 20%+

Example cache keys:


location:30308:context          → "GA, zone 7b, southeast..."
plant:tomato:region:southeast   → "Tomato planting: Feb 18..."
pollinator:region:southeast:spring → "Active pollinators Mar-May..."

Cost savings:

70% cache hit rate = 70% fewer OpenAI embedding calls
Embedding cost: $0.02/1M tokens
1,000 users × 20 queries/day × 300 tokens = 6M tokens/day
Savings: $0.12/day × 70% = $2.52/month (small but adds up)

5.5 Incremental Rollout Cost Management

Phase 1-3 (Weeks 1-10): Foundation + AI Engine

Users: 100 (alpha testers)
Monthly cost: $207 (AI) + $70 (Pinecone) = $277

Phase 4-6 (Weeks 11-18): Caching + Deep Dive

Users: 1,000 (beta)
Monthly cost: $2,070 (AI) + $70 (Pinecone) = $2,140

Phase 7-8 (Weeks 19-24): Optimization + National Launch

Users: 10,000 (GA)
Monthly cost: $20,700 (AI) + $140 (Pinecone, upgraded tier) = $20,840

vs Naive approach at 10K users:

$112,800/month (naive)
$20,840/month (optimized)
Savings: $91,960/month

AI Skill

Product Showcase

ITI Knowledge System

AI Agent

User Guide

Requirements

ScubaGPT

Grateful Dead Chatbot

Farmers Bounty

Technical Document

Answer Engine Optimizer

SEO Optimizer

Travel Planner

Fact Checker

Estate Manager

ITI Operations

ITI Marketing

Patriot University

Personal Assistant