Reduce TTS Costs with Smart Caching Strategies

Posted on February 23, 2026
By Speeko Team
optimizationcachingcost-reductionarchitecture

Reduce TTS Costs with Smart Caching Strategies

Speeko is already 90% cheaper than competitors. Add caching, and you reduce costs by another 80%.

The Caching Case

Most TTS requests are repeats. Common phrases (welcome messages, error prompts, category announcements) are generated thousands of times. Generate once, serve forever.

Cache Key Design

Hash the input parameters that affect output:

const crypto = require('crypto');

function cacheKey(text, voice, format, speed) {
  const input = JSON.stringify({ text, voice, format, speed });
  return crypto.createHash('sha256').update(input).digest('hex');
}

Different inputs → different keys. Same inputs → same key → cache hit.

Cache Tiers

L1: Application memory (LRU)

const LRU = require('lru-cache');
const cache = new LRU({ max: 1000, ttl: 1000 * 60 * 60 });

Nanosecond access. Small capacity. Lost on restart.

L2: Redis

const redis = require('redis');
const client = redis.createClient();

async function getAudio(key) {
  const cached = await client.get(key);
  if (cached) return Buffer.from(cached, 'base64');
  return null;
}

Millisecond access. Network-attached. Shared across instances.

L3: Object storage + CDN

Store generated audio in S3/R2/GCS, serve via CloudFront/Cloudflare. Globally distributed, cheap, durable.

Complete Flow

async function generateOrFetch(text, voice) {
  const key = cacheKey(text, voice);

  if (memCache.has(key)) return memCache.get(key);

  const redisHit = await redis.get(key);
  if (redisHit) {
    memCache.set(key, redisHit);
    return redisHit;
  }

  const s3Url = `https://cdn.example.com/tts/${key}.mp3`;
  if (await s3Exists(s3Url)) {
    const audio = await fetch(s3Url).then(r => r.buffer());
    await redis.set(key, audio, 'EX', 3600);
    return audio;
  }

  const audio = await speeko.tts(text, voice);
  await s3Upload(s3Url, audio);
  await redis.set(key, audio, 'EX', 3600);
  memCache.set(key, audio);
  return audio;
}

Cache Invalidation

When should you invalidate?

  • Voice model upgrades (rare, announced by Speeko)
  • Brand voice change (deliberate)
  • Content errors in source text

Otherwise: cache forever.

Cost Example

IVR with 10,000 calls/day, 5 prompts per call, 100 characters each:

  • Without caching: 50M characters/day × $0.03/1K = $1,500/day
  • With caching: 500 characters one-time generation = $0.015 once

Monthly savings: $45,000.

Anti-Patterns

  • Caching keys that don't include voice/format (serves wrong audio)
  • Unbounded cache (memory exhaustion)
  • Caching one-off user inputs (cache pollution)

Start optimizing.