Reduce TTS Costs with Smart Caching Strategies

Speeko is already 90% cheaper than competitors. Add caching, and you reduce costs by another 80%.

The Caching Case

Most TTS requests are repeats. Common phrases (welcome messages, error prompts, category announcements) are generated thousands of times. Generate once, serve forever.

Cache Key Design

Hash the input parameters that affect output:

const crypto = require('crypto');

function cacheKey(text, voice, format, speed) {
  const input = JSON.stringify({ text, voice, format, speed });
  return crypto.createHash('sha256').update(input).digest('hex');
}

Different inputs → different keys. Same inputs → same key → cache hit.

Cache Tiers

L1: Application memory (LRU)

const LRU = require('lru-cache');
const cache = new LRU({ max: 1000, ttl: 1000 * 60 * 60 });

Nanosecond access. Small capacity. Lost on restart.

L2: Redis

const redis = require('redis');
const client = redis.createClient();

async function getAudio(key) {
  const cached = await client.get(key);
  if (cached) return Buffer.from(cached, 'base64');
  return null;
}

Millisecond access. Network-attached. Shared across instances.

L3: Object storage + CDN

Store generated audio in S3/R2/GCS, serve via CloudFront/Cloudflare. Globally distributed, cheap, durable.

Complete Flow

async function generateOrFetch(text, voice) {
  const key = cacheKey(text, voice);

  if (memCache.has(key)) return memCache.get(key);

  const redisHit = await redis.get(key);
  if (redisHit) {
    memCache.set(key, redisHit);
    return redisHit;
  }

  const s3Url = `https://cdn.example.com/tts/${key}.mp3`;
  if (await s3Exists(s3Url)) {
    const audio = await fetch(s3Url).then(r => r.buffer());
    await redis.set(key, audio, 'EX', 3600);
    return audio;
  }

  const audio = await speeko.tts(text, voice);
  await s3Upload(s3Url, audio);
  await redis.set(key, audio, 'EX', 3600);
  memCache.set(key, audio);
  return audio;
}

Cache Invalidation

When should you invalidate?

Voice model upgrades (rare, announced by Speeko)
Brand voice change (deliberate)
Content errors in source text

Otherwise: cache forever.

Cost Example

IVR with 10,000 calls/day, 5 prompts per call, 100 characters each:

Without caching: 50M characters/day × $0.03/1K = $1,500/day
With caching: 500 characters one-time generation = $0.015 once

Monthly savings: $45,000.

Anti-Patterns

Caching keys that don't include voice/format (serves wrong audio)
Unbounded cache (memory exhaustion)
Caching one-off user inputs (cache pollution)

Start optimizing.

Reduce TTS Costs with Smart Caching Strategies

Reduce TTS Costs with Smart Caching Strategies

The Caching Case

Cache Key Design

Cache Tiers

L1: Application memory (LRU)

L2: Redis

L3: Object storage + CDN

Complete Flow

Cache Invalidation

Cost Example

Anti-Patterns

Related articles

Cross-Platform Voice Integration: Building Unified Voice Experiences Across Web, iOS, and Android

Mobile Voice Integration Best Practices: Optimization, Battery Efficiency, and Network Constraints