Reduce TTS Costs with Smart Caching Strategies
Speeko is already 90% cheaper than competitors. Add caching, and you reduce costs by another 80%.
The Caching Case
Most TTS requests are repeats. Common phrases (welcome messages, error prompts, category announcements) are generated thousands of times. Generate once, serve forever.
Cache Key Design
Hash the input parameters that affect output:
const crypto = require('crypto');
function cacheKey(text, voice, format, speed) {
const input = JSON.stringify({ text, voice, format, speed });
return crypto.createHash('sha256').update(input).digest('hex');
}Different inputs → different keys. Same inputs → same key → cache hit.
Cache Tiers
L1: Application memory (LRU)
const LRU = require('lru-cache');
const cache = new LRU({ max: 1000, ttl: 1000 * 60 * 60 });Nanosecond access. Small capacity. Lost on restart.
L2: Redis
const redis = require('redis');
const client = redis.createClient();
async function getAudio(key) {
const cached = await client.get(key);
if (cached) return Buffer.from(cached, 'base64');
return null;
}Millisecond access. Network-attached. Shared across instances.
L3: Object storage + CDN
Store generated audio in S3/R2/GCS, serve via CloudFront/Cloudflare. Globally distributed, cheap, durable.
Complete Flow
async function generateOrFetch(text, voice) {
const key = cacheKey(text, voice);
if (memCache.has(key)) return memCache.get(key);
const redisHit = await redis.get(key);
if (redisHit) {
memCache.set(key, redisHit);
return redisHit;
}
const s3Url = `https://cdn.example.com/tts/${key}.mp3`;
if (await s3Exists(s3Url)) {
const audio = await fetch(s3Url).then(r => r.buffer());
await redis.set(key, audio, 'EX', 3600);
return audio;
}
const audio = await speeko.tts(text, voice);
await s3Upload(s3Url, audio);
await redis.set(key, audio, 'EX', 3600);
memCache.set(key, audio);
return audio;
}Cache Invalidation
When should you invalidate?
- Voice model upgrades (rare, announced by Speeko)
- Brand voice change (deliberate)
- Content errors in source text
Otherwise: cache forever.
Cost Example
IVR with 10,000 calls/day, 5 prompts per call, 100 characters each:
- Without caching: 50M characters/day × $0.03/1K = $1,500/day
- With caching: 500 characters one-time generation = $0.015 once
Monthly savings: $45,000.
Anti-Patterns
- Caching keys that don't include voice/format (serves wrong audio)
- Unbounded cache (memory exhaustion)
- Caching one-off user inputs (cache pollution)