TTS API for Game Development: Dynamic NPC Dialogue Without Voice Actors
62% of indie studios cite voice acting costs as the primary reason they limit or eliminate NPC dialogue. That stat is from the 2023 GDC survey, and nothing has changed the underlying economics since — voice actors for a mid-size RPG can run $50,000–$200,000 depending on union rates, session fees, and how many unique characters you need.
A TTS API doesn't replace voice acting for hero characters. But it solves the 200-NPC problem that every RPG developer has.
What TTS Actually Solves in Games
Three distinct problems:
Procedural dialogue: Your game generates text dynamically — quest updates, item descriptions, world events. You can't pre-record something that doesn't exist yet. TTS generates audio at runtime from any string.
Long-tail NPC coverage: Your main characters have full voice acting. The innkeeper in the third village? The merchant who says four lines? TTS covers the 80% of characters who'd otherwise be silent.
Accessibility narration: Screen reader support for menus, inventory, quest logs. WCAG 2.1 Level AA is increasingly a platform requirement. TTS via API gives you programmatic control over what gets narrated.
Runtime Generation Pattern
Call the TTS API when the NPC speaks, cache the result, play it:
import requests
import hashlib
import os
from pathlib import Path
SPEEKO_KEY = "your-key"
CACHE_DIR = Path("audio_cache")
CACHE_DIR.mkdir(exist_ok=True)
def get_npc_voice(text: str, voice_id: str = "en-US-neural-2") -> Path:
"""Generate or retrieve cached audio for NPC dialogue."""
cache_key = hashlib.md5(f"{voice_id}:{text}".encode()).hexdigest()
cache_path = CACHE_DIR / f"{cache_key}.mp3"
if cache_path.exists():
return cache_path
response = requests.post(
"https://api.speekoapp.com/v1/tts",
headers={"X-API-Key": SPEEKO_KEY, "Content-Type": "application/json"},
json={"text": text, "voice": voice_id, "format": "mp3"},
)
response.raise_for_status()
cache_path.write_bytes(response.content)
return cache_path
# Usage
audio_path = get_npc_voice(
"The bridge washed out three days ago. You'll have to go around.",
voice_id="en-US-neural-3"
)The cache key is a hash of voice + text, so the same line with the same voice always hits disk. Network call only on first encounter. For a 200-NPC game where each NPC has 10 dialogue lines, you're generating 2,000 audio files. At Speeko's rate and average dialogue length (~150 chars/line), that's roughly $9 total.
Voice Assignment Strategy
Different NPCs should sound different. You're not limited to one voice — assign voices from a pool based on character archetype:
VOICE_POOL = {
"elder_male": "en-US-neural-1",
"young_female": "en-US-neural-4",
"merchant": "en-GB-neural-2",
"guard": "en-US-neural-5",
"child": "en-US-neural-6",
}
def get_character_voice(character_archetype: str, text: str) -> Path:
voice_id = VOICE_POOL.get(character_archetype, "en-US-neural-1")
return get_npc_voice(text, voice_id)You can push this further with SSML — add <prosody rate="slow"> for an elder character, <prosody pitch="+2st"> for an excited child. The Speeko API accepts SSML natively.
Pre-Build for Shipped Content
Runtime generation works for dynamic or procedural content. For dialogue that's fixed at ship time, pre-generate everything during your build pipeline:
import json
def prebuild_dialogue(script_file: str):
"""Pre-generate all voiced dialogue from a script JSON."""
with open(script_file) as f:
script = json.load(f)
for character_id, lines in script.items():
archetype = lines["archetype"]
for i, line in enumerate(lines["dialogue"]):
output_path = Path(f"assets/audio/npc/{character_id}_{i:03d}.mp3")
if not output_path.exists():
audio = get_character_voice(archetype, line["text"])
output_path.write_bytes(audio.read_bytes())
print(f"Generated: {output_path}")
prebuild_dialogue("scripts/npc_dialogue.json")Run this as a build step. Ship the MP3s as game assets. Zero API calls at runtime, zero latency, works offline.
Procedural Narration
Some games generate text at runtime — roguelikes, open-world event systems, AI dungeon masters. For those, runtime TTS with caching is the right pattern. But add a queue so you're not blocking gameplay on API calls:
import threading
import queue
dialogue_queue = queue.Queue()
def dialogue_worker():
while True:
character, text, callback = dialogue_queue.get()
audio_path = get_character_voice(character, text)
callback(audio_path)
dialogue_queue.task_done()
# Start worker thread at game init
threading.Thread(target=dialogue_worker, daemon=True).start()
# Queue dialogue without blocking
def npc_speaks(character: str, text: str):
def on_ready(path):
play_audio(path) # your game's audio system
dialogue_queue.put((character, text, on_ready))The main game thread never blocks. Audio generates in the background, plays when ready.
Cost Reality Check
A typical indie RPG with 500 unique NPCs, each with 8 lines of dialogue, averaging 120 characters per line:
- 500 × 8 × 120 = 480,000 characters
- At $0.03/1K chars = $14.40 total
That's the full voice budget for 500 characters. A union voice actor charges $200–500 per hour. Even at the lowest indie rate, $14 doesn't buy you one character.
The tradeoff: TTS voices don't have the emotional range of a skilled actor. For main characters, hire humans. For the other 90% of your cast, TTS is the correct call.
What to Avoid
Don't use TTS for cutscene dialogue or major character moments. Players notice. The uncanny valley in voice acting is real, and a poorly chosen TTS voice in a dramatic scene pulls them out of the story.
Use it where players aren't paying close attention: ambient NPC chatter, merchant idle lines, tutorial prompts, menu narration. The bar is lower there, and TTS clears it comfortably.
Getting Started
Speeko's API costs nothing to start — the free $5 credit covers 167,000 characters, enough to voice your entire first act. Test a few voices, find the right archetype matches, build your cache layer, and integrate it into your build pipeline before worrying about runtime generation.