SSML Guide: Advanced Voice Control for Developers
Plain text gets you 80% of the way to natural audio. SSML gets you the remaining 20%.
What Is SSML
Speech Synthesis Markup Language is a W3C standard for marking up text with pronunciation hints. Speeko supports the core SSML spec plus provider-specific extensions.
Essential SSML Tags
break
Pause for a specific duration:
<speak>
First sentence. <break time="500ms"/> Second sentence.
</speak>emphasis
Stress specific words:
<speak>
This is <emphasis level="strong">very important</emphasis>.
</speak>Levels: reduced, moderate, strong.
prosody
Control rate, pitch, and volume:
<speak>
<prosody rate="slow" pitch="low">Mysterious whispering</prosody>
<prosody rate="fast" pitch="high">Excited announcement</prosody>
</speak>say-as
Force interpretation of text:
<speak>
Call us at <say-as interpret-as="telephone">1-800-555-0199</say-as>.
Your order <say-as interpret-as="ordinal">1</say-as> has shipped.
Meeting on <say-as interpret-as="date" format="mdy">4/15/2026</say-as>.
</speak>phoneme
Override pronunciation for specific words:
<speak>
The <phoneme alphabet="ipa" ph="ˈtəʊmeɪtəʊ">tomato</phoneme> is ripe.
</speak>sub
Substitute a word's reading:
<speak>
<sub alias="World Wide Web">WWW</sub> is everywhere.
</speak>Practical Patterns
Dynamic pauses based on punctuation
Periods get 300ms, commas get 150ms, em-dashes get 250ms. Tune per-voice.
Numbers and currency
<say-as interpret-as="currency" language="en-US">$1,234.56</say-as>Foreign words
We went to a <lang xml:lang="fr-FR">café</lang>.API Usage
{
"text": "<speak>Hello <emphasis>world</emphasis>.</speak>",
"voice": "af_heart",
"input_type": "ssml"
}Common Mistakes
- Forgetting
<speak>root element → treated as plain text - Over-emphasizing → sounds unnatural
- Using prosody rate outside 0.5x-2.0x → clipped to range
- Phoneme IPA without validation → pronunciation errors
Testing
Iterate in Speeko's playground before deploying. Save prompts that work.