Google Cloud Text to Speech Pricing: What You'll Actually Pay
Google Cloud TTS has one of the most generous free tiers in the market. But navigating the pricing tiers — Standard, WaveNet, Neural2, Studio — is more complicated than it looks. The "free" label obscures real friction points, and the cost model beyond the free tier has hidden complexity. Here's exactly what you'll pay, and when alternatives are worth considering.
Google Cloud TTS Pricing Tiers (2026)
| Voice Type | Free Tier | Pay-as-you-go |
|---|---|---|
| Standard voices | 4M chars/month | $0.004/1K chars |
| WaveNet voices | 1M chars/month | $0.016/1K chars |
| Neural2 voices | 1M chars/month | $0.016/1K chars |
| Studio voices | 100 chars/month | $0.160/1K chars |
The free tier resets monthly. After the free limit, you're billed per thousand characters. The free tier reset is both a benefit (it renews) and a cost (unused characters disappear at month-end).
The Catch: Billing Account Required
Google Cloud TTS requires a billing account — a valid credit card — even to use the free tier. You can stay under the free limit and never be charged, but you must add payment information and accept Google's billing terms.
This is a meaningful friction point for:
- Developers who want to prototype without committing financial information
- Teams where credit card authorization requires finance department approval
- Individual developers evaluating before requesting a purchase order
You cannot make your first Google Cloud TTS API call without a billing account. This is different from Speeko, where you get $5 free credit with no card required.
WaveNet vs Neural2 vs Studio: Which to Use
Standard voices ($0.004/1K chars) Concatenative synthesis — the technology from 15 years ago. Noticeably robotic. Not recommended for user-facing applications. The 4M free characters/month are useful for internal tools or pipeline testing where voice quality doesn't matter.
WaveNet voices ($0.016/1K chars) Google's first neural voice technology. Natural-sounding, broad language support. 1M free characters/month. The right choice for most production applications.
Neural2 voices ($0.016/1K chars) Google's newer neural technology, trained with a larger dataset. Slightly more natural than WaveNet, particularly for longer texts. Same price as WaveNet. Newer voices are Neural2 by default.
Studio voices ($0.160/1K chars, only 100 free chars/month) Premium quality, specifically designed for long-form content like audiobooks and podcasts. 10× more expensive than WaveNet/Neural2. The 100 free characters is barely a paragraph — essentially not a real free tier. At $160/1M chars, Studio is cost-prohibitive for high-volume use cases.
For production use: choose Neural2 (or WaveNet). Studio is for specialized high-quality audio where cost isn't the primary concern.
Real-World Monthly Costs (WaveNet/Neural2)
These calculations account for the 1M free character tier:
| Monthly Usage | Free Tier Applied | Billable Chars | Monthly Cost |
|---|---|---|---|
| 500K chars | Covered by free | 0 | $0 |
| 1M chars | Covered by free | 0 | $0 |
| 1.5M chars | 1M free | 500K billable | $8.00 |
| 2M chars | 1M free | 1M billable | $16.00 |
| 5M chars | 1M free | 4M billable | $64.00 |
| 10M chars | 1M free | 9M billable | $144.00 |
| 50M chars | 1M free | 49M billable | $784.00 |
Note that the free tier saves $16/month at the 2M character level, but this savings decreases as a percentage of total cost as volume grows. At 50M chars, the free tier saves $16 on an $800 bill — about 2%.
The Setup Process
Getting Google Cloud TTS working is a multi-step process:
- Create a Google Cloud account
- Create a GCP project
- Enable the Cloud Text-to-Speech API (in the API library)
- Set up billing (enter credit card)
- Create a service account with the appropriate IAM role
- Download the JSON key file
- Set
GOOGLE_APPLICATION_CREDENTIALSenvironment variable - Install the SDK:
pip install google-cloud-texttospeech - Write code using the SDK
This typically takes 30-60 minutes for a developer unfamiliar with GCP. Here's what the code looks like:
from google.cloud import texttospeech
import os
# Requires GOOGLE_APPLICATION_CREDENTIALS environment variable set
client = texttospeech.TextToSpeechClient()
synthesis_input = texttospeech.SynthesisInput(text="Your text here")
voice = texttospeech.VoiceSelectionParams(
language_code="en-US",
name="en-US-Neural2-C", # Neural2 female voice
ssml_gender=texttospeech.SsmlVoiceGender.FEMALE,
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
response = client.synthesize_speech(
input=synthesis_input,
voice=voice,
audio_config=audio_config
)
with open("output.mp3", "wb") as f:
f.write(response.audio_content)Compare to Speeko:
curl -X POST https://api.speekoapp.com/v1/tts \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text": "Your text here", "voice": "af_sarah", "format": "mp3"}' \
--output output.mp3One is 12 steps and an SDK install. The other is one command. If you're not already on GCP, the operational cost of Google Cloud TTS setup is material.
Voice Selection on Google Cloud TTS
Google Cloud TTS has a large voice catalog — 380+ voices across 50+ languages. Finding the right voice requires:
- Browsing the Google Cloud TTS voice list
- Using the demo interface or making test API calls
- Understanding the naming convention:
{language}-{type}-{letter}e.g.en-US-Neural2-C
The voice selection process is more involved than simpler APIs. For teams that need broad multilingual coverage, this catalog depth is valuable. For teams that need 2-4 English voices, it's unnecessary complexity.
Speeko's four voices are curated for the most common content automation use cases:
am_michael— US English maleaf_sarah— US English femalebm_george— British English malebf_emma— British English female
Google Cloud TTS vs Speeko: Direct Comparison
| Feature | Google Cloud TTS | Speeko |
|---|---|---|
| Price (neural voices) | $0.016/1K chars | $0.030/1K chars |
| Free tier | 1M Neural2/month (billing req.) | $5 one-time credit |
| Video support | No | $0.045/sec |
| Credits expire? | Monthly reset (unused lost) | Never |
| Billing account to start | Required | Not required |
| Setup complexity | High (9+ steps) | Low (signup + API key) |
| SSML support | Full | Limited |
| Voice catalog | 380+ voices, 50+ languages | 4 voices, English |
| Language support | Broad multilingual | English |
Google Cloud TTS is cheaper per character and offers broader language/voice coverage. Speeko is simpler to integrate, requires no billing account to evaluate, and includes video narration support.
Migrating from Google Cloud TTS to Speeko
If you're currently on Google Cloud TTS and want simpler integration:
# Before (Google Cloud TTS)
from google.cloud import texttospeech
def google_synthesize(text: str, output_path: str):
client = texttospeech.TextToSpeechClient()
response = client.synthesize_speech(
input=texttospeech.SynthesisInput(text=text),
voice=texttospeech.VoiceSelectionParams(
language_code="en-US",
name="en-US-Neural2-C",
),
audio_config=texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3
)
)
with open(output_path, "wb") as f:
f.write(response.audio_content)
# After (Speeko)
import requests
import os
def speeko_synthesize(text: str, output_path: str):
response = requests.post(
"https://api.speekoapp.com/v1/tts",
headers={"Authorization": f"Bearer {os.environ['SPEEKO_API_KEY']}"},
json={"text": text, "voice": "af_sarah", "format": "mp3"},
timeout=60,
)
response.raise_for_status()
with open(output_path, "wb") as f:
f.write(response.content)Migration removes the SDK dependency, the service account file, and the GCP project dependency. The function interface can be kept identical for a drop-in replacement.
When Google Cloud TTS Makes Sense
- You're already running on GCP and want unified billing and monitoring
- Your volume is under 1M Neural2 chars/month and the free tier covers it indefinitely
- You need broad multilingual support (50+ languages) not available elsewhere
- You need full SSML support including advanced prosody control
- Cost at high volume is the primary concern and you can absorb the setup complexity
When to Choose Speeko Instead
- You want to evaluate TTS without entering a credit card
- You need video narration alongside TTS in a single API
- Your stack isn't on GCP and you don't want cross-cloud dependencies
- You need English-only voices and don't need 380+ options
- Your usage is variable and you want no-overhead billing (no monthly commitment, no budget alerts needed)
Getting Started
Try Speeko free — $5 credit, no credit card, API key in under 2 minutes. The $5 covers 167,000 characters — enough for a thorough quality evaluation before you decide.