Text to Speech for News Apps: Auto-Generate Audio Articles on Publish
Reuters added audio to its articles in 2023. By 2025, audio article completion rates were running 2.3× higher than text-read completion rates across major news sites. The pattern is consistent: people who start listening finish. People who start reading often don't.
Adding TTS to a news or media app isn't complicated. Here's how to do it properly — auto-generation on publish, CDN caching, and a cost breakdown for real publishing volumes.
The Architecture
The goal: every article gets an audio version the moment it publishes. No manual steps, no delay.
Article publishes
↓
CMS webhook fires
↓
TTS service generates MP3
↓
Upload to CDN (S3/CloudFront, GCS, etc.)
↓
Store audio URL in article metadata
↓
Frontend serves audio playerThe TTS service is the only moving part you're building. The rest is infrastructure you already have.
Webhook Handler
When your CMS publishes an article, it fires a webhook. Your handler catches it and kicks off audio generation:
from fastapi import FastAPI, BackgroundTasks
import requests
import boto3
app = FastAPI()
s3 = boto3.client('s3')
BUCKET = "your-audio-bucket"
SPEEKO_KEY = os.environ["SPEEKO_API_KEY"]
@app.post("/webhook/article-published")
async def handle_publish(payload: dict, background: BackgroundTasks):
article_id = payload["article_id"]
text = payload["body_text"] # plain text, no HTML
background.add_task(generate_and_store, article_id, text)
return {"status": "queued"}
async def generate_and_store(article_id: str, text: str):
response = requests.post(
"https://api.speekoapp.com/v1/tts",
headers={"X-API-Key": SPEEKO_KEY, "Content-Type": "application/json"},
json={"text": text, "voice": "en-US-neural-1", "format": "mp3"}
)
if response.status_code != 200:
# log and alert — don't silently fail
print(f"TTS failed for {article_id}: {response.status_code}")
return
key = f"articles/{article_id}/audio.mp3"
s3.put_object(
Bucket=BUCKET,
Key=key,
Body=response.content,
ContentType="audio/mpeg",
CacheControl="public, max-age=31536000" # 1 year — content doesn't change
)
# update article record with audio URL
audio_url = f"https://cdn.example.com/{key}"
update_article_audio_url(article_id, audio_url)Run this as a background task so the webhook returns immediately — your CMS doesn't need to wait for TTS generation.
Stripping HTML Before Sending
Send plain text to the TTS API, not HTML. Sending raw article HTML will result in the voice reading out <p>, <strong>, and every anchor tag. Strip it first:
from bs4 import BeautifulSoup
def extract_text(html: str) -> str:
soup = BeautifulSoup(html, "html.parser")
# Remove elements that shouldn't be read aloud
for tag in soup.find_all(["figure", "aside", "nav", "footer"]):
tag.decompose()
return soup.get_text(separator=" ", strip=True)Also strip:
- Pull quotes (they duplicate content already in the article body)
- Author bylines and datelines
- "Read more" links
What you're sending to TTS should be exactly what you'd read aloud to someone. Nothing more.
CDN Configuration
Once the MP3 is on S3/CloudFront, a few cache settings matter:
Cache-Control: public, max-age=31536000
Content-Type: audio/mpegOne year TTL is appropriate — audio for a published article won't change. If you do update an article significantly, generate a new audio file and update the URL (don't try to invalidate CDN cache for audio, it's not worth the complexity).
For a news site with 1,000 article reads per day across 500 audio-enabled articles, serving from CDN costs near nothing. The TTS API cost is one-time per article.
Cost at Publishing Scale
Speeko charges $0.03 per 1,000 characters. An average news article runs 800–1,200 words, roughly 5,000–7,500 characters.
At $0.03/1K:
- 1 article: ~$0.18
- 100 articles/day: ~$18/day, ~$540/month
- 20 articles/day (mid-size outlet): ~$108/month
Compare: ReadSpeaker's enterprise licensing for newsrooms starts at $800–$2,000/month for similar volumes. Pay-per-use wins at every publishing scale below 500+ articles per day.
The Audio Player
Keep the player simple. An HTML5 <audio> element with minimal controls is enough:
<div class="article-audio" role="region" aria-label="Listen to this article">
<audio controls preload="metadata">
<source src="{{ article.audio_url }}" type="audio/mpeg">
Your browser doesn't support audio playback.
</audio>
<p>🔊 Listen to this article — {{ estimated_duration }}</p>
</div>Estimated duration: divide character count by 14 (average characters per second at normal TTS pace). A 6,000-character article is about 7 minutes.
Don't autoplay. Ever. Users hate it, browsers block it on most platforms anyway, and it burns data for people on mobile who didn't ask for it.
Handling Regeneration
Articles get corrected. A factual error gets fixed, a headline changes. You need a way to regenerate audio when content updates significantly.
Simple approach: store a hash of the text you sent to TTS alongside the audio URL. On each article update, compare the new text hash to the stored one. If they differ by more than N%, regenerate.
import hashlib
def should_regenerate(old_text: str, new_text: str, threshold: float = 0.1) -> bool:
# Simple character-level difference ratio
longer = max(len(old_text), len(new_text))
diff = abs(len(new_text) - len(old_text))
return (diff / longer) > thresholdThis won't catch every significant change, but it catches article rewrites. Typo fixes don't trigger regeneration. Good enough for most newsrooms.
Getting Started
Speeko's free $5 credit covers roughly 167,000 characters — about 25–30 full articles. Enough to test the full pipeline before committing to production.
For the async webhook handler in production, see TTS webhook and async callbacks guide for error handling, retries, and monitoring patterns.