Text to Speech for News Apps: Auto-Generate Audio Articles on Publish

Reuters added audio to its articles in 2023. By 2025, audio article completion rates were running 2.3× higher than text-read completion rates across major news sites. The pattern is consistent: people who start listening finish. People who start reading often don't.

Adding TTS to a news or media app isn't complicated. Here's how to do it properly — auto-generation on publish, CDN caching, and a cost breakdown for real publishing volumes.

The Architecture

The goal: every article gets an audio version the moment it publishes. No manual steps, no delay.

Article publishes
       ↓
CMS webhook fires
       ↓
TTS service generates MP3
       ↓
Upload to CDN (S3/CloudFront, GCS, etc.)
       ↓
Store audio URL in article metadata
       ↓
Frontend serves audio player

The TTS service is the only moving part you're building. The rest is infrastructure you already have.

Webhook Handler

When your CMS publishes an article, it fires a webhook. Your handler catches it and kicks off audio generation:

from fastapi import FastAPI, BackgroundTasks
import requests
import boto3

app = FastAPI()
s3 = boto3.client('s3')
BUCKET = "your-audio-bucket"
SPEEKO_KEY = os.environ["SPEEKO_API_KEY"]

@app.post("/webhook/article-published")
async def handle_publish(payload: dict, background: BackgroundTasks):
    article_id = payload["article_id"]
    text = payload["body_text"]  # plain text, no HTML
    background.add_task(generate_and_store, article_id, text)
    return {"status": "queued"}

async def generate_and_store(article_id: str, text: str):
    response = requests.post(
        "https://api.speekoapp.com/v1/tts",
        headers={"X-API-Key": SPEEKO_KEY, "Content-Type": "application/json"},
        json={"text": text, "voice": "en-US-neural-1", "format": "mp3"}
    )

    if response.status_code != 200:
        # log and alert — don't silently fail
        print(f"TTS failed for {article_id}: {response.status_code}")
        return

    key = f"articles/{article_id}/audio.mp3"
    s3.put_object(
        Bucket=BUCKET,
        Key=key,
        Body=response.content,
        ContentType="audio/mpeg",
        CacheControl="public, max-age=31536000"  # 1 year — content doesn't change
    )

    # update article record with audio URL
    audio_url = f"https://cdn.example.com/{key}"
    update_article_audio_url(article_id, audio_url)

Run this as a background task so the webhook returns immediately — your CMS doesn't need to wait for TTS generation.

Stripping HTML Before Sending

Send plain text to the TTS API, not HTML. Sending raw article HTML will result in the voice reading out <p>, <strong>, and every anchor tag. Strip it first:

from bs4 import BeautifulSoup

def extract_text(html: str) -> str:
    soup = BeautifulSoup(html, "html.parser")
    # Remove elements that shouldn't be read aloud
    for tag in soup.find_all(["figure", "aside", "nav", "footer"]):
        tag.decompose()
    return soup.get_text(separator=" ", strip=True)

Also strip:

Pull quotes (they duplicate content already in the article body)
Author bylines and datelines
"Read more" links

What you're sending to TTS should be exactly what you'd read aloud to someone. Nothing more.

CDN Configuration

Once the MP3 is on S3/CloudFront, a few cache settings matter:

Cache-Control: public, max-age=31536000
Content-Type: audio/mpeg

One year TTL is appropriate — audio for a published article won't change. If you do update an article significantly, generate a new audio file and update the URL (don't try to invalidate CDN cache for audio, it's not worth the complexity).

For a news site with 1,000 article reads per day across 500 audio-enabled articles, serving from CDN costs near nothing. The TTS API cost is one-time per article.

Cost at Publishing Scale

Speeko charges $0.03 per 1,000 characters. An average news article runs 800–1,200 words, roughly 5,000–7,500 characters.

At $0.03/1K:

1 article: ~$0.18
100 articles/day: ~$18/day, ~$540/month
20 articles/day (mid-size outlet): ~$108/month

Compare: ReadSpeaker's enterprise licensing for newsrooms starts at $800–$2,000/month for similar volumes. Pay-per-use wins at every publishing scale below 500+ articles per day.

The Audio Player

Keep the player simple. An HTML5 <audio> element with minimal controls is enough:

<div class="article-audio" role="region" aria-label="Listen to this article">
  <audio controls preload="metadata">
    <source src="{{ article.audio_url }}" type="audio/mpeg">
    Your browser doesn't support audio playback.
  </audio>
  <p>🔊 Listen to this article — {{ estimated_duration }}</p>
</div>

Estimated duration: divide character count by 14 (average characters per second at normal TTS pace). A 6,000-character article is about 7 minutes.

Don't autoplay. Ever. Users hate it, browsers block it on most platforms anyway, and it burns data for people on mobile who didn't ask for it.

Handling Regeneration

Articles get corrected. A factual error gets fixed, a headline changes. You need a way to regenerate audio when content updates significantly.

Simple approach: store a hash of the text you sent to TTS alongside the audio URL. On each article update, compare the new text hash to the stored one. If they differ by more than N%, regenerate.

import hashlib

def should_regenerate(old_text: str, new_text: str, threshold: float = 0.1) -> bool:
    # Simple character-level difference ratio
    longer = max(len(old_text), len(new_text))
    diff = abs(len(new_text) - len(old_text))
    return (diff / longer) > threshold

This won't catch every significant change, but it catches article rewrites. Typo fixes don't trigger regeneration. Good enough for most newsrooms.

Getting Started

Speeko's free $5 credit covers roughly 167,000 characters — about 25–30 full articles. Enough to test the full pipeline before committing to production.

For the async webhook handler in production, see TTS webhook and async callbacks guide for error handling, retries, and monitoring patterns.

Text to Speech for News Apps: Auto-Generate Audio Articles on Publish

Text to Speech for News Apps: Auto-Generate Audio Articles on Publish

The Architecture

Webhook Handler

Stripping HTML Before Sending

CDN Configuration

Cost at Publishing Scale

The Audio Player

Handling Regeneration

Getting Started

Related articles

Real-Time Voice Translation: Building Multilingual Conversation Systems

Voice Commerce Integration: Building Voice-Enabled Checkout Experiences