Voice APIs for Web Applications: Browser Voice Integration, WebRTC, and JavaScript Voice Libraries
The modern web browser is a powerful voice platform. With Web Audio API, MediaRecorder, Speech Recognition, and WebRTC, you can build sophisticated voice features entirely in JavaScript—no native apps required. Add external voice services like Speeko for TTS synthesis, and you have a complete voice stack for web applications.
This guide covers the full spectrum of voice APIs available to web developers, when to use each, and how to combine them effectively.
The Web Voice API Landscape
1. Web Speech API (Built-in Browser)
Speech Recognition and Synthesis are available directly in modern browsers.
TTS (Text-to-Speech):
// Browser native TTS - No API key required
const utterance = new SpeechSynthesisUtterance("Hello world");
utterance.rate = 1.0;
utterance.pitch = 1.0;
utterance.volume = 0.8;
window.speechSynthesis.speak(utterance);Pros:
- Zero setup, no API calls
- ~80% browser support (Chrome, Edge, Safari, Firefox)
- Good for fallback scenarios
Cons:
- Limited voice quality (robotic)
- No offline audio file export
- Limited language/voice selection
- Inconsistent voice quality across browsers
- Cannot use while speaking in web app (blocks synthesis)
Speech Recognition:
// Browser native speech-to-text
const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)();
recognition.onresult = (event) => {
let transcript = "";
for (let i = event.resultIndex; i < event.results.length; i++) {
transcript += event.results[i][0].transcript;
}
console.log("You said:", transcript);
};
recognition.start();Pros:
- Free, no API
- Works offline
- Good for accessibility
Cons:
- Accuracy varies
- Limited language support
- No streaming support
2. Web Audio API (Advanced Playback)
For professional audio playback, effects, and analysis:
// Web Audio API for advanced playback control
const audioContext = new (window.AudioContext || window.webkitAudioContext)();
async function playAudioWithEffects(audioUrl) {
const response = await fetch(audioUrl);
const arrayBuffer = await response.arrayBuffer();
const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
const source = audioContext.createBufferSource();
source.buffer = audioBuffer;
// Add effects chain
const gainNode = audioContext.createGain();
const biquadFilter = audioContext.createBiquadFilter();
source.connect(biquadFilter);
biquadFilter.connect(gainNode);
gainNode.connect(audioContext.destination);
// Fade in effect
gainNode.gain.setValueAtTime(0, audioContext.currentTime);
gainNode.gain.linearRampToValueAtTime(1, audioContext.currentTime + 0.5);
source.start(0);
return { source, gainNode, biquadFilter };
}When to use:
- Audio visualization
- Real-time effects (equalization, reverb)
- Advanced audio analysis
- Game audio engines
- Music production web apps
3. MediaRecorder API (Recording)
Capture user voice directly in browser:
// Record user voice
let mediaRecorder;
const audioChunks = [];
navigator.mediaDevices.getUserMedia({ audio: true })
.then(stream => {
mediaRecorder = new MediaRecorder(stream);
mediaRecorder.ondataavailable = (event) => {
audioChunks.push(event.data);
};
mediaRecorder.onstop = () => {
const audioBlob = new Blob(audioChunks, { type: 'audio/webm' });
const audioUrl = URL.createObjectURL(audioBlob);
// Send to server
uploadAudio(audioBlob);
};
mediaRecorder.start();
})
.catch(err => console.error("Mic access denied:", err));
function stopRecording() {
mediaRecorder.stop();
}
async function uploadAudio(blob) {
const formData = new FormData();
formData.append('audio', blob);
const response = await fetch('/api/upload-voice', {
method: 'POST',
body: formData
});
return response.json();
}Use cases:
- Voice messaging apps
- Transcription services
- Voice note-taking
- Voice feedback collection
- Quality assurance voice recording
4. WebRTC (Real-time Voice Communication)
For peer-to-peer or real-time voice:
// Simplified WebRTC example
class VoiceCall {
constructor() {
this.peerConnection = null;
this.localStream = null;
this.remoteStream = null;
}
async initiate(signalingServer) {
// Get local microphone
this.localStream = await navigator.mediaDevices.getUserMedia({ audio: true });
// Create peer connection
this.peerConnection = new RTCPeerConnection({
iceServers: [{ urls: ['stun:stun.l.google.com:19302'] }]
});
// Add local stream
this.localStream.getTracks().forEach(track => {
this.peerConnection.addTrack(track, this.localStream);
});
// Handle incoming stream
this.peerConnection.ontrack = (event) => {
this.remoteStream = event.streams[0];
// Play remote audio
const audioElement = new Audio();
audioElement.srcObject = this.remoteStream;
audioElement.play();
};
// Handle ICE candidates
this.peerConnection.onicecandidate = (event) => {
if (event.candidate) {
signalingServer.send({
type: 'ice-candidate',
candidate: event.candidate
});
}
};
// Create and send offer
const offer = await this.peerConnection.createOffer();
await this.peerConnection.setLocalDescription(offer);
signalingServer.send({ type: 'offer', offer });
}
}When to use:
- Video conferencing
- Real-time voice chat
- Live collaboration tools
- P2P voice calls
External Voice Services for Web
Browser native voice has limits. For production applications, use external APIs:
Speeko TTS API (Recommended)
Best for web applications needing professional voice synthesis:
// speeko-voice.js - Production-ready integration
class SpeekoVoice {
constructor(apiKey) {
this.apiKey = apiKey;
this.baseUrl = 'https://api.speekoapp.com/api/v1';
this.cache = new Map(); // Browser cache
}
async synthesize(text, options = {}) {
const {
voiceId = 'alloy',
language = 'en',
speed = 1.0
} = options;
// Check browser cache (IndexedDB for durability)
const cached = await this._getCached(text, voiceId);
if (cached) {
return cached;
}
try {
const response = await fetch(`${this.baseUrl}/tts`, {
method: 'POST',
headers: {
'X-API-Key': this.apiKey,
'Content-Type': 'application/json'
},
body: JSON.stringify({
text,
voice_id: voiceId,
language,
format: 'mp3'
})
});
if (!response.ok) {
throw new Error(`Synthesis failed: ${response.status}`);
}
const data = await response.json();
// Cache result
await this._cache(text, voiceId, data.audio_url);
return {
audioUrl: data.audio_url,
duration: data.duration || 0,
cached: false
};
} catch (error) {
console.error('Speeko synthesis error:', error);
// Fallback to browser TTS
return this._fallbackToBrowserTTS(text);
}
}
async play(text, options = {}) {
const { audioUrl } = await this.synthesize(text, options);
return new Promise((resolve) => {
const audio = new Audio(audioUrl);
audio.onended = resolve;
audio.play().catch(console.error);
});
}
async _getCached(text, voiceId) {
try {
const db = await this._openDB();
const key = `${text}:${voiceId}`;
return await db.get('voice_cache', key);
} catch {
return null;
}
}
async _cache(text, voiceId, audioUrl) {
try {
const db = await this._openDB();
const key = `${text}:${voiceId}`;
await db.put('voice_cache', audioUrl, key);
} catch (error) {
console.warn('Cache error:', error);
}
}
async _openDB() {
return new Promise((resolve, reject) => {
const request = indexedDB.open('SpeekoVoice', 1);
request.onerror = () => reject(request.error);
request.onsuccess = () => {
const db = request.result;
if (!db.objectStoreNames.contains('voice_cache')) {
db.createObjectStore('voice_cache');
}
resolve(db);
};
request.onupgradeneeded = (event) => {
const db = event.target.result;
if (!db.objectStoreNames.contains('voice_cache')) {
db.createObjectStore('voice_cache');
}
};
});
}
_fallbackToBrowserTTS(text) {
const utterance = new SpeechSynthesisUtterance(text);
window.speechSynthesis.speak(utterance);
return { audioUrl: null, duration: 0, cached: false };
}
async getAvailableVoices() {
try {
const response = await fetch(`${this.baseUrl}/voices`, {
headers: { 'X-API-Key': this.apiKey }
});
return response.json();
} catch (error) {
console.error('Failed to fetch voices:', error);
return { voices: [] };
}
}
}Comparison Table: Voice APIs for Web
| Service | Cost | Quality | Setup | Latency | Best For |
|---|---|---|---|---|---|
| Browser Web Speech | Free | Basic | None | 0-2s | Accessibility, fallback |
| Speeko | $0.03/1K chars | Excellent | API key | 300-500ms | Production TTS |
| Google Cloud TTS | $16/1M chars | Excellent | GCP account | 500ms-1s | Enterprise |
| ElevenLabs | $0.30/1K chars | Premium | API key | 1-3s | High-end voices |
| Amazon Polly | $0.02/1K chars | Good | AWS account | 500ms | AWS ecosystem |
Real-World Implementation: Interactive Fitness App
Here's a complete example combining multiple APIs:
// fitness-app.js - Voice-guided workout
class VoiceWorkout {
constructor(speeko_api_key) {
this.voice = new SpeekoVoice(speeko_api_key);
this.audioContext = new AudioContext();
this.currentExerciseIndex = 0;
this.isPlaying = false;
}
async startWorkout(workout) {
this.workout = workout;
// Greeting
await this.voice.play("Let's get started!");
// Main loop
for (let i = 0; i < workout.exercises.length; i++) {
this.currentExerciseIndex = i;
await this.playExercise(workout.exercises[i]);
}
// Completion
await this.voice.play("Great job! You completed the workout.");
}
async playExercise(exercise) {
const { name, reps, duration } = exercise;
// Preparation
const prepText = `Next: ${name}. Get ready in 5 seconds.`;
await this.voice.play(prepText, { voiceId: 'alloy' });
// Countdown
for (let i = 5; i > 0; i--) {
await this.delay(1000);
await this.voice.play(i.toString(), { voiceId: 'echo' });
}
// Go!
await this.voice.play("Go!", { voiceId: 'nova' });
// During exercise: periodic encouragement
if (duration) {
this.playEncouragementDuring(duration * 1000);
} else if (reps) {
for (let i = 1; i <= reps; i++) {
await this.delay(3000);
const remaining = reps - i;
if (remaining > 0) {
await this.voice.play(`${remaining} to go!`);
}
}
}
// Rest
await this.voice.play("Rest for 30 seconds.");
await this.delay(30000);
}
async playEncouragementDuring(durationMs) {
const encouragements = [
"You got this!",
"Keep going!",
"Halfway there!",
"Almost done!"
];
const interval = durationMs / encouragements.length;
for (let i = 0; i < encouragements.length; i++) {
await this.delay(interval);
await this.voice.play(encouragements[i]);
}
}
delay(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
}
// HTML
export function FitnessApp() {
const voice = new VoiceWorkout(globalThis._importMeta_.env.VITE_SPEEKO_KEY);
const workout = [
{ name: "Push-ups", reps: 10 },
{ name: "Squats", reps: 15 },
{ name: "Burpees", duration: 30 },
{ name: "Plank", duration: 60 }
];
return (
<div className="workout-container">
<button onClick={() => voice.startWorkout(workout)}>
🎙️ Start Voice-Guided Workout
</button>
</div>
);
}React Hook for Voice Integration
// hooks/useVoice.js
import { useState, useRef, useEffect } from 'react';
export function useVoice(apiKey) {
const voiceService = useRef(null);
const [isPlaying, setIsPlaying] = useState(false);
const [isSynthesizing, setIsSynthesizing] = useState(false);
const [error, setError] = useState(null);
useEffect(() => {
voiceService.current = new SpeekoVoice(apiKey);
}, [apiKey]);
const speak = async (text, options = {}) => {
try {
setError(null);
setIsSynthesizing(true);
const { audioUrl } = await voiceService.current.synthesize(text, options);
setIsPlaying(true);
const audio = new Audio(audioUrl);
audio.onended = () => {
setIsPlaying(false);
};
await audio.play();
} catch (err) {
setError(err.message);
console.error('Voice error:', err);
} finally {
setIsSynthesizing(false);
}
};
const stop = () => {
// Implementation: pause current audio
};
return {
speak,
stop,
isPlaying,
isSynthesizing,
error
};
}
// Usage
export function NewsReader() {
const { speak, isPlaying } = useVoice(globalThis._importMeta_.env.VITE_SPEEKO_KEY);
return (
<article>
<h1>Breaking News</h1>
<p>Major developments in tech industry...</p>
<button
onClick={() => speak("Breaking News: Major developments in tech industry...")}
disabled={isPlaying}
>
{isPlaying ? 'Playing...' : '🔊 Read Article'}
</button>
</article>
);
}Performance Best Practices
- Lazy Load Speeko
// Only load voice API when needed
const loadVoiceService = async () => {
const module = await import('./speeko-voice.js');
return new module.SpeekoVoice(apiKey);
};- Implement Timeout
async synthesizeWithTimeout(text, timeoutMs = 5000) {
return Promise.race([
this.synthesize(text),
new Promise((_, reject) =>
setTimeout(() => reject(new Error('Synthesis timeout')), timeoutMs)
)
]);
}- Cache Aggressively
- Cache in memory (Map) for session
- Cache in IndexedDB for persistence
- Cache in Service Worker for offline
- Batch Requests
async batchSynthesize(texts) {
return Promise.all(texts.map(text => this.synthesize(text)));
}Accessibility Considerations
Always provide text alternatives:
// Good: Voice + visible captions
<div className="audio-player">
<button onClick={() => voice.speak(caption)}>
🔊 {isPlaying ? 'Playing...' : 'Listen'}
</button>
<p className="caption" role="status">
{caption}
</p>
</div>Browser Support
Check support before using APIs:
const browserSupport = {
audioContext: window.AudioContext || window.webkitAudioContext,
speechSynthesis: window.speechSynthesis,
mediaRecorder: window.MediaRecorder,
webrtc: navigator.mediaDevices?.getUserMedia,
indexedDB: window.indexedDB
};
console.log('Browser voice capabilities:', browserSupport);Conclusion
Modern web applications have powerful voice capabilities:
- Browser APIs for accessibility and fallback
- Speeko TTS for professional, reliable voice synthesis
- Web Audio API for advanced playback and effects
- WebRTC for real-time communication
- MediaRecorder for voice capture
Combine these intelligently to create engaging, accessible voice experiences entirely in the browser—without native apps.
Build voice-powered web apps today.
Speeko's JavaScript integration is simple, fast, and production-ready. Start with $10 in free credits.