Voice APIs for Web Applications: Browser Voice Integration, WebRTC, and JavaScript Voice Libraries

Posted on May 2, 2026
By Speeko Team
web-developmentvoice-apijavascriptwebrtctext-to-speechspeech-recognition

Voice APIs for Web Applications: Browser Voice Integration, WebRTC, and JavaScript Voice Libraries

The modern web browser is a powerful voice platform. With Web Audio API, MediaRecorder, Speech Recognition, and WebRTC, you can build sophisticated voice features entirely in JavaScript—no native apps required. Add external voice services like Speeko for TTS synthesis, and you have a complete voice stack for web applications.

This guide covers the full spectrum of voice APIs available to web developers, when to use each, and how to combine them effectively.

The Web Voice API Landscape

1. Web Speech API (Built-in Browser)

Speech Recognition and Synthesis are available directly in modern browsers.

TTS (Text-to-Speech):

// Browser native TTS - No API key required
const utterance = new SpeechSynthesisUtterance("Hello world");
utterance.rate = 1.0;
utterance.pitch = 1.0;
utterance.volume = 0.8;

window.speechSynthesis.speak(utterance);

Pros:

  • Zero setup, no API calls
  • ~80% browser support (Chrome, Edge, Safari, Firefox)
  • Good for fallback scenarios

Cons:

  • Limited voice quality (robotic)
  • No offline audio file export
  • Limited language/voice selection
  • Inconsistent voice quality across browsers
  • Cannot use while speaking in web app (blocks synthesis)

Speech Recognition:

// Browser native speech-to-text
const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)();

recognition.onresult = (event) => {
  let transcript = "";
  for (let i = event.resultIndex; i < event.results.length; i++) {
    transcript += event.results[i][0].transcript;
  }
  console.log("You said:", transcript);
};

recognition.start();

Pros:

  • Free, no API
  • Works offline
  • Good for accessibility

Cons:

  • Accuracy varies
  • Limited language support
  • No streaming support

2. Web Audio API (Advanced Playback)

For professional audio playback, effects, and analysis:

// Web Audio API for advanced playback control
const audioContext = new (window.AudioContext || window.webkitAudioContext)();

async function playAudioWithEffects(audioUrl) {
  const response = await fetch(audioUrl);
  const arrayBuffer = await response.arrayBuffer();
  const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
  
  const source = audioContext.createBufferSource();
  source.buffer = audioBuffer;
  
  // Add effects chain
  const gainNode = audioContext.createGain();
  const biquadFilter = audioContext.createBiquadFilter();
  
  source.connect(biquadFilter);
  biquadFilter.connect(gainNode);
  gainNode.connect(audioContext.destination);
  
  // Fade in effect
  gainNode.gain.setValueAtTime(0, audioContext.currentTime);
  gainNode.gain.linearRampToValueAtTime(1, audioContext.currentTime + 0.5);
  
  source.start(0);
  
  return { source, gainNode, biquadFilter };
}

When to use:

  • Audio visualization
  • Real-time effects (equalization, reverb)
  • Advanced audio analysis
  • Game audio engines
  • Music production web apps

3. MediaRecorder API (Recording)

Capture user voice directly in browser:

// Record user voice
let mediaRecorder;
const audioChunks = [];

navigator.mediaDevices.getUserMedia({ audio: true })
  .then(stream => {
    mediaRecorder = new MediaRecorder(stream);
    
    mediaRecorder.ondataavailable = (event) => {
      audioChunks.push(event.data);
    };
    
    mediaRecorder.onstop = () => {
      const audioBlob = new Blob(audioChunks, { type: 'audio/webm' });
      const audioUrl = URL.createObjectURL(audioBlob);
      
      // Send to server
      uploadAudio(audioBlob);
    };
    
    mediaRecorder.start();
  })
  .catch(err => console.error("Mic access denied:", err));

function stopRecording() {
  mediaRecorder.stop();
}

async function uploadAudio(blob) {
  const formData = new FormData();
  formData.append('audio', blob);
  
  const response = await fetch('/api/upload-voice', {
    method: 'POST',
    body: formData
  });
  
  return response.json();
}

Use cases:

  • Voice messaging apps
  • Transcription services
  • Voice note-taking
  • Voice feedback collection
  • Quality assurance voice recording

4. WebRTC (Real-time Voice Communication)

For peer-to-peer or real-time voice:

// Simplified WebRTC example
class VoiceCall {
  constructor() {
    this.peerConnection = null;
    this.localStream = null;
    this.remoteStream = null;
  }

  async initiate(signalingServer) {
    // Get local microphone
    this.localStream = await navigator.mediaDevices.getUserMedia({ audio: true });
    
    // Create peer connection
    this.peerConnection = new RTCPeerConnection({
      iceServers: [{ urls: ['stun:stun.l.google.com:19302'] }]
    });

    // Add local stream
    this.localStream.getTracks().forEach(track => {
      this.peerConnection.addTrack(track, this.localStream);
    });

    // Handle incoming stream
    this.peerConnection.ontrack = (event) => {
      this.remoteStream = event.streams[0];
      // Play remote audio
      const audioElement = new Audio();
      audioElement.srcObject = this.remoteStream;
      audioElement.play();
    };

    // Handle ICE candidates
    this.peerConnection.onicecandidate = (event) => {
      if (event.candidate) {
        signalingServer.send({
          type: 'ice-candidate',
          candidate: event.candidate
        });
      }
    };

    // Create and send offer
    const offer = await this.peerConnection.createOffer();
    await this.peerConnection.setLocalDescription(offer);
    signalingServer.send({ type: 'offer', offer });
  }
}

When to use:

  • Video conferencing
  • Real-time voice chat
  • Live collaboration tools
  • P2P voice calls

External Voice Services for Web

Browser native voice has limits. For production applications, use external APIs:

Speeko TTS API (Recommended)

Best for web applications needing professional voice synthesis:

// speeko-voice.js - Production-ready integration
class SpeekoVoice {
  constructor(apiKey) {
    this.apiKey = apiKey;
    this.baseUrl = 'https://api.speekoapp.com/api/v1';
    this.cache = new Map(); // Browser cache
  }

  async synthesize(text, options = {}) {
    const {
      voiceId = 'alloy',
      language = 'en',
      speed = 1.0
    } = options;

    // Check browser cache (IndexedDB for durability)
    const cached = await this._getCached(text, voiceId);
    if (cached) {
      return cached;
    }

    try {
      const response = await fetch(`${this.baseUrl}/tts`, {
        method: 'POST',
        headers: {
          'X-API-Key': this.apiKey,
          'Content-Type': 'application/json'
        },
        body: JSON.stringify({
          text,
          voice_id: voiceId,
          language,
          format: 'mp3'
        })
      });

      if (!response.ok) {
        throw new Error(`Synthesis failed: ${response.status}`);
      }

      const data = await response.json();
      
      // Cache result
      await this._cache(text, voiceId, data.audio_url);

      return {
        audioUrl: data.audio_url,
        duration: data.duration || 0,
        cached: false
      };
    } catch (error) {
      console.error('Speeko synthesis error:', error);
      // Fallback to browser TTS
      return this._fallbackToBrowserTTS(text);
    }
  }

  async play(text, options = {}) {
    const { audioUrl } = await this.synthesize(text, options);
    
    return new Promise((resolve) => {
      const audio = new Audio(audioUrl);
      audio.onended = resolve;
      audio.play().catch(console.error);
    });
  }

  async _getCached(text, voiceId) {
    try {
      const db = await this._openDB();
      const key = `${text}:${voiceId}`;
      return await db.get('voice_cache', key);
    } catch {
      return null;
    }
  }

  async _cache(text, voiceId, audioUrl) {
    try {
      const db = await this._openDB();
      const key = `${text}:${voiceId}`;
      await db.put('voice_cache', audioUrl, key);
    } catch (error) {
      console.warn('Cache error:', error);
    }
  }

  async _openDB() {
    return new Promise((resolve, reject) => {
      const request = indexedDB.open('SpeekoVoice', 1);
      
      request.onerror = () => reject(request.error);
      request.onsuccess = () => {
        const db = request.result;
        if (!db.objectStoreNames.contains('voice_cache')) {
          db.createObjectStore('voice_cache');
        }
        resolve(db);
      };
      
      request.onupgradeneeded = (event) => {
        const db = event.target.result;
        if (!db.objectStoreNames.contains('voice_cache')) {
          db.createObjectStore('voice_cache');
        }
      };
    });
  }

  _fallbackToBrowserTTS(text) {
    const utterance = new SpeechSynthesisUtterance(text);
    window.speechSynthesis.speak(utterance);
    return { audioUrl: null, duration: 0, cached: false };
  }

  async getAvailableVoices() {
    try {
      const response = await fetch(`${this.baseUrl}/voices`, {
        headers: { 'X-API-Key': this.apiKey }
      });
      return response.json();
    } catch (error) {
      console.error('Failed to fetch voices:', error);
      return { voices: [] };
    }
  }
}

Comparison Table: Voice APIs for Web

Service Cost Quality Setup Latency Best For
Browser Web Speech Free Basic None 0-2s Accessibility, fallback
Speeko $0.03/1K chars Excellent API key 300-500ms Production TTS
Google Cloud TTS $16/1M chars Excellent GCP account 500ms-1s Enterprise
ElevenLabs $0.30/1K chars Premium API key 1-3s High-end voices
Amazon Polly $0.02/1K chars Good AWS account 500ms AWS ecosystem

Real-World Implementation: Interactive Fitness App

Here's a complete example combining multiple APIs:

// fitness-app.js - Voice-guided workout
class VoiceWorkout {
  constructor(speeko_api_key) {
    this.voice = new SpeekoVoice(speeko_api_key);
    this.audioContext = new AudioContext();
    this.currentExerciseIndex = 0;
    this.isPlaying = false;
  }

  async startWorkout(workout) {
    this.workout = workout;
    
    // Greeting
    await this.voice.play("Let's get started!");
    
    // Main loop
    for (let i = 0; i < workout.exercises.length; i++) {
      this.currentExerciseIndex = i;
      await this.playExercise(workout.exercises[i]);
    }
    
    // Completion
    await this.voice.play("Great job! You completed the workout.");
  }

  async playExercise(exercise) {
    const { name, reps, duration } = exercise;
    
    // Preparation
    const prepText = `Next: ${name}. Get ready in 5 seconds.`;
    await this.voice.play(prepText, { voiceId: 'alloy' });
    
    // Countdown
    for (let i = 5; i > 0; i--) {
      await this.delay(1000);
      await this.voice.play(i.toString(), { voiceId: 'echo' });
    }
    
    // Go!
    await this.voice.play("Go!", { voiceId: 'nova' });
    
    // During exercise: periodic encouragement
    if (duration) {
      this.playEncouragementDuring(duration * 1000);
    } else if (reps) {
      for (let i = 1; i <= reps; i++) {
        await this.delay(3000);
        const remaining = reps - i;
        if (remaining > 0) {
          await this.voice.play(`${remaining} to go!`);
        }
      }
    }
    
    // Rest
    await this.voice.play("Rest for 30 seconds.");
    await this.delay(30000);
  }

  async playEncouragementDuring(durationMs) {
    const encouragements = [
      "You got this!",
      "Keep going!",
      "Halfway there!",
      "Almost done!"
    ];
    
    const interval = durationMs / encouragements.length;
    
    for (let i = 0; i < encouragements.length; i++) {
      await this.delay(interval);
      await this.voice.play(encouragements[i]);
    }
  }

  delay(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
  }
}

// HTML
export function FitnessApp() {
  const voice = new VoiceWorkout(globalThis._importMeta_.env.VITE_SPEEKO_KEY);

  const workout = [
    { name: "Push-ups", reps: 10 },
    { name: "Squats", reps: 15 },
    { name: "Burpees", duration: 30 },
    { name: "Plank", duration: 60 }
  ];

  return (
    <div className="workout-container">
      <button onClick={() => voice.startWorkout(workout)}>
        🎙️ Start Voice-Guided Workout
      </button>
    </div>
  );
}

React Hook for Voice Integration

// hooks/useVoice.js
import { useState, useRef, useEffect } from 'react';

export function useVoice(apiKey) {
  const voiceService = useRef(null);
  const [isPlaying, setIsPlaying] = useState(false);
  const [isSynthesizing, setIsSynthesizing] = useState(false);
  const [error, setError] = useState(null);

  useEffect(() => {
    voiceService.current = new SpeekoVoice(apiKey);
  }, [apiKey]);

  const speak = async (text, options = {}) => {
    try {
      setError(null);
      setIsSynthesizing(true);
      
      const { audioUrl } = await voiceService.current.synthesize(text, options);
      
      setIsPlaying(true);
      const audio = new Audio(audioUrl);
      
      audio.onended = () => {
        setIsPlaying(false);
      };
      
      await audio.play();
    } catch (err) {
      setError(err.message);
      console.error('Voice error:', err);
    } finally {
      setIsSynthesizing(false);
    }
  };

  const stop = () => {
    // Implementation: pause current audio
  };

  return {
    speak,
    stop,
    isPlaying,
    isSynthesizing,
    error
  };
}

// Usage
export function NewsReader() {
  const { speak, isPlaying } = useVoice(globalThis._importMeta_.env.VITE_SPEEKO_KEY);

  return (
    <article>
      <h1>Breaking News</h1>
      <p>Major developments in tech industry...</p>
      
      <button 
        onClick={() => speak("Breaking News: Major developments in tech industry...")}
        disabled={isPlaying}
      >
        {isPlaying ? 'Playing...' : '🔊 Read Article'}
      </button>
    </article>
  );
}

Performance Best Practices

  1. Lazy Load Speeko
// Only load voice API when needed
const loadVoiceService = async () => {
  const module = await import('./speeko-voice.js');
  return new module.SpeekoVoice(apiKey);
};
  1. Implement Timeout
async synthesizeWithTimeout(text, timeoutMs = 5000) {
  return Promise.race([
    this.synthesize(text),
    new Promise((_, reject) => 
      setTimeout(() => reject(new Error('Synthesis timeout')), timeoutMs)
    )
  ]);
}
  1. Cache Aggressively
  • Cache in memory (Map) for session
  • Cache in IndexedDB for persistence
  • Cache in Service Worker for offline
  1. Batch Requests
async batchSynthesize(texts) {
  return Promise.all(texts.map(text => this.synthesize(text)));
}

Accessibility Considerations

Always provide text alternatives:

// Good: Voice + visible captions
<div className="audio-player">
  <button onClick={() => voice.speak(caption)}>
    🔊 {isPlaying ? 'Playing...' : 'Listen'}
  </button>
  <p className="caption" role="status">
    {caption}
  </p>
</div>

Browser Support

Check support before using APIs:

const browserSupport = {
  audioContext: window.AudioContext || window.webkitAudioContext,
  speechSynthesis: window.speechSynthesis,
  mediaRecorder: window.MediaRecorder,
  webrtc: navigator.mediaDevices?.getUserMedia,
  indexedDB: window.indexedDB
};

console.log('Browser voice capabilities:', browserSupport);

Conclusion

Modern web applications have powerful voice capabilities:

  • Browser APIs for accessibility and fallback
  • Speeko TTS for professional, reliable voice synthesis
  • Web Audio API for advanced playback and effects
  • WebRTC for real-time communication
  • MediaRecorder for voice capture

Combine these intelligently to create engaging, accessible voice experiences entirely in the browser—without native apps.


Build voice-powered web apps today.

Speeko's JavaScript integration is simple, fast, and production-ready. Start with $10 in free credits.

Get Started | JavaScript Docs