Text to Speech API Java Integration: HttpClient, Spring Boot, and Async Batch Processing

The Java ecosystem doesn't have official Speeko SDKs yet. You don't need one. Java 11's HttpClient handles REST calls cleanly, and the async patterns are straightforward. Here's everything you need to generate TTS audio from Java, including a Spring Boot service and batch processing with CompletableFuture.

Basic HTTP Call with Java 11 HttpClient

The simplest case — one text string, one MP3 back:

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.nio.file.Files;
import java.nio.file.Path;

public class SpeekoClient {
    private static final String API_KEY = System.getenv("SPEEKO_API_KEY");
    private static final String BASE_URL = "https://api.speekoapp.com/v1/tts";

    public static void synthesize(String text, Path outputPath) throws Exception {
        String body = """
            {
                "text": "%s",
                "voice": "en-US-neural-1",
                "format": "mp3"
            }
            """.formatted(text.replace("\"", "\\\""));

        HttpClient client = HttpClient.newHttpClient();
        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create(BASE_URL))
            .header("X-API-Key", API_KEY)
            .header("Content-Type", "application/json")
            .POST(HttpRequest.BodyPublishers.ofString(body))
            .build();

        HttpResponse<byte[]> response = client.send(
            request, HttpResponse.BodyHandlers.ofByteArray()
        );

        if (response.statusCode() != 200) {
            throw new RuntimeException("TTS failed: " + response.statusCode());
        }

        Files.write(outputPath, response.body());
    }
}

Call it:

SpeekoClient.synthesize(
    "Welcome to the onboarding flow.",
    Path.of("welcome.mp3")
);

That's it for single requests. For anything at scale, you want async.

Async Batch Processing with CompletableFuture

Generating audio for 50 product descriptions serially takes 50× the latency. Run them concurrently:

import java.util.List;
import java.util.Map;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.stream.Collectors;

public class TtsBatchProcessor {
    private final ExecutorService executor = Executors.newFixedThreadPool(10);

    public Map<String, byte[]> generateBatch(Map<String, String> scripts) {
        HttpClient client = HttpClient.newBuilder()
            .executor(executor)
            .build();

        List<CompletableFuture<Map.Entry<String, byte[]>>> futures = scripts.entrySet()
            .stream()
            .map(entry -> CompletableFuture.supplyAsync(() -> {
                try {
                    byte[] audio = callApi(client, entry.getValue());
                    return Map.entry(entry.getKey(), audio);
                } catch (Exception e) {
                    throw new RuntimeException("Failed for: " + entry.getKey(), e);
                }
            }, executor))
            .collect(Collectors.toList());

        return futures.stream()
            .map(CompletableFuture::join)
            .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
    }

    private byte[] callApi(HttpClient client, String text) throws Exception {
        // same request construction as above
        String body = """{"text": "%s", "voice": "en-US-neural-1", "format": "mp3"}"""
            .formatted(text.replace("\"", "\\\""));

        HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create("https://api.speekoapp.com/v1/tts"))
            .header("X-API-Key", System.getenv("SPEEKO_API_KEY"))
            .header("Content-Type", "application/json")
            .POST(HttpRequest.BodyPublishers.ofString(body))
            .build();

        HttpResponse<byte[]> response = client.send(
            request, HttpResponse.BodyHandlers.ofByteArray()
        );
        return response.body();
    }
}

With a pool of 10 threads, 50 product descriptions that each take ~400ms serial become ~2 seconds total. The math works.

One thing to watch: Speeko's rate limits apply per API key, not per connection. Check the X-RateLimit-Remaining response header if you're pushing high concurrency.

Spring Boot Service with Caching

For a web application, you don't want to regenerate audio on every request. Cache aggressively.

import org.springframework.cache.annotation.Cacheable;
import org.springframework.stereotype.Service;

@Service
public class TtsService {
    private final HttpClient client = HttpClient.newHttpClient();

    @Cacheable(value = "tts-audio", key = "#text + '-' + #voice")
    public byte[] generateAudio(String text, String voice) {
        try {
            String body = """
                {"text": "%s", "voice": "%s", "format": "mp3"}
                """.formatted(
                    text.replace("\"", "\\\""),
                    voice
                );

            HttpRequest request = HttpRequest.newBuilder()
                .uri(URI.create("https://api.speekoapp.com/v1/tts"))
                .header("X-API-Key", System.getenv("SPEEKO_API_KEY"))
                .header("Content-Type", "application/json")
                .POST(HttpRequest.BodyPublishers.ofString(body))
                .build();

            HttpResponse<byte[]> response = client.send(
                request, HttpResponse.BodyHandlers.ofByteArray()
            );
            return response.body();
        } catch (Exception e) {
            throw new RuntimeException("TTS generation failed", e);
        }
    }
}

Configure Spring's cache with Redis for a shared cache across instances:

# application.yml
spring:
  cache:
    type: redis
  data:
    redis:
      host: localhost
      port: 6379

// In your @SpringBootApplication class
@EnableCaching
public class Application { ... }

The first call generates audio and caches it. Every subsequent call for the same text+voice pair returns from Redis in microseconds. For a content site with a finite article corpus, this means near-zero TTS costs after the initial generation run.

Controller Endpoint

Wire it into a REST endpoint:

@RestController
@RequestMapping("/api/audio")
public class AudioController {
    private final TtsService ttsService;

    public AudioController(TtsService ttsService) {
        this.ttsService = ttsService;
    }

    @GetMapping(value = "/generate", produces = "audio/mpeg")
    public ResponseEntity<byte[]> generateAudio(@RequestParam String text) {
        byte[] audio = ttsService.generateAudio(text, "en-US-neural-1");
        return ResponseEntity.ok()
            .header("Content-Disposition", "inline; filename=\"speech.mp3\"")
            .body(audio);
    }
}

Cost Math for Java Applications

Speeko charges $0.03 per 1,000 characters. A typical product description runs 300–500 characters. Generating audio for a 10,000-product catalog costs $90–$150 one time, then nothing until the copy changes.

Compare that to ElevenLabs at $0.30/1K: the same catalog would cost $900–$1,500. For a catalog that updates quarterly, the difference compounds fast.

Next Steps

For async job handling at scale — queueing thousands of TTS jobs overnight and processing results via webhook — see the async TTS job queue guide. For reducing API calls further with SSML batching, see the SSML advanced guide.

Get started with a free $5 credit at Speeko — enough for 167,000 characters, or a full product catalog in most stores.

Text to Speech API Java Integration: HttpClient, Spring Boot, and Async Batch Processing

Text to Speech API Java Integration: HttpClient, Spring Boot, and Async Batch Processing

Basic HTTP Call with Java 11 HttpClient

Async Batch Processing with CompletableFuture

Spring Boot Service with Caching

Controller Endpoint

Cost Math for Java Applications

Next Steps

Related articles

Real-Time Voice Translation: Building Multilingual Conversation Systems

Voice Commerce Integration: Building Voice-Enabled Checkout Experiences