Back to Blog
Building Voice Assistants with Laravel: Speech-to-Text and Beyond
Voice as Interface
Voice interfaces are becoming ubiquitous. Building voice capabilities into your Laravel application opens new interaction paradigms—hands-free operation, accessibility improvements, and natural conversation.
Speech-to-Text with Whisper
class SpeechToText
{
public function transcribe(string $audioPath): string
{
$response = Http::attach(
'file',
file_get_contents($audioPath),
'audio.webm'
)->post('https://api.openai.com/v1/audio/transcriptions', [
'model' => 'whisper-1',
]);
return $response->json('text');
}
}
Text-to-Speech
class TextToSpeech
{
public function synthesize(string $text, string $voice = 'alloy'): string
{
$response = Http::withHeaders([
'Authorization' => 'Bearer ' . config('services.openai.key'),
])->post('https://api.openai.com/v1/audio/speech', [
'model' => 'tts-1',
'input' => $text,
'voice' => $voice,
]);
$path = 'audio/' . Str::uuid() . '.mp3';
Storage::put($path, $response->body());
return $path;
}
}
Voice Conversation Loop
class VoiceAssistant
{
public function processVoiceInput(string $audioPath): array
{
// Transcribe
$text = $this->stt->transcribe($audioPath);
// Process with chatbot
$response = $this->chatbot->respond($text);
// Synthesize response
$audioResponse = $this->tts->synthesize($response);
return [
'transcription' => $text,
'response_text' => $response,
'response_audio' => $audioResponse,
];
}
}
Real-Time Processing
class RealtimeVoice
{
public function streamTranscription(Request $request): StreamedResponse
{
return response()->stream(function () use ($request) {
$audioStream = $request->getContent();
// Process in chunks
foreach ($this->chunkAudio($audioStream) as $chunk) {
$partial = $this->stt->transcribeChunk($chunk);
echo "data: " . json_encode(['partial' => $partial]) . "\n\n";
ob_flush();
flush();
}
}, 200, ['Content-Type' => 'text/event-stream']);
}
}
Conclusion
Voice interfaces add powerful capabilities to applications. Start with basic transcription and synthesis, then build toward real-time conversation. Consider accessibility implications and provide fallback text interfaces.
Related Articles
Need Help With Your Project?
I respond to all inquiries within 24 hours. Let's discuss how I can help build your production-ready system.
Get In Touch