src/services/speech/AudioConverter.ts)Purpose: Handle WebM to MP3 conversion using FFmpeg
Key Methods:
convertToMp3(webmPath: string): Promise<string> - Convert WebM to MP3cleanup(mp3Path: string): Promise<void> - Clean up temporary MP3 filesFeatures:
src/services/speech/TranscriptionClient.ts)Purpose: Handle OpenAI Whisper API communication
Key Methods:
transcribe(filePath: string, language?: string): Promise<string> - Transcribe audio filegetApiKey(): string | null - Get OpenAI API key from contextgetBaseUrl(): string - Get OpenAI base URLFeatures:
src/services/speech/ChunkProcessor.ts)Purpose: Handle chunk file detection and processing coordination
Key Methods:
startWatching(directory: string): void - Start watching for chunksstopWatching(): void - Stop watchingprocessChunk(chunkPath: string): Promise<string> - Process single chunkEvents:
chunkReady - Emitted when chunk is ready for processingchunkProcessed - Emitted when chunk processing is completeerror - Emitted on processing errorssrc/services/speech/StreamingManager.ts)Purpose: Handle text deduplication and streaming state
Key Methods:
addChunkText(text: string): string - Add chunk text with deduplicationgetSessionText(): string - Get current session textreset(): void - Reset session stateFeatures:
Instead of polling, use FFmpeg's built-in notifications:
ffmpeg -f avfoundation -i :default \
-c:a libopus -b:a 32k -application voip -ar 16000 -ac 1 \
-f segment -segment_time 3 -reset_timestamps 1 \
-segment_list /tmp/segments.txt -segment_list_flags +live \
/tmp/chunk_%03d.webm
Key Changes:
-segment_list to track completed segments1. AudioRecorder starts FFmpeg with segment completion logging
2. ChunkProcessor watches FFmpeg stderr for completion events
3. On "Closing chunk_001.webm" → emit chunkReady event
4. AudioConverter converts WebM → MP3
5. TranscriptionClient transcribes MP3
6. StreamingManager deduplicates and emits progressive updates
Transform SpeechService from monolithic to orchestrator:
New Structure:
export class SpeechService extends EventEmitter {
private audioConverter: AudioConverter
private transcriptionClient: TranscriptionClient
private chunkProcessor: ChunkProcessor
private streamingManager: StreamingManager
// Orchestrate the modules instead of doing everything
}
Benefits:
This approach eliminates race conditions and makes the system much more reliable and maintainable.