Background Transcription
Background transcription processes audio chunks in real-time while recording continues, providing immediate feedback and eliminating wait time at the end of long recordings.
What is Background Transcription?
When VAD chunking is enabled, Voicepad transcribes each chunk in the background as soon as it's extracted:
graph TB
subgraph "Recording Thread"
A[Record Audio] --> B[Audio Queue]
end
subgraph "Chunk Worker Thread"
B --> C[VAD Analysis]
C --> D{Chunk Ready?}
D -->|No| B
D -->|Yes| E[Extract Chunk]
E --> F[Transcribe Chunk]
F --> G[Append to Markdown]
G --> H[Accumulate Audio]
end
A --> I[Continue Recording...]
H --> J[Next Chunk...]
Key principle: Recording and transcription happen simultaneously, not sequentially.
How It Works
1. Chunk Detection
When VAD detects a natural speech boundary:
- Chunk audio is extracted from the buffer
- Chunk is added to the transcription queue
- Recording continues uninterrupted
2. Background Transcription Thread
A dedicated worker thread:
- ✅ Monitors queue for new chunks
- ✅ Transcribes each chunk using Whisper model
- ✅ Appends result to markdown file (thread-safe)
- ✅ Accumulates audio data in memory
- ✅ Continues until recording stops
3. Model Caching
The first chunk load the Whisper model into memory. Subsequent chunks reuse the cached model:
# First chunk: Load model (~2-5 seconds)
Chunk 1: [Load Model] → [Transcribe 60s audio in 3s]
# Subsequent chunks: Reuse cached model (~0 seconds overhead)
Chunk 2: [Transcribe 60s audio in 3s]
Chunk 3: [Transcribe 60s audio in 3s]
Result: Only the first chunk has loading overhead!
4. Thread-Safe Writing
Multiple chunks may be transcribing simultaneously (if transcription is fast). A lock ensures markdown file updates are sequential and never corrupted:
with markdown_lock:
append_to_file(chunk_transcription)
Real-Time Markdown Updates
Live Progress
Open the markdown file while recording to watch transcription appear:
# Transcription: meeting_20260218_033000.wav
**Status:** Recording in progress...
---
## Chunk 1 (0:00 - 1:15)
Welcome everyone to today's meeting. We'll be discussing...
## Chunk 2 (1:15 - 2:45)
[This appears while you're still recording chunk 3]
First on the agenda is the project timeline...
Live Viewing
Use a markdown viewer with auto-refresh to see transcription update live. VS Code's markdown preview auto-updates when the file changes.
Completion Status
When recording stops, the status updates:
**Status:** Recording complete ✓
---
**Recording Complete**
- Total chunks: 5
- Total duration: 8:25
Audio File Output
Memory Accumulation
Chunk audio data is kept in memory (not saved individually):
accumulated_chunks = [
chunk_1_audio, # 60 seconds
chunk_2_audio, # 58 seconds
chunk_3_audio, # 62 seconds
]
Single Merged File
When recording stops, all chunks merge into one WAV file:
merged_audio = concatenate(accumulated_chunks)
save_to_file("recording_20260218_033000.wav")
Result: One continuous audio file, no chunk files on disk.
Performance
Transcription Speed
Speed depends on device and model:
GPU (CUDA):
- Tiny model: 20-40x real-time (60s → 1.5-3s)
- Medium model: 10-20x real-time (60s → 3-6s)
- Large model: 5-10x real-time (60s → 6-12s)
CPU:
- Tiny model: 3-8x real-time (60s → 7.5-20s)
- Medium model: 1-3x real-time (60s → 20-60s)
- Large model: 0.5-1x real-time (60s → 60-120s)
CPU with Large Models
On CPU, large models may transcribe slower than real-time. This means chunks can pile up faster than they're processed, causing delays.
Optimal Settings
For smooth background transcription:
# GPU users (fast transcription)
vad_min_chunk_duration: 60.0
transcription_model: medium
transcription_device: cuda
# CPU users (slower transcription)
vad_min_chunk_duration: 120.0 # Larger chunks, less frequent
transcription_model: tiny # Faster model
transcription_device: cpu
Error Handling
Transcription Failures
If a chunk fails to transcribe:
- ✅ Error logged with details
- ✅ Worker continues processing next chunk
- ✅ Recording is not interrupted
- ✅ Failed chunk is still saved in audio file
The markdown will show:
## Chunk 3 (2:30 - 3:45)
[Transcription failed for this chunk]
Worker Resilience
The chunk worker is designed to never crash:
try:
transcribe_chunk()
except Exception as e:
log_error(e)
continue # Process next chunk
Even if transcription fails completely, you still get the audio file!
Monitoring Progress
Log Output
Watch the terminal for real-time progress:
[INFO] Chunk 1 ready (62.3s), transcribing...
[INFO] Chunk 1 transcribed successfully
[INFO] Chunk 2 ready (59.8s), transcribing...
[INFO] Chunk 2 transcribed successfully
Chunk Worker Status
On stop, see summary:
[INFO] Chunk worker stopped. Processed 5 chunks, accumulated 5 total chunks
[INFO] Saving 5 accumulated chunks...
[INFO] Recording saved to: data/recordings/meeting_20260218_033000.wav
[INFO] Total duration: 8:25.6 seconds
Comparison: With vs Without Background Transcription
Traditional (No VAD)
Timeline: [Record 10 min] → [Stop] → [Transcribe 2 min] → [Done]
Total time: 12 minutes
Wait time: 2 minutes
Output timing:
- Audio: Available immediately
- Transcription: Available after 2 minutes
Background Transcription (VAD Enabled)
Timeline: [Record 10 min with live transcription] → [Stop] → [Done]
Total time: 10 minutes
Wait time: ~0 seconds (finalization only)
Output timing:
- Audio: Available after finalization (~2 seconds)
- Transcription: 80% available during recording, 100% within 5 seconds of stop
Time saved: 2 minutes of waiting!
Use Cases
✅ Ideal For
- Long meetings - See transcription while meeting continues
- Interviews - Review what was said while still recording
- Lectures - Watch notes appear in real-time
- Monitoring - Know if transcription quality is good before recording ends
⚠️ Considerations
- Resource usage - Transcription uses CPU/GPU while recording
- Battery impact - Laptop users may see faster battery drain
- Disk I/O - Markdown file updates frequently
Technical Details
Threading Model
# Main thread
recorder.start_recording() # Starts:
→ recording_thread # Captures audio
→ chunk_worker_thread # Processes chunks
# On stop
recorder.stop_recording() # Waits for:
→ recording_thread.join() # Audio capture stops
→ chunk_worker.finalize() # Final chunk processed
→ save_accumulated_audio() # Merge and save
Queue Management
Audio frames flow through a queue:
audio_queue = Queue()
# Recording thread adds frames
audio_queue.put(audio_frame)
# Chunk worker consumes frames
frame = audio_queue.get(timeout=0.5)
chunker.add_audio(frame)
When recording stops:
- Chunk worker drains remaining frames
- Finalizes last chunk
- Transcribes final chunk
- Updates markdown completion
See Also
- VAD Chunking - How chunks are created
- Transcription Models - Choosing the right model
- Long Recordings Guide - Best practices
- GPU Acceleration - Faster transcription