Voicepad
Voice recording with GPU-accelerated transcription and smart audio chunking.
What is Voicepad?
Voicepad is a command-line tool for recording audio and transcribing it using OpenAI's Whisper models. It features VAD (Voice Activity Detection) chunking that automatically splits long recordings at natural speech boundaries, enabling real-time background transcription.
Key Features:
- 🎙️ High-Quality Recording - 16kHz mono WAV files from any audio input device
- ✂️ Smart Chunking - AI-powered splitting at natural speech boundaries
- ⚡ Background Transcription - Real-time processing while you record
- 🚀 GPU Acceleration - 4-5x faster with CUDA support
- 📝 Markdown Output - Clean, formatted transcriptions with metadata
- ⚙️ Flexible Configuration - YAML-based settings for all options
Quick Start
# Install
pip install voicepad
# First recording
voicepad record start
# With smart chunking (for long recordings)
voicepad record start --vad --min-chunk-duration 60
New user? Start with the Getting Started Guide.
Documentation
For Users
- Getting Started - Installation and first recording
- Configuration - Customize all settings
- Features - Smart chunking, background transcription, models
- CLI Reference - Complete command documentation
- Guides - How-to guides for common tasks
For Developers
- voicepad-core - Python library API
- Architecture - Technical design
- GPU Acceleration - CUDA setup
Configuration Reference
- Paths - Output directories
- Audio - Input device settings
- Transcription - Model and device options
- VAD Settings - Chunking parameters
Feature Highlights
VAD Chunking
Split long recordings at natural speech boundaries for background processing:
vad_enabled: true
vad_min_chunk_duration: 60.0
vad_threshold: 0.5
Learn more: VAD Chunking Guide
Background Transcription
See transcription update in real-time while recording:
[Recording...] → [Chunk 1 transcribed] → [Chunk 2 transcribed] → [Recording...]
Learn more: Background Transcription
Multiple Whisper Models
Choose the right balance of speed vs. accuracy:
- tiny - Fastest, good for quick notes
- medium - Balanced, recommended default
- large-v3 - Best accuracy, slower
Learn more: Transcription Models
Resources
- GitHub Repository - Source code and issues
- Zensical Documentation - Documentation framework
- Writing Documentation - Contribution guide