Voicepad

Voice recording with GPU-accelerated transcription and smart audio chunking.

What is Voicepad?

Voicepad is a command-line tool for recording audio and transcribing it using OpenAI's Whisper models. It features VAD (Voice Activity Detection) chunking that automatically splits long recordings at natural speech boundaries, enabling real-time background transcription.

Key Features:

🎙️ High-Quality Recording - 16kHz mono WAV files from any audio input device
✂️ Smart Chunking - AI-powered splitting at natural speech boundaries
⚡ Background Transcription - Real-time processing while you record
🚀 GPU Acceleration - 4-5x faster with CUDA support
📝 Markdown Output - Clean, formatted transcriptions with metadata
⚙️ Flexible Configuration - YAML-based settings for all options

Quick Start

# Install
pip install voicepad

# First recording
voicepad record start

# With smart chunking (for long recordings)
voicepad record start --vad --min-chunk-duration 60

New user? Start with the Getting Started Guide.

Documentation

For Users

Getting Started - Installation and first recording
Configuration - Customize all settings
Features - Smart chunking, background transcription, models
CLI Reference - Complete command documentation
Guides - How-to guides for common tasks

For Developers

voicepad-core - Python library API
Architecture - Technical design
GPU Acceleration - CUDA setup

Configuration Reference

Paths - Output directories
Audio - Input device settings
Transcription - Model and device options
VAD Settings - Chunking parameters

Feature Highlights

VAD Chunking

Split long recordings at natural speech boundaries for background processing:

vad_enabled: true
vad_min_chunk_duration: 60.0
vad_threshold: 0.5

Learn more: VAD Chunking Guide

Background Transcription

See transcription update in real-time while recording:

[Recording...] → [Chunk 1 transcribed] → [Chunk 2 transcribed] → [Recording...]

Learn more: Background Transcription

Multiple Whisper Models

Choose the right balance of speed vs. accuracy:

tiny - Fastest, good for quick notes
medium - Balanced, recommended default
large-v3 - Best accuracy, slower

Learn more: Transcription Models

Resources

GitHub Repository - Source code and issues
Zensical Documentation - Documentation framework
Writing Documentation - Contribution guide