Skip to content

Voicepad

Voice recording with GPU-accelerated transcription and smart audio chunking.

What is Voicepad?

Voicepad is a command-line tool for recording audio and transcribing it using OpenAI's Whisper models. It features VAD (Voice Activity Detection) chunking that automatically splits long recordings at natural speech boundaries, enabling real-time background transcription.

Key Features:

  • 🎙️ High-Quality Recording - 16kHz mono WAV files from any audio input device
  • ✂️ Smart Chunking - AI-powered splitting at natural speech boundaries
  • Background Transcription - Real-time processing while you record
  • 🚀 GPU Acceleration - 4-5x faster with CUDA support
  • 📝 Markdown Output - Clean, formatted transcriptions with metadata
  • ⚙️ Flexible Configuration - YAML-based settings for all options

Quick Start

# Install
pip install voicepad

# First recording
voicepad record start

# With smart chunking (for long recordings)
voicepad record start --vad --min-chunk-duration 60

New user? Start with the Getting Started Guide.

Documentation

For Users

For Developers

Configuration Reference

Feature Highlights

VAD Chunking

Split long recordings at natural speech boundaries for background processing:

vad_enabled: true
vad_min_chunk_duration: 60.0
vad_threshold: 0.5

Learn more: VAD Chunking Guide

Background Transcription

See transcription update in real-time while recording:

[Recording...] → [Chunk 1 transcribed] → [Chunk 2 transcribed] → [Recording...]

Learn more: Background Transcription

Multiple Whisper Models

Choose the right balance of speed vs. accuracy:

  • tiny - Fastest, good for quick notes
  • medium - Balanced, recommended default
  • large-v3 - Best accuracy, slower

Learn more: Transcription Models

Resources