Documentation Index
Fetch the complete documentation index at: https://mintlify.com/openai/whisper/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Thewhisper command-line tool provides a simple interface for transcribing and translating audio files. It supports multiple audio formats and offers extensive customization options.
Basic Usage
Command Syntax
Multiple Files
Process multiple audio files in one command:Model Selection
Choose from different model sizes to balance speed and accuracy:tiny, base, small, medium, large, turbo, or English-only variants (tiny.en, base.en, small.en, medium.en).
The default model is
turbo, which offers fast transcription with good accuracy for English and multilingual content.Language Options
Automatic Language Detection
By default, Whisper detects the language automatically:Specify Language
For better performance, specify the language explicitly:Japanese, Spanish) or language code (e.g., ja, es).
Translation to English
Translate non-English speech directly to English:Output Options
Output Directory
Specify where to save the transcription files:Output Format
Choose specific output formats:txt- Plain textvtt- WebVTT subtitlessrt- SubRip subtitlestsv- Tab-separated values with timestampsjson- JSON with detailed segment informationall- Generate all formats (default)
Advanced Options
Word-Level Timestamps
Extract word-level timestamps for precise timing:Device Selection
Choose between CPU and GPU processing:Initial Prompt
Provide context or custom vocabulary to improve accuracy:Temperature and Sampling
- Greedy Decoding
- Sampling
Use temperature 0 for deterministic output:
Compression and Quality Thresholds
--compression_ratio_threshold: Detect and retry overly repetitive outputs (default: 2.4)--logprob_threshold: Retry if average log probability is too low (default: -1.0)--no_speech_threshold: Detect silent segments (default: 0.6)