Transcription Overview
How transcription works
Voice Recorder Forge uses OpenAI Whisper to convert audio recordings to text. Whisper is a state-of-the-art speech recognition model trained on a large and diverse dataset, giving it strong performance across accents, background noise conditions, and languages.
Transcription happens automatically:
- Dictation: runs immediately after you press the hotkey to stop recording
- Meeting recording: runs after you click Stop Meeting
You do not need to manually trigger transcription.
The transcription pipeline
- Audio is captured and buffered in memory during recording.
- When recording stops, the audio is written to a temporary file.
- The file is sent to the OpenAI Whisper API (
whisper-1model). - The API returns a transcription in
verbose_jsonformat, which includes:- Full text transcript
- Detected language
- Total duration
- Segments with start/end timestamps, token IDs, and confidence scores
- For dictation, the transcript text is inserted at the cursor (or clipboard on macOS).
- For meetings, the transcript is saved to the local database.
Transcription options
| Option | Description | Default |
|--------|-------------|---------|
| Language | Hint to Whisper about the spoken language | en (English) |
| Temperature | Controls output randomness (0 = deterministic) | 0 |
| Response format | Format of the API response | verbose_json |
| Prompt | Optional context string to guide the model | None |
The language setting can be changed in Settings → Transcription Language. See Supported Languages for all available options.
Automatic language detection
If you leave the language setting blank or set it to Auto, Whisper will attempt to detect the language from the audio itself. Detection is generally accurate for common languages but less reliable for short recordings or mixed-language audio.
Confidence and timestamps
The verbose_json response includes segment-level data, giving each segment a confidence score and precise start/end timestamps. This metadata is stored in the database for meeting transcripts and can be used for future features such as speaker-aware playback or timeline navigation.
API usage and costs
Transcription uses the OpenAI API, which is billed per minute of audio. Whisper pricing is available on the OpenAI pricing page. You are responsible for any API costs incurred.