Turn interviews, lectures and calls into text: an hour of audio from $0.36, with timestamps. Whisper large-v3 — the most accurate open speech model.
Transcription on NeuralBox runs on Whisper large-v3 — OpenAI's most accurate open speech-recognition model. It handles fast speech, technical terms, accents and recordings with background noise. The language is detected automatically — dozens of languages are supported.
You get the full text plus timestamped segments: perfect for subtitles, jumping to the right spot in a recording, or quoting with second-level precision. If a file turns out to contain no speech, your tokens are refunded automatically.
No subscriptions or "$10+/month" plans like dedicated transcription services: upload a file, pay for its duration. An hour of audio costs from $0.36 (Fast Whisper) or $0.60 (WhisperX — maximum accuracy) — on par with the official OpenAI Whisper API ($0.006/min), but with a web UI, timestamps and one balance shared with 300+ AI models. Files up to 200 MB, payment by card (Stripe) or crypto, tokens never expire.
Upload an audio file — the text is ready in a couple of minutes.
| Models | WhisperX (large-v3), Fast Whisper (large-v3) |
| File size | up to 200 MB |
| Formats | MP3, WAV, M4A, OGG and other audio formats |
| Languages | dozens, with automatic detection |
| Output | full text + timestamped segments |
| Minimum charge | 250–500 tokens (≈ $0.01–0.02) for short files |
| API access | yes — the NeuralBox API |
| Recording | Fast Whisper | WhisperX |
|---|---|---|
| 10 minutes | 1,800 tokens ≈ $0.06 | 3,000 tokens ≈ $0.10 |
| 30 minutes | 5,400 tokens ≈ $0.18 | 9,000 tokens ≈ $0.30 |
| 1 hour | 10,800 tokens ≈ $0.36 | 18,000 tokens ≈ $0.60 |
| 3 hours | 32,400 tokens ≈ $1.08 | 54,000 tokens ≈ $1.80 |
Billing is per second of audio duration. Token rate shown for the Basic plan ($5 = 150,000 tokens); larger plans cut token cost by up to 40%. For comparison: the official OpenAI Whisper API charges $0.006/min (API only, no UI), and transcription services sell subscriptions from $10–20/mo.
It runs on Whisper large-v3 — the most accurate open speech-recognition model. On clean speech, errors are rare; on noisy recordings with crosstalk, WhisperX delivers better quality.
From $0.36 with Fast Whisper and $0.60 with WhisperX (at the Basic plan rate; up to 40% cheaper on larger plans). You pay only for the recording's duration — there's no subscription.
By card via Stripe or with cryptocurrency. You buy tokens once and spend them as you go — no subscription, no recurring charges.
No. Tokens stay on your balance until you spend them — transcribe today or six months from now.
Dozens of languages, including English, European and Asian ones. The language is detected automatically — no need to specify it.
Yes, the output includes time-aligned segments. For the most precise timestamps (subtitles), use WhisperX.
Both run on large-v3. Fast Whisper is about 40% cheaper and faster — right for most recordings. WhisperX aligns timestamps more precisely and handles difficult audio better.
Audio files are what you upload. Extract the audio track from the video (any converter) and transcribe it. Files up to 200 MB cover multi-hour recordings.
Your tokens are automatically refunded — you don't pay for silence or instrumental music.
Yes, transcription is available via the NeuralBox API — handy for automated processing of calls and content. Docs at neuralbox.io/api.
$5 = almost 14 hours of transcription. No subscription, pay per recording.
Get started