NeuralBox Transcribe a recording

AI audio transcription powered by Whisper

Turn interviews, lectures and calls into text: an hour of audio from $0.36, with timestamps. Whisper large-v3 — the most accurate open speech model.

From $0.36/hourNo subscriptionWhisper large-v3TimestampsFiles up to 200 MB

Transcription on NeuralBox runs on Whisper large-v3 — OpenAI's most accurate open speech-recognition model. It handles fast speech, technical terms, accents and recordings with background noise. The language is detected automatically — dozens of languages are supported.

You get the full text plus timestamped segments: perfect for subtitles, jumping to the right spot in a recording, or quoting with second-level precision. If a file turns out to contain no speech, your tokens are refunded automatically.

No subscriptions or "$10+/month" plans like dedicated transcription services: upload a file, pay for its duration. An hour of audio costs from $0.36 (Fast Whisper) or $0.60 (WhisperX — maximum accuracy) — on par with the official OpenAI Whisper API ($0.006/min), but with a web UI, timestamps and one balance shared with 300+ AI models. Files up to 200 MB, payment by card (Stripe) or crypto, tokens never expire.

Recognition models

Fast Whisper (large-v3)
3 tokens/sec ≈ $0.36 per hour
Fast and economical: the same large-v3 base model in an accelerated build. The right choice for most recordings.
WhisperX (large-v3)
5 tokens/sec ≈ $0.60 per hour
Maximum accuracy with precise word-level timestamps — for subtitles and difficult audio.
Timestamps & segments
included
The text comes split into time-aligned segments — the foundation for subtitles and navigating recordings.
Automatic language detection
included
No need to specify the language — the model detects it on its own. Dozens of languages.

What transcription can do

🎯
large-v3 accuracy
The top Whisper model: handles fast speech, accents, jargon and background noise.
⏱️
Timestamps
Every text segment is time-aligned — jump to the right spot in a recording in one click.
🌍
Dozens of languages
English, European and Asian languages and many more — with automatic detection.
📦
Large files
Up to 200 MB — multi-hour voice-recorder sessions and full podcast episodes.
🛡️
Refund for silence
If the file turns out to contain no speech, your tokens come back automatically.
⚙️
API
Transcription via API — automate processing of calls and content.

What people use it for

Upload an audio file — the text is ready in a couple of minutes.

Interviews & podcasts
Transcribe an hour-long interview in minutes instead of an evening of manual work — with timestamp references
Lectures & webinars
Notes from a recording: transcribe the lecture, then ask a chat model to turn the text into structured notes
Calls & meetings
Zoom recording → text → summary with action items. Pairs well with NeuralBox chat
Subtitles
WhisperX's timestamped segments are a ready-made base for video subtitles
Voice memos
Thoughts and notes dictated on the go — turned into clean text
Call centers via API
Automatic call transcription for quality control and analytics

Parameters & limits

ModelsWhisperX (large-v3), Fast Whisper (large-v3)
File sizeup to 200 MB
FormatsMP3, WAV, M4A, OGG and other audio formats
Languagesdozens, with automatic detection
Outputfull text + timestamped segments
Minimum charge250–500 tokens (≈ $0.01–0.02) for short files
API accessyes — the NeuralBox API

What transcription costs

RecordingFast WhisperWhisperX
10 minutes1,800 tokens ≈ $0.063,000 tokens ≈ $0.10
30 minutes5,400 tokens ≈ $0.189,000 tokens ≈ $0.30
1 hour10,800 tokens ≈ $0.3618,000 tokens ≈ $0.60
3 hours32,400 tokens ≈ $1.0854,000 tokens ≈ $1.80

Billing is per second of audio duration. Token rate shown for the Basic plan ($5 = 150,000 tokens); larger plans cut token cost by up to 40%. For comparison: the official OpenAI Whisper API charges $0.006/min (API only, no UI), and transcription services sell subscriptions from $10–20/mo.

The $5 Basic plan buys 150,000 tokens — almost 14 hours of Fast Whisper transcription. Tokens never expire.

How to transcribe a recording

Sign in
Register with Google, Telegram or email — under a minute.
Top up your balance
From $2 — by card (Stripe) or crypto. No subscription, tokens never expire.
Upload the file
Open the Audio tab, pick a model and upload your recording. Timestamped text arrives in a couple of minutes.

Transcribe a recording

FAQ

How accurate is the transcription?

It runs on Whisper large-v3 — the most accurate open speech-recognition model. On clean speech, errors are rare; on noisy recordings with crosstalk, WhisperX delivers better quality.

How much does an hour cost?

From $0.36 with Fast Whisper and $0.60 with WhisperX (at the Basic plan rate; up to 40% cheaper on larger plans). You pay only for the recording's duration — there's no subscription.

How do I pay?

By card via Stripe or with cryptocurrency. You buy tokens once and spend them as you go — no subscription, no recurring charges.

Do tokens expire?

No. Tokens stay on your balance until you spend them — transcribe today or six months from now.

Which languages are supported?

Dozens of languages, including English, European and Asian ones. The language is detected automatically — no need to specify it.

Are there timestamps?

Yes, the output includes time-aligned segments. For the most precise timestamps (subtitles), use WhisperX.

What's the difference between WhisperX and Fast Whisper?

Both run on large-v3. Fast Whisper is about 40% cheaper and faster — right for most recordings. WhisperX aligns timestamps more precisely and handles difficult audio better.

Can I transcribe a video?

Audio files are what you upload. Extract the audio track from the video (any converter) and transcribe it. Files up to 200 MB cover multi-hour recordings.

What if the file has no speech?

Your tokens are automatically refunded — you don't pay for silence or instrumental music.

Is there an API?

Yes, transcription is available via the NeuralBox API — handy for automated processing of calls and content. Docs at neuralbox.io/api.

Transcribe your first recording

$5 = almost 14 hours of transcription. No subscription, pay per recording.

Get started