AI audio transcription powered by Whisper

Turn interviews, lectures and calls into text: an hour of audio from $0.36, with timestamps. Whisper large-v3 — the most accurate open speech model.

From $0.36/hourNo subscriptionWhisper large-v3TimestampsFiles up to 200 MB

Transcription on NeuralBox runs on Whisper large-v3 — OpenAI's most accurate open speech-recognition model. It handles fast speech, technical terms, accents and recordings with background noise. The language is detected automatically — dozens of languages are supported.

You get the full text plus timestamped segments: perfect for subtitles, jumping to the right spot in a recording, or quoting with second-level precision. If a file turns out to contain no speech, your tokens are refunded automatically.

No subscriptions or "$10+/month" plans like dedicated transcription services: upload a file, pay for its duration. An hour of audio costs from $0.36 (Fast Whisper) or $0.60 (WhisperX — maximum accuracy) — on par with the official OpenAI Whisper API ($0.006/min), but with a web UI, timestamps and one balance shared with 300+ AI models. Files up to 200 MB, payment by card (Stripe) or crypto, tokens never expire.

Recognition models

Fast Whisper (large-v3)

3 tokens/sec ≈ $0.36 per hour

Fast and economical: the same large-v3 base model in an accelerated build. The right choice for most recordings.

WhisperX (large-v3)

5 tokens/sec ≈ $0.60 per hour

Maximum accuracy with precise word-level timestamps — for subtitles and difficult audio.

Timestamps & segments

included

The text comes split into time-aligned segments — the foundation for subtitles and navigating recordings.

Automatic language detection

included

No need to specify the language — the model detects it on its own. Dozens of languages.

What transcription can do

🎯

large-v3 accuracy

The top Whisper model: handles fast speech, accents, jargon and background noise.

⏱️

Timestamps

Every text segment is time-aligned — jump to the right spot in a recording in one click.

🌍

Dozens of languages

English, European and Asian languages and many more — with automatic detection.

📦

Large files

Up to 200 MB — multi-hour voice-recorder sessions and full podcast episodes.

🛡️

Refund for silence

If the file turns out to contain no speech, your tokens come back automatically.

⚙️

API

Transcription via API — automate processing of calls and content.

What people use it for

Upload an audio file — the text is ready in a couple of minutes.

Interviews & podcasts

Transcribe an hour-long interview in minutes instead of an evening of manual work — with timestamp references

Lectures & webinars

Notes from a recording: transcribe the lecture, then ask a chat model to turn the text into structured notes

Calls & meetings

Zoom recording → text → summary with action items. Pairs well with NeuralBox chat

Subtitles

WhisperX's timestamped segments are a ready-made base for video subtitles

Voice memos

Thoughts and notes dictated on the go — turned into clean text

Call centers via API

Automatic call transcription for quality control and analytics

Parameters & limits

Models	WhisperX (large-v3), Fast Whisper (large-v3)
File size	up to 200 MB
Formats	MP3, WAV, M4A, OGG and other audio formats
Languages	dozens, with automatic detection
Output	full text + timestamped segments
Minimum charge	250–500 tokens (≈ $0.01–0.02) for short files
API access	yes — the NeuralBox API

What transcription costs

Recording	Fast Whisper	WhisperX
10 minutes	1,800 tokens ≈ $0.06	3,000 tokens ≈ $0.10
30 minutes	5,400 tokens ≈ $0.18	9,000 tokens ≈ $0.30
1 hour	10,800 tokens ≈ $0.36	18,000 tokens ≈ $0.60
3 hours	32,400 tokens ≈ $1.08	54,000 tokens ≈ $1.80

Billing is per second of audio duration. Token rate shown for the Basic plan ($5 = 150,000 tokens); larger plans cut token cost by up to 40%. For comparison: the official OpenAI Whisper API charges $0.006/min (API only, no UI), and transcription services sell subscriptions from $10–20/mo.

The $5 Basic plan buys 150,000 tokens — almost 14 hours of Fast Whisper transcription. Tokens never expire.

How to transcribe a recording

Top up your balance

From $2 — by card (Stripe) or crypto. No subscription, tokens never expire.

Upload the file

Open the Audio tab, pick a model and upload your recording. Timestamped text arrives in a couple of minutes.

Transcribe a recording

FAQ

How accurate is the transcription?

It runs on Whisper large-v3 — the most accurate open speech-recognition model. On clean speech, errors are rare; on noisy recordings with crosstalk, WhisperX delivers better quality.

How much does an hour cost?

From $0.36 with Fast Whisper and $0.60 with WhisperX (at the Basic plan rate; up to 40% cheaper on larger plans). You pay only for the recording's duration — there's no subscription.

How do I pay?

By card via Stripe or with cryptocurrency. You buy tokens once and spend them as you go — no subscription, no recurring charges.

Do tokens expire?

No. Tokens stay on your balance until you spend them — transcribe today or six months from now.

Which languages are supported?

Dozens of languages, including English, European and Asian ones. The language is detected automatically — no need to specify it.

Are there timestamps?

Yes, the output includes time-aligned segments. For the most precise timestamps (subtitles), use WhisperX.

What's the difference between WhisperX and Fast Whisper?

Both run on large-v3. Fast Whisper is about 40% cheaper and faster — right for most recordings. WhisperX aligns timestamps more precisely and handles difficult audio better.

Can I transcribe a video?

Audio files are what you upload. Extract the audio track from the video (any converter) and transcribe it. Files up to 200 MB cover multi-hour recordings.

What if the file has no speech?

Your tokens are automatically refunded — you don't pay for silence or instrumental music.

Is there an API?

Yes, transcription is available via the NeuralBox API — handy for automated processing of calls and content. Docs at neuralbox.io/api.

Transcribe your first recording

$5 = almost 14 hours of transcription. No subscription, pay per recording.

Get started