Most speech recognition apps haven’t any bother transcribing a local speaker being recorded with a professional microphone in a quiet room. This isn’t a problem.
So to check them extra totally, I created a “nightmare” recording of two non-native audio system with loud metropolis background noise.
How did they fare?
Let’s discover out.

Otter was one of the continuously talked about options once we requested for ideas on Twitter and within the Ahrefs neighborhood. And for good cause. It’s simple to arrange, has an intuitive interface, and presents clear pricing.
Distinctive options
What stands out from the remainder is the app’s means to report on-line conferences and transcribe them—just by pasting the assembly URL. However you too can import a video/audio file or report audio proper within the app.
Moreover, you may join your calendar to by no means miss a gathering.
Transcript high quality
I bought first rate outcomes, however there was rather a lot to edit too.
It didn’t get some names proper. However I can’t blame any device for not selecting up “Ahrefs” or “Tim Soulo” 100% of the time.

One factor I discovered is that after it notified the transcriptions have been prepared, it would nonetheless do one thing within the background (regulate time stamps, tag audio system, and so forth.). Like a scholar nonetheless scribbling on a take a look at paper whereas passing it to the instructor.
Pricing
You can begin without cost and improve to a paid plan later. You may import as much as three information and report 290 minutes of conferences earlier than it’s worthwhile to improve (as of April 2023).

Organising an account was a no brainer. I discovered the interface simple to navigate as properly. One private comment is that it felt slightly too “chilly” to make use of since I noticed issues like “Place Order,” “Billing,” and “Bill” manner too typically.
You may get an impression that it was designed by an accounting workforce (versus Descript that comes subsequent on this roundup).
Distinctive options
Moreover auto-generated transcripts, Rev presents dwell captions for Zoom conferences. You even have the choice to position an order for human transcriptions.
Transcript high quality
Poor audio with metropolis noise was a bit an excessive amount of for Rev. Some phrases have been lacking, whereas others have been misrecognized. In consequence, some paragraphs didn’t make a lot sense, whereas others have been wonderful.

Pricing
You may transcribe the primary audio file (as much as 45 minutes) without cost. I bought a invoice for $1.25 with a reduction that resulted in a complete of $0.00. Thanks, accounting workforce. 😉
Rev additionally has a 14-day trial of its paid plan. However that was tough to search out. To find it, it’s worthwhile to go to the footer of the homepage and search for it underneath “Companies.”


Descript welcomed me by identify (which was a pleasant coincidence). The principle factor you need to know is that it’s a standalone software program fairly than an online service. It’s rather more than a speech-to-text converter. It’s principally a video modifying device. And there’s positively a studying curve. However fortunately, onboarding is extraordinarily humorous and fascinating.

Distinctive options
As I discussed, Descript is extra of a video modifying device that’s good with transcribing. I’d name it “Canva for video/captions.” You may add B-rolls, results, animations, and extra.
You may simply drag and drop and principally produce a whole video with its assist. However if you happen to simply want a transcript or captions of a video or audio, you are able to do that too.
Transcript high quality
My pattern audio had fairly muddy outcomes. At occasions, it had issue recognizing abbreviations (e.g., search engine marketing). I additionally had an issue with eradicating filler phrases like “uh” and “um.”
I discovered that if I didn’t select an choice to take away them, they, um, simply stayed there although I didn’t want them more often than not. But when I did select to take away them, it sometimes ate up elements of different phrases, inflicting much more bother.
Additionally, it couldn’t acknowledge elements {that a} human being would haven’t any downside understanding simply from context, e.g., “Jack of all trades” grew to become ‘“jackal, trades.”
On the intense aspect, I imagine you may nonetheless perceive what the textual content is about.

Pricing
You can begin with fundamental features without cost and improve if wanted.

MacWhisper is a transcription device powered by Whisper. It’s an automated speech recognition (ASR) system developed by OpenAI, the identical firm that introduced us ChatGPT.
As OpenAI states on its web site:
Whisper is skilled on 680,000 hours of multilingual and multitask supervised information collected from the internet.
Whisper shouldn’t be one thing you may merely “run” as is. What’s extra, it’s fairly difficult to arrange if you happen to do wish to run it your self. Github, Python—you get the gist.
Fortunately, there are instruments like MacWhisper that take this off your shoulders and allow you to use the facility of AI in a easy person interface.
Distinctive options
Simply plain speech-to-text recognition with time stamps. Sadly, it doesn’t auto-tag the audio system.
Transcript high quality
If you run the device, you need to select a “mannequin” to work with. Principally, the lighter the mannequin, the faster it’ll run. However bigger fashions will produce higher outcomes. Additionally, in MacWhisper, these bigger (higher however slower) fashions are solely obtainable within the paid model.
I made a decision to begin with the free “small” mannequin, which was acknowledged to have “regular velocity with good accuracy.”
It was OK, however no higher than the rivals. I assumed it could work wonderful with high-quality audio, however not with the horrible examples I fed to it.
“AI is overrated,” I believed. However earlier than closing the Mac and switching again to my pricey Home windows PC, I made a decision to provide the “giant” mannequin a strive.
And you recognize what, AI shouldn’t be overrated. I discovered the outcomes to be a lot better than the rest.
The transcript was actually, actually good. It even bought issues like “Ahrefs” and “SaaS” proper! Although nonetheless not 100% of the time.

Pricing
You may run smaller fashions without cost. For a big mannequin, you’ll must buy a license.

This device is the best to make use of. Merely drag and drop your file—then it’s prepared. It takes a while to course of, although.
Distinctive options
Nothing apart from downloading a transcription.
Transcript high quality
My first impression was that the outcomes have been good as a result of, visually, it delivered a confident-looking textual content:

However after proofreading, I noticed that it merely didn’t embody the elements it failed to acknowledge—generally a number of phrases in a row.
Pricing
It’s free to use.

Premiere Professional shouldn’t be precisely a “transcription device” however fairly a video modifying software program. I’m together with it as a result of I assume that some corporations could have already got it of their arsenal (like we do).
To get to the transcription function in Premiere Professional, simply go to the “Captions and graphics” workspace and click on “Create transcription.”

Distinctive options
If we take solely speech recognition under consideration right here, what it does properly is creating exact time stamps, auto-tagging the audio system and, if wanted, routinely including an editable captions monitor to a video venture.
Transcript high quality
Let’s be simple: I discovered the noisy audio transcript to be a failure. I couldn’t comprehend what folks have been speaking about within the first place.

Nonetheless, I believe this function will be actually useful in case you are creating captions from high-quality audio. I used it myself a number of occasions and had nothing to complain about when the recording high quality was good.
Pricing
You want an Adobe Artistic Cloud subscription to make use of Premiere Professional.

Whereas signing up and importing information is fairly simple, you need to spend a while answering questions on you and your organization earlier than you may lastly get to the device itself. And no, you may’t skip typing in your organization identify, your function, and your organization dimension.
However when you get via this, the interface is clear and intuitive.
Distinctive options
You may generate a transcript or captions for video or audio. There may be additionally an choice to request a guide evaluation of the transcript. Alternatively, you may generate subtitles in a unique language, so you could have transcription and translation in a single click on.

Transcript high quality
Completely happy Scribe did a very good job transcribing the audio. It had no downside with phrases like “search engine marketing” and “SaaS” (clearly the weakest level for a lot of instruments). It may additionally auto-tag the audio system, which is likely to be useful in sure conditions.

Pricing
I may take a look at one file without cost. After that, I would want to purchase credit for use for every minute of video or audio transcribed.

Sonix is a device for automated transcriptions, translations, and integration with assembly apps.
Distinctive options
Moreover conferences integration, which is sort of a given for many instruments, AI abstract technology is an attention-grabbing function (in beta as of April 2023.) However I already bought spectacular outcomes from it.

You additionally get some further instruments to work with video captions—a timeline view and an possibility to separate captions into a number of traces. You can even import an present transcript, and Sonix will sync it with the audio.
Transcript high quality
Sonix has a customized vocabulary function. I discovered that helped a bit with names like “Tim Soulo” and “Ahrefs,” but it surely didn’t work 100% of the time. It largely did properly. However at occasions, it mistook search engine marketing for CEO and returned the phrase “Excel” seemingly out of nowhere.
The transcript made sense typically however required numerous edits if it wanted to be good.

Pricing
Sonix has a free trial for 25 minutes of transcriptions. After that, it’s worthwhile to buy pay-as-you-go credit or get a subscription.

Notta is one more transcription service that works for each real-time conferences and present recordings.
Distinctive options
Moreover transcription, Notta focuses on streamlining sure workflows and presents options resembling calendar sync and scheduler (in beta as of April 2023).
Transcript high quality
Background noise and poor audio high quality weren’t deal breakers for Notta. The transcription outcomes turned out largely OK however nonetheless had some issues.

Sentence construction was generally a bit bizarre, sure phrases went lacking, and my favourite “Jack of all trades” half wasn’t that neat this time.

One other factor value noting is that, for some cause, it failed to acknowledge two audio system, and the entire interview was tagged as “Speaker 1.”
Pricing
You can begin with a free fundamental subscription and take a look at a three-day trial of the paid plan, Notta Professional.
Ultimate ideas
As you may see, there are many instruments to select from. Nonetheless, plainly OpenAI stirred issues up a bit by releasing a free ASR (automated speech recognition) system, which I discovered to be significantly extra succesful than others.
However pure speech recognition high quality is only one issue. Perhaps you do must report your Zoom conferences (Otter), work with captions in a big video venture (Premiere Professional), or shortly create a Canva-style video (Descript).
Additionally, I must stress that I used to be attempting to push these instruments to the sting by giving them the worst-case state of affairs recording. For extra pure makes use of, the variations within the final result is likely to be a lot much less noticeable.
It’s nice to see that there are such a lot of choices on the market, and I hope this evaluation will assist a bit find the one that’s good for you.
Bought questions? Ping me on Twitter.