Why 100% Accuracy is Not Available With Speech Recognition Software Alone

August 15, 2013 10:40 am
2 notes.

As much as technology has evolved in the last few years, and even with the advancements in speech recognition software, transcription still requires human review and intervention to assure close to 100% accuracy. Although lower accuracy may be an adequate tradeoff for immediate results, the majority of people who use transcription professionally require publishing-ready transcripts with minimal review time in order to justify the expense. In order to get around this hurdle, TranscribeMe uses a hybrid approach that assures quicker turnaround time and prices than human transcription alone, as well as higher accuracy than speech processing alone in one package.

Will speech-processing software ever reach human transcriber accuracy? Realistically, this will not happen for at least another ten years, and here’s why:

Speech patterns and accents

Different regions and people within those regions might have a different way of speaking, which makes training a computer to recognize accents and speech patterns very difficult even when tested with various sample groups. Moreover, some people slur their words or blend them when speaking very quickly, which can cause errors in transcription of the audio. Sometimes people might stutter or pause to think, which means that the software may include words such as um, ah, eh, hmm, stutters, and other words that should have otherwise been omitted in a clean transcript.

Grammar and punctuation

Speech recognition software also requires that you verbalize punctuation versus automatically implementing it (for example, by stating comma, period, or colon instead of implying it by the tone of your voice). This makes transcription of professional speeches, interviews, etc. more difficult to transcribe because they will require human review in order to add the appropriate punctuation and/ or to fix any grammar mistakes.

Homonyms and unusual words

Speech processing software can only recognize words and phrases that it has specifically been trained to recognize. As such, any time that slang or made up terms are used that may not necessarily be in the computer program, the machine will not recognize these terms. This applies to brand names, last names, and other unusual words, such as acronyms or highly technical vocabulary. Another possible problem is the usage of homonyms, or words that sound the same but are not, such as there/ their/ they’re and air/ heir. A computer will not be able to recognize which word it should use without being able to understand the context of the sentence, which requires extensive programming and advancements in the technology.

Ambient noise, overlapping speech, and number of speakers

When multiple speakers are present, they will frequently interrupt each other or speak at the same time, which can be challenging to transcribe for even the most experienced human transcribers. Because computers require clear speech, it is nearly impossible for them to deduce the words accurately and then separate the text by speaker. A human transcriber, may at least be able to figure out who spoke and what they said based on the sound of the speaker’s voice, as well as the previous context. Moreover, ambient noise, such as music, other people talking, a tea kettle whistling, and even wind noise will affect the accuracy of the transcription, as the computer uses sound bites to figure out the word and these other sounds can cause inaccuracies. Although ambient noise can also be an issue for human transcribers, they can at least try to figure out what is being said.

To sum it up, speech recognition software, while a valuable tool in transcription, is still a long way away from achieving close to 100% accuracy for the majority of the public and especially without some form of human review. Thus, for professional and enterprise clients, a hybrid approach such as TranscribeMe’s is the best for generating speedy transcripts with high accuracy and at low cost. Please get in touch with our sales team if you’re interested in transcribing or leave us a comment below.

2 Responses to “Why 100% Accuracy is Not Available With Speech Recognition Software Alone”

Sergei Lee says:

December 16, 2013 at 5:22 pm

I’m just trying to figure out what speach recognition is. If some of you can answer to the following questions it would make the matter clear for me. Thanks in advance.

1. Which language is the easiest or one of the easiest to transcribe?

2. Is English one of the most diffictult languages to transcribe?

Reply
daniel F says:

March 1, 2015 at 1:40 am

Yes, thats true. I was trying to say “Add” and the system recognized “at”. No matter the times I tried, the system never could recognize my accent. I know they sound a bit different, but I d said just as it is, saying at with the t sound and add with the non-existence sound of a mute “d”, but in the two ways, the words written were “at at”

Reply

Call Us +1-800-275-5513

Why 100% Accuracy is Not Available With Speech Recognition Software Alone

2 Responses to “Why 100% Accuracy is Not Available With Speech Recognition Software Alone”

Leave a Reply

Download our app!

Partners

Support

Talk to us

Crowd Workers

Join our mailing list