Better STT Results: Why you need a good mike for top Automatic Transcriptions

By Lina Scarborough

TL;DR: 

  • For optimal Speech-To-Text results, record your audios in .wav form. 
  • Use dual lapel mikes attached to each speaker’s shirt collars, seated 1 – 1.5 m from each other to prevent static interference.
  • If you use a stationary mike, such as the Saramonic, the device should be placed on a surface or mounted to a tripod.
Using an external Microphone.
A simple external mike makes a surprising improvement in automatic STT transcriptions by improving audio quality.

Thomas Edison shouted “Mary had a little lamb…!” into a mouthpiece way back in 1877. Turning a hand-crank, the creaky, crackling sound of a voice replayed the sound – his invention worked – it was the first recording of a human voice in history that could not only be recorded, but played back too. 

Fast forward to the modern days: recording audio is standard now, and the omnipresent smartphone is often used to record meetings and interviews. But: a lot of people still use old-fashioned methods for transcribing these audio recordings into text – a step that is necessary in many cases, for instance to write journalistic articles or to generate meeting minutes. Transcribing audio files by hand is an extremely tedious task that can easily take several hours for a single hour of audio.

With the tremendous advances of Automatic Speech Recognition Systems (ASR), it is nowadays possible to automatically transcribe speech into text with surprisingly high accuracy. We have created a software solution – called Interscriber – that is based on state-of-the-art technologies and provides an easy-to-use interface for automatic transcription of interviews and meetings on your mobile and computer.

One crucial aspect in using our system – and this applies to any ASR System in general – is that the transcription quality highly depends on the incoming audio. As we have shown in a recent scientific study together with the Zurich University of Applied Sciences (ZHAW), read-aloud text is much easier to transcribe than spontaneous speech.

Logo of Interscriber.
Interscriber reduces workload by at least 4 fold compared to manual transcriptions.

On the other hand, the quality of the audio recording itself significantly influences the transcription quality: the better the recording, the less errors in the transcript. It turns out that recording quality is of the easiest and most controllable ways is to improve the transcription, by simply choosing a proper microphone.

If you use a smartphone’s built-in mic, like Voice Memo, you will most likely encounter any of the following issues:

  • Background ‘ruffling’ sounds
  • Interruptions from incoming notifications or calls (message pings or vibrations)
  • Chatter in background or wind noises

People who are serious about getting professional, top-standard audios will need to invest in an external microphone.

How to Determine What Mike You Need

Your mike doesn’t have to be the most expensive or bulkiest one. We’ll first look at general criteria for good microphones for interviews and meetings, before delving into specialized usage.

Key Requirements for any mike

  • Compatible with your phone – optimal is a Lightning connector for iOs/iPhone and an USB-C port for Android, but a basic Audio-Line-In port is also sufficient
  • Quality criteria #1: Frequency response – The lowest to highest range of frequencies it can pick up (80 Hz to 15 kHz is good for a vocal mic)
  • Quality criteria #2: Sampling rate – How fast samples are taken (audio CDs have a sample rate of 44.1kHz).

But different microphones, of course, serve different purposes.

Different connectors side by side.
Lightning connector and USB-C port vs. Audio-Line-In cable.

Requirements for specialized mikes

The requirements depend on the purpose or setting: monologue (uni-directional) vs. several speaker discussion (omni-directional).

  • Uni-directional = for one speaker
    • Also called directional or shotgun mikes, uni-directional mikes capture the sound primarily coming out of one direction only.
  • Omni-directional = for many speakers
    • This means you want to capture sound coming from all directions, like in a group setting or a boardroom meeting.
Diagram of uni- and omni-directional microphones.

The Proof is in the Pudding

We singled out and tested 6 mike options for comparison. In order of best performing, the mikes are:

Model & PriceProsCons
  • Most crisp sound quality
  • Best value for money
  • Compatible with smartphones, tablets and other audio-video recorders.
  • Omnidirectional directional pattern
  • Long cables attached to speakers can be a fuss to set up
  • May not work well with small tape recorders
  • Compatible with both Android and iOS – Volume control and LCD level meter monitoring
  • Comes with aluminium handgrip
  • Expensive, especially if for students or freelancers
  • Aluminium body causes knocking and rumbling sounds on audios if placed on surface occasionally moves, e.g on a wobbly table
  • Compatible with most Apple devices
  • Ensures clean sound with no signal overload
  • Powered by the connected iOS device
  • Mid-side (M-S) stereo microphone
  • Does not support Android devices
  • Small, attachable T-shaped microphoneRecords in WMA format in high quality During recording, the LC display shows the recording levels for the right and left channels
  • The cable shielding is not the best, which means the mike can add a lot of distortion
  • Stereo mini-jack headphone – Compatible with both Android and iOS – Easy set-up: just plug in and go
  • With some background noise , the mike may produce a high pitch whine around voices

Lastly, record in .wav format since this allows for higher quality recordings than .mp3 or .mp4.

Happy recording!