As a young news reporter I remember holding a mini-cassette recorder in my palm during interviews. When the interview was finished I would rewind the tape to the top. Hit PLAY. Then STOP. Then scribble down the words. PLAY. STOP. SCRIBBLE. REWIND. PLAY. STOP. SCRIBBLE. Those cassette recorders have been replaced by digital recorders and iPhones, but that tedious and time-consuming process of transcribing audio to text hasn’t changed.
Nowadays Automated Speech Recognition (ASR) can deliver astonishingly accurate transcripts. The enduring problem of audio transcription software is that it makes mistakes. We humans speak imperfectly: we swallow words, stumble and interrupt; many of us speak with heavy accents. Bad audio quality (distant microphones, background noise, music, room echo) can make it hard (or impossible) for audio to text converters using Artificial Intelligence to decipher content. Many proper names and technical terms won’t be recognized. That means the output of transcription software is inherently flawed. The challenge is to figure out how to push the output of audio transcription software into new frontiers so that it can solve workflow problems without creating a bigger problem of unreliable content.
In 2013 while I was ABC News London Correspondent I began teaching a university course in 21st Century Journalism for American and Canadian students spending a semester in London. A friend took me to Mozfest, an annual coding conference in London that showcases cutting edge media innovation. That’s where I was introduced to Mark Boas and Laurian Gridinoc (and later Mark Panaghiston), a brilliant team of developers who devised some astonishing tech that took manual transcripts and automatically glued text to the original audio/video and allowed you to search and manipulate the a/v by moving text on the screen. “Wouldn’t this work with automated transcription software?” I asked. I felt like I was looking at the future. It was a true light bulb moment of revelation that began a journey for all of us into entrepreneurship, invention and the world of startups.
Fast-forward a year later to December 2014. Mark, Mark, Laurian and I holed up in an AirBnB in Florence for a couple of weeks to scope out a prototype that would eventually become Trint. We spent hours on research calls on Skype. We spoke to journalists, tech managers and editors at news organizations in North America and Europe, including CNN, The Guardian, CBC Radio, ABC News and The New York Times.
I remember a senior vice president at CNN offering us some pithy advice. “You are focused on our single biggest technological challenge,” he said. He explained that at CNN they have more than 100,000 hours of recorded content streaming into their servers every week (most of it recorded talk in the form of interviews, speeches, news conferences) and none of it searchable without labor-intensive and costly manual audio to text transcription. He added that they would never touch automated speech-to-text until they knew it wouldn’t burn them.
That became Trint’s rallying cry.
We solved the puzzle by merging two pieces of software into one: marrying a text editor to an audio/video player. That means with Trint users get the speed and affordability of online transcription software and they can easily search, verify and if necessary correct the machine-generated output. When you play a Trint in your browser, you can follow the transcript karaoke-like and edit any transcription errors directly in the browser. No more ping-ponging between your audio player and a Word document.
This is disruptive technology created by a truly brilliant team of developers. It is about making it easy to transcribe audio to text and get the most out of it in less time for much less money. Over the next year you’ll see us roll out an array of features that will transform Trint from an audio to text converter into an end-to-end publishing platform.
JEFF KOFMAN CEO and Co-Founder