Every quarter I generate a transcript of some or all of Apple’s conference call with analysts. I’ve used any number of methods to generate this transcript, from brute force to double-teaming in a Google Doc to using a tool to time-shift the audio, making it easier to transcribe. The most recent manifestation of this involved me splitting the job with Serenity Caldwell and using Audio Hijack to time-shift the call.
This year, though, I tried something else. (In part, it was out of desperation—I wanted a transcript of the questions and answers because it’s helpful to me in writing post-call stories, but I was also pretty under the weather and I couldn’t bear to type the entire thing out.)
The other week, in the aftermath about my complaints about the flaws in speech-to-text transcription services, I got an email about a new speech-to-text transcript service called Trint. What makes Trint different is probably not its text-conversion engine—it’s the web app that the service has built around the engine. When you upload an MP3 file to Trint, it converts it to text and puts the result in a web-based editor that’s synced directly with the timestamps of the audio file.
In other words, if I click on a word in a Trint transcription file, it plays the recorded audio from that word. This makes it very easy to follow along in the Trint editor and clean up the transcript as I go. You can even set the editor to play back audio at slower than normal speed (or faster!), which can allow you to really get in a groove. And there are keyboard shortcuts to pause and jump back a few seconds, which are key features if you’re trying to get through a transcript quickly and you missed a couple of words.
So on Tuesday afternoon, here’s what I did: I recorded the Apple conference call using Audio Hijack and, after a few minutes, I clicked the Split button, which stops recording on one MP3 file and starts on a new one. Then I’d upload the previous audio to Trint and a minute or so later, I’d begin editing the transcript generated by Trint’s speech-to-text engine. Rather than transcribing the call from scratch, now I was editing a faulty machine-generated transcript, which requires far less typing and is therefore a much faster process.
When I reached the end of the first audio file, I’d click Split again, upload that MP3 file, and continue transcribing. Repeat until the call is done, and you end up with a full call transcript that’s a lot easier to create and is done not very long after the actual call concludes. (Trint lets you export in various file formats; I got my file out in Word and then did a bunch of search-and-replace operations to get it formatted the way I wanted it.)
The result, posted here, isn’t perfect—it’s got a bunch of typos I should have caught that I’m going to chalk up to my illness more than my choice of transcription methods. Still, I’d much rather edit a transcript than type it all myself, especially if it can happen in almost real-time.
More broadly, Trint’s approach—to make it easy to compare the audio clips to the transcript as you’re verifying and editing it—is exactly the right one. If you ever find yourself needing to make a transcript, it’s worth a look. Trint offers monthly membership plans, but there’s also a $15/hour pay-as-you-go plan. It was certainly worth it for me.
Originally published here.