英语轻松读发新版了,欢迎下载、更新

Whisper is an AI-powered jet engine for writing – Six Colors

2025-05-30 15:30:00 英文原文

作者:May 30, 2025 8:30 AM PT

About four weeks out from a manuscript deadline—already a month or two behind schedule—I broke my arm.

Well, technically, someone else broke it. A Muay Thai fighter, during a sparring session in a martial arts class I definitely should not have been in. The next morning, I had to call my editor and fess up: instead of hammering away at the keyboard, I’d been getting my forearm snapped like a dry twig. Now I wasn’t going to be hammering away at anything except painkillers and regret.

Did he have any suggestions?

After some colorful swearing—we’re both Australian—it turned out he did. Half an hour later, I’d bought a copy of Dragon Dictate, or whatever it was called back then. It’s gone through a few names and versions over the years. But back then, it was my only option.

I don’t think I would’ve stuck with it if I hadn’t had 120,000 words due and no other way to deliver them. The learning curve was steep. And the Mac-flavored software which I was using was legitimately regarded as inferior to the Windows version. But I was desperate. So I persisted.

And I finished the book.

Surprisingly, I even got a bit of a productivity bump out of it.1 Even with all the bugs and weirdness of early speech recognition software, it was still better than my typing, and I stuck with it.

I’ve been using dictation software in one form or another for well over a decade now. I’ve seen a lot of development, not all of it good. For a while there, I dreaded the release of new Dragon updates, because they always seemed to break more than they fixed.

I also found that while Dragon was great for fiction, where you can get into a storytelling flow, it was less helpful for writing magazine features (back when magazines were a thing that existed). Maybe magazine copy demands more precision. My agent tells a great story about Robert E. Howard, Conan the Barbarian’s creator, standing at a mantlepiece over the fireplace in his home, where he’d rigged up an early standing desk arrangement, roaring and gesticulating as he told himself the story while she typed it up.

That’s how I dictate novels. It’s fun, and good cardio, but it doesn’t work so well when you’re trying to finely craft a heartbreaking work of narrative genius for the super-picky subeditors at The New Yorker.

Anyway, long story short, I’ve always been a dictation nerd. Constantly hunting down new software. Always hoping that the next version will shave a few more seconds off the process or give me a bit more accuracy.

Recently, I switched from Dragon, which had been baked into Microsoft Word, to MacWhisper Pro, an LLM-based app for macOS. I was already trying out a writing experiment, switching from apocalyptic novels to, er, spy romances, so I figured it was a good time to try some experimental dictation, too.

I was stunned by the results.

From a roar to a Whisper

Macwhisper Pro
MacWhisper Pro

AI-powered dictation—at least for me—has turned out to be significantly faster and more accurate than even the best, most expensive versions of the previous generation of software.

I think of it as taking a leap from a Mechanical Turk to a probability engine.

Older systems like early versions of Dragon Dictate relied on pattern matching and statistical models like Hidden Markov Models (HMMs). You had to train the software to your specific voice, accent, and vocabulary. Over time, it would “learn” your patterns and improve.

But the actual recognition process was pretty rigid: matching sound waves to a limited set of templates, then mapping those to words using your trained vocabulary. These systems did run some probability calculations, trying to work out the most likely sequence of words from the sounds you made, but their contextual awareness was limited. They focused on individual words or short phrases. Not the broader meaning of the sounds.

Modern AI-powered tools like Whisper (the engine behind MacWhisper Pro) are a different beast. They use large neural networks—often Transformer architectures—trained on hundreds of thousands of hours of diverse audio and text. (Some of it stolen from me. You’re welcome.) They don’t need training like Dragon did. They just work, straight out of the box, for a wide range of accents, languages, and speaking styles.

As best I understand it, these models predict the most likely sequence of words from the audio input, based on the sound, but also the context of the whole sentence or even the paragraph. That allows them to handle ambiguity, background noise, and weird phrasing far better than older systems ever could.

They’re not just listening—they’re calculating. Continuously.

Which is why I think of the old systems as Mechanical Turks. They were rule-bound, brittle, and prone to failure outside their narrow training. Today’s probability engines use immense context to serve up the next most likely word. And no, it’s still not “thinking,” but the change in how it feels to use is dramatic.

For me, there are two immediate and critical differences between the old and new systems.

First, when I was writing using Dragon, I had to dictate everything—including the punctuation. Over the course of a 100,000-word novel, that adds up to thousands, maybe tens of thousands, of spoken commands: quotation marks, question marks, line breaks, paragraph breaks, ellipses, em dashes, and so on.

Every one of those was an opportunity for the software to mishear what I said and transcribe something else. A lot of the time I saved not hunting and pecking at the keyboard, I lost fixing the errors the program introduced by misinterpreting spoken punctuation.

Second, and even more powerful, is the freedom to correct yourself as you speak and just let the AI clean it up. It’s not unusual for me to get halfway through a sentence, realize I’ve butchered it, and say something like:
 “Ugh, that’s terrible—delete that, let’s try again.” 
Then I start over just like I would if I were dictating to a human taking notes.

Between those two improvements—out of what are probably a dozen major UX differences between the old-school dictation models and these newer, Whisper-based tools—the boost to my daily productivity has been astounding. What Tim Cook would call ‘blowaway.’

I used to aim for 1,500 words a day (the Antony Johnston-approved benchmark for a solid writing day), and by the end of it, I’d be wiped out. With these new tools, I regularly hit between 4,000 and 5,000 words daily, and the cognitive load feels much lighter.

One obvious sign of this: I’m taking fewer naps in the afternoon. I’m just not as wrecked by the day as I used to be.

I’ve been using MacWhisper Pro for about six months now, so I feel confident saying the changes I’ve seen are deep and structural. This isn’t a novelty bump. It’s a genuine shift in how I work and how much I can produce without burning out.

It means I’m likely to be more productive in the next twelve months than I have been in the last four or five years.

Bots off my words

I’m really looking forward to writing more books.

I’m not, however, looking forward to the holy war that feels like it’s coming.

Because there’s a second step to getting a good, clean copy out of a dictation rig like MacWhisper Pro: You have to feed the transcript to an AI like Claude or ChatGPT and ask it to clean it up for you.

The prompt I used for this piece, for instance, was: “I recorded this blog post using a speech recognition AI, so it rambles around a bit and is full of transcription errors and artifacts. Can you clean it up while keeping as close to my intended tone and content as possible?”

It’s not generative writing. It’s not even close. But for a lot of writers, it’s too much. The fear and loathing of AI is already so profound that any touch of the bots on your copy is anathema.

I wrote tech columns for ten years before I wrote novels, so I guess I’m less given to fear and loathing of our silicon friends. But I understand why my fellow writers feel that way. Part of the reason these models are so good at accurately interpreting not just what I said, but what I meant, is that they have consumed every word I ever published. They did it without my permission, and the billionaires who own and run these companies say they can’t possibly afford to pay for any of it.

So I understand the fear and loathing.

But to me, these things feel like jet engines. They’re incredibly fast and powerful, and they do amazing things. It’s just best not to think about where they came from.

[John Birmingham is the author of numerous novels, including the Axis of Time series, The Cruel Stars, and Zero Day Code. You can sign up for his newsletter or read his works in progress on his Patreon.]

If you appreciate articles like this one, support us by becoming a Six Colors subscriber. Subscribers get access to an exclusive podcast, members-only stories, and a special community.

关于《Whisper is an AI-powered jet engine for writing – Six Colors》的评论


暂无评论

发表评论

摘要

An author broke their arm while training in Muay Thai, forcing them to switch from typing to dictation software to meet manuscript deadlines. Initially using Dragon Dictate due to necessity, they found the transition challenging but ultimately productive. Over time, advancements in speech recognition technology led to more accurate and faster tools like MacWhisper Pro, an AI-based application that significantly boosted their writing productivity. The author now sees modern AI as a game-changer for dictation, offering substantial improvements over previous software. However, they anticipate resistance from other writers regarding the use of AI in post-processing written content, despite its efficiency and accuracy.

相关新闻