AI Voice Narration for Authors: An Honest Look at ElevenLabs (and Why Amazon’s Version Falls Short)

I’ve wanted to get into audiobooks for a while. Last year, I finally bought a microphone and one of those mini sound booths that sets on top of your desk—so classy. The problem was that I was living with someone who also worked from home, and a quiet house was basically a myth. Good recording windows existed, but they were short and unpredictable, and I couldn’t exactly sit hunched over a desktop booth waiting for them.

So I started looking at alternatives. That’s how I ended up at ElevenLabs, which let me create an AI voice clone of myself. I trained it on my own voice recordings to narrate my short story On the Fringes. Here’s what that experience was actually like.

How the Voice Cloning Process Works

The training process took about two hours. I recorded audio of myself reading one of my own short stories, plus bits of two different novels, to give the system a range of tones and genres to work with. More audio means a more accurate clone, so I wanted to give it enough to really capture my voice.

My super-classy homemade soundbooth. I used a duvet and my yoga mat to cover a lot of openings.

I have to say, I was genuinely impressed by what came back. It didn’t sound like me in everyday conversation, but it did sound like my voice in a sound booth—clear and polished. ElevenLabs also lets you go back through the narration afterward, listen to everything, and re-record the moments that didn’t sound right, such as a mispronounced word or a line that needed a different tone. That editing pass is what separates this from just pressing a button and hoping for the best.

What ElevenLabs Gets Right

The editing feature is honestly what makes ElevenLabs stand out to me. You’re not just generating a narration and hoping it sounds okay. You can go through the whole thing, catch the moments that didn’t come out right, and fix them. That step matters. A lot of AI-narrated audiobooks on Amazon right now clearly skipped it entirely, and you can tell.

The other thing I appreciated was that it’s still my voice. I recorded the source audio, trained the clone, and went back to review everything afterward. That feels different to me than just selecting a random AI voice from a dropdown menu, and yet publishing platforms still classify both the same way. Honestly, I think there should be a third category for this. It’s not a human performance in the traditional sense, but it’s not a generic AI voice either. It’s somewhere in between, and that distinction feels worth making.

Where Amazon’s AI Narration Falls Short

Amazon has its own AI audiobook feature (still in beta as of this writing), and I gave it a try. I won’t be using it again anytime soon.

The two things that really got me: first, you can’t use your own voice clone. You just pick from whatever AI voices they’ve made available. And second—and this is the one that genuinely baffled me—there’s no way to go back and fix anything. If the AI mispronounces a word, misreads the tone of an emotional moment, or butchers an acronym, you just have to live with it. There’s no editing pass. No corrections. You accept it as is, or you start over. For a company with Amazon’s resources, I was honestly shocked that feature wasn’t built in from the start.

Did It Actually Save Time?

Hmm…Maybe upfront, but I’m not sure it saved me much time overall.

Generating the narration is faster than sitting in a recording booth for hours, but you still have to go back and listen to everything carefully, catch the problems, and re-record the sections that didn’t come out right. By the time I’d done all of that, I wasn’t convinced I’d come out significantly ahead compared to just recording it myself from the start.

That said, I think time is the wrong metric for a lot of authors. After all, you could sit in front of the screen for eight hours one day and accomplish nothing, but two hours the next day and knock everything off your to-do list. Using time as a metric for authors is like using time as a metric for software developers—it’s a rather worthless metric. (Same with words written or lines of code written). What we should be looking for specifically is outcomes and their quality.

In the end, if you don’t have a usable recording environment, AI voice narration might not save a ton of time, but it is what makes the project possible at all.

Who This Is Actually For

AI voice narration makes the most sense if:

  • You don’t have access to a quiet space or a sound booth
  • You’re trying to record around noise you can’t control (kids, construction, city traffic, roommates, dogs)
  • You want an audiobook, but can’t afford a professional voice actor
  • You want to use your own voice, but can’t commit to hours of recording sessions

If you have reasonable recording conditions and a microphone, doing it yourself might be just as fast and give you more control over the final product. That’s roughly where I landed after this experiment. Now that my situation/recording environment has changed, I’d probably record the next one myself.

But I don’t regret the experience. And I’d encourage readers not to dismiss AI-narrated audiobooks without context. That voice might be a clone of the author’s voice, built specifically to sound as authentic as possible, used because their life doesn’t provide a good sound setup.

The Bottom Line

ElevenLabs is the better tool by a significant margin because it gives you editing control and lets you work from your own voice. Amazon’s version, at least in its current state, isn’t there yet.

If you’re an author weighing this option: go in with realistic expectations about total time investment, review everything before you publish, and treat the editing pass as non-negotiable. The difference between a careful audiobook and a careless one is audible.

If you’d like to hear the final results of my experiment, you can find links to the audiobook version of On the Fringes below.

As always, thank you for reading.
-Eliza

Share this:

Leave a Reply

Your email address will not be published. Required fields are marked *