Contents

Bimodal Reading in Balabolka, Microsoft Edge's Read Aloud, and Audible's Immersion Reading

TL;DR:

Voice Control Visual Price
Balabolka with paid TTS voice * **** **** *
Microsoft Edge’s Read Aloud ** ** ** /
Audible’s Immersion Reading **** ? (***) ? (***) ****
Generic audiobook with ebook *** * * ?

Introduction

I’ve been using Balabolka to do bimodal reading for a few years now: reading the highlighted text while simultaneously listening to the Text-To-Speech (TTS) voice. The audio and text are synchronized. Personally, the main advantage is that it stabilizes and focuses my attention, providing a more immersive reading experience. Having the current word highlighted is like having a pen to guide me when reading a physical book, while at the same time, the audio provides another form of the same input.

What is bimodal reading?

According to Bimodal Reading: Benefits of a Talking Computer for Average and Less Skilled Readers:

Quote
Bimodal reading is presenting written material in auditory and visual modes simultaneously— a bimodal presentation of text material.

Advantages of Bimodal Presentation

Quote

Various studies have investigated individuals' task performance when they were given auditory, visual, or bimodal stimuli. By measuring participants' reaction time for a decision regarding the presented stimuli, researchers have commonly found a facilitative effect of the bimodal condition when stimuli presented to each sensory channel were the same or functionally related. Kinchla (1974) referred to this effect as the redundant signals effect (RSE). The RSE has shown to be a rather robust phenomenon in work with […] more complex stimuli, such as letters and words. That is, subjects typically respond more accurately or quickly to redundant bimodal stimuli than to unimodal stimulus presentations.

[E]nhanced recall due to bimodal redundancy has been documented in various research paradigms. Penney (1989) […] showed evidence of a bimodal memory advantage compared to recall of information in single-mode presentations. Since then, others have shown that short-term retention is improved when an item (e.g., word or digit string) is presented to visual and auditory channels simultaneously. […] Collectively, these studies suggest that individuals remember more of what is presented when information is delivered bimodally.

Table comparison

Here’s the same table from TL;DR:

Voice Control Visual Price
Balabolka with paid TTS voice * **** **** *
Microsoft Edge’s Read Aloud ** ** ** /
Audible’s Immersion Reading **** ? (***) ? (***) ****
Generic audiobook with ebook *** * * ?

Voice: Audible > Generic audiobook > Microsoft Edge > Balabolka

Control: Balabolka > Audible > Microsoft Edge > Generic audiobook

Visual: Balabolka > Audible > Microsoft Edge > Generic audiobook

Price: Audible > Generic audiobook > Balabolka > Microsoft Edge

I. Balabolka with a paid TTS voice

The one star in Balabolka represents a paid TTS voice like Salli or Ava. The free SAPI5 voices that come with Windows are absolutely horrendous and thus, unusable. Without at least a decent TTS voice, I’d not consider doing bimodal reading at all.

Balabolka earned four stars for control is because Balabolka offers amazing customizability: shortcuts to control speed, pitch, “skip to previous or next sentence” etc. I’ve also made a vim-like ahk script to control the interface. This is not possible for Edge because it offers a very limited control.

The price represents a one-time purchase for a TTS voice.

For a more detailed Balabolka description, please see Incremental Reading + Text-To-Speech (TTS) = 2x Concentration and Engagement

II. Microsoft Edge’s Read Aloud

2022-08-03 Update: I use the browser addon reader-view, which offers free Microsoft AI-voices but with a much better interface, customization and controls.

Here’s a short introduction video on Microsoft Edge’s Read Aloud: How to make the web more accessible with Immersive Reader in Microsoft Edge!

Advantages

1. AI-powered voices

The biggest advantage of using Read Aloud is, in my opinion, its AI-powered voices. Currently it offers various localized voices with different accents (for en-us voices it offers Aria, Guy and Jenny). You can use it to read text other than English as well. These AI-generated voices are tremendously better than any traditional Text to Speech (TTS) voices. Although Salli is already pretty good, it’s simply no match for Jenny. It’s the pacing, pronunciation, and intonation that makes it sound “lifelife and natural-sounding”, as advertised. Once I’ve tried these AI voices it’s hard for me to go back to traditional TTS voices. Now that these AI-powered voices have opened my eyes (ears), I notice how coarse, static and low-information those traditional TTS voices are.

2. Free voices

Edge’s Read Aloud is the cheapest option: you only need to pay for the ebook.

3. PDF support

./images/screenshot-2022-03-24-09-50-59.png

There are books without any ebook formats (epub/mobi/awz3) and offers only scanned PDFs. This is especially apparent with older titles or research papers. With PDF support, I can now open any PDF in Edge and enjoy the AI-powered voice with real-time highlighting. Edge simply provides the best PDF reading experience. The OCR-generated speech is very accurate.

In the past, for any PDF, I’d use Free Tesseract OCR to Convert PDF into Editable Text for Incremental Reading. There was quite a tedious upfront cost in order to do bimodal reading: using Tesseract OCR to convert the text, and then paste it to Balabolka. Edge streamlined this process with just a few clicks and thus, providing a more fluent and convenient workflow.

Disadvantages

1. Substantial delay of voice output

There’s a 1-2 seconds delay when going between paragraphs. For every paragraph, you’re communicating to Microsoft’s server and it takes time to send the request and receive the response. It’s like playing video game with a high ping. The 1-2 seconds may not sound much but it often breaks the flow of speech.

./images/ping.png

2. Inferior screen scrolling

Even though both Balabolka and Edge’s Read Aloud automatically scroll the screen as it highlights the next row of text, the screen scrolling is much smoother in Balabolka. It’s more jagged in Edge; it’s more abrupt when it moves from one sentence to the next. Also, Balabolka always keeps the current highlighted text at the center of the screen, whereas Edge’s Read Aloud doesn’t (image below). It’s therefore, less visually comfortable and harder to follow along in Edge than in Balabolka.

./images/screenshot-2022-03-23-20-16-33.png

3. Inferior voice control

In Balabolka, you can skip between sentences: If your mind wandered during the previous sentence, you can always go one sentence back. However, in Read Aloud, you can only jump between paragraphs. In other words, you can’t go back to the previous sentence; your only option is to go back one paragraph and wait until it reaches the target sentence. Of course, you could use the mouse to do so, but I find it tiring. The only keyboard shortcut Microsoft provides is Ctrl+Shift+u to start or stop Read Aloud. I had to create an ahk script to hack on top of three buttons (previous or next paragraph and speed of voice) on the Read Aloud bar.

./images/screenshot-2022-03-23-20-18-44-2.png

4. Only available in Windows' Edge

I tried the Edge in Linux and the Read Aloud function isn’t available:

./images/screenshot-2022-03-24-09-16-35.png

As you can see, Immersive Reader is available but there’s no Read Aloud function.

5. Privacy concerns

Since you’re sending data to a third-party provider (and it’s Microsoft no less), there’s potential privacy concerns. This is something to keep in mind when you choose to use this function.

III. Audible’s Immersion Reading

Warning
I’ve not used Audible’s Immersion Reading. This is purely my speculation after researching a bit online, hence the “?” in the table. Take this with a grain of salt.
Quote
Immersion Reading […] allows you to read a Kindle eBook and listen to its professionally narrated Audible companion Audiobook – all at the same time. Not only that, but you get the benefit of real-time highlighting, making Immersion Reading a valuable tool to boost reading comprehension and overall retention of content. (source What is Immersion Reading?)

Here’s a short video of Jeff Bezos introduces Whispersync for Voice and Immersion Reading.

Advantage

1. Premium bimodal reading experience

As you can see from the above video, the biggest advantage is the voice quality. Having a real narrator reading the content to you is unrivaled. It’s simply a joy to be immersing yourself in the narrator’s voice while simultaneously reading the real-time highlighted text. It’s Amazon’s “Whispersync for Voice” and “Immersion Reading” technology that enable bimodal reading.

Disadvantages

1. Difficult integration into SuperMemo’s Incremental Reading

For Balabolka and Read Aloud, it’s easier to integrate the workflow into SuperMemo, because they’ll read anything you feed into it. For example, I have an ahk script to send SuperMemo’s Article (they’re simply local HTML files) to Balabolka or Edge and then start bimodal reading. It’s a one-click operation.

However, for Audible it’s different. First, you need to navigate the Audible’s Windows app: find the book title, search the content and then start bimodal reading. I speculate that you can’t do this entirely with a keyboard. Second, you have no control over the text in Audible. You can’t modify the text. For Incremental Reading, we constantly modify the original text i.e., to add context or remove redudant text, and extract portions of the text. There’s significantly more friction to incorporate Audible into the Incremental Reading workflow.

If I were given the option to use Audible’s Immersion Reading, I’d do it when reading something completely new, e.g., a new book chapter. And then rely on Read Aloud to listen to the modified and shortened Extracts.

2. Expensive

You pay a premium to enjoy this ultimate bimodal reading experience: purchase the Kindle ebook along with the companion audiobook. It’s a monthly subscription. Even though there’s a discount when getting both the ebook and the audiobook, the cost could add up quickly if you’re a voracious reader.

3. Immersion Reading is not available for every book

“Not all audiobooks are Whispersync for Voice and Immersion Reading-compatible with Kindle eBooks. (source)”

IV. Generic audiobook with ebook

./images/screenshot-2022-03-24-11-38-56.png

Advantages

1. cheaper or even free

Not all audiobooks are paid. You’d surprised at how many free audiobooks are available at your local library or online. For example, librivox provide “free public domain audiobooks read by volunteers from around the world.” These free audiobooks almost always comes with its accompanying free ebook. Internet Archive provides a lot of free ebooks, as well as OverDrive.

If a free title isn’t available, you can shop around in, say, Kobo ebooks or audiobooks.com for better deals, since you’re not vendor locked-in with Kindle ebooks and Amazon’s Audible.

2. Real narration by a real human

If you can’t stand computer-generated TTS voice (Balabolka or Edge) and wouldn’t want to use Audible, this is the only other option to have a real narrator.

Disadvantages

1. Worst visual

With a generic audiobook and an ebook, the speech and text are not synchronized. It means that the visual is far inferior: there’s no real-time text highlighting. It can be visually tiring to track the text on the screen.

2. Worst control

You can’t skip to “previous sentence” or even “previous paragraph” of the audio, because it’s simply audio files. It’s a trial-and-error process to skip between different portions of the text: sometimes too far behind, other times, not far enough. It can get annoying very soon.

3. Very limited option for free titles

This is the oppposite of Audible: you pay a premium to enjoy the book professionally narrated. For a free option like librivox, they’re read by volunteers. Most of the time they’re just as good, but there’s no guarantee on the quality. Moreover, the audiobook options are mostly limited to books in the free public domain. Therefore, I’d recommend trying your luck by searching through your local library.

Which one should you use?

As with all things in life, it’s a trade-off. It depends on your priority and preference, and the reading material.

If you want maximum control over the text, I recommend Balabolka. Maybe you’re reading something dense and technical, and would like to dissect and navigate various portions of the text easily. The primary focus is on the reading and listening is auxiliary.

If you want better TTS voice and willing to give up some control over the text, I recommend Microsoft Edge’s Read Aloud. This is what I did. After discovering its AI-powered TTS voices, I’m very impressed by the voice quality. I can live with inferior automatic screen scrolling and voice control. And since it’ll read anything like Balabolka, integrating it into SuperMemo’s Incremental Reading is just as easy.

If money isn’t a concern, I recommend Audible’s Immersion Reading. It provides the best bimodal reading experience. If you’re reading fiction this is the way to go. Have you tried listening to the Harry Potter voiced by Stephen Fry? It’s an absolute joy. However, any audiobooks has the worst SuperMemo integration.

Conclusion

If you use other technologies to do bimodal reading, please let me know. I’d love to hear from you.