Incremental Reading + Text-To-Speech (TTS) = 2x Concentration and Engagement

Photo by Sincerely Media on Unsplash

Use Text-To-Speech (TTS) free software Balabolka when doing Incremental Reading in SuperMemo. It’s more engaging to read and listen simultaneously than just either modality.

I read the cognitive theory of multimedia learning a few years ago. Then a few weeks ago it struck me┬áthat I could use TTS alongside with Incremental Reading. In this article I’ll share my what I’ve found.

Brief introduction to Balabolka

The most important feature of Balabolka is that the current spoken word is highlighted. In other words, the text and the audio are synchronized: it provides real-time highlighting of the text as you hear the audio.


I believe it’s better to have redundant inputs through two sensory channels (visual and auditory), rather than just through either one modality. This is known as Bimodal Reading.


From the graph (source), you can see that words will enter both the auditory (ears) and visual (eyes) channels. In the auditory channel, words are presented as narration, detected by the ears. Then, the learner mentally organizes the words into a verbal model. When you read, you will inevitably subvocalize (inner voice in your mind). With TTS, you don’t have to generate the sound yourself. You just need to hear and follow the TTS voice.

Benefits of TTS

1. Better concentration

A. Guided reading

Speech synchronizes with words. Words are highlighted as they are read. These two features aid concentration. One promise of any speed-reading method or software is reducing eye-tracking. This program achieves this exact benefit. With Balabolka, the current spoken word is highlighted and you’re guided with the blue highlighter. Also, words read will remain blue. Other customizations includes line spacing and auto-scrolling. This is like using a pen to guide yourself when reading. All these reduce the possibility of getting lost. Without TTS, you’d have to follow and track word by word, at the same time subvocalizing. Now with Balabolka, they’re done for you.

B. Anchored by two sensory channels

When reading without TTS, you are using only one input channel (eyes). With TTS, you have two input channels (eyes and ears) that help focus your attention through listening to the same sound and reading the same words simultaneously. The natural course of action is to follow along. This increases engagement due to two reasons: increased sensory stimulation by using two sensory channels and automatic highlighting.

You learn better and remember more with better concentration. Working memory is of paramount importance in learning. In order to maximize your learning you need laser-like focus. Guided reading and anchored by two sensory channels reduce the extra and unnecessary cognitive load, i.e., the burden of tracking word and doing the subvocalization.

2. Less mind-wandering

Better concentration implies less mind-wandering. Part of it is just that with intense focus comes the flow state and you’ll be lost in the content.

When your mind has lost its focus on the content and started wandering, you’re tempted to do something else. Minor distraction would be hitting “Next”; major distraction would be giving up SuperMemo to browse your news feed or watch YouTube videos.

However, with TTS, at least in my experience, two things act as anchors to stay on track. First is the auditory channel facilitating me to continue to follow the text. Second is the more stimulating visual (automatic highlighting) instead of a plain wall of text in SuperMemo. This is like increased dependability via redundancy. Both my eyes and ears are anchored to the content. Without TTS, when I get bored, my tendency is to press “Next”. But now with TTS, my tendency is to follow along, to see it through the end. Once it starts talking, I don’t quite want to stop it mid-way.

Another reason for less mind-wandering has to do with silencing the inner voice, the subvocalization. Let me explain. (This is purely my speculation and without any scientific basis whatsoever.) When you want to procrastinate by, say, going on Instagram or getting a snack, you first need to have some sort of inner voice (“This is so boring and difficult. I wonder what’s new on my Instagram feed.")

However, when I’m hearing and reading the same content at the same time, it feels like “stuffing” my sensory channels and thus stifling that inner voice, so it has no place to surface. The inner voice is so preoccupied with listening to the TTS that it has no room for other thoughts, particularly thoughts about procrastination like “This is so difficult and boring let me do other stuff.” It has no choice but to listen actively to the TTS, if not, then hear passively. You can quickly switch from focusing on the TTS voice to your procrastination voice, but you can’t have two inner voices at the same time.

3. Higher baseline reading speed

Your reading speed is largely determined by your subvocalization. The faster you can generate that inner voice, the faster your reading speed will be (comprehension of course, is another matter). Your usual speed of subvocalization is comfortable, which means that there is room for optimization without losing much, if any, comprehension. For example,

I have the Salli IVONA Voice. The maximal speed of Salli is 444 words/min. I’ve set the speech rate to 3 (1.3x), which is a bit faster than my usual comfortable reading speed. Of course, this is fluid and I change it throughout, but my baseline reading speed has increased and so I can read faster. Without the TTS voice to guide me, my reading speed will definitely be slower and of lower quality.


What Does TTS Have to Do with Incremental Reading?

Incremental Reading is about the methodology and philosophy of Incrementalism (for more Philosophy of incremental Learning) How you do the actual reading, however, is another matter and has room for optimization. This is why I use TTS to enhance the reading experience.

TTS supplements Incremental Reading. TTS enhances the reading experience, at the same time, without derailing the core principles of Incremental Reading, which are spacing your reading, mixing learning materials, and active recall (extracts into clozed/Q&A items). Between reading and extracting is where TTS comes in.


My Implementation: Balabolka

Balabolka is a freeware program. (I have no affiliation with Balabolka.)

I have SuperMemo on the left and Balabolka on the right (like picture above). I simply use an ahk script to copy all the content from SuperMemo to Balabolka. For important content, I switch back to SuperMemo (Alt+Tab) and make an extract. Also, I have set some Balabolka shortcuts to control my listening (reading) flow and speed: f for next sentence; d previous sentence, w 10% slower , e 10% faster, a start listening and jk stop listening.


I’ve only tried Balabolka. I’m aware better TTS synthesis is available, like Microsoft Azure or Google Cloud Text-to-Speech. I’ve tried these high quality TTS voices and they are indeed, quite an dramatic improvement over the default Microsoft Zira voice. However, they require Internet connection and I have no idea how to integrate it with SuperMemo or Balabolka. At last, I opted for the paid Salli IVONA Voice. It’s a good trade-off between quality and integration.

One annoyance

If I find something worth extracting, I have to stop the TTS voice, go back to SuperMemo, find that corresponding sentence or paragraph and then extract it. I’ve created a function in SuperMemoVim to automate this process. It would be a dream to integrate Balabolka directly into SuperMemo, and with Microsoft or Google superior TTS voices.

Closing Remarks

This is quite a breakthrough for me in terms of optimizing Incremental Reading (or reading in general). Even if you don’t do Incremental Reading, I believe using TTS alongside with the text is better than using either one alone.