You may consider this part II from How I Deal with PDFs. This article assumes your PDF belongs to type 3 (genuine and copyable content). There are some caveats for materials that are heavy in symbols and equations, so I’ve created this separate article.
Your PDF might be light or heavy in symbols and equations. The biggest problems with symbols and equation is that, you can’t copy them to Word (or straight into SuperMemo for that matter); they won’t display properly. The stakes are high: studying the wrong equations or resulting in unnecessary confusion. For both types, you should reference back to the original PDF for accuracy. The backbone should be the original PDF, not your copied content in SuperMemo. During Incremental Reading, read from the PDF and only copy important parts into SuperMemo for extracting.
Light in Symbols and Equations
For example, this example is not heavy in symbols and equations, relatively speaking. Therefore, you may refer back to ”3. Extract Content From a Genuine PDF” to deal with it. The copied content in SuperMemo is mainly for extracting important parts AFTER you’ve studied from the PDF. In other words, you use the PDF to read, not the copied content in SuperMemo. When you find something important, you find the exact match in SuperMemo for extracting.
Do It Only When You Need It
With such materials, there is another way to go about incorporating PDF for Incremental Reading. Instead of having copied content in SuperMemo, for each chapter, you may create a dummy Article with nothing in it. When this dummy Article comes up in the Outstanding Queue, you can open the source PDF, snap the PDF on the left and have Balabolka (for TTS) on the right like this:
I have this AHK script to copy the genuine text, remove the line breaks in Word, and paste it to Balabolka for me to start reading and listening. I refer back to the PDF for symbols and equations. For any important part, I copy it from Balabolka, paste it into the dummy Article and extract it.
I’m aware that this AHK script is not the most elegant, having to switch back and forth between four programs, but this is what I have and use. I would love to hear from you if you have better solutions. The benefit of this solution is that doesn’t require any pre-processing. If you don’t need TTS, you can simply put the PDF and SuperMemo side-by-side like this:
Heavy In Symbols And Equations
For example, if your PDF is heavy in symbols and equations:
Then there’s pretty much nothing you can do. First, using TTS would (probably) have the opposite effect: distracting you rather than anchoring your focus. Then, there’s no way you can quickly import it into SuperMemo. As far as I know, SuperMemo’s HTML content window doesn’t support MathJax. I couldn’t figure out how to copy and paste equations into SuperMemo easily and elegantly. SuperMemo user Alesso has a great solution demonstrated here but I find typing out equations still too time-consuming. Instead, my strategy is just use Windows Snipping Tool to paste it as an image component.
From personal experience, it takes more time fixing and cleaning up the copied content than just reading from PDF and copy as you go. As with the same for light materials, I have a dummy Article to act as a chapter placeholder in SuperMemo. When it comes up in the Outstanding Queue, I open the original PDF (Win+S for quick search) and do the reading in PDF. For important parts I copy and paste it into SuperMemo. For equations and symbols, I clip it (Win+Shift+S) and paste it as images into SuperMemo.
Removal of In-text Citations
Your source material may be heavy in in-text citations. I remove them all. Sometimes there are so many and so long that it’s annoying:
PS: include the initial whitespace
I use this regular expression to remove any () style citations. Since Microsoft Word doesn’t support Regular Expression, I use Visual Studio Code’ Search and Replace. In Visual Studio Code, remember to use Word Wrap (Alt + Z) to prevent text overflowing. Any text editor with regex support would do the job.
At this point you need manual processing, i.e, you scan and check whether all the matched regex is legitimate. Sometimes, () means extra information. If it matches something that I want to keep, I change the ( to [ to escape that match.
Makes sure you finish modifying the whole document before importing it back to SuperMemo. Otherwise you have to do it chapter by chapter (if you import chapter by chapter). For example, it’s a common mistake for FineReader to mistake “The” as “Tire”. So I’ll replace all “Tire” with “The” at this point.
With your document prettified, it’s ready to import it into SuperMemo. I just copy and paste with PureText to remove the formatting. If not, you can also just rinse it through Notepad. Then you will need to replace all < BR > to < P > for proper line spacing. (See Formatting with HTML Source Codes)
Here’s the result:
When to Prepare PDFs for Incremental Reading?
Copying and pasting, clipping and inserting images, cleaning up copied content, removing citations, checking for errors take a lot of time. You can decide whether to process the whole document at once or do it incrementally. Personally I do it all at once. I assign the least productive time to do it, e.g., after a long day.
The Official Way to Deal with PDFs
This is the official way to deal with PDFs. To be honest I think using image component is an inferior method:
1. Extracting portions upfront compartmentalizes and isolates context. You will probably lose context and have a hard time understanding and connecting with the whole PDF.
2. You can’t extract any important portion. You have to type them out.
I guess this is the only solution if your PDF is non-copyable and without any OCR software. After a quick search, Google’s OCR Tesseract is open source that you can try. I tried FreeOCR but it crashed frequently. Capture2Text might also come in handy.
Incremental PDF from SuperMemo Assistant
Here is the PDF addon. SuperMemo Assistant supports “PDF incremental reading”, which could come in handy. If memory serves, I think the support for SuperMemo 18 is still in progress. Be sure to look out for the new release.