The fifth book of Spice and Wolf (Ookami to Koushinryou) by Isuna Hasekura in EPUB format.
As I explained when I first started this conversion project, I decided to do my own version of Spice and Wolf in EPUB format. The process of converting PDF to EPUB is not as simple as clicking a button. Yes there are all-purpose converters like Calibre. The PDF that I got is actually image files attached together as PDF. When Calibre “converts” the PDF to EPUB, it does not detect the text inside the pages’ image files and treated the file like a comic book. The conversion result is not something nice to read in any mobile reader. So, there is only once option left: to do manual work, hours of work just for one book. But the result is good and worth it.
By “manual work”, I never mean that I have to re-type the entire thing. No, that’s not what I did. It would take forever and I won’t have time (or patience) to do that. What I did is using a software to convert the entire text from the image files, have it extracted as plain text, then start doing “manual” editing to remove page numbers, hyphenation issues (thousands of them), quotes, and some conversion and formatting errors. I’m a perfectionist when it comes to books I like, so I want the best possible EPUB file created.
In case anyone’s interested, here are some steps I need to do:
1. Cleaning up page numbers, usually there are around 200 occurrences (per book) but can be done in one step using find-and-replace.
2. Cleaning up hyphenations, usually there are around 500-800 occurrences that I have to check manually one by one. For example, the find-and-replace can’t differentiate between “keen-eyed” and “sil-ver”, whereas “keen-eyed” needs to be left untouched and “sil-ver” needs to be changed to “silver”.
3. Changing all the “new line” characters who became tab character in the process of OCR-ing. Usually there are around 100-200 occurrences and I also need manual inspection.
4. Changing missing “new line” characters (lost from the process of OCR-ing). Usually there are around 70-80 occurrences, but they’re pain to check.
5. Changing the missing apostrophes. Usually there are around 100 occurrences.
6. Fixing up OCR mistakes (like the name “Holo” is quite often to be extracted as “Hole”).
7. Fixing missing spaces. For some reason, OCR software sometimes miss space in characters. There are usually around 20-30 occurrences but take long time to find them all.
8. Fixing lines that incorrectly separated into multiple lines. Usually there are around 150 occurrences, thankfully I managed to create a formula to do this in one operation.
9. Fixing wrong words (mistakes of OCR). This is the most random part, non-patterned, and can take forever to do. If the mistake produce a non-English word, I can find them using spell-checker. But if the mistake end up being a different (but valid) English word, I have no mean to detect this without manual observation.
Lastly, in 2016 I decided to make MOBI format available for Kindle users.
Fair warning : please respect the writer’s work. If you like this light novel and it is available to purchase legally in your country, please buy a legal copy of the book.
Why in EPUB format? I prefer EPUB format for my e-books because it’s more comfortable to read in smartphones, e-readers and tablets. EPUB is widely supported by various brands. You can open EPUB books in Nook reader, Kobo reader, Sony Reader, iPhone, iPod, iPad, Android smartphone or Android tablet. It will automatically adjust to whatever screen size you have. Plus giving us the ability to change the font size, to get the best reading experience.
Since I was doing manual editing, I need to check the words and sentences manually. So please let me know if I missed something. Thank you.