There are two common misconceptions in the blogosphere these days:
- that a PDF is an ebook;
- that a PDF can seamlessly become an ebook (MOBI for Kindle or EPUB for all other eReaders).
Let’s start with the first one: a PDF is not an ebook.
You might say, “Well, when I look at a PDF on my computer, it looks exactly like a book. It has page numbers and page headings and everything. I can even read it on my iPad.”
But the fact that you can read it on your computer or your tablet does not make it an ebook. It does indeed look like a print book. And that’s where the problem lies.
How an Ebook Redefines Moveable Type
An ebook is a text that can adjust itself to the device you use to read it and to settings you choose.
A PDF (Portable Document Format) cannot adjust itself. It is an image of a document that is always meant to look the same — the page always starts with a certain word and ends with a certain word. It has a hidden layer of code speaking to the electronic PDF reader and telling it how to display a static page — imagine an invisible grid with horizontal and vertical coordinates to place each element in a precise location.
These hidden instructions are why a PDF is not an ebook and why a PDF cannot be seamlessly changed into either of the standard ebook formats: MOBI and EPUB.
Ebooks are about flexibility — about reflow. An ebook morphs depending on what your settings are and what device you are reading on. You can change the font size, read a book in one-column view or in two columns, read the same copy of the book on your iPhone, iPad, computer, or even on your Kindle. The book needs to adapt easily to each of those devices and to how you want to read it.
When we make an ebook, we work on creating something that resembles a print book but that is malleable, transforming itself to meet the needs of the reader and the reading device.
We have to think about all the different platforms that people read on (Kindle, Nook, iBook, etc.) and the different ways readers customize their screens. I know people, for example, who have failing eyesight and who thought they would have to give up reading. Ebooks — true ebooks! — have changed their lives, because they can make the font as big as they need or even adjust the text color and contrast.
But what happens when you enlarge the font on a PDF? You can’t: you just enlarge the PDF and if you enlarge it too much, you can’t see the whole page on your screen.
You may want to say, “Well, yes, that’s true. But if I want to play with font size and read a PDF on a Kindle, I can put it through a free converter that will make an ebook for Kindle.” (I have in fact seen this as part of the sales pitch on websites selling PDFs as “ebooks.”)
That is indeed true and, occasionally, it might turn out all right. But usually it needs some help. And often a lot of help.
Why a PDF Can’t Magically Become an Ebook
This brings us to point number 2. Remember: the PDF has a hidden layer of information that gives precise instructions on how to make a page look like the printed page of the book. These instructions are useless for the eReaders because an eReader does not want to make a static page.
In fact, the program you use to make an ebook out of a PDF will not know what to do with most of those instructions. Computer programs are useful tools, but it is very hard to create one that can catch all the complexities involved in translating a PDF into an ebook.
Here are some examples of conversion oddities that we regularly encounter.
(1) Mysterious Text
The other day I started creating an ebook out of a PDF. I had put my document through the first step of the process and was going through the text to see just how much needed to be fixed when I came upon what seemed to be a random list of geographical places:
Having no idea why the list was where it was, I looked at the PDF as a reference. It turns out that the list had originally been words labeling different parts of a map. The PDF maker had layered the names of places on top of the image of a map. When I extracted the text from the PDF to create the ebook (and thus temporarily removed images), the names that had been layered onto the map were left all on their lonesome, referring to nothing and not knowing how to position themselves on the screen.
(2) The Ebook vs. the Typesetter
Think about what typesetters do: they do not let words fall randomly on a page to create a book. They are, after all, setting the type. Typesetters have to think about things like widows (the last word or last few words of a paragraph printed alone at the beginning of a new page or column) and orphans (the first line of a paragraph starting at the very end of a page or column), and when they are confronted with widows and orphans, they manipulate the page to get rid of them. And this becomes part of the PDF’s code.
When translated into ebooks, the mechanisms used to avoid widows and orphans turn into strange spaces between paragraphs or into paragraphs that start on the same line the previous paragraph ended:
Be good enough, honourable deputies, to take our request seriously, and do not reject it without at least hearing the reasons that we have to advance in its support. First, if you shut off as much as possible all access to natural light, and thereby create a need for artificial light, what industry in France will not ultimately be encouraged?
(3) Ran-dom Hyphens
Ever read an ebook and found a word with a ran-dom hyphen in the middle? It’s not just a typo.
Sometimes a word at the end of a line of text is too long and is continued on the next line. To let you know that the word is interrupted, the typesetter (or word processor) adds a soft hyphen at the end of the line — or sometimes the typesetter (or program) makes a mistake and adds a regular (hard) hyphen instead.
A soft hyphen is meant to stay in a word only if it is interrupted at the end of a line. Systems converting PDFs into an ebooks don’t always know what to do with these and will often just turn them into hard hyphens. Thus, even if the word is not shared between two lines, it still has a hyphen in it.
Printed books and their PDFs often have something called ligatures. A ligature is actually a combination of two letters that would otherwise look awkward next to each other. For example, f and i are often slurred together to avoid the awkward juxtaposition of the tip of the f and the dot of the i.
But an ebook converter won’t necessarily know what to do with ligatures and instead you’ll get a hyphen in the middle of a word, or a strange vertical line, or even a question mark to replace the ligature that once represented two letters.
These are only a few examples of why many ebook makers prefer to manually go through a text that is being converted from PDF to ebook: there are too many little details for a converting program to catch, and so the human eye is still vital in the process.
Think about how far we are from Gutenberg’s moveable type: a PDF may remind us of the beautiful, static pages Gutenberg offered the Western world, but the ebook redefines the notion of “moveable” type — with a well-made ebook (and often “handmade” ebook), each reader can move the type to create his own version of the text.