In engineering everything is a tradeoff to achieve a stated goal.
What is a stated design goal of PDF?
1. It should be easily sent to printers
2. It should be rendered the same on any machine (regardless of fonts, OS, graphic adapters, locales, etc).
3. It should be small size for large documents (hundreds of pages)
You see how there is no goal “It should be easy to extract meaningful information from a document”?
PDF documents (and programs that create PDFs) are concerned only about how it looks, not that content is semantically makes sense.
For example, if you have 5 paragraphs on a page, there is no guarantee that they will go in the same order in the document file. The only thing that matters is how it looks.
For this reason PDF is almost as hard to read as a picture. And programs that do read PDFs do it because they coded hundreds and hundreds of real-world PDF hacks into their readers.
Latest Answers