eli5 why pdf files are “Madness inside.”

454 views

I made a passing comment of asking how hard it would be to convert a pdf file to another file format by writing a discord bot for it (for our ttrpg game) and one of the players said “Hell, because pdfs are madness inside.”

Can someone explain to me why pdfs are so weird?

Edit: a typo

In: 185

12 Answers

Anonymous 0 Comments

A PDF is like a printed book. It has all the information in it and is easy to read. It is not easy to *change*.

If you want to alter and reprint a book, you need the file that created it — such as a Word document.

Anonymous 0 Comments

With PDFs you don’t have single way to structure your content. It’s a WYSIWYG world.

If you have 2 identical looking PDFs, one may be nicely structured internally, containing all the raw text and graphics in a sequence closely matching the sequence it is displayed in. The other PDF can have the content strewn all over the inside of the file using absolute positioning and using only images and gliphs for the text. It all depends on who and how the particular PDF was made.

For converting: if your PDFs have the same nice structure and you want their text then it’s straight forward. If you find you’re having a lot of trouble converting a pile of them then consider other tools like OCR and using ML to extract images and positional data if needed.