eli5 why pdf files are “Madness inside.”

466 views

I made a passing comment of asking how hard it would be to convert a pdf file to another file format by writing a discord bot for it (for our ttrpg game) and one of the players said “Hell, because pdfs are madness inside.”

Can someone explain to me why pdfs are so weird?

Edit: a typo

In: 185

12 Answers

Anonymous 0 Comments

With PDFs you don’t have single way to structure your content. It’s a WYSIWYG world.

If you have 2 identical looking PDFs, one may be nicely structured internally, containing all the raw text and graphics in a sequence closely matching the sequence it is displayed in. The other PDF can have the content strewn all over the inside of the file using absolute positioning and using only images and gliphs for the text. It all depends on who and how the particular PDF was made.

For converting: if your PDFs have the same nice structure and you want their text then it’s straight forward. If you find you’re having a lot of trouble converting a pile of them then consider other tools like OCR and using ML to extract images and positional data if needed.

You are viewing 1 out of 12 answers, click here to view all answers.