why does copy/pasting from pdfs sometimes have a problem with the ‘ti’ combination of letters?

247 viewsOtherTechnology

seems that when you copy and paste, sometimes there’s a question mark in a box replacing letters, particularly t or ti in some strange unicode error. but why in my experience does it only affect these characters?

In: Technology

4 Answers

Anonymous 0 Comments

Some fonts have special characters called “ligatures” which are combinations of letters designed specifically to look good together – as a single character. For example, “ti” – in some fonts, the cross of the t and the dot of the i can overlap and look weird, so there’s a special character which is a specifically-designed version of t and i together.

Copying and pasting from PDFs is a dicey prospect at best. The text in a PDF isn’t actually “text” – it’s a weird compressed processed image OF text. When you attempt to copy the text out of it, your computer uses a text-recognition algorithm to do the best it can at identifying the letters. Weird characters like ligatures sometimes confuse it and you get errors.

You are viewing 1 out of 4 answers, click here to view all answers.