Why is barcode scanning so quick and accurare, but OCR even for digital text is kinda bad?


Why is barcode scanning so quick and accurare, but OCR even for digital text is kinda bad?

In: Technology

Because OCR has to deal not only with different fonts, but also different languages. To further complicate matters, if you’re dealing with a camera OCR, you might be holding it at just enough of an angle to distorte it, have a speck on the lens, light might not be good, and so on. Bar code scanners produce light and only have to deal with one style of thing (and even then it can still mess it up and not scan).

For barcodes, you’re essentially plotting 2 values (black or white) over a line across the bars. Then you “read” the information encoded in this string of values. That’s an extremely simple task, and there’s a convention to the system.

OCR on the other hand requires more complex algorithms (ie rules) to recognise patterns in images. Couple that with noise and a large number of different styles/fonts, it makes it more difficult.

There is a massive difference in complexity between the two. Bar codes are 1 dimensional variations (thickness of a given line) while OCR must establish the edges of each purported character, it’s relationship to other characters, font variations, bad handwriting, etc. In the true ELI5 spirit, I’d say that bar codes are checkers while OCR is chess.

OK: Barcodes and QR codes have simple, predefined shapes and rules. Because of these strict rules, they are either read correctly or they aren’t read at all (ie you can’t read half a barcode because that’s not a barcode).

Language is _totally_ different. There’s different fonts, text sizes, things that look like letters but maybe aren’t, things that aren’t letters but look like them. Is that an I or a pillar? If you want some clever on-the-fly stuff you’ve got to account for what they look like at weird angles in different lighting.

(Also, in general: OCR scanned from a book is pretty good and has been for a long time so 😛 )

You’re basically asking “why are computers good at numbers and not good at languages”: the answer is just that’s how they work.

In addition to all other replays, barcodes usually have very low amount of information encoded (QR codes usually only encode an internet URL) while OCR is usually done on a much longer text.