AnswerCult

Question

354 viewsJanuary 3, 2024

Question 100.55K March 19, 2022 0 Comments

For latin script, it is understandable that one may use OCR to upload hardcopies into database, but how are the dictionaries from non-Latin script languages, East Asian, Indian and African Languages converted to a database.

Some examples: [https://kosha.sanskrit.today/word/sa/prabhRti?q=%E0%A4%AA%E0%A5%8D%E0%A4%B0%E0%A4%AD%E0%A5%83%E0%A4%A4%E0%A4%BF](https://kosha.sanskrit.today/word/sa/prabhRti?q=%E0%A4%AA%E0%A5%8D%E0%A4%B0%E0%A4%AD%E0%A5%83%E0%A4%A4%E0%A4%BF)

[https://glosbe.com/](https://glosbe.com/)

In: 0

2 Answers

You are viewing 1 out of 2 answers, click here to view all answers.

Answer 1 · 2022-03-19T05:06:02+00:00

OCR can potentially work on any script; there are systems available in most scripts already, and even if not you can train a new one from scratch. (Although note that if you don’t have ANY other text in the language in question, it’s hard to correct the inevitable mistakes. You end up needing human error correction.)

But in practice, many online dictionary projects did something much easier than train a new OCR model: they just hired people to type it back in. Literate people are more plentiful and cheaper than people who know how to train a new OCR system. (And you need literate people anyway, to do the aforementioned error correction.)

Also, as an educated guess, the majority of dictionaries for the world’s languages have been written in the last 40 years, when we’ve had computers. (Of the 7k languages in the world, most don’t have a very long tradition of books and literacy. A lot of languages got their first dictionaries, bible translations, textbooks, etc. only recently.) So when the dictionaries were made by computer in the first place, we don’t need to get them back into computers (except when someone loses the source files… which tbh is dismayingly often).

AnswerCult

how online dictionaries for so many non-Latin script languages are made

2 Answers

Search questions

Popular Questions

Latest Answers