What is latent dirichlet allocation?


I am a graduate psychology student studying extremist literature and my advisor recommended LDA and sent me an article to read describing it but both him and the article are too verbose for me to quite grasp. I then tried a google search but the terminology is throwing me off. I would greatly appreciate it if someone can ELI5!

In: 1

It’s a mathematical method to analyze the distribution of something. In your case propably words in some text?

You group certain keywords by topic, and then analyze your text by wich keywords appear in wich frequency to find out wixh topics this text contains to wich degree.

For example to find out if a text has antisemitism as a topic, you get a dictionary of words that are related to that and look how these are distributed in your document.

Your observation is a bunch of ants carrying shit in and out of an ant hill.

LDA says:

Make guesses about what they’re doing with the stuff.

Create the actual statistic model that would be needed for them to do the stuff you think they are doing

Compare these statistical models to the observation. do they match?

Imagine 3 groups of friends that you can join based on similar clothes … similar shoes, shirts, and pants… BUT … with more diversity of styles and such, new groups can form (not limited to existing 3).

Basically, LDA is probability of joining an existing group with some probability of forming a new group. Sometimes called Nonparametric Bayes, although would be more appropriately named potentially-infinitely parametric Bayes 😉