What’s a logistic regression?


What’s a logistic regression?

In: 2

a regression trying to predict probability of an event. E.g. person finding a job, or defaulting on their debt.

Dependent variable (Y) is 0 or 1 depending on whether even actually happened.

You do not want to use linear regression on such data, since predictions can be above 1 or below 0, i.e. above 100% or below 0% probability, which means you predict that event is guaranteed to happen (or not).

Logit function turns any number into a probability, strictly between 0% and 100%. This is a much better prediction, since you always allow for either thing to happen.

We have an intuition that if a person studies 0 hours for a test, they are likely (but not certain) to fail it, and if they study 100 hours for the test, they likely (but not certain) to pass it. If we have data on a hundred students, how long they studied, and whether they passed, how can we use data to quantify that and determine the likelihood that someone who studies a certain amount of time will pass? And what if there are multiple dimensions to the data, and we want to know whether someone is more likely to pass if they are an A student with no off-campus job who only studies one hour, vs. a C student with an off-campus job who studies five hours?

Logistic regression is a process of reducing one-or-more-dimensional data into a single value from 0 to 1 (or from 0% to 100%) using a particular exponential function, so that you can get a best estimate for never before seen examples.