What does Godhart’s law mean?

317 viewsOther

It goes “When a measure becomes a target, it ceases to be a good measure.” How does that work in practice?

In: Other

30 Answers

Anonymous 0 Comments

A classic example: lines of code written.

As a software engineer, I spend a bunch of time writing code. On a bad day, if I’m unfocused, I’ll write very little code. On a good day, I’ll write a bunch of code. If you ignore my other responsibilities, you can look at how much code I write on any given day and measure how productive I’m being. Lines of code written are a fairly decent measure of my personal productivity.

Ok, if lines of code written are a measure of productivity, and you want to reward productivity, you should reward people for writing more code, right?Your measure becomes a target. Well, that’s where it all goes awry.

First off, different people have different styles. I try to write code as simple as possible, and tend towards a fairly terse style, so I’ll almost always write less code than my colleagues to complete the same task. So now my performance seems “bad” for no particular reason. Second, more perversely, if your objective is to write as much code as possible, it’s _really_ easy to write the same functionality using more code, at the cost of it being really low quality code. You know the sort of person who just rambles on and speaks a lot without really saying anything? Code written like that.

So, by rewarding people for writing more code, the net result is that you get lower quality code that’ll be harder to maintain in the long term. In becoming a target, lines of code stopped being a good measurement of productivity.

Anonymous 0 Comments

Speed limits should indicate that going any faster is dangerous. And so you should always drive slower. Instead, people use them as a target for how fast they can drive, so typically there ends up being some slight leeway above the limit for how fast you can drive without being prosecuted. Which means that the “limit” is no longer a useful measure of how fast you can safely drive.

Somewhat similarly, an amber light technically means “stop unless it is unsafe to do so”, and a red means “stop”. But that measure of safety has become a target of “if I can get through this amber light before it turns red I’m ok”. And so people accelerate into a junction when they should be slowing down, with potential negative effects for cross traffic and pedestrians 

Anonymous 0 Comments

A good measure objectively represents the state of a process, while a target carries intent and is no longer objective.

Realistically this is kind of pedantic because very few people measure for measurement sake, except for chronic scientists who seek the truth regardless of outcome. Doing science with a target is highly susceptible to confirmation bias – convinced of the thing before proven with evidence.

Anonymous 0 Comments

Another example is the bradford score. Supposed to show the effect on the company of an employees absence.

Instances X Instances X Days

So 2 days off in 2 instances = 2 X 2 X 2 =8

vs

taking the whole week off 5 days in 1 instance = 1 X 1 X 5 = 5

If I’m not sure if I’m entirely well again after taking monday off and might well be off again friday its less punishment to take the whole week off due to the lower score.

Yet the employer loses 5 days work, not 2.

Which works against the principle, especially as I won’t be telling them on the monday, I’ll be ringing in every day (in the UK we self certify up to 1 week, no doctors notes needed).

Anonymous 0 Comments

A real world example would likely be checkout times for cashiers and fast food and the like.

A good while back, someone realized that Good cashiers were generally quick. They were efficient and and got things done quickly because they were good at their job. So the Measure of a good Cashier could be how fast they can get people checked out correctly.

So… Someone decided it would be a good idea to grade cashiers on how quickly they got stuff done, as a way to reward good cashiers and indirectly punish ‘bad’ cashiers. The Measure became a Target if you wanted to get promoted or get decent raises.

The issue quickly became that cashiers would work to meet that target, and would leave other important tasks to the side, because they feared being punished if they took to long to get someone out of their lane, and would become unnecessarily stressed when things outside their control cause that metric to suffer, like an older person taking time to pay, or a person who has lost their card, etc. They might rush customers instead of present a good customer experience, or might not take time to scan every item, costing the business money in shrink.

Anonymous 0 Comments

Schools do standardized testing every year. Good, fine.

Schools decide to give raises and promotions to teachers whose students do better in standardized testing.

Teachers are incentivized to get the dummies kicked out of class, to spend time with the ones deemed salvageable and ignore the others, to cheat on the standardized testing, to ignoring anything not on the standardized tests, to avoid any ESL students, to avoid underperforming districts, and so on.

Yeah, now standardized testing may now be a net negative.

Anonymous 0 Comments

Ever hear the term “teaching to the test”?

If the goal is to teach someone how to do multiplication, it makes sense to have a test at the end of the class to see if they can do it. However imagine if the teacher was paid based on how well the class did on the test (as a measure of how well they taught the class). If the teacher knows the questions will be what is 6×7, 11×12, and 3×4… would it make more sense for the teacher to teach all of multiplication or just teach those questions. Because the teachers goal now is not teaching multiplication, it’s getting the kids to pass the class.

Now the classes will likely get a lot of 100%s but they might not know multiplication.

Anonymous 0 Comments

Most of the top responses here only get this half right. As many others have said, the problem is that people alter their behavior in response to quantified targets, which can produce unintended consequences. One of the most important consequences is that, because they are now targets, the metrics that those targets are based on *no longer actually measure what we think they measure.*

A classic example (which fans of The Wire will be familiar with) is the use of test scores to evaluate teacher and student performance. Let’s say you want to improve schools by rewarding good teachers and firing bad ones. How do you identify good teachers? One way is by looking at their students’ standardized test scores. Presumably, students of good teachers will on average perform better on standardized tests than students of bad teachers. At the outset, student test scores are a plausible measure of teacher quality.*

What happens when you tell teachers and schools that they will be rewarded or fired based on student test scores? They will do whatever they can to improve those test scores. In many cases, they will start ‘teaching to the test’ – that is, sacrificing other goals in order to produce high test scores, rather than being good teachers. If this becomes widespread, students’ test scores will no longer measure overall quality of teaching, they will measure teachers’ single-minded focus on teaching to the test. *The metric no longer measures what we think it measures*.

* Many people would debate this premise, but for the sake of the example I am stipulating that it is at least plausible.

Anonymous 0 Comments

Exams in schools are basically the absolute perfect example of this.

So what is the thing that we actually want to measure in school? We want to measure how much students learn in school. This is very hard to measure directly, we can’t look into peoples minds and figure it out.

So instead we invent exams and tests. This is our measure. There is a difference however between learning subjects in school and performing well in tests for those subjects so it should be fine right? There is a strong correlation between learning and writing good tests so the test is a good measure at first glance. Students who learned more (for whatever reason) should perform better on tests

But then we reward students for being good at tests (and teachers for their students performing good at tests etc). So now our students try to get good results on exams. The tests and exams have become a target.

All of the small difference between performing well in tests and actually learning become much more pronounced because people try to find methods to get the desired results on these tests and many of these might not have any direct influence on actually learning the subjects. The tests have become a way worse measure of how much students learn.

Anonymous 0 Comments

Here’s an example to help explain it:

My wife was a teacher. At some point, her school decided that too many kids were failing. They decided that every time a teacher gave a failing grade, they’d have to write out an explanation of why they did that, what steps they’ve taken to help the kid learn the material, what contacts they had with the parent, how everybody responded to their attempts, and their plans going forward to see that the kid learned the material.

Instead, teachers would just change the grades to D-, which is a passing grade. So the kids were passing, but they still didn’t know the material, and were not prepared for the next grade. Next year, of course, the kids would still receive passing grades, and would still not have learned the material from the previous year, let alone the current one.

The measure (kids failing) became a target (don’t fail kids), and it ceased to be a good measure (it no longer measured whether the kids had learned the material.)

The reason the school decided to do that was another level of Godhart’s Law. The administration was being penalized based on the number of kids who failed, so they were trying to reduce the number of kids failing. At another time, they simply declared that the lowest grade a teacher could give was a D. My wife had a few kids who would simply never do any work at all, would turn in blank tests, and who knew very well that they’d still pass, even if they skipped half of their classes.

This is part of why she retired from teaching.