ELI5, How are precision calibration tools, themselves calibrated?

38 views

Feels like a chicken and egg senario. Let’s say I get my torque wrench from work sent off to be calibrated, and that’s calibrated with something itself needs to be calibrated, and so on and so fourth. How’s that figured out?

In: 430

A way of manufacturing a precision device is created then other devices using a similar mechanism can be calibrated. A device is made that derives it’s calibration from some universal element. Another device is then calibrated against this. If the original device is inaccurate then sometimes the error can he calculated very accurately or the average of a number of readings from a number of devices are taken.

The key to calibration is traceability. If I buy a calibrated tool, the company that calibrated it used their local standard. In the paper work they provide me, they include provenance showing when their local standard was last calibrated, and what it was calibrated to. In the US, these chains of provenance typically lead to a standard approved by the National Institute of Standards and Technology, a part of the US government.

It used to be that we had physical objects that were by definition calibrated. Whenever we wanted to calibrate an instrument to the highest degree so we could use it to calibrate other instruments we would use this object to calibrate the instrument and by definition the object would be calibrated.

The issue with this was that these objects changed slightly over time. So the definition changed. For example when the master kilogram lost a tiny bit of weight over some time then the definition of the kilogram changed and all the calibrated instruments were now out of calibration.

In order to fix this we are now using the resault of carefully chosen experiments. So you can measure a physical property, such as the speed of light, and you will by definition know the resault, if you get a different resault it is because your instrument is out of calibration so you need to have it adjusted.

With something like a torque wrench, you can calibrate it with an ordinary scale/balance. Multiply the force applied to the wrench by the length of the lever arm to get the torque, then compare that to the setting on the wrench. Don’t forget to account for the torque from weight of the wrench itself.

at the lab i work at we have a guy from the manufacture(s) come in a calibrate our tools. every 6mo or year depending on tool.

in my college electronics classes measurement tools were calibrated with other precision tools. its called metrology. you have to use a tool thats a certain amout more accurate than the one youre calibrating and so on and so forth. in college we had little handheld old school analog meters. we also had the machine to calibrate them that took up an entire table and was heavy as hell.

Same problem here….how programing language (c++, java etc) are programmed in first place?

The SI system is what defines meters, seconds, kilograms, Newtons etc. Your wrench may use foot-pounds-force, but feet and pounds-force are nowadays defined as certain numbers of meters and Newtons.

The SI system is defined in such a way that scientists can carry out experiments to get the length of a meter etc precisely.

For example, a particular atom in a particular state will give off radio waves with a specified number of waves per second. By counting waves, you can measure a second. And this is not an approximation, the second is defined as the time it takes for a certain number of waves.

Another example: Light and radio waves travel through vacuum at a fixed speed, which is specified in the SI standard. Using an accurate clock (see previous paragraph), you can measure how far light goes in a certain fraction of a second, and that is a meter.

Of course, all the above is completely impractical for day to day use.

So there are a small number of labs worldwide, typically one per country, which specialise in measurement. They will have a number of standards, such as 1kg lumps of metal or metal sticks with two marks precisely 1m apart. Those standards will have been checked by the experiments above, or against standards that were calibrated against those experiments. (E.g. There are only a handful of labs that have done the kilogram experiment).

In turn, those standards will be used to calibrate other standards or measuring devices, which will be used to calibrate other standards or measuring devices, and this repeats many times until one of those calibrated devices is used to calibrate your wrench.

Each time you calibrate something you end up with less accuracy than you started with. But your wrench probably doesn’t need to be accurate to one part per million, even one part per thousand is probably overkill.

Super captivating BBC Documentary about measurments and weight standards. They know how to tell a story. https://www.youtube.com/watch?v=XofuloR6x74

A calibration standard is, in general, calibrated to a better quality standard at a higher laboratory with better comparison equipment. However, at some point, there has to be a top laboratory with a reference standard which is the end of the chain.

Historically, this was with special specimens kept in very careful conditions, which were carefully built. For example, for many years, a laboratory in Paris kept a stick with two marks 1 meter apart engraved on it, and this was the reference meter. Another laboratory might get a stick and put two marks on it – but it would then have to be shipped to Paris, and measured against the reference meter stick. The lab would then key a record of the exact length.

These days, measures have been redefined to something fundamental which you can measure with a scientific experiment. The official meter is no longer the length of a stick in Paris, but there is an equation for the length of a meter as compared to the result of a scientific experiment. For example, top calibration labs don’t have use sticks as their top reference any more. Instead, they have a scientific apparatus which can perform a laser spectroscopy experiment which allows the time it takes for light to travel a certain distance to be measured. The lab can put a stick in the apparatus, and it will be able to give the exact length based on the equation and the result of the experiment.

Similarly, the second used to be defined as a fraction of the length of the day. A calibration laboratory would do an experiment to measure the height of the sun, and they could compare a clock to when the sun reached it’s highest point in the day marking noon. These days, the second is now defined as a multiple of the frequency of a specific transition of a cesium atom. This transition frequency can be measured by microwave spectroscopy, and you can compare a clock to the transition, and you can adjust the clock as needed. In fact, you can go out and buy an atomic clock, which is just a good quality clock, packaged with a spectroscopy apparatus and an auto-adjust system which checks the clock against the spectroscopy apparatus hundreds of times per second and adjusts the clock as needed.

For complex machines, you calibrate using an Asset that is known to be in good working order.

If you know what output your supposed to get from that asset, then you know what your machine should be outputting.

But all in all, there is a reason precision tools, and their technicians, are expensive as Fuck.

Calibration of microphones/speakers is one I’d like to hear about. How do you break out of that self referential loop?

There is an ultimate definition for each unit, in the past that might be a physical object, but more recently we’ve been switching to physical constants. To calibrate your torque wrench you ultimately need reference standards for mass, distance, and time.

For time we defined the resonant frequency of a cesium atom, so a specific count of oscillations equals one second.

For distance we fixed the value of the speed of light. So the distance light travels in one second. This works because we already established the second, and interferometry can be used to get very precise distance measurements.

And for mass we fixed the value of the Planck constant, which relates energy to frequency, and because of mass-energy equivalence also relates mass to frequency. In more practical terms, we create a sphere of ultra pure silicon, which has a known crystal structure and which has been refined isotopically, so we know the atomic mass and can count the number of atoms extremely precisely.

So once we have these standards:
force = mass * acceleration
newtons = kilograms * meters / (seconds * seconds)

torque is force applied tangentially at a distance, so newtons * meters.

If you want that in inch pounds, those are defined in terms of SI units.

Any device used for measuring is calibrated to some extent. Calibration is basically just comparing scales.

The tool used for comparison / calibration itself is manufactured at a higher accuracy level and therefore more exact. Say if you want to measure distance in a 1km accuracy, the tool you use for calibration should be accurate to at least 0.1km. This is why the super accurate devices are used in very controlled environments.

Oh shit! There’s never questions like this that I’m qualified to answer, however this one I can!

As stated by another redditor, there is what’s considered NIST traceability.

What that means is that there is an unbreakable chain of traceability back to the “standard” of measurement that all other measurements that can be derived from start at. This is agreed upon at an international level.

An oversimplification of this is that you imagine somewhere there’s a vault with a perfect block that measures 100 cm in length. (Example, not how it’s actually done)

It’s protected and is what everything that length is measured in is derived from. Inches, meters, feet, kilometers, acres, etc.

Every few years, very high accuracy secondary measuring “standards” are compared against the master standard.

This establishes the first level of traceability.

Each level of measurement down the line from that increases the “uncertainty” of measurement to account for variations in accuracy, human error, etc.

If you have ever seen a zombie or vampire movie, imagine that patient zero is the “master standard” and every zombie or vampire derived from that is a “little less perfect” than that singular top level unit.

For usage as calibration standards, there’s a guideline called the rule of 4 that stipulates when calibrating something, the standard you compare it against is at least 4 times as accurate as the unit under test.

i.e. if you are measuring a ruler that is accurate to 0.1 cm, the standard you compare against should be at least 0.025 cm accurate.

This helps retain that accuracy down the line for long periods of time.

Testing torque is pretty easy. You apply a weight that pulls the arm of the wrench at a specific distance from the centre of the axis. And then you read the scale manually to see what it says.

“What’s on the scale if I apply 1 kilogram?”

“What’s on the scale if I apply 2 kilograms?”

“What’s on the scale if I apply 3 kilograms?”

“What’s on the scale if I apply 4 kilograms?”

“What’s on the scale if I apply 5 kilograms?”

After a while, you get a table of numbers that you can use to establish a) how much the value on the scale deviates from the actual load b) if the tool is better or worse at certain parts of the range and c) if there has been an obvious change from the precious periodic calibration.

It’s possible that the tool is fit-for-purpose for the actual user case despite that it’s overall pretty crappy, but that’s besides the scope of this explanation.

For that testing location to be fit for purpose for the testing, you need to have a) a pretty sturdy rack for the test itself because you need to be reasonably certain that the test itself is adding as few as possible of the extra unwanted force directions that will make the test useless b) a verified, digital, level c) a verified set of weights d) knowledge about your local gravity (because that shit changes a hint on the decimals even within the same city) and e) a controlled climate (you want to be able to reproduce the same – within reason – circumstances again and again and again)

The weights are pretty essential in the whole thing, so you send them to an external institute annually or biannually or so. They, in essence, put them up one by one on a scale to find out if their weight is within an acceptable margin; for some users, it may be more than enough that their 1000g weight is ⨦1g. For others, the requirement may be ⨦0.01g

Their scale is *also* in a controlled climate. Only used for the purpose of verifying the weight of…weights. It’s reliability is verified with a *reference weight* every now and then (say, monthly?) and sometimes THAT is sent to another test institute for cooperative verification of both institutes. Occasionally, they lend in a national reference weight or perhaps an international reference weight so that they can compare to what other countries, on an international *treaty* level has agreed to be a certain weight.

So that’s how it works. You test everything with reference loads. And occasionally, you let someone else verify the reference loads, effectively borrowing the credibility of THEIR reference for your own calibrations. They, in turn, borrow the credibility of someone else’s reference load.

Remember how I said that a weight is rated? E.g 1000g ⨦0.1g?

What that says, basically, is that since the weight is not guaranteed to be better than one ten thousandth of it’s full weight, then you can never offer a better rating on a calibration with that weight than 0.01% of the tools full scale reading.

In reality, you also need to factor in the reliability of the digital scale, the reliability of the instrument that was used to establish the local gravity and so on and on and on. But that is kind of out of scope for the explanation.

But, the point I was trying to make is that all of the references have an established reliability, that they have inherited from the initial reference when the reference steps are taken into account.

If you can trace a weight to how it’s weight is established and within what fault tolerance it’s weight is established, you pretty much just have to make up your mind on if your reference has good enough tolerance for it’s purpose.

Also see how to handscrape plane surfaces. There are tricks where you can craft something completely flat, starting from imperfect parts. Tricks like those are used to ‘pull up by your own hair’, which is what toolmakers have to do.
That specific trick involves scraping 3 surfaces against each other, in a way that eventually makes all 3 perfectly flat.

The good news is that, along this chain there are definitely “accuracy multipliers” and forms of natural calibration that are often “good enough” to meet accuracy requirements.

For example: I could measure one foot-pound on a torque wrench with a balanced two-foot bar and a one-pound weight. Both the weight and distance have to be pretty accurate.

But suppose it’s a ten-pound weight and a 20-foot bar, and a mechanism (1:100 gearing) to reduce that torque by a factor of 100. Still one foot-pound, but any inaccuracy on the weight and distance is divided by 100!

Some calibration (when extreme accuracy isn’t needed) is easy. Ice water is always 0C; boiling water is 100C, so there’s probably the most common reference for thermometer calibration.

It’s important to remember that all measurements have an associated uncertainty. This includes the fundamental definitions for the seven base units such as length, time, mass, temperature, etc.

These base units are only redefined when we find a better method which results in a reduced uncertainty or easier implementation, etc. We just went through this with mass, the last base unit defined by a physical artifact. For a lot of reasons we don’t want base units defined by physical artifacts which can be lost or damaged. Work has been ongoing to redefine the kilogram for many years and just recently a method with better uncertainty and that can be realized by various labs around the world.

At all but the national research labs (NIST, NRC, etc) physical standards are still used – in fact even NIST and NRC use physical standards for most of their day to day work. Weights and measures inspectors for example use various grades or levels of stainless steel and cast iron standards depending upon the level of traceability required. High precision standards will be used to test precious metal scales and lower precision, but still calibrated and traceable, cast iron standards are used to test and calibrate larger freight and vehicle scales.

Canada’s K50/74 prototype kilograms are physical artifacts that are still the primary reference standards for the country. All other mass standards are compared to these standard. It is only when the primary standards need to be tested (they are never adjusted) that the kibble balance (new definition) would be used. Previous to this, the international prototypes were physically carried to Sevres, France for comparison with the international prototypes kilogram (colloquially le grand K)

https://en.m.wikipedia.org/wiki/International_Prototype_of_the_Kilogram

I happen to know of a textbook that covers this exact topic: [Foundations of Mechanical Accuracy](https://archive.org/details/FoundationsOfMechanicalAccuracy). It’s actually quite complex when you get down to it, but the other comments have it essentially right. You need a “Master” calibration tool that is more precise than all the rest of your tools which you can measure against. The book goes into detail on how you can create some of these master tools.

For instance, how do you create a perfectly flat plane from scratch (or as near perfect as can be)? If you already have a master flat plane to measure against, it’s easy — all you do is push your plane against the master and see where they aren’t touching evenly (dye can be used to make this more clear). Once you know where they aren’t touching flat, you can sand your plate down until it does. But how do you make a master plane without a master to reference?

The trick is to make three different flat plates and compare them to each other. Call them A, B, and C. Put A and B together, then sand them down repeatedly until they lie flat against each other, even when rotated 90/180 degrees. They’ll be *mostly* flat, but you can’t be sure that one doesn’t have a depression and the other a bulge. So what you do next is sand down C until it meshes with A. Since B and C both mesh with A, they’ll both have the same bulge or depression. Now you can mesh them with *each other*, and sand both down to get rid of that bulge/depression. If you keep repeating this process alternating between A, B, and C, eventually all three plates will lie flat against each other, and you can be confident that they’re all near-perfectly flat.

Each kind of master requires different tricks like this, but they all boil down to the same idea – gradually calibrate multiple different master versions against each other until they all agree with each other.

0 views

Feels like a chicken and egg senario. Let’s say I get my torque wrench from work sent off to be calibrated, and that’s calibrated with something itself needs to be calibrated, and so on and so fourth. How’s that figured out?

In: 430

A way of manufacturing a precision device is created then other devices using a similar mechanism can be calibrated. A device is made that derives it’s calibration from some universal element. Another device is then calibrated against this. If the original device is inaccurate then sometimes the error can he calculated very accurately or the average of a number of readings from a number of devices are taken.

The key to calibration is traceability. If I buy a calibrated tool, the company that calibrated it used their local standard. In the paper work they provide me, they include provenance showing when their local standard was last calibrated, and what it was calibrated to. In the US, these chains of provenance typically lead to a standard approved by the National Institute of Standards and Technology, a part of the US government.

It used to be that we had physical objects that were by definition calibrated. Whenever we wanted to calibrate an instrument to the highest degree so we could use it to calibrate other instruments we would use this object to calibrate the instrument and by definition the object would be calibrated.

The issue with this was that these objects changed slightly over time. So the definition changed. For example when the master kilogram lost a tiny bit of weight over some time then the definition of the kilogram changed and all the calibrated instruments were now out of calibration.

In order to fix this we are now using the resault of carefully chosen experiments. So you can measure a physical property, such as the speed of light, and you will by definition know the resault, if you get a different resault it is because your instrument is out of calibration so you need to have it adjusted.

With something like a torque wrench, you can calibrate it with an ordinary scale/balance. Multiply the force applied to the wrench by the length of the lever arm to get the torque, then compare that to the setting on the wrench. Don’t forget to account for the torque from weight of the wrench itself.

at the lab i work at we have a guy from the manufacture(s) come in a calibrate our tools. every 6mo or year depending on tool.

in my college electronics classes measurement tools were calibrated with other precision tools. its called metrology. you have to use a tool thats a certain amout more accurate than the one youre calibrating and so on and so forth. in college we had little handheld old school analog meters. we also had the machine to calibrate them that took up an entire table and was heavy as hell.

Same problem here….how programing language (c++, java etc) are programmed in first place?

The SI system is what defines meters, seconds, kilograms, Newtons etc. Your wrench may use foot-pounds-force, but feet and pounds-force are nowadays defined as certain numbers of meters and Newtons.

The SI system is defined in such a way that scientists can carry out experiments to get the length of a meter etc precisely.

For example, a particular atom in a particular state will give off radio waves with a specified number of waves per second. By counting waves, you can measure a second. And this is not an approximation, the second is defined as the time it takes for a certain number of waves.

Another example: Light and radio waves travel through vacuum at a fixed speed, which is specified in the SI standard. Using an accurate clock (see previous paragraph), you can measure how far light goes in a certain fraction of a second, and that is a meter.

Of course, all the above is completely impractical for day to day use.

So there are a small number of labs worldwide, typically one per country, which specialise in measurement. They will have a number of standards, such as 1kg lumps of metal or metal sticks with two marks precisely 1m apart. Those standards will have been checked by the experiments above, or against standards that were calibrated against those experiments. (E.g. There are only a handful of labs that have done the kilogram experiment).

In turn, those standards will be used to calibrate other standards or measuring devices, which will be used to calibrate other standards or measuring devices, and this repeats many times until one of those calibrated devices is used to calibrate your wrench.

Each time you calibrate something you end up with less accuracy than you started with. But your wrench probably doesn’t need to be accurate to one part per million, even one part per thousand is probably overkill.

Super captivating BBC Documentary about measurments and weight standards. They know how to tell a story. https://www.youtube.com/watch?v=XofuloR6x74

A calibration standard is, in general, calibrated to a better quality standard at a higher laboratory with better comparison equipment. However, at some point, there has to be a top laboratory with a reference standard which is the end of the chain.

Historically, this was with special specimens kept in very careful conditions, which were carefully built. For example, for many years, a laboratory in Paris kept a stick with two marks 1 meter apart engraved on it, and this was the reference meter. Another laboratory might get a stick and put two marks on it – but it would then have to be shipped to Paris, and measured against the reference meter stick. The lab would then key a record of the exact length.

These days, measures have been redefined to something fundamental which you can measure with a scientific experiment. The official meter is no longer the length of a stick in Paris, but there is an equation for the length of a meter as compared to the result of a scientific experiment. For example, top calibration labs don’t have use sticks as their top reference any more. Instead, they have a scientific apparatus which can perform a laser spectroscopy experiment which allows the time it takes for light to travel a certain distance to be measured. The lab can put a stick in the apparatus, and it will be able to give the exact length based on the equation and the result of the experiment.

Similarly, the second used to be defined as a fraction of the length of the day. A calibration laboratory would do an experiment to measure the height of the sun, and they could compare a clock to when the sun reached it’s highest point in the day marking noon. These days, the second is now defined as a multiple of the frequency of a specific transition of a cesium atom. This transition frequency can be measured by microwave spectroscopy, and you can compare a clock to the transition, and you can adjust the clock as needed. In fact, you can go out and buy an atomic clock, which is just a good quality clock, packaged with a spectroscopy apparatus and an auto-adjust system which checks the clock against the spectroscopy apparatus hundreds of times per second and adjusts the clock as needed.

For complex machines, you calibrate using an Asset that is known to be in good working order.

If you know what output your supposed to get from that asset, then you know what your machine should be outputting.

But all in all, there is a reason precision tools, and their technicians, are expensive as Fuck.

Calibration of microphones/speakers is one I’d like to hear about. How do you break out of that self referential loop?

There is an ultimate definition for each unit, in the past that might be a physical object, but more recently we’ve been switching to physical constants. To calibrate your torque wrench you ultimately need reference standards for mass, distance, and time.

For time we defined the resonant frequency of a cesium atom, so a specific count of oscillations equals one second.

For distance we fixed the value of the speed of light. So the distance light travels in one second. This works because we already established the second, and interferometry can be used to get very precise distance measurements.

And for mass we fixed the value of the Planck constant, which relates energy to frequency, and because of mass-energy equivalence also relates mass to frequency. In more practical terms, we create a sphere of ultra pure silicon, which has a known crystal structure and which has been refined isotopically, so we know the atomic mass and can count the number of atoms extremely precisely.

So once we have these standards:
force = mass * acceleration
newtons = kilograms * meters / (seconds * seconds)

torque is force applied tangentially at a distance, so newtons * meters.

If you want that in inch pounds, those are defined in terms of SI units.

Any device used for measuring is calibrated to some extent. Calibration is basically just comparing scales.

The tool used for comparison / calibration itself is manufactured at a higher accuracy level and therefore more exact. Say if you want to measure distance in a 1km accuracy, the tool you use for calibration should be accurate to at least 0.1km. This is why the super accurate devices are used in very controlled environments.

Oh shit! There’s never questions like this that I’m qualified to answer, however this one I can!

As stated by another redditor, there is what’s considered NIST traceability.

What that means is that there is an unbreakable chain of traceability back to the “standard” of measurement that all other measurements that can be derived from start at. This is agreed upon at an international level.

An oversimplification of this is that you imagine somewhere there’s a vault with a perfect block that measures 100 cm in length. (Example, not how it’s actually done)

It’s protected and is what everything that length is measured in is derived from. Inches, meters, feet, kilometers, acres, etc.

Every few years, very high accuracy secondary measuring “standards” are compared against the master standard.

This establishes the first level of traceability.

Each level of measurement down the line from that increases the “uncertainty” of measurement to account for variations in accuracy, human error, etc.

If you have ever seen a zombie or vampire movie, imagine that patient zero is the “master standard” and every zombie or vampire derived from that is a “little less perfect” than that singular top level unit.

For usage as calibration standards, there’s a guideline called the rule of 4 that stipulates when calibrating something, the standard you compare it against is at least 4 times as accurate as the unit under test.

i.e. if you are measuring a ruler that is accurate to 0.1 cm, the standard you compare against should be at least 0.025 cm accurate.

This helps retain that accuracy down the line for long periods of time.

Testing torque is pretty easy. You apply a weight that pulls the arm of the wrench at a specific distance from the centre of the axis. And then you read the scale manually to see what it says.

“What’s on the scale if I apply 1 kilogram?”

“What’s on the scale if I apply 2 kilograms?”

“What’s on the scale if I apply 3 kilograms?”

“What’s on the scale if I apply 4 kilograms?”

“What’s on the scale if I apply 5 kilograms?”

After a while, you get a table of numbers that you can use to establish a) how much the value on the scale deviates from the actual load b) if the tool is better or worse at certain parts of the range and c) if there has been an obvious change from the precious periodic calibration.

It’s possible that the tool is fit-for-purpose for the actual user case despite that it’s overall pretty crappy, but that’s besides the scope of this explanation.

For that testing location to be fit for purpose for the testing, you need to have a) a pretty sturdy rack for the test itself because you need to be reasonably certain that the test itself is adding as few as possible of the extra unwanted force directions that will make the test useless b) a verified, digital, level c) a verified set of weights d) knowledge about your local gravity (because that shit changes a hint on the decimals even within the same city) and e) a controlled climate (you want to be able to reproduce the same – within reason – circumstances again and again and again)

The weights are pretty essential in the whole thing, so you send them to an external institute annually or biannually or so. They, in essence, put them up one by one on a scale to find out if their weight is within an acceptable margin; for some users, it may be more than enough that their 1000g weight is ⨦1g. For others, the requirement may be ⨦0.01g

Their scale is *also* in a controlled climate. Only used for the purpose of verifying the weight of…weights. It’s reliability is verified with a *reference weight* every now and then (say, monthly?) and sometimes THAT is sent to another test institute for cooperative verification of both institutes. Occasionally, they lend in a national reference weight or perhaps an international reference weight so that they can compare to what other countries, on an international *treaty* level has agreed to be a certain weight.

So that’s how it works. You test everything with reference loads. And occasionally, you let someone else verify the reference loads, effectively borrowing the credibility of THEIR reference for your own calibrations. They, in turn, borrow the credibility of someone else’s reference load.

Remember how I said that a weight is rated? E.g 1000g ⨦0.1g?

What that says, basically, is that since the weight is not guaranteed to be better than one ten thousandth of it’s full weight, then you can never offer a better rating on a calibration with that weight than 0.01% of the tools full scale reading.

In reality, you also need to factor in the reliability of the digital scale, the reliability of the instrument that was used to establish the local gravity and so on and on and on. But that is kind of out of scope for the explanation.

But, the point I was trying to make is that all of the references have an established reliability, that they have inherited from the initial reference when the reference steps are taken into account.

If you can trace a weight to how it’s weight is established and within what fault tolerance it’s weight is established, you pretty much just have to make up your mind on if your reference has good enough tolerance for it’s purpose.

Also see how to handscrape plane surfaces. There are tricks where you can craft something completely flat, starting from imperfect parts. Tricks like those are used to ‘pull up by your own hair’, which is what toolmakers have to do.
That specific trick involves scraping 3 surfaces against each other, in a way that eventually makes all 3 perfectly flat.

The good news is that, along this chain there are definitely “accuracy multipliers” and forms of natural calibration that are often “good enough” to meet accuracy requirements.

For example: I could measure one foot-pound on a torque wrench with a balanced two-foot bar and a one-pound weight. Both the weight and distance have to be pretty accurate.

But suppose it’s a ten-pound weight and a 20-foot bar, and a mechanism (1:100 gearing) to reduce that torque by a factor of 100. Still one foot-pound, but any inaccuracy on the weight and distance is divided by 100!

Some calibration (when extreme accuracy isn’t needed) is easy. Ice water is always 0C; boiling water is 100C, so there’s probably the most common reference for thermometer calibration.

It’s important to remember that all measurements have an associated uncertainty. This includes the fundamental definitions for the seven base units such as length, time, mass, temperature, etc.

These base units are only redefined when we find a better method which results in a reduced uncertainty or easier implementation, etc. We just went through this with mass, the last base unit defined by a physical artifact. For a lot of reasons we don’t want base units defined by physical artifacts which can be lost or damaged. Work has been ongoing to redefine the kilogram for many years and just recently a method with better uncertainty and that can be realized by various labs around the world.

At all but the national research labs (NIST, NRC, etc) physical standards are still used – in fact even NIST and NRC use physical standards for most of their day to day work. Weights and measures inspectors for example use various grades or levels of stainless steel and cast iron standards depending upon the level of traceability required. High precision standards will be used to test precious metal scales and lower precision, but still calibrated and traceable, cast iron standards are used to test and calibrate larger freight and vehicle scales.

Canada’s K50/74 prototype kilograms are physical artifacts that are still the primary reference standards for the country. All other mass standards are compared to these standard. It is only when the primary standards need to be tested (they are never adjusted) that the kibble balance (new definition) would be used. Previous to this, the international prototypes were physically carried to Sevres, France for comparison with the international prototypes kilogram (colloquially le grand K)

https://en.m.wikipedia.org/wiki/International_Prototype_of_the_Kilogram

I happen to know of a textbook that covers this exact topic: [Foundations of Mechanical Accuracy](https://archive.org/details/FoundationsOfMechanicalAccuracy). It’s actually quite complex when you get down to it, but the other comments have it essentially right. You need a “Master” calibration tool that is more precise than all the rest of your tools which you can measure against. The book goes into detail on how you can create some of these master tools.

For instance, how do you create a perfectly flat plane from scratch (or as near perfect as can be)? If you already have a master flat plane to measure against, it’s easy — all you do is push your plane against the master and see where they aren’t touching evenly (dye can be used to make this more clear). Once you know where they aren’t touching flat, you can sand your plate down until it does. But how do you make a master plane without a master to reference?

The trick is to make three different flat plates and compare them to each other. Call them A, B, and C. Put A and B together, then sand them down repeatedly until they lie flat against each other, even when rotated 90/180 degrees. They’ll be *mostly* flat, but you can’t be sure that one doesn’t have a depression and the other a bulge. So what you do next is sand down C until it meshes with A. Since B and C both mesh with A, they’ll both have the same bulge or depression. Now you can mesh them with *each other*, and sand both down to get rid of that bulge/depression. If you keep repeating this process alternating between A, B, and C, eventually all three plates will lie flat against each other, and you can be confident that they’re all near-perfectly flat.

Each kind of master requires different tricks like this, but they all boil down to the same idea – gradually calibrate multiple different master versions against each other until they all agree with each other.