how does a monitoring software know the clock speed, data transfert etc. of components at any given time (CPU/GPU/disk/…)?

557 views

how does a monitoring software know the clock speed, data transfert etc. of components at any given time (CPU/GPU/disk/…)?

In: Technology

5 Answers

Anonymous 0 Comments

Computer clock ticks are generated from pizoelectric crystals which are crystal chips that will oscillate with at a very precise and known frequency when a DC voltage (from your power supply) is applied to them. The wave that comes off the crystal is then rectified into whatever form is necessary to drive discrete clock ticks of your CPU and other devices on your motherboard and there are various techniques where a low frequency wave from such a crystal can be frequency scaled multiple times to get higher frequencies. This is usually all defined by settings in your BIOS and your CPU knows what frequency its being fed. Thus it knows how many of its incoming clock ticks per second there are, and that’s how it keeps time.

Interestingly clocks in schools and factories etc. do exactly the same thing but directly from the AC signals of the power system. You can tell your power is not ideal when your clocks start lagging. i.e. if you’re on a 60 Hz system but your clocks always lag by a few seconds would mean you’re really only getting 59.whatever Hz.

Anonymous 0 Comments

Sometimes, the monitor will show what the component itself is reporting. The component knows it is using a certain amount of power, or is performing operations at a certain rate. The driver allows for the monitor to request and interpret this data.

The monitor may also request data from the motherboard, which can control voltage or report bandwidth.

Other times, the monitor will measure this with the help of the operating system, as it is doing its own monitoring.

Some utilities may measure by testing the component, such as timing how long it takes to do a certain task. This is different from merely asking for a report, and takes more effort.

Anonymous 0 Comments

[This video](https://www.youtube.com/watch?v=U612mx16j7U) gives a really simple and visual explanation about this question.

Anonymous 0 Comments

It’s essentially counters and clocks.

Counters are exactly what they sound like. For example, every time a packet gets transmitted through a network card, the corresponding counter gets one number added to it. In order to know a rate, you have to know an amount of change, and the time frame over which that change happened (just as we express speed in miles per hour, or kilometers per second).

These counters are usually (but not always) maintained by the system kernel to prevent tampering. The kernel allows higher-level programs to read these counters. Monitoring software will periodically read these counters, and also record the exact time when the reading was taken. With multiple readings, you then have multiple data points with which to establish a rate.

For example let’s imagine we are monitoring network traffic, and reading the counters once every second. If we have a reading at 9:00:01 that the counter read 1005 packets sent and then another reading at 9:00:02 that the counter read 1015, then we have established for that time period that the average rate was 10 packets per second.

Almost everything that produces metrics is done with counters. For example, figuring out how busy a CPU is, is a matter of figuring out how much time it is spending working on things and how much time it is idle. This is a really tough problem to solve, because the CPU doesn’t initially know how much time has passed between the time when it makes one calculation, to the next. In order to figure this out, an external time source is used. The CPU reads the time, then does a set number of calculations, and then reads the time again from this external source, to see how much time has passed. The CPU then knows how many calculations it can do within a certain time. Now that the CPU has established its rate, it uses counters. There are a few ways to make these external time sources, but one of the most common is to use a crystal which oscillates at an exact, reliable, known frequency when an electric current is applied to it.

When the CPU works on calculations for a Firefox tab for example, it increments a counter after each calculation it does. Then those counts are compared against the rate it gathered earlier, in order to figure out how much time it spent working on those calculations. That way, it can determine for a given time period, what percentage of available CPU calculation time was spent working on that Firefox tab. When a monitoring tool says that one process is using 50% of the CPU, what that really means is how much of the CPU’s time it consumed, since the amount of calculations a CPU can do over a set period of time is the limited resource that we are using to do our computing.To put some numbers to this, here’s an example:

Let’s say a CPU powers on, and using the rate-establishing method we previously mentioned. It figures out that it can do 5,000,000 calculations per second (This is actually really slow by today’s standards). After it has been booted up, and a Firefox tab is loaded, a piece of monitoring software grabs the current count of how many calculations the CPU has spent working on that Firefox tab once every second. Between the first and second data points it sees an increase of 1,000,000 on the counter. We now know that this tab has consumed 20% of the CPU’s 5,000,000 calculations per second over the course of that one second. The monitoring software can then report that that Firefox tab is consuming 20% of the CPU’s time.

Astute readers will notice just the sheer number of assumptions that have to be made in these equations. For example, we have to assume the CPU hasn’t changed the number of calculations it can do since it powered on. It also assumes that the clock signal it is receiving is oscillating at the same frequency al the time, and not drifting at all. These things can happen, and do sometimes cause problems. There have been lots of technical advancements in solving these problems, I definitely don’t know all of them.

One particular issue I had the displeasure of troubleshooting at a workplace many years ago was figuring out why some computers would crash, and then crash the very same way after being reset, but would stop crashing after they were fully powered off and back on. These were Ubuntu Linux servers, and we were recording their kernel logs at the time of crash in order to gather diagnostic data and figure this out. What we realized was that the CPU will sometimes erroneously establish its calculation rate, and it would remember that rate until it was powered off completely.

I might have gotten a few things here wrong that some Comp Sci friends will help me out with. I don’t claim to know much, I can only share what I’ve learned through a few years of experience in running computers.

Hope this helps!

Anonymous 0 Comments

You have to be careful. I worked for a company that was replacing older servers. The new servers were producing results slower than the older servers. It turned out the server engineering team was turning on energy saver mode, and the processors were running slower than the older processors. We were paying a fortune for SQL Server per processor licenses, and they were trying to save money on the electricity bill.

Also, if you are using VMWare, it can be very difficult to know the speeds you are actually getting, especially if you have different sized nodes in your VMWare cluster.