Nvidia’s RT and Tensor Cores.

1.43K views

Hello, I’m a person who likes Hardware quite a lot and always tries to keep informed about it, and these couple of years with Nvidia’s new technologies I’ve kind of been struggling to fully understand how these new types of cores work.

I understand the basic concept, RT cores work for Ray Tracing, and Tensor cores for Deep Learning Super Sampling, but I want to understand what makes these cores better at their job different to normal CUDA or AMD’s Stream Processors (I know they’re quite different but understand that they act similarly).

I’ve tried to read but have come in contact with things like:

* “4 x 4 matrices”
* “4 x 4 FP 16/FP 32 matrices”

And I have no idea what that means, I think it’s ways of doing calculations and math, but I’m not sure. That’s specific for Tensor cores though, not RT cores, but I’m a lot more interested in Tensor cores to be honest because I’ve been seeing how it has evolved in DLSS 2.0 and it has come a HUGE way to DLSS 1.0, probably outperforming now most types of AA available right now. (Although I know it’s an upscaling tool other than AA, or I think that’s what it was)

So basically, could someone explain to someone who doesn’t understand much “Computer math” a simpler way to understand **WHY** these cores are best at what they specifically do and **HOW** they do it?

Thanks a ton! Hope this explains well what I wanted to know ^^.

In: Technology

3 Answers

Anonymous 0 Comments

First off FP16, FP32, etc. are all referring to floating points. Floating points are one way to store decimals/fractions on computers. Think of it like scientific notation, but with slightly different to work in binary. The number after FP refers to the size of the number. So FP32 is a 32 bit floating point. The more bits you have, the more precise the number.

I am not super familiar with Nvidia’s architectures, but I did glance through the architecture page about Ampere. So most of what I’ll say is generic, but should relate back to what you’ve read about Ampere.

Computers are pretty limited in how they do math. Every operation is limited to a certain number of digits. Think of it like doing math, but each number is limited to 3 digits. When asked to add 1100 + 900, you have to break it into 2 parts. First we do 100 + 900. We get 000 and carry the 1. Then we do 001 + the carried 1. Concatenate the answers from each part and we get 2000. Because we were limited to 3 digit numbers, the operation took twice as long. If we had 6 digits to work with, we could do the operation in one step.

That’s what nvidia has done. They added a lot more FP64 operators. So that you can do 64 bit floating point math more quickly than using 32 bit cores. And there’s a lot more of them, so you can do more operations simultaneously. They also added more 16 and 32 bit arithmetic units. Again, so more operations can be performed simultaneously.

In addition, they’ve added hardware support for matrix math. Matrices are basically tables of numbers. They’re often used to solve systems of algebraic equations (see Matrix Theory), which are very common in AI. The concept is similar to what I mentioned before. What used to take multiple operations to perform can now be done in one.

You are viewing 1 out of 3 answers, click here to view all answers.