Nvidia’s RT and Tensor Cores.

1.43K views

Hello, I’m a person who likes Hardware quite a lot and always tries to keep informed about it, and these couple of years with Nvidia’s new technologies I’ve kind of been struggling to fully understand how these new types of cores work.

I understand the basic concept, RT cores work for Ray Tracing, and Tensor cores for Deep Learning Super Sampling, but I want to understand what makes these cores better at their job different to normal CUDA or AMD’s Stream Processors (I know they’re quite different but understand that they act similarly).

I’ve tried to read but have come in contact with things like:

* “4 x 4 matrices”
* “4 x 4 FP 16/FP 32 matrices”

And I have no idea what that means, I think it’s ways of doing calculations and math, but I’m not sure. That’s specific for Tensor cores though, not RT cores, but I’m a lot more interested in Tensor cores to be honest because I’ve been seeing how it has evolved in DLSS 2.0 and it has come a HUGE way to DLSS 1.0, probably outperforming now most types of AA available right now. (Although I know it’s an upscaling tool other than AA, or I think that’s what it was)

So basically, could someone explain to someone who doesn’t understand much “Computer math” a simpler way to understand **WHY** these cores are best at what they specifically do and **HOW** they do it?

Thanks a ton! Hope this explains well what I wanted to know ^^.

In: Technology

3 Answers

Anonymous 0 Comments

[Matrices](https://en.wikipedia.org/wiki/Matrix_(mathematics)) are a mathematical tool that can be used for a bunch of stuff.

They are everywhere in machine learning/AI, and quite common in 3D rendering (most transforms in 3D space can be reprensented using a 4×4 matrix).

FP 16 and FP 32 are types of numbers. FP means [floating point](https://en.wikipedia.org/wiki/IEEE_754) which is how decimal numbers are represented (most of the time) by computers. 16 or 32 is the size of these numbers. FP16 is a 16 bit floating point number, and FP32 is a 32 bits floating point number.

So a 4×4 FP32 matrix is a matrix made of 16 (4×4) 32 bits floating point numbers.

> understand WHY these cores are best at what they specifically do and HOW they do it?

Doing ML require doing a **shitton** of matrix math. Tensor cores are basically specialized circuits that do matrix operations. Using dedicated circuits is faster than using general purpose processors^[1] which is why tensor cores make DLSS and other ML based techniques much faster.

**************

[1] There are several reasons why this is the case, one is that integrated circuits can do complex operations “in one go”: they don’t have to read, decode execute and store intermediate results and instructions. An other one is that it frees the general purpose circuits to do other stuff.

You are viewing 1 out of 3 answers, click here to view all answers.