Patented AI engineer here.
AI is primarily a series of multiplications and additions across large tables (matrices). The largest models (like GPT-4) have about 1 trillion parameters, which are the weights learned during the training phase. These trillion parameters compress the model’s knowledge about the world, which was obtained from training on vast datasets consisting of text and images from the internet. This datasets are on the order of petabytes, so it seems impossible to rapidly access them.
Now, imagine the size of 1 trillion numbers. If each number is stored in FP16 precision (2 bytes), the total storage required is about 2 terabytes. This 2 terabytes represents the compressed form of the knowledge the model has learned.
To generate a response, the model needs to perform a large number of multiplications and additions for each word it generates. Modern compute power is incredibly advanced. For example, NVIDIA’s H200 GPUs can produce up to 8 petaflops (quadrillion operations per second) each. In a setup with 20 such GPUs, you get a combined 160 petaflops.
Considering that each word generation might involve several trillion operations, this setup allows the model to generate words quickly. For instance, with 160 petaflops, you can theoretically perform 160 quadrillion operations per second. Dividing this by the trillions of operations needed per word, such a system could generate thousands of words per second, demonstrating the immense computational power available today.
Latest Answers