A “neural network” works by having many, many layers of arrays of numbers. These arrays of numbers can have tens of thousands, perhaps even *millions* of entries, each of which captures some tiny part of the structure of the input data. (ChatGPT uses text as its input data, for example.) By tweaking and adjusting the input data in bazillions of ways, the system is able to generate a probability range for what the next “token” (chunk of data) should be.
Thing is, it’s not really feasible for a human to keep in mind literally millions of distinct matrix multiplication steps all at the same time. You need to keep the whole thing in your head, all at once, just to *attempt* to figure out why the model maps certain inputs to certain outputs. Hence, although we know how the model adjusts its parameters each time it is corrected after making a bad prediction (aka every time it is “trained”), we do not know what is collected up together after *trillions* of tiny adjustments from training.
Latest Answers