These neural networks are so named because their design is inspired by the brains of living things, like humans and animals. They are basically incredibly complicated wiring with billions of connections between the various parts.

But nobody really “built” these brains straight up. They are trained, which is a process involving giving it what is considered good information and tweaking the wiring of the brain over time until it is producing the good information we want. Training can easily take weeks of time.

But after weeks of time and letting the computer tweak the wiring of the brain… how is a human supposed to open up that box of wiring and make any sense of it? Even the computer has no idea why it’s doing most of the wiring adjustments it did beyond “doing it this way produced a better result during training than the other things I tried”. After repeating that process an absurd number of times, you’re left with something that you can’t really explain *how* it works, but clearly it *does* work.

Imagine you have a fence that has 3 numbered holes in it. You want your dog to go through hole 2.

So, you train your dog by giving it a doggy snack after it goes through hole 2, and eventually it learns to do that.

Well, imagine now you have thousands and thousands of fences with numbered holes. Eventually, your dog starts going to the number you want it to, but you realize that it doesn’t even really need the treats. It has figured out some way to determine what hole you want it to go through with you only saying “go”.

You don’t really know why it’s happening, but we go with it. But then, your dog starts to use those same cues to do other things. Occasionally you have to whip out the doggy snacks, but you are adding more and more complex holes to go through- now, instead of a fence with 3 holes, your dog can pick 1 hole out of 1,000. And then 20,000. And you can’t figure it out, but it is what you want.

So you continue to test, and provide the occasional doggy snack, but overall the dog is doing most of the work based off of what it has found to work in the past. And it keeps getting better and better.

This is why we say we don’t know understand exactly why AI works- because it is making “decisions”, just like your dog in the story above, but we don’t know exactly what reinforced those decisions.

EDIT: NSTickels did a better job explaining this than I was able to above.

This is an important distinction: it’s not that we don’t understand the principles of how they work — they were designed, after all. The issue is that the system is so complex, consisting of billions of paths that form a ‘neural network’, that nobody can tell exactly which path a given input takes to end up in a given output.

This is also a very real problem in that for example a recommendation cannot be validated when it’s not possible to know which data points weighted it in which ways. There are performance reasons, but it’s quite possible this is an intentional decision to avoid confirming that copyrighted or otherwise disallowed material was unlawfully used to train the model.

There are some more excitable people and charlatans who claim that there is some kind of an ’emergent consciousness’ inside LLMs that’s making decisions independently, but that’s just nonsense. All we have is an unimaginably large number of data points.

I work with AI and people saying “we don’t know how these things work” is a pet peeve of mine. These things are entirely procedural and we do know how they work, and with a sufficient grasp of linear algebra and enough patience you could build these algorithms with nothing more than pen & paper if you really wanted to. A more nuanced way to say it is that we can’t logically explain why they work.

There’s an axiom in classical statistical analysis that says *”correlation doesn’t equal causation”*, and coming up with a logical explanation for why a relationship between two things is causative is an important step in any project. What the fields of machine learning & AI do is essentially chuck that axiom out the window. What matters to AI is finding correlations that make good predictions, so if there’s a correlation between two things and it’s reliable enough to make good predictions, who cares if it has no logical explanation.

A “neural network” works by having many, many layers of arrays of numbers. These arrays of numbers can have tens of thousands, perhaps even *millions* of entries, each of which captures some tiny part of the structure of the input data. (ChatGPT uses text as its input data, for example.) By tweaking and adjusting the input data in bazillions of ways, the system is able to generate a probability range for what the next “token” (chunk of data) should be.

Thing is, it’s not really feasible for a human to keep in mind literally millions of distinct matrix multiplication steps all at the same time. You need to keep the whole thing in your head, all at once, just to *attempt* to figure out why the model maps certain inputs to certain outputs. Hence, although we know how the model adjusts its parameters each time it is corrected after making a bad prediction (aka every time it is “trained”), we do not know what is collected up together after *trillions* of tiny adjustments from training.

OP, I’m going to modify TitansFrontRow’s analogy to better describe a neural network and why even the designers don’t know how it did it…

Imagine that 3 holes in the fence example. You ask the dog to run through a hole, and he does. Well now let’s imagine that you have 3 holes on your back fence, 3 holes on your fence to the neighbor on the right, and 3 holes on your fence to the neighbor on the left. Each of your neighbors have multiple holes in each of their fences in different directions, as do their neighbors, and their neighbors, and their neighbors. One day, you are walking your dog and you are on the opposite side of your block and you tell your dog “go home!” and he takes off running through a hole in one fence. Your wife is at home in the backyard and 30 seconds later she calls you to tell you the dog is in the backyard. You don’t know which route your dog took to get through all of the neighbor’s yards and which fences he ran through, you just know you told him to go home, and he got home.

That’s how neural networks work. There are many “layers” which can each contain varying amounts of “nodes”. With this analogy, the layers would be each yard and the nodes would be the holes in a fence. But in an LLM, there would be thousands of layers, each with tens of thousands of nodes. And each node in layer N will have a connection to every node in layer N-1 and a connection to every node in layer N+1, so there are literally hundreds of billions of paths a single input could take to return a single output. And at the end, you have no idea which path your input took through those millions of nodes to give you the output you got, you just know that’s what you got.

The learning algorithms that power neural networks are procedures for finding complex mathematical functions that correctly predict the relationship between connected data. So if you have a bunch of pictures of letters and a bunch of labels (annotations of the letter it’s a picture of), a learning algorithm can automatically find a mathematical function that correctly predicts, from the pixel values, what the label should be, by iteratively refining the function/network to reduce error on the data.

The techniques used to refine the function (the learning algorithm) are well understood (and not very complicated). However, the functions that are produced by these procedures are extremely complex and don’t come with any guide to interpreting them. They work well in practice, but figuring out how they work is very hard and often just not possible with any level of fidelity.

Think of it like evolution. Evolution is very simple: try random variations on what you’ve got, keep the versions that work for the next generation, and discard the ones that don’t. Anyone can understand evolution. The *products* of evolution are staggeringly complex and nobody fully understands them. Trying to work out what that simple procedure had done is the entire field of biology.

It’s not as much as how they don’t know how it works, but rather they won’t know the output. When you use engineering software like Ansys to figure out something, you roughly know what the output will be. But when it comes to AI’s, the magic happens because you don’t know what it’ll output, like if you ask an AI “What is the most aerodynamic shape for a car that can hold 5 passengers” you don’t know what the output will be, which is why we don’t really “understand” it, it’s more akin to discovery.

So some AI terms to explain first. Nodes are basically just equations that take inputs and give outputs, Weights are numbers that you multiply the inouts by, Biases are numbers that you add to the equation at a node. To simplify more, a node is y = m1x1 + m2x2 + b with as many mx pairs as it needs, weights are the ms and biases are the bs.

Now to explain, lets build some simple AIs. 2 inputs to the system, a and c 1 layer of nodes combining them and then a layer to output them. So this system basically is y = (md)((ma)a+(mc)c+b1)+b2. In this systen we have 3 weights, md ma and mc and 2 biases b1 and b2. This AI we can very easily understand exactly how its working.

Now lets add an input. So 3 inputs now a,c, and d lets say. We are still going to combine them all so now we have 3 nodes in our first layer for ac, ad, cd but still one node in our output layer for combining those three. Im going to skip writing this out because it would be long but now simply by adding 1 input we now have 9 weights and 4 biases. This is still understandable but now much harder to follow.

Now lets add another layer between our initial combination and our output so now instead of ac, ad, and cd going to output we combine them to be (ac)(ad), (ad)(cd), and (ac)(dc) and then those go to our output. Now we have 15 weights and 7 biases. This is even harder to trace back to see how much a c or d are contributing.

Real AI and especially the fancier ones like the ones youre talking about are going to be using millions to billions of inputs and dozens to hundreds of layers. You theoretically can still see all the weights and biases they are using but actually following them back to see how any individual input effected the output is functionally impossible

## Latest Answers