How does an artificial neural network function?

In: Technology

To build a neural network, you’d need some neurons, right?

Well, the neurons we use are artificial neurons. They are mathematical machines that add up everything that comes in the back, and send it out to everything connected to the front.

But when sending things out the front, they ‘weight’ each connection – in other words, they multiply it by some value. This allows each neuron to be different.

If you connect them together, you get a neural network. If you have more than one ‘layer’ of neurons, you get a deep neural network, which can do more complicated things. If you have a loop in your network, you have a recurrent neural network, which can ‘remember’ things.

To actually get it to do what you want, you take a network, but don’t put in any of the weights. You then use an error function, that takes the set of weights for the network, tries them, and produces an ‘error value’ – basically scoring the network on how it performs.

You can then use calculus to find the set of weights that produces the lowest error value. It turns out, if you choose the right shaped network for the right job, you get a pretty good neural network.

Our brains have lots of little connections inside them. Those connections can move around and make other connections too!

Those connections are eventually connected to our other organs like our eyes, ears, skin, nose, etc… and can mix and match connections as we age. Those connections in turn help our organs to react to the world. It’s how we know what smells good, or bad. It’s how we learn to walk by controlling our muscles. Things like that.

The way these connections are actually made is known on a small scale, but a lot less understood how they work on a big scale. For example we know how neurons are physically structured and how they transmit signals, but we don’t really understand how many many thousands of connections represent complex behavior like talking or reading. We know there are many many connection groups that do one or many things and some groups can even do some of the duties of other groups. We even know what some of those groups are.

ANNs try to emulate some of that behavior. We make lots of little connections that eventually are connected to inputs and outputs that act like our organs. Then ask them to do something like learn to walk a digital mannequin or identify pictures of cats.

At first they do really badly and act pretty much randomly. Then we tweak the connections and have them try again. There is a lot of study that goes into how exactly to tweak the connections but it’s basically the same thing in the end, making the connections perform ‘better’ at a task than it did before.

Just like with the study of living brains we don’t really know ‘how’ it knows wha to do. Only that it got better at doing it through learning by mistakes.

Lots of study is going into figuring out how exactly those connections work on a higher scale and we can use that info to possibly understand living brains better too.

Data is first converted into numerical inputs. For an image, this could be the colour of each pixel, for text it could be the words themselves. For example, the sentence “over here over there” might be converted into 0102 (1 number per unique word).

Now you have a bunch of input numbers, so you can perform mathematical operations on them. For example, you could multiply 2 inputs together, or add them, or add only half of the first the 2x the other. We call this bunch of mathematical operations a “layer”.

These processed outputs are fed into another layer, which does the exact same thing. It happens again and again until you get an output (say the final output is 1.7). Sometimes there can be multiple outputs, but I’m going to stick with classification as an example.

We compare these outputs to some validated test data, for example if we fed Romeo and Juliet into the same bunch of layers and got an output of 1.7, we would call all outputs near 1.7 “Shakespeare text” while an output of 1.2 is “nursery rhymes”.

Now we semi-randomly adjust the mathematical operations until the neural network becomes more accurate. For example, maybe adding is more useful than multiplying, or maybe one input data is useless (eg the word “the” is too common to be useful).

The semi-random adjustments will make it so the most useful data is kept, and the “best” way to process the data to get the most accurate results is found.

After the neural network is trained and tested on good data, you can use it on other new data. Using the same set of mathematical operations and method to covert data into numerical input, you will probably get an output which is useful.