A neural network is a form of machine learning that is referred to as deep learning. It’s probably the most advanced method of machine learning, and truly understanding how it works might require a Ph.D.
Neural networks are computer systems designed to mimic the path of communication within the human brain. In your body, you have billions of neurons that are all interconnected and travel up through your spine and into your brain. They are attached by root-like nodes that pass messages through each neuron one at a time up the chain until it reaches your brain.
While there is no way to replicate this with a computer yet, we take the principle idea and apply it to computer neural networks to replicate the ability to learn as a human brain learns; recognize patterns and inferring information from the discovery of new information.
In the case of the neural networks, as with all our machine learning models. Information is processed through neural networks as numerical data. By giving out numerical data values, we are giving it the power to use algorithms to make predictions.
Just as with the neurons in the brain, data starts at the top and works its way down, being first separated into nodes. The neural network uses nodes to communicate through each layer. A neural network is comprised of three parts; Input, hidden, and output layers.
In the picture below, we have a visual representation of a neural network, with the circles being every individual node in the network. On the left side, we have the input layer; this is where our data goes in. After the data passes through the input layer, it gets filtered through several hidden layers. The hidden layers are where data gets sorted by different characteristics and features. The hidden layers look for patterns within the data set. The hidden layers are where the ‘magic’ is happening because the data is being sorted by patterns that we probably wouldn’t recognize if we sorted it manually. Each node has a weight which will help to determine the significance of the feature being sorted.
The best use of these neural networks would be a task that would be easy for a human but extremely difficult for a computer. Our human brain is a powerful tool for inductive reasoning; it’s our advantage over advanced computers that can calculate high numbers of data in a matter of seconds. We model neural networks after human thinking because we are attempting to teach a computer how to ‘reason’ like a human. This is quite a challenge. A good example of a neural network is the example we mentioned we apply neural networks for tasks that would be extremely easy for a human but are very challenging for a computer.
Neural networks can take a huge amount of computing power. The first reason neural networks are a challenge to process is because of the volume of datasets required to make an accurate model. If you want the model to learn how to sort photographs, there are many subtle differences between photos that the model will need to learn to complete the task effectively. That leads to the next challenge, which is the number of variables required for a neural network to work properly. The more data that you use and the higher the number of variables analyzed means the more there is an increase in hidden networks. At any given time, several hundred or even thousands of features are being analyzed and classified through the model. Take self-driving cars as an example. Self-driving cars have more than 150 nodes for sorting. This means that the amount of computing power required for a self-driving car to make split-second decisions while analyzing thousands of inputs at a time is quite large.
In the instance of sorting photos, neural networks can be very useful, and the methods that data scientists use are improving rapidly. If I showed you a picture of a dog and a picture of a cat, you could easily tell me which one a cat was, and which one was a dog. But for a computer, this takes sophisticated neural networks and a large volume of data to teach the model.
A common issue with neural networks is overfitting. The model can predict the values for the training data, but when it’s exposed to unknown data, it is fit too specifically for the old data and cannot make generalized predictions for new data.
Say that you have a math test coming up and you want to study. You can memorize all the formulas that you think will appear on the test and hope that when the test day comes, you will be able to just plug in the new information into what you’ve already memorized. Or you can study more deeply; learning how each formula works so that you can produce good results even when the conditions change. An overfitted model is like memorizing the formulas for a test. It will do well if the new data is similar, but when there is a variation, then it won’t know how to adapt. You can usually tell if your model is overfitted if it performs well with training data but does poorly with test data.
When we are checking the performance of our model, we can measure it using the cost value. The cost value is the difference between the predicted value and the actual value of our model.
One of the challenges with neural networks is that there is no way to determine the relationship between specific inputs with the output. The hidden layers are called hidden layers for a reason; they are too difficult to interpret or make sense of.
The most simplistic type of neural network is called a perceptron. It derives its simplicity from the fact that it has only one layer through which data passes. The input layer leads to one classifying hidden layer, and the resulting prediction is a binary classification. Recall that when we refer to a classification technique as binary, that means it only sorts between two different classes, represented by 0 and 1.
The perceptron was first developed by Frank Rosenblatt. It’s a good idea to familiarize yourself with the perceptron if you’d like to learn more about neural networks. The perceptron uses the same process as other neural network models, but typically you’ll be working with more layers and more possible outputs. When data is received, the perceptron multiples the input by the weight they are given. Then the sum of all these values is plugged into the activation function. The activation function tells the input which category it falls into, in other words predicting the output.
If you were to look at the perceptron on a graph, its line would appear like this:
f(x)= 0 if 0>x, 1 if x≥0
The line of the graph of perception appears like a step, with two values, one on either side of the 1. These two sides of the step are the different classes that the model will predict based on the inputs. As you might be able to tell from the graph, it’s a bit crude because there is very little separate along the line between classes. Even a small change in some input variables will cause the predicted output to be a different class. It won’t perform as well outside of the original dataset that you use for training because it is a step function. An alternative to the perceptron is a model called a sigmoid neuron. The principle advantage of using the sigmoid neuron is that it is not binary. Unlike perceptron, which can classify data into two categories, the sigmoid function creates a probability rather than a classification. The image below shows the curve of a sigmoid neuron
Notice the shape of the curve around one, where the data is sorted with the perceptron; the step makes it difficult to classify data with just marginal differences. With the sigmoid neuron, the data is predicted by the probability that it falls into a given class. As you can see the line curves at one, which means that the probability that a data point falls into a given class increases after one, but it’s only a probability.