# Introduction to Neural Networks — Part 1

Neural Networks have become a huge hit in the recent Machine Learning craze due to their significantly better performance than traditional Machine Learning algorithms in many cases. The art and science of **Deep Learning **is built on the foundation of Neural Networks and how they work. Hence demystifying Neural Networks is going to be the first step in demystifying Deep Learning. Let’s dive in!

### What is a Neural Network?

How do we define a Neural Network? It is essentially a naive implementation of how our brains might work. It’s not a very accurate representation but it tries to replicate some of the methods our brain uses to learn from it’s mistakes. Let’s look at how our brains work from a simplified perspective and then compare it with a Neural Network.

The brain is essentially a bunch of neurons connected to each other in a huge interconnected network. There are a lot of neurons and even more connections. These neurons pass a small amount of electrical charge to each other as a way to transmit information. Another important feature of these neural connections is that the connection between two neurons can be vary between **strong **and **weak.** A strong connection allows more charge to flow between them and a weak one allows lesser. A neuron pathway which frequently transmits charge will eventually become a **strong pathway.**

Now as the brain takes input from any external source, let’s say for example we touch a hot pan. The nerves from our hand transmits info to certain neurons in our brain. Now there is a pathway from these neurons to the neurons which control our hand. And in these cases our brain has **learnt** that the best option is to move our hand from the pan ASAP. Hence this certain **pathway** between the neurons taking input from the hand and the neurons controlling the hand will be **strong.**

Neural pathways become **stronger **upon frequent usage, and our brain essentially tries to use pathways which have proven to give us better results over time. So essentially as we humans live our lives and decide whether our actions are good or bad, we are training our brain to make sure we don’t repeat our previous mistakes or keep doing things which we think resulted in a good outcome. This is a highly simplified explanation and doesn’t fully portray what’s going on, but hopefully it helps you understand the basic concept.

### Functionality of a Neural Network

Now let’s understand how a Neural Network is represented. A neural network consists of many **Nodes **(Neurons) in many **layers. **Each layer can have *any number* of nodes and a neural network can have *any number* of layers. Let’s have a closer look at a couple of layers.

Now as you can see, there are many interconnections between both the layers. These interconnections exist between **each node** in the first layer with **each and every node** in the second layer. These are also called the **weights** between two layers.

Now let’s see how exactly these weights function.

Here we take the example of what’s going on with a **single node** in the network. Here we are considering all the values from the **previous layer** connecting to **one node in the next layer**.

Y is thefinal valueof the node.

W represents theweightsbetween the nodes in the previous layer and the output node.

X represents thevalues of the nodesof the previous layer.

B representsbias, which is an additional value present for each neuron. Bias is essentially a weight without an input term. It’s useful for having anextra bit of adjustabilitywhich is not dependant on previous layer.

H is theintermediate node value. This is not the final value of the node.

f( ) is called anActivation Functionand it is something we can choose. We will go through it’s importance later.

So finally, the output value of this node will be **f(0.57)**

Now let’s look at the calculations between two complete layers:

The weights in this case have been colour coded for easier understanding. We can represent the entire calculation as a matrix multiplication. If we represent the weights corresponding to each input node as vectors and arrange them horizontally, we can form a matrix, this is called the **weight matrix. **Now we can multiply the weight matrix with the input vector and then add the bias vector to get the intermediate node values.

We can summarize the entire calculation as **Y = f(W*X + B)**. Here, Y is the output vector, X is the input vector, W represents the weight matrix between the two layers and B is the bias vector.

We can determine the size of the weight matrix by looking at the number of input nodes and output nodes. An M*N weight matrix means that it is between two layers with the **first layer** having **N nodes** and the **second layer** having **M nodes**.

Now let’s look at a complete neural network.

This is a small neural network of four layers. The input layer is where we feed our **external stimulus**, or basically the **data** from which our neural network has to **learn from**. The output layer is where we are supposed to get the target value, this represents what exactly our neural network is trying to *predict *or *learn**.* All layers in between are called **hidden layers. **When we feed the inputs into the first layer, the values of the nodes will be calculated layer by layer using the matrix multiplications and activation functions till we get the final values at the output layer. That is how we get an **output** from a neural network.

So essentially a neural network is, simply put, a series of matrix multiplications and activation functions. When we input a vector containing the input data, the data is multiplied with the sequence of weight matrices and subjected to activation functions till it reaches the output layer, which contains the **predictions **of the neural network corresponding to that particular input.

### Role of Activation Function

Even though our neural network has a very complex configuration of weights, it will not be able to solve a problem without the activation function. The reason for this lies in the concept of **Non Linearity.**

Let’s revise what linearity and non linearity means.

The above equation represents a **linear relationship **between Y and X1,X2. Regardless of what values W1 and W2 have, at the end of the day the change of value of X1 and X2 will result in a **linear **change in Y. Now if we look at real world data we realize this is actually not desirable because data often has **non linear **relationships between the input and output variables.

The above diagram represents a typical dataset which shows a non-linear relationship between X and Y. If we try to fit a linear relationship on the data, we will end up with the **red line, **which is not a very accurate representation of the data. However if our relationship can be **non linear**, we are able to get the green line, which is much better.

Now let’s compare the neural network equation **with and without the activation function.**

We can observe that in this equation, there exists a **linear relationship **between the input and the output. However in the case of the equation **with activation function**, we can say that the relationship between input and output can be non linear, IF the activation function is** itself non linear**. Hence all we have to do is keep some non linear function as the activation function for each neuron and our neural network is now **capable** of fitting on non linear data.

Let’s look at a couple of popular activation functions:

**ReLU: **ReLU stands for Rectified Linear Unit. It essentially becomes an identity function (y = x) when x ≥ 0 and becomes 0 when x < 0. This is a very widely used activation function because its a nonlinear function and it is very simple.

**Sigmoid: **Sigmoid is essentially a function bounded between 0 and 1. It will become 0 for values which are very negative and 1 for values which are very positive. Hence this function *squishes *values which are very high or very low to values between 0 and 1. This is useful in neural networks sometimes to ensure values aren’t extremely high or low. This function is usually used at the last layer when we need values which are binary (0 or 1).

This concludes this part of the tutorial. The next part will explain in detail how exactly we can use our data to train our neural network. Thank you for reading!