TensorFlow 2: Model Building with tf.keras
Tensorflow, which is a popular Deep Learning framework made by Google, has released it’s 2nd official version recently and one of its main features is the more compatible and robust implementation of its Keras API which is used to quickly and easily build neural networks for different tasks and train them. In this guide, we will go through the various features of tf.keras and how we can use it to build our neural networks.
Let’s first understand the basic components of Deep Learning models and then we can see how we recreate these using tf.keras. Every neural network has the following 3 parts:
- Input Layer
- Hidden Layers
- Output Layer
Input and Output Layers:
The Input and Output layers are at the very extremes of a neural network. We keep data in the input layer and the output layer will contain the value which we get once the input data has been processed by all the layers of the neural network.
In a neural network, there will be many layers between the input layer and the output layer. These layers contain weights and will perform computations on the input data until it reaches the output of the network. Each hidden layer has a few properties that we will go through now.
Number of neurons
The number of neurons/nodes in each layer is something that we, as the designer of the neural network, can decide. The chosen number will affect the number of weights this particular layer will have.
Every layer typically has an activation function. It is up to us to decide which activation function each layer has. Generally, we choose the same activation function for all the hidden layers and we choose an appropriate function for the output layer depending on the required output format. Most of the time we use ReLU for the activation function for all the hidden layers.
Sometimes, to prevent overfitting, we introduce regularization like weight regularization or dropout. The parameters of these will need to be defined by us when building the neural network.
Each type of layer will have it’s own parameters which we have to put. For example a convolutional layer will require kernel size, number of filters, etc.
Tensors and Operations
Essentially, a deep learning model consists of many layers that are connected in a particular fashion from the input layer until the output layer at the end of the network. Each of these layers performs a certain computation on the data it receives from the previous layers and then those values will be passed on to the next layers. This entire process can be represented by Tensors and Operations, which are the most basic TensorFlow elements used to define neural networks.
Tensors are multi-dimensional arrays containing numerical values of a certain datatype (int, float, etc). All numeric data in the neural network is represented with tensors. Each tensor can contain multiple dimensions and will have a fixed shape. In certain situations, the tensor will be empty and will take values that we input. This happens at the input layer tensor, it will initially be an empty tensor of a certain shape and it will have values that are passed to it. Then this tensor with values will be passed to the rest of the neural network. These empty tensors which we have to provide values for are called placeholder tensors.
This is how the tensors are represented in a simple 3 layer neural network. When we initialize the neural network, the weight matrix tensors will have randomly initialized values and while we train the model, these values will get updated.
Tensors only contain numerical values, but in a neural network, we need to perform some computation of the numbers to get the final output value of the neural network. We define these computations in the form of operations. Operations are simple, they take tensors as input, perform certain mathematical operations on these tensors, and then output a tensor. Operations are essentially the functions that connect all the tensors in a neural network so that it forms a long chain of computation from the input tensor to the output tensor.
In this example, each layer is essentially an operation that takes two input tensors: the weight tensor of that layer and the tensor containing the values of the previous layer. The operation will perform the necessary mathematical computation, which in this case is matrix multiplication of the two input tensors followed by an activation function. This calculation will result in another tensor, which is then outputted by the operation. This output tensor essentially contains the values of the current layer and this is also an input tensor to the next layer.
So essentially, in a neural network, we have several tensors containing the parameters (weights) of the network and several operations take the tensor provided from the input layer and perform computations till the end of the network. Now the name “TensorFlow” might make more sense because deep learning models are essentially a flow of tensors through operations from input to output.
Now we know how a deep learning model is represented in terms of tensors and operations. Luckily, we don’t have to deal with these directly when using tf.keras. With the Keras API, we can directly define the model layer by layer. A layer essentially contains a tensor which has its weights. Each layer takes a tensor value as an input, which is the tensor passed from the previous layer. The layer has an internal operation that performs a computation on the input tensor and its internal weight tensor. This resultant tensor is the output and is passed to the next layer.
Note: Not all layers contain an internal weight tensor. Some layers simply perform some mathematical operation or transformation on the input tensors. Eg: Reshape, Add, Concatenate, etc
So essentially, what we have to do when building a neural network with tf.keras is to define each layer and define how they are connected with the other layers. Each layer will take input from some previous layers except for the input layer. Let’s now look at how we define a layer in tf.keras.
Here is an example of how to initialize an input layer and a dense layer. The dense layer refers to the standard neural network layer.
In this example, we are only initializing the layers but we are not connecting them with other layers. We’ll now look into how to construct the model using the layers we create using tf.keras.
Keras model building
With tf.keras, there are 2 methods of building models:
- Functional API
If we have a simple model where each layer is sequentially connected from the input layer until the output layer, then we can use the sequential model. Basically in this model, there is a single chain of layers from the input to the output and there are no layers that have multiple inputs. This model also needs to have a single input and a single output.
Let’s see how to create this model using Sequential:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
model = Sequential() #Here we initiate the sequential model
#This is how we add a layer to the sequential model. Only the first layer requires an input_shape parameter
model.add(Dense(4, activation = 'relu', input_shape = (2,))
model.add(Dense(4, activation = 'relu'))
model.add(Dense(4, activation = 'relu'))
model.add(Dense(1, activation = 'sigmoid'))
#At the final layer we select the appropriate activation function, which in this case is sigmoid.
This is how we create a model using the Sequential model in tf.keras. It is a very simple and straightforward API and can be used to quickly build models that have a linear structure.
In the case of Sequential models, the process is simple:
- Initialize the sequential model.
- Add layers in the correct order by model.add()
Sometimes, we require more complex neural network architectures that may have multiple inputs/outputs or that may have multiple pathways from the input to the output. In such cases, we can use the functional API which is more flexible when creating deep learning models.
For using the functional API, we have to define each layer separately. While defining the layers, we can also define which previous layers’ it’s connected to. The syntax for doing so is as follows:
This is how we can create simple models with the functional API. As you can observe, when using this approach, we can overwrite layers by chaining them together. In this case, while defining the dense layers, we overwrite the ‘x’ variable with a new dense layer by connecting it to the previous dense layer (x). This is possible with functional API and make’s it easier to chain layers together.
Let’s look at a more complicated example of the Functional API. Let’s build a model that has two inputs and looks like this:
In this model, we have two inputs. Later on in the model, the layers which started from each input will be added together and then fed into further layers. Adding is a very common technique of merging layers in deep learning models, but we need to ensure that the shape of each layer is the same before adding. In this case, both layers have a shape of (3,1) so they are compatible. Let’s now look at how we can code this model using tf.keras functional API. (Add is a layer in the tf.keras API)
from tensorflow.keras.layers import Input, Dense, Add
from tensorflow.keras.models import Model
input_1 = Input((2,))
input_2 = Input((2,))
dense_branch_1 = Dense(3, activation = 'relu')(input_1)
dense_branch_1 = Dense(3, activation = 'relu')(dense_branch_1)
dense_branch_2 = Dense(3, activation = 'relu')(input_2)
dense_branch_2 = Dense(3, activation = 'relu')(dense_branch_2)
add_layer = Add()([dense_branch_1, dense_branch_2])
dense_main = Dense(3, activation = 'relu')(add_layer)
dense_main = Dense(3, activation = 'relu')(dense_main)
dense_main = Dense(3, activation = 'relu')(dense_main)
output_layer = Dense(1, activation = 'sigmoid')(dense_main)
model = Model([input_1, input_2], output_layer)
Keras model compiling
After we have made a model either using Sequential or Functional API, we then need to compile the model before we can start feeding the model some data to train on. By compiling a model we essentially are defining three things for the model:
- The loss function which it will use at the output layers
- The optimizer which will be used to train the model
- The metrics which will be used to evaluate the model
When using tf.keras, this process is done with a single line. Once we have made the model, we need to write the following line of code to compile it:
model.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
This is how we can compile a model after we have made it. If we have multiple output layers and we require a different loss function for each layer, then we can pass a list to the loss parameter in the same order as the list we pass for the output layers while making the model.
This explains the basics of how we can use tf.keras API in TensorFlow 2 to build and compile models. In the next few tutorials of TensorFlow 2, we will cover model training, data pipelining, and evaluation. Thanks for reading!