When it comes to implementing Deep Learning models, there are ample Open Source frameworks available. One of the popular frameworks – Torch, started as a scientific computing library in Lua programming language, but with the integration of CUDA C (for GPU computing) and ML Libraries, it soon became a standard ML Framework. Deep Mind was using Torch7 before moving on to Google’s TensorFlow framework. Facebook is one of the major users and contributors for Torch.
PyTorch’s creators say that they have a philosophy – they want to be imperative. This means that we run our computation immediately. This fits right into the python programming methodology, as we don’t have to wait for the whole code to be written before getting to know if it works or not. We can easily run a part of the code and inspect it in real time.
PyTorch is a python based library built to provide flexibility as a deep learning development platform. The workflow of PyTorch is as close as you can get to python’s scientific computing library – numpy.
Why would we use PyTorch to build deep learning models?
- Easy to use API – It is as simple as python can be.
- Python support – As mentioned above, PyTorch smoothly integrates with the python data science stack. It is so similar to numpy that you might not even notice the difference.
- Dynamic computation graphs – Instead of predefined graphs with specific functionalities, PyTorch provides a framework for us to build computational graphs as we go, and even change them during runtime. This is valuable for situations where we don’t know how much memory is going to be required for creating a neural network.
A few other advantages of using PyTorch are it’s multiGPU support, custom data loaders and simplified preprocessors.
Since its release in the start of January 2016, many researchers have adopted it as a go-to library because of its ease of building novel and even extremely complex graphs. Having said that, there is still some time before PyTorch is adopted by the majority of data science practitioners due to it’s new and “under construction” status.
PyTorch uses an imperative / eager paradigm. That is, each line of code required to build a graph defines a component of that graph. We can independently perform computations on these components itself, even before your graph is built completely. This is called “define-by-run” methodology.
Installing PyTorch is pretty easy. You can follow the steps mentioned in the official docs and run the command as per your system specifications.
The main elements we should get to know when starting out with PyTorch are:
- PyTorch Tensors
- Mathematical Operations
- Autograd module
- Optim module and
- nn module
Tensors are nothing but multidimensional arrays. Tensors in PyTorch are similar to numpy’s ndarrays, with the addition being that Tensors can also be used on a GPU. PyTorch supports various types of Tensors.
You can define a simple one dimensional matrix as below:
# import pytorch import torch # define a tensor torch.FloatTensor()
2 [torch.FloatTensor of size 1]
As with numpy, it is very crucial that a scientific computing library has efficient implementations of mathematical functions. PyTorch gives you a similar interface, with more than 200+ mathematical operations you can use.
Below is an example of a simple addition operation in PyTorch:
a = torch.FloatTensor() b = torch.FloatTensor() a + b
5 [torch.FloatTensor of size 1]
Doesn’t this look like a quinessential python approach? We can also perform various matrix operations on the PyTorch tensors we define. For example, we’ll transpose a two dimensional matrix:
matrix = torch.randn(3, 3) matrix 0.7162 1.0152 1.1525 -0.3503 -0.9452 -1.0861 -0.1093 -0.0927 -0.0476 [torch.FloatTensor of size 3x3] matrix.t() 0.7162 -0.3503 -0.1093 1.0152 -0.9452 -0.0927 1.1525 -1.0861 -0.0476 [torch.FloatTensor of size 3x3]
PyTorch uses a technique called automatic differentiation. That is, we have a recorder that records what operations we have performed, and then it replays it backward to compute our gradients. This technique is especially powerful when building neural networks, as we save time on one epoch by calculating differentiation of the parameters at the forward pass itself.
from torch.autograd import Variable x = Variable(train_x) y = Variable(train_y, requires_grad=False)
torch.optim is a module that implements various optimization algorithms used for building neural networks. Most of the commonly used methods are already supported, so that we don’t have to build them from scratch (unless you want to!).
Below is the code for using an Adam optimizer:
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
PyTorch autograd makes it easy to define computational graphs and take gradients, but raw autograd can be a bit too low-level for defining complex neural networks. This is where the nn module can help.
The nn package defines a set of modules, which we can think of as a neural network layer that produces output from input and may have some trainable weights.
You can consider a nn module as the keras of PyTorch!
import torch # define model model = torch.nn.Sequential( torch.nn.Linear(input_num_units, hidden_num_units), torch.nn.ReLU(), torch.nn.Linear(hidden_num_units, output_num_units), ) loss_fn = torch.nn.CrossEntropyLoss()
Now that you know the basic components of PyTorch, you can easily build your own neural network from scratch.
Building Neural Networks Dynamically –
All along, we have been implementing the AI models in a static way, i.e. the model is constructed with attributes of the data incorporated into it. Therefore, once a model is built, it can only be reused. Changing the behavior of the network requires a complete rebuilding of the model from scratch. This is how the popular ML frameworks like TensorFlow, Theano, Caffe, CNTK work.
But there has been a growing interest in constructing neural networks dynamically. i.e. during the runtime. It should not be confused with Dynamic Neural Networks. PyTorch and few other frameworks like DyNet offer this feature. Of course, building Dynamic Neural Networks like Recursive Neural Networks becomes much easier in these frameworks. Constructing NNs dynamically offers a good advantage. One can use an instance of a network for learning a particular structure of the input and deploy multiple such instances to form a complete model.
For example, in Natural Language Processing, the question “What is the color of the object right of the cat?” has so many levels of understanding – identifying the cat, moving right, identifying the color of that object. Different questions have varying levels and a Network built dynamically would be the most appropriate for such a task.
PyTorch dynamically builds the network using a technique called Reverse-Mode Auto-Differentiation, but it is beyond the scope of this post. The supremacy of PyTorch is that it performs the optimization tasks faster and makes the models maximally memory efficient compared to other ML libraries providing the same features.
The Dynamic view of Neural Networks is the future. By being one of the first ML frameworks to bring in this feature, PyTorch would surely make a lot of researchers shift to it. What are your thoughts about Dynamic Neural Networks? Give PyTorch a try and share your experience with us.