Pytorch quickstart ☄️: Some of my Pytorch notes

Hello everyone! Today I wanted to share with you all some of my Pytorch notes if you need a quickstart, enjoy!

PyTorch is an open-source Python-based deep learning library that has been the most widely used deep learning library for research since 2019 by a wide margin.

It is popular because it's user-friendly and efficient, yet still flexible enough for advanced users to customize and optimize models, it has a good balance between ease of use and powerful features.

It is a tensor library extending NumPy with GPU acceleration, an automatic differentiation engine (Autograd) for simplified gradient computation and model optimization, and a comprehensive deep learning library offering modular building blocks for designing and training a wide range of deep learning models for both researchers and developers.

Installation

To install PyTorch (Automatically detects default CPU/GPU):

pip install torch

Specific version for the tutorial:

pip install torch==2.4.1

Explicit CUDA/GPU version: on https://pytorch.org, select your OS and desired CUDA version, and then modify the generated command to include your torch version

Verify installation:

import torch
print(torch.__version__)

NVIDIA GPU recognition:

import torch
print(torch.cuda.is_available())

Apple Silicon GPU recognition:

import torch
print(torch.backends.mps.is_available())

Tensors

Tensors are a mathematical concept that extends scalars, vectors, and matrices to higher dimensions, with their "rank" indicating the number of dimensions (for example, a scalar is rank 0, a vector is rank 1, a matrix is rank 2) in computing, tensors act as multi-dimensional data containers, efficiently managed by libraries like PyTorch where tensors are similar to NumPy arrays but offer key advantages for deep learning, including an automatic differentiation engine for gradient computation and GPU support to accelerate neural network training, all while maintaining a familiar NumPy-like API.

We can create objects of PyTorch's Tensor class using the torch.tensor function as follows:

import torch

# create a 0D tensor (scalar) from a Python integer
tensor0d = torch.tensor(1)

# create a 1D tensor (vector) from a Python list
tensor1d = torch.tensor([1, 2, 3])

# create a 2D tensor from a nested Python list
tensor2d = torch.tensor([[1, 2], [3, 4]])

# create a 3D tensor from a nested Python list
tensor3d = torch.tensor([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

To check the data type of a tensor in Pytorch we use:

tensor1d = torch.tensor([1, 2, 3])
print(tensor1d.dtype)

And it should print something like:

torch.int64

In a Pytorch Tensor, we can use the .shape:

print(tensor2d.shape)

It would print something like:

torch.Size([2, 3])

Which means that the tensor has 2 rows and 3 columns. To reshape the tensor into a 3 by 2 tensor, we can use the .reshape method or more commonly .view() method:

tensor2d.reshape(3, 2)
tensor2d.view(3, 2)

Next, we can use .T to transpose a tensor, which means flipping it across its diagonal:

tensor2d.T

Lastly, the common way to multiply two matrices in PyTorch is the .matmul method:

tensor2d.matmul(tensor2d.T)

However, we can also adopt the @ operator, which accomplishes the same thing more compactly:

tensor2d @ tensor2d.T

Autograd

PyTorch's autograd engine automatically computes gradients using computational graphs.

A computational graph is:

A directed graph that visualizes mathematical expressions.
In deep learning, it maps out the steps a neural network takes to produce an output.
Crucial for backpropagation, the main training method for neural networks, as it allows us to calculate necessary gradients.

PyTorch builds internal computation graphs if requires_grad=True on a tensor.

These graphs enable gradient calculation, crucial for backpropagation (neural network training).

Backpropagation uses the chain rule to calculate how much each parameter contributes to the loss.

PyTorch's Autograd engine automatically handles this via:

grad() (manual, for specific tensors)
.backward() (automatic, computes gradients for all parameters).

Neural Networks

PyTorch makes it easy to define custom neural networks by subclassing torch.nn.Module.

We use init to define layers.

and forward() to define how data flows through the network.

Example:

class NeuralNetwork(torch.nn.Module):
    def __init__(self, num_inputs, num_outputs):
        super().__init__()
        self.layers = torch.nn.Sequential(
            torch.nn.Linear(num_inputs, 30),
            torch.nn.ReLU(),
            torch.nn.Linear(30, 20),
            torch.nn.ReLU(),
            torch.nn.Linear(20, num_outputs),
        )

    def forward(self, x):
        return self.layers(x)

And to output the model:

model = NeuralNetwork(50, 3)
print(model)

Model parameters:

# Count trainable parameters
num_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

# Access weights
print(model.layers[0].weight.shape)  # torch.Size([30, 50])

Set random seed to ensure reproducible weights:

torch.manual_seed(123)

Forward pass:

X = torch.rand((1, 50))
out = model(X)

Use torch.no_grad() to skip tracking gradients:

with torch.no_grad():
    out = model(X)

Apply softmax to get class probabilities:

probs = torch.softmax(out, dim=1)

Data Loading

Custom dataset class:

from torch.utils.data import Dataset

class ToyDataset(Dataset):
    def __init__(self, X, y):
        self.X, self.y = X, y

    def __getitem__(self, i):
        return self.X[i], self.y[i]

    def __len__(self):
        return len(self.y)

Create DataLoaders:

from torch.utils.data import DataLoader

train_ds = ToyDataset(X_train, y_train)
train_loader = DataLoader(train_ds, batch_size=2, shuffle=True, num_workers=0)

Iterate through batches:

for x, y in train_loader:
    print(x, y)

num_workers=0: Data loads in main process (slower but safer). >0: Faster for large datasets, but may cause issues in small datasets or Jupyter.

Training

Training the neural network:

model = NeuralNetwork(2, 2)
optimizer = torch.optim.SGD(model.parameters(), lr=0.5)

for epoch in range(3):
    model.train()
    for features, labels in train_loader:
        loss = F.cross_entropy(model(features), labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

cross_entropy = softmax + loss
zero_grad() prevents gradient buildup
train() sets model to training mode
After 3 epochs, loss ≈ 0 (model fits training data)

Evaluation

Making predictions:

model.eval()
with torch.no_grad():
    outputs = model(X_train)
    predictions = torch.argmax(outputs, dim=1)

eval() = evaluation mode
no_grad() saves memory
argmax gives predicted class
Use softmax for probabilities

Accuracy:

(predictions == y_train).sum()

Reusable accuracy function:

def compute_accuracy(model, loader):
    model.eval()
    correct = 0
    for X, y in loader:
        with torch.no_grad():
            preds = model(X).argmax(dim=1)
        correct += (preds == y).sum()
    return correct.item() / len(loader.dataset)

Saving and Loading

torch.save(model.state_dict(), "model.pth")

model = NeuralNetwork(2, 2)
model.load_state_dict(torch.load("model.pth", weights_only=True))

Distributed Training

Distributed Training with PyTorch's DistributedDataParallel (DDP):

Distributed training:

Speeds up training by splitting work across GPUs/machines.
Essential for large models and repeated training runs.

How DDP works:

Each GPU gets a copy of the model.
Data is split between GPUs (via DistributedSampler).
Gradients are synchronized across GPUs after each backward pass.

Conclusion

PyTorch is a flexible and powerful deep learning framework built around three key components: tensors, autograd, and neural network tools. It supports GPU acceleration, making it efficient for training large models. With tools like Dataset, DataLoader, and DistributedDataParallel, PyTorch simplifies everything from loading data to scaling training across multiple GPUs. Whether you're starting on a CPU or scaling to clusters, PyTorch makes it straightforward to build and train deep learning models efficiently.

Drop any question/feedback/request in comments! 🖊️