Building Complex AI Algorithms from scratch
- Lets talk about basic SGD in Pytorch
- 2) A simple Logistic Regression Class
- 3) Experimenting with MNIST :
def f(x): return x**2
xt = tensor(3.).requires_grad_()
Notice the special method requires_grad_? That's the magical incantation we use to tell PyTorch that we want to calculate gradients with respect to that variable at that value. It is essentially tagging the variable, so PyTorch will remember to keep track of how to compute gradients of the other, direct calculations on it that you will ask for.
yt = f(xt)
yt.backward()
The "backward" here refers to backpropagation, which is the name given to the process of calculating the derivative of each layer (in case of neural network).
xt.grad #(This grad arrtibute accumulates the gradient at the variable x and keeps adding them consecutively unless you zero out the recent grads by using **zero_grad** arrtibute.
xt.grad.zero_()
Now we can try with a vector :
def f(x): return (x**2).sum()
def calc_grad(x,f):
y=f(x)
y.backward()
print(x.grad)
x.grad.zero_()
calc_grad(tensor([3.,4.,5.],requires_grad=True),f)
As we can see, we got right results! One thing is to notice that tensors always accept floating points and the grad can be implicitly created only for scalar outputs therefore the function f must return a scalar thats why we did apply .sum().
Lets Take Another End-To-End Example of SGD
time = torch.arange(20).float()
time
speed = torch.randn(20)*3 + 0.75*(time-9.5)**2 + 1
plt.scatter(time,speed);
def f(t, params):
a,b,c=params
return a*(t**2) + b*t + c
params = torch.randn(3).requires_grad_()
orig_params = params.clone() # Clone method will also copy autograd func.
orig_params
preds = f(time, params)
Let's create a little function to see how close our predictions are to our targets, and take a look:
def show_preds(preds, ax=None):
if ax is None: ax=plt.subplots()[1]
ax.scatter(time, speed)
ax.scatter(time, (preds).detach().numpy(), color='red')
ax.set_ylim(-300,100)
show_preds(preds)
loss = F.mse_loss(preds,speed)
loss
Our Goal is to now improve this loss. The next step is to calculate the gradients.
loss.backward() # .backward() method is applied on the variable of which we want to calculate gradient.
params.grad # gradient is calculated w.r.t params ->(a,b,c)
lr = 1e-5
params.grad * lr
params.data -= params.grad.data*lr
params.grad = None
Understanding this bit depends on remembering recent history. To calculate the gradients we call backward on the loss. But this loss was itself calculated by mse, which in turn took preds as an input, which was calculated using f taking as an input params, which was the object on which we originally called requiredgrads—which is the original call that now allows us to call backward on loss. This chain of function calls represents the mathematical composition of functions, which enables PyTorch to use calculus's chain rule under the hood to calculate these gradients.
preds = f(time,params)
F.mse_loss(preds,speed)
show_preds(preds)
def apply_step(params,prn=True):
preds = f(time,params)
loss = F.mse_loss(preds,speed)
loss.backward()
params.data -= params.grad.data*1e-5
params.grad = None
if prn: print(loss.item()) #In this case our Metric is same as loss function
return preds
for i in range(10): apply_step(params)
We just decided to stop after 10 epochs arbitrarily. In practice, we would watch the training and validation losses and our metrics to decide when to stop
params = torch.randn(3).requires_grad_()
_,axs = plt.subplots(1,8,figsize=(24,3))
for ax in axs: show_preds(apply_step(params, False), ax)
plt.tight_layout()
In 8 epochs, from left to right we can see that how our predictions are improving with a constant lr=1e-5
class LogisticRegression():
def __init__(self) : pass
def get_params(self,size) -> torch.LongTensor: return torch.randn(size).requires_grad_()
def sigmoid(self,x) : return 1/(1+torch.exp(-x))
def mse_loss(self,predictions,targets) -> torch.LongTensor: return torch.where(targets==1, 1-predictions, predictions).mean()
def fit(self,x,y,epochs,bs,lr,trim=False):
dset = list(zip(x,y))
n_batches = int(len(dset)/bs)
if (trim==False) : n_mini_batches= n_batches+int(len(dset)%bs)
weights = self.get_params(x.shape[1])
bias = self.get_params(1)
for e in range(epochs):
for i in range(n_mini_batches):
if(i == n_mini_batches-1) : xb = x[i*bs:]
else : xb = x[i*bs:(i+1)*bs]
preds = self.sigmoid(x@weights + bias)
loss = self.mse_loss(preds,y)
loss.backward()
weights.data -= weights.grad.data*lr
bias.data -= bias.grad.data*lr
weights.grad = None
bias.grad = None
print(f"Epoch_{e}_accuracy = ",((preds>=0.5)==y).float().mean().item())
def predict(self,x):
return x*weights + bias
(path/'train').ls()
(path/'valid').ls()
train_x = torch.cat([train_3_tens, train_7_tens]).view(-1, 28*28)
valid_x = torch.cat([valid_3_tens, valid_7_tens]).view(-1, 28*28)
train_y = tensor([1]*len(train_3_tens) + [0]*len(train_7_tens)).unsqueeze(1)
valid_y = tensor([1]*len(valid_3_tens) + [0]*len(valid_7_tens)).unsqueeze(1)
train_x.shape, valid_x.shape
m = LogisticRegression()
m.fit(x=train_x,y=train_y,bs=100,lr=1e-5,epochs=10)
Putting it Altogether - Using Fastai Dataloaders !
train_dset = list(zip(train_x,train_y))
valid_dset = list(zip(valid_x,valid_y))
- DataLoader - An Iterable/List of Minibatches where each minibatch is of the form $tuple(tensor(x_1,x_2,...,x_b), (y_1,y_2,...,y_b))$.