Goal

Understand the workings of an NN(heavily modified/reproduced from fastbook notbook)

Concrete Problem and Plan

Recognize 3's and 7's from mnist dataset using the following methods:

  • Start with a baseline
  • Write a Linear NN
  • Write a Linear NN using pytorch functions (nn)
  • Write a Linear NN using fastai functions
  • Write a non-linear NN
  • Write a non-linear NN using pytorch
  • Write a non-linear NN using fastai
  • Write a (DEEPER 18 layer) non-linear NN using fastai
  • compare all answers for epochs, lr and accuracy

Results upfront

Method Epochs Accuracy Comment
Baseline NA 96.6%
Own Linear NN 20 96.5% Uses slightly different initial values than the rest
Linear NN with own SGD 20 98%
Linear NN with SGD 20 98%
Linear NN with fastai 20 98%
Own Non-Linear NN 20 98%
CNN 18 layer 1 99.7%

Gathering the Tensors

Stack tensors with list comprehension

def stack_tensors(paths):
    lcomp_tensors = [tensor(Image.open(o)) for o in paths]
    print (len(lcomp_tensors))
    return torch.stack(lcomp_tensors).float()/255
stacked_threes_tr = stack_tensors(threes_tr)
stacked_threes_vd = stack_tensors(threes_vd)
stacked_sevens_tr = stack_tensors(sevens_tr)
stacked_sevens_vd = stack_tensors(sevens_vd)

stacked_threes_tr.shape
6131
1010
6265
1028
torch.Size([6131, 28, 28])

Baseline

  • Get mean of 3 and 7 tensors per pixel across all images
  • check for each picture if the dstance (l1_norm or l2_norm) is closer to 3 or 7.

Mean of stacked tensors

mean_3_2d = stacked_threes_tr.mean((0))
mean_7_2d = stacked_sevens_tr.mean((0))
show_image(mean_3_2d), show_image(stacked_threes_tr[10])
(<AxesSubplot:>, <AxesSubplot:>)

Distance measurement

def l1_norm(a,b): return (a-b).abs().mean((-1,-2))
def l2_norm(a,b): return ((a-b)**2).mean((-1,-2)).sqrt()
# F.l1_loss(a_3.float(),mean7), F.mse_loss(a_3,mean7).sqrt() #pytorch functions

Is a tensor a 3 or 7?

def is_3(stacked_tensor,mean_3,mean_7): 
    return l2_norm(stacked_tensor, mean_3)<l2_norm(stacked_tensor,mean_7)

Checking the accuracy

accuracy_3s_1 = is_3(stacked_threes_vd,mean_3_2d,mean_7_2d).float().mean()
accuracy_3s_2 = 1-is_3(stacked_sevens_vd,mean_3_2d,mean_7_2d).float().mean()

print("Accuracy of prediction is: ",(accuracy_3s_1+accuracy_3s_2)/2)
## no need to do 7 separately
Accuracy of prediction is:  tensor(0.9661)

Result: 97% Accuracy.


Write your own Linear NN

<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"> G init init predict predict init->predict loss loss predict->loss gradient gradient loss->gradient step step gradient->step step->predict repeat stop stop step->stop

Steps

  1. initialize parameters (w,b)
  2. Predicit with training vectors (X@w + b)
  3. Calculate the loss (something that varies with tiny variation of parameters)
  4. Calculate the gradient at parameter values (params.grad)
  5. Make a step based on the gradient (w -= gradient(w) * lr)
  6. Repeat n times!
  7. Calculate Validation accuracy (No. of predictions matching Targets)

Squeeze into tuple

dset= list(zip(train_x,train_y))
valid_dset = list(zip(valid_x,valid_y))
x,y = dset[0]

x.shape,y,len(dset),type(dset[0])
(torch.Size([784]), tensor([1]), 12396, tuple)

Using Data Loaders to load into batches

dl = DataLoader(dset,batch_size=256,shuffle=False)
dl_vd = DataLoader(valid_dset,batch_size=256,shuffle=False) 
# Not sure why True Shuffle is not working even on the validation set. Somehow it gets shuffled wrong

xb,yb = first(dl_vd)
xb.shape, yb.shape
(torch.Size([256, 784]), torch.Size([256, 1]))

Step 1

def init_params() :
    w = torch.randn(28*28).requires_grad_()
    b = torch.randn(1).requires_grad_()
    return (w,b)

Step 2

def linear1(tens):
    return tens@w+b   

Step 3

def mnist_loss(prediction,target):
    prediction = prediction.sigmoid()
    return torch.where(target==1,1-prediction,prediction).mean()
## Check what happenes if you use sum (guess: nothing different)

Step 4

def calc_grad(xb,yb,model):
    pred = model(xb)
    loss = mnist_loss(pred,yb)
    loss.backward()

Step 1-5

def train_epoch(dl,model,params):
    for xb,yb in dl:
        calc_grad(xb,yb,model)
        for p in params:
            # print(b,params[1])
            p.data-= p.grad*lr
            # print(b,params[1])
            p.grad.zero_()
            ## p.data is needed otherwise get leaf-variable error

Step 7

def batch_accuracy(xb, yb):
    preds = xb.sigmoid()
    correct = (preds>0.5) == yb
    return correct.float().mean()
def validate_epoch(dl,model):
    accs = [batch_accuracy(model(xb),yb) for xb,yb in dl]
    return round(torch.stack(accs).mean().item(), 4)

    

Setting parameters

lr=1
torch.manual_seed(0)
w,b = init_params()
params=w,b
torch.Size([784]) torch.Size([1]) tensor([-0.0198], requires_grad=True)

Training

for i in range(20):
    train_epoch(dl, linear1, params)
    print(validate_epoch(dl_vd, linear1), end=' ')
0.4932 0.8298 0.8219 0.8996 0.9232 0.9344 0.9432 0.9489 0.9537 0.9551 0.957 0.9585 0.9614 0.9619 0.9633 0.9643 0.9648 0.9648 0.9657 0.9657 

Result: 96 percent with 20 epochs


Doing it with own BASIC OPTIM

SGD

class BasicOptim:
    def __init__(self,params,lr): self.params,self.lr = list(params),lr
        
    def step(self):
        for p in self.params: p.data -= p.grad *self.lr
            
    def zero_grad(self):
        for p in self.params: p.grad = None # p.grad.zero_() also OK.
def train_epoch(dl,model):
    for xb,yb in dl:
        calc_grad(xb,yb,model)
        opt.step()
        opt.zero_grad()
    
def train_model(dl,model, no_epochs):
    for i in range(no_epochs):
        train_epoch(dl,model)
        print(validate_epoch(dl_vd,linear2), end=' ')
    

Setting parameters

lr=1
torch.manual_seed(0)
linear2 = nn.Linear(28*28,1)
opt = BasicOptim(linear2.parameters(),lr)
train_model(dl, linear2,20)
0.4932 0.8843 0.814 0.9087 0.9336 0.9463 0.9555 0.9614 0.9663 0.9673 0.9697 0.9712 0.9741 0.9751 0.9761 0.977 0.9775 0.9775 0.9785 0.9785 

Result: 98% with 20 epochs


Linear NN with built in SGD

def train_model(dl,model, no_epochs):
    for i in range(no_epochs):
        train_epoch(dl,model)
        print(validate_epoch(dl_vd,linear2), end=' ')
    

Setting parameters

lr=1
torch.manual_seed(0)
linear2 = nn.Linear(28*28,1)
opt = BasicOptim(linear2.parameters(),lr)
train_model(dl, linear2,20)
0.4932 0.8843 0.814 0.9087 0.9336 0.9463 0.9555 0.9614 0.9663 0.9673 0.9697 0.9712 0.9741 0.9751 0.9761 0.977 0.9775 0.9775 0.9785 0.9785 

Result: 98% with 20 epochs


Linear NN with fastai

Everything in 3 lines

dls = DataLoaders(dl,dl_vd)

learn = Learner(dls, nn.Linear(28*28,1),opt_func=SGD, loss_func=mnist_loss,metrics=batch_accuracy)

learn.fit(10,lr=lr) ## part which is a for loop of training and validating
epoch train_loss valid_loss batch_accuracy time
0 0.637098 0.503163 0.495584 00:00
1 0.439934 0.228752 0.797350 00:00
2 0.164458 0.165872 0.850343 00:00
3 0.073856 0.101805 0.916094 00:00
4 0.040425 0.075480 0.933759 00:00
5 0.027307 0.060971 0.948479 00:00
6 0.021858 0.051846 0.957311 00:00
7 0.019390 0.045741 0.962709 00:00
8 0.018109 0.041433 0.965653 00:00
9 0.017324 0.038244 0.967125 00:00

Result: 97% with 10 epochs


Adding a nonlinearity

Non-linear NN with RELU (Rectified Linear Unit)

def simple_net(xb): 
    res = xb@w1 + b1
    res = res.max(tensor(0.0))
    res = res@w2 + b2
    return res

Initializing parameters

def init_params2(size): return torch.randn(size).requires_grad_()
w1 = init_params2((28*28,30))
b1 = init_params2(1)
w2 = init_params2(30)
b2 = init_params2(1)
/opt/conda/envs/fastai/lib/python3.8/site-packages/fastbook/__init__.py:55: UserWarning: Not providing a value for linspace's steps is deprecated and will throw a runtime error in a future release. This warning will appear only once per process. (Triggered internally at  /pytorch/aten/src/ATen/native/RangeFactories.cpp:23.)
  x = torch.linspace(min,max)

Non-linear NN with pytorch

simple_net2 = nn.Sequential(
    nn.Linear(28*28,30),
    nn.ReLU(),
    nn.Linear(30,1)
)
learn2 = Learner(dls, simple_net2,opt_func=SGD, loss_func=mnist_loss, metrics=batch_accuracy)

learn2.fit(20,0.1)
epoch train_loss valid_loss batch_accuracy time
0 0.333909 0.405487 0.504416 00:00
1 0.153576 0.240681 0.790481 00:00
2 0.083833 0.117808 0.913150 00:00
3 0.054202 0.078306 0.941609 00:00
4 0.040586 0.060718 0.957311 00:00
5 0.033723 0.050975 0.964181 00:00
6 0.029842 0.044882 0.966143 00:00
7 0.027354 0.040743 0.968597 00:00
8 0.025581 0.037743 0.969578 00:00
9 0.024215 0.035456 0.970069 00:00
10 0.023111 0.033643 0.972031 00:00
11 0.022192 0.032156 0.972522 00:00
12 0.021408 0.030907 0.974975 00:00
13 0.020730 0.029837 0.975957 00:00
14 0.020136 0.028907 0.975957 00:00
15 0.019609 0.028090 0.977429 00:00
16 0.019138 0.027364 0.977920 00:00
17 0.018714 0.026714 0.977920 00:00
18 0.018328 0.026129 0.978410 00:00
19 0.017974 0.025598 0.978901 00:00

CNN with 18 layers with fastai

18 layer model #resnet18

dls = ImageDataLoaders.from_folder(path)
learn = cnn_learner(dls, resnet18, pretrained=False,
                    loss_func=F.cross_entropy, metrics=accuracy)
learn.fit_one_cycle(1, 0.1)
epoch train_loss valid_loss accuracy time
0 0.068580 0.012664 0.995584 00:08

Result: 99.7% with 1 epoch

Results

Method Epochs Accuracy comment
Baseline NA 96.6%
Own Linear NN 20 96.5% Uses different initial values than the rest
Linear NN with own SGD 20 98%
Linear NN with SGD 20 98%
Linear NN with fastai 20 98%
Own Non-Linear NN 20 98%
CNN 18 layer 1 99.7%