Goal

Understand the workings of an NN(heavily modified/reproduced from fastbook notbook)

Concrete Problem and Plan

Recognize 3's and 7's from mnist dataset using the following methods:

Start with a baseline
Write a Linear NN
Write a Linear NN using pytorch functions (nn)
Write a Linear NN using fastai functions
Write a non-linear NN
Write a non-linear NN using pytorch
Write a non-linear NN using fastai
Write a (DEEPER 18 layer) non-linear NN using fastai
compare all answers for epochs, lr and accuracy

Results upfront

Method	Epochs	Accuracy	Comment
Baseline	NA	96.6%
Own Linear NN	20	96.5%	Uses slightly different initial values than the rest
Linear NN with own SGD	20	98%
Linear NN with SGD	20	98%
Linear NN with fastai	20	98%
Own Non-Linear NN	20	98%
CNN 18 layer	1	99.7%

Gathering the Tensors

Stack tensors with list comprehension

def stack_tensors(paths):
    lcomp_tensors = [tensor(Image.open(o)) for o in paths]
    print (len(lcomp_tensors))
    return torch.stack(lcomp_tensors).float()/255

stacked_threes_tr = stack_tensors(threes_tr)
stacked_threes_vd = stack_tensors(threes_vd)
stacked_sevens_tr = stack_tensors(sevens_tr)
stacked_sevens_vd = stack_tensors(sevens_vd)

stacked_threes_tr.shape

6131
1010
6265
1028

torch.Size([6131, 28, 28])

Baseline

Get mean of 3 and 7 tensors per pixel across all images
check for each picture if the dstance (l1_norm or l2_norm) is closer to 3 or 7.

Mean of stacked tensors

mean_3_2d = stacked_threes_tr.mean((0))
mean_7_2d = stacked_sevens_tr.mean((0))
show_image(mean_3_2d), show_image(stacked_threes_tr[10])

(<AxesSubplot:>, <AxesSubplot:>)

Distance measurement

def l1_norm(a,b): return (a-b).abs().mean((-1,-2))
def l2_norm(a,b): return ((a-b)**2).mean((-1,-2)).sqrt()
# F.l1_loss(a_3.float(),mean7), F.mse_loss(a_3,mean7).sqrt() #pytorch functions

Is a tensor a 3 or 7?

def is_3(stacked_tensor,mean_3,mean_7): 
    return l2_norm(stacked_tensor, mean_3)<l2_norm(stacked_tensor,mean_7)

Checking the accuracy

accuracy_3s_1 = is_3(stacked_threes_vd,mean_3_2d,mean_7_2d).float().mean()
accuracy_3s_2 = 1-is_3(stacked_sevens_vd,mean_3_2d,mean_7_2d).float().mean()

print("Accuracy of prediction is: ",(accuracy_3s_1+accuracy_3s_2)/2)
## no need to do 7 separately

Accuracy of prediction is:  tensor(0.9661)

Result: 97% Accuracy.

Write your own Linear NN

Steps

initialize parameters (w,b)
Predicit with training vectors (X@w + b)
Calculate the loss (something that varies with tiny variation of parameters)
Calculate the gradient at parameter values (params.grad)
Make a step based on the gradient (w -= gradient(w) * lr)
Repeat n times!
Calculate Validation accuracy (No. of predictions matching Targets)

Squeeze into tuple

dset= list(zip(train_x,train_y))
valid_dset = list(zip(valid_x,valid_y))
x,y = dset[0]

x.shape,y,len(dset),type(dset[0])

(torch.Size([784]), tensor([1]), 12396, tuple)

Using Data Loaders to load into batches

dl = DataLoader(dset,batch_size=256,shuffle=False)
dl_vd = DataLoader(valid_dset,batch_size=256,shuffle=False) 
# Not sure why True Shuffle is not working even on the validation set. Somehow it gets shuffled wrong

xb,yb = first(dl_vd)
xb.shape, yb.shape

(torch.Size([256, 784]), torch.Size([256, 1]))

Step 1

def init_params() :
    w = torch.randn(28*28).requires_grad_()
    b = torch.randn(1).requires_grad_()
    return (w,b)

Step 2

def linear1(tens):
    return tens@w+b

Step 3

def mnist_loss(prediction,target):
    prediction = prediction.sigmoid()
    return torch.where(target==1,1-prediction,prediction).mean()
## Check what happenes if you use sum (guess: nothing different)

Step 4

def calc_grad(xb,yb,model):
    pred = model(xb)
    loss = mnist_loss(pred,yb)
    loss.backward()

Step 1-5

def train_epoch(dl,model,params):
    for xb,yb in dl:
        calc_grad(xb,yb,model)
        for p in params:
            # print(b,params[1])
            p.data-= p.grad*lr
            # print(b,params[1])
            p.grad.zero_()
            ## p.data is needed otherwise get leaf-variable error

Step 7

def batch_accuracy(xb, yb):
    preds = xb.sigmoid()
    correct = (preds>0.5) == yb
    return correct.float().mean()

def validate_epoch(dl,model):
    accs = [batch_accuracy(model(xb),yb) for xb,yb in dl]
    return round(torch.stack(accs).mean().item(), 4)

Setting parameters

lr=1
torch.manual_seed(0)
w,b = init_params()
params=w,b

torch.Size([784]) torch.Size([1]) tensor([-0.0198], requires_grad=True)

Training

for i in range(20):
    train_epoch(dl, linear1, params)
    print(validate_epoch(dl_vd, linear1), end=' ')

0.4932 0.8298 0.8219 0.8996 0.9232 0.9344 0.9432 0.9489 0.9537 0.9551 0.957 0.9585 0.9614 0.9619 0.9633 0.9643 0.9648 0.9648 0.9657 0.9657

Result: 96 percent with 20 epochs

Doing it with own BASIC OPTIM

SGD

class BasicOptim:
    def __init__(self,params,lr): self.params,self.lr = list(params),lr
        
    def step(self):
        for p in self.params: p.data -= p.grad *self.lr
            
    def zero_grad(self):
        for p in self.params: p.grad = None # p.grad.zero_() also OK.

def train_epoch(dl,model):
    for xb,yb in dl:
        calc_grad(xb,yb,model)
        opt.step()
        opt.zero_grad()

def train_model(dl,model, no_epochs):
    for i in range(no_epochs):
        train_epoch(dl,model)
        print(validate_epoch(dl_vd,linear2), end=' ')

Setting parameters

lr=1
torch.manual_seed(0)
linear2 = nn.Linear(28*28,1)
opt = BasicOptim(linear2.parameters(),lr)

train_model(dl, linear2,20)

0.4932 0.8843 0.814 0.9087 0.9336 0.9463 0.9555 0.9614 0.9663 0.9673 0.9697 0.9712 0.9741 0.9751 0.9761 0.977 0.9775 0.9775 0.9785 0.9785

Result: 98% with 20 epochs

Linear NN with built in SGD

def train_model(dl,model, no_epochs):
    for i in range(no_epochs):
        train_epoch(dl,model)
        print(validate_epoch(dl_vd,linear2), end=' ')

Setting parameters

lr=1
torch.manual_seed(0)
linear2 = nn.Linear(28*28,1)
opt = BasicOptim(linear2.parameters(),lr)

train_model(dl, linear2,20)

0.4932 0.8843 0.814 0.9087 0.9336 0.9463 0.9555 0.9614 0.9663 0.9673 0.9697 0.9712 0.9741 0.9751 0.9761 0.977 0.9775 0.9775 0.9785 0.9785

Result: 98% with 20 epochs

Linear NN with fastai

Everything in 3 lines

dls = DataLoaders(dl,dl_vd)

learn = Learner(dls, nn.Linear(28*28,1),opt_func=SGD, loss_func=mnist_loss,metrics=batch_accuracy)

learn.fit(10,lr=lr) ## part which is a for loop of training and validating

Result: 97% with 10 epochs

Adding a nonlinearity

Non-linear NN with RELU (Rectified Linear Unit)

def simple_net(xb): 
    res = xb@w1 + b1
    res = res.max(tensor(0.0))
    res = res@w2 + b2
    return res

Initializing parameters

def init_params2(size): return torch.randn(size).requires_grad_()

w1 = init_params2((28*28,30))
b1 = init_params2(1)
w2 = init_params2(30)
b2 = init_params2(1)

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastbook/__init__.py:55: UserWarning: Not providing a value for linspace's steps is deprecated and will throw a runtime error in a future release. This warning will appear only once per process. (Triggered internally at  /pytorch/aten/src/ATen/native/RangeFactories.cpp:23.)
  x = torch.linspace(min,max)

Non-linear NN with pytorch

simple_net2 = nn.Sequential(
    nn.Linear(28*28,30),
    nn.ReLU(),
    nn.Linear(30,1)
)

learn2 = Learner(dls, simple_net2,opt_func=SGD, loss_func=mnist_loss, metrics=batch_accuracy)

learn2.fit(20,0.1)

CNN with 18 layers with fastai

18 layer model #resnet18

dls = ImageDataLoaders.from_folder(path)
learn = cnn_learner(dls, resnet18, pretrained=False,
                    loss_func=F.cross_entropy, metrics=accuracy)
learn.fit_one_cycle(1, 0.1)

Result: 99.7% with 1 epoch

Results

Method	Epochs	Accuracy	comment
Baseline	NA	96.6%
Own Linear NN	20	96.5%	Uses different initial values than the rest
Linear NN with own SGD	20	98%
Linear NN with SGD	20	98%
Linear NN with fastai	20	98%
Own Non-Linear NN	20	98%
CNN 18 layer	1	99.7%

epoch	train_loss	valid_loss	batch_accuracy	time
0	0.637098	0.503163	0.495584	00:00
1	0.439934	0.228752	0.797350	00:00
2	0.164458	0.165872	0.850343	00:00
3	0.073856	0.101805	0.916094	00:00
4	0.040425	0.075480	0.933759	00:00
5	0.027307	0.060971	0.948479	00:00
6	0.021858	0.051846	0.957311	00:00
7	0.019390	0.045741	0.962709	00:00
8	0.018109	0.041433	0.965653	00:00
9	0.017324	0.038244	0.967125	00:00

epoch	train_loss	valid_loss	batch_accuracy	time
0	0.333909	0.405487	0.504416	00:00
1	0.153576	0.240681	0.790481	00:00
2	0.083833	0.117808	0.913150	00:00
3	0.054202	0.078306	0.941609	00:00
4	0.040586	0.060718	0.957311	00:00
5	0.033723	0.050975	0.964181	00:00
6	0.029842	0.044882	0.966143	00:00
7	0.027354	0.040743	0.968597	00:00
8	0.025581	0.037743	0.969578	00:00
9	0.024215	0.035456	0.970069	00:00
10	0.023111	0.033643	0.972031	00:00
11	0.022192	0.032156	0.972522	00:00
12	0.021408	0.030907	0.974975	00:00
13	0.020730	0.029837	0.975957	00:00
14	0.020136	0.028907	0.975957	00:00
15	0.019609	0.028090	0.977429	00:00
16	0.019138	0.027364	0.977920	00:00
17	0.018714	0.026714	0.977920	00:00
18	0.018328	0.026129	0.978410	00:00
19	0.017974	0.025598	0.978901	00:00