Deep Learning from the Foundations Lesson 3 Part1

images

In the beginning we recap some lesson 2 concepts (Callbacks, Variants, under special methods) :

Just our imports :

#collapse
%load_ext autoreload
%autoreload 2

%matplotlib inline
The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload

#collapse
import torch
import matplotlib.pyplot as plt

Callbacks

Callbacks as GUI events

#collapse
import ipywidgets as widgets

#collapse_show
def f(o): print('hi')

From the ipywidget docs:

  • the button widget is used to handle mouse clicks. The on_click method of the Button can be used to register function to be called when the button is clicked

#collapse_show
w = widgets.Button(description='Click me')

#collapse
w

Now that we created this button we can pass it a function that will execute when pushing the button. It's a callback using a function pointer !

#collapse_show
w.on_click(f)

NB: When callbacks are used in this way they are often called "events".

Did you know what you can create interactive apps in Jupyter with these widgets? Here's an example from plotly:

Creating your own callback

#collapse
from time import sleep

We create a dummy calculation function to show this concept :

#collapse_show
def slow_calculation(cb=None):
    res = 0
    for i in range(5):
        res += i*i
        sleep(1)
        if cb: cb(i)
    return res

#collapse_show
def show_progress(epoch):
    print(f"Awesome! We've finished epoch {epoch}!")

We can now use this show_progress fct to use it as a callback !

#collapse_show
slow_calculation(show_progress)
Awesome! We've finished epoch 0!
Awesome! We've finished epoch 1!
Awesome! We've finished epoch 2!
Awesome! We've finished epoch 3!
Awesome! We've finished epoch 4!
30

Lambdas and partials

We can also define the function with Lambda and use it right away !

#collapse_show
slow_calculation(lambda o: print(f"Awesome! We've finished epoch {o}!"))
Awesome! We've finished epoch 0!
Awesome! We've finished epoch 1!
Awesome! We've finished epoch 2!
Awesome! We've finished epoch 3!
Awesome! We've finished epoch 4!
30

#collapse_show
def show_progress(exclamation, epoch):
    print(f"{exclamation}! We've finished epoch {epoch}!")

The above function can not be passed, because it uses 2 arguments. So we use Lambda to fix this :

#collapse_show
slow_calculation(lambda o: show_progress("OK I guess", o))
OK I guess! We've finished epoch 0!
OK I guess! We've finished epoch 1!
OK I guess! We've finished epoch 2!
OK I guess! We've finished epoch 3!
OK I guess! We've finished epoch 4!
30

It's better to do it like this, where we pass the function with the exclamation :

#collapse_show
import torch.nn 
def make_show_progress(exclamation):
    # Leading "_" is generally understood to be "private"
    def _inner(epoch): print(f"{exclamation}! We've finished epoch {epoch}!")
    return _inner
def logsumexp(x):
    m = x.max(-1)[0]
    print('m shape : {}'.format(m.shape))
    return m + (x-m[:,None]).exp().sum(-1).log()

def log_softmax(x): 
    return x - x.logsumexp(-1,keepdim=True)
sm_pred = [10,11]
sm_pred = torch.FloatTensor(sm_pred)
a = log_softmax(sm_pred)
print(a)
tensor([-1.3133, -0.3133])
def progress(epochs,cb=None): 
    
    for i in range(epochs): 
        res = torch.FloatTensor([i,i+1])
        
        if cb : 
            cb(i,res)
    return res
def show_random_loss(): 
    def _inner(epoch,loss): 
        print("Loss",loss)
        a = log_softmax(loss)
        print("log-soft",a)
        print(f"The loss in epoch {epoch} is : {log_softmax(loss)}")
    return _inner
progress(10,show_random_loss())
Loss tensor([0., 1.])
log-soft tensor([-1.3133, -0.3133])
The loss in epoch 0 is : tensor([-1.3133, -0.3133])
Loss tensor([1., 2.])
log-soft tensor([-1.3133, -0.3133])
The loss in epoch 1 is : tensor([-1.3133, -0.3133])
Loss tensor([2., 3.])
log-soft tensor([-1.3133, -0.3133])
The loss in epoch 2 is : tensor([-1.3133, -0.3133])
Loss tensor([3., 4.])
log-soft tensor([-1.3133, -0.3133])
The loss in epoch 3 is : tensor([-1.3133, -0.3133])
Loss tensor([4., 5.])
log-soft tensor([-1.3133, -0.3133])
The loss in epoch 4 is : tensor([-1.3133, -0.3133])
Loss tensor([5., 6.])
log-soft tensor([-1.3133, -0.3133])
The loss in epoch 5 is : tensor([-1.3133, -0.3133])
Loss tensor([6., 7.])
log-soft tensor([-1.3133, -0.3133])
The loss in epoch 6 is : tensor([-1.3133, -0.3133])
Loss tensor([7., 8.])
log-soft tensor([-1.3133, -0.3133])
The loss in epoch 7 is : tensor([-1.3133, -0.3133])
Loss tensor([8., 9.])
log-soft tensor([-1.3133, -0.3133])
The loss in epoch 8 is : tensor([-1.3133, -0.3133])
Loss tensor([ 9., 10.])
log-soft tensor([-1.3133, -0.3133])
The loss in epoch 9 is : tensor([-1.3133, -0.3133])
tensor([ 9., 10.])
slow_calculation(show_random_loss(log_softmax(sm_pred)))
The loss in epoch 0 is : tensor([-435.,    0., -430.])
The loss in epoch 1 is : tensor([-435.,    0., -430.])
The loss in epoch 2 is : tensor([-435.,    0., -430.])
The loss in epoch 3 is : tensor([-435.,    0., -430.])
The loss in epoch 4 is : tensor([-435.,    0., -430.])
30

#collapse_show
slow_calculation(make_show_progress("Nice!"))
Nice!! We've finished epoch 0!
Nice!! We've finished epoch 1!
Nice!! We've finished epoch 2!
Nice!! We've finished epoch 3!
Nice!! We've finished epoch 4!
30

Obviously we can also do it like this with f2 containing the closure :

#collapse_show
f2 = make_show_progress("Terrific")

#collapse
slow_calculation(f2)
Terrific! We've finished epoch 0!
Terrific! We've finished epoch 1!
Terrific! We've finished epoch 2!
Terrific! We've finished epoch 3!
Terrific! We've finished epoch 4!
30

#collapse
slow_calculation(make_show_progress("Amazing"))
Amazing! We've finished epoch 0!
Amazing! We've finished epoch 1!
Amazing! We've finished epoch 2!
Amazing! We've finished epoch 3!
Amazing! We've finished epoch 4!
30

#collapse
from functools import partial

We can also use partial to use the argmuent passed always as given. This means we can only pass the epoch argument now. We can also store it in f2 again like before.

#collapse_show
slow_calculation(partial(show_progress, "OK I guess"))
OK I guess! We've finished epoch 0!
OK I guess! We've finished epoch 1!
OK I guess! We've finished epoch 2!
OK I guess! We've finished epoch 3!
OK I guess! We've finished epoch 4!
30

#collapse
f2 = partial(show_progress, "OK I guess")

Callbacks as callable classes

Most of the time we want to use a class, so we store our exclamation and epoch args in class attributes :

#collapse_show
class ProgressShowingCallback():
    def __init__(self, exclamation="Awesome"): self.exclamation = exclamation
    def __call__(self, epoch): print(f"{self.exclamation}! We've finished epoch {epoch}!")

#collapse_show
cb = ProgressShowingCallback("Just super")

call is a magic name, will be called when you take an object and treat it like a function :

#collapse_show
cb("hi")
Just super! We've finished epoch hi!

#collapse
slow_calculation(cb)
Just super! We've finished epoch 0!
Just super! We've finished epoch 1!
Just super! We've finished epoch 2!
Just super! We've finished epoch 3!
Just super! We've finished epoch 4!
30

Multiple callback funcs; *args and **kwargs

All the things that are positional arguments end up in a tuple (args) and all the keyword arguments (kwargs) are stored as a dict. This is used to wrap other classes/objects, **kwargs can be passed off to the subclasses for example.

#collapse_show
def f(*args, **kwargs): print(f"args: {args}; kwargs: {kwargs}")

#collapse_show
f(3, 'a', thing1="hello")
args: (3, 'a'); kwargs: {'thing1': 'hello'}

NB: We've been guilty of over-using kwargs in fastai - it's very convenient for the developer, but is annoying for the end-user unless care is taken to ensure docs show all kwargs too. kwargs can also hide bugs (because it might not tell you about a typo in a param name). In R there's a very similar issue (R uses ... for the same thing), and matplotlib uses kwargs a lot too.

Let's go back to our function from the start, adding a callback before and after the calculation. This is a good use for args and kwargs, due to it's flexibility :

#collapse_show
def slow_calculation(cb=None):
    res = 0
    for i in range(5):
        if cb: cb.before_calc(i)
        res += i*i
        sleep(1)
        if cb: cb.after_calc(i, val=res)
    return res

#collapse_show
class PrintStepCallback():
    def __init__(self): pass
    #when removing args and kwargs here it won't work
    def before_calc(self, *args, **kwargs): print(f"About to start") 
    def after_calc (self, *args, **kwargs): print(f"Done step")

#collapse_show
slow_calculation(PrintStepCallback())
About to start
Done step
About to start
Done step
About to start
Done step
About to start
Done step
About to start
Done step
30

We can now use this with epoch and val to print our details :

#collapse_show
class PrintStatusCallback():
    def __init__(self): pass
    def before_calc(self, epoch, **kwargs): print(f"About to start: {epoch}")
    def after_calc (self, epoch, val, **kwargs): print(f"After {epoch}: {val}")

#collapse_show
slow_calculation(PrintStatusCallback())
About to start: 0
After 0: 0
About to start: 1
After 1: 1
About to start: 2
After 2: 5
About to start: 3
After 3: 14
About to start: 4
After 4: 30
30

Modifying behavior

We can now use this to implement early stopping.

#collapse_show
def slow_calculation(cb=None):
    res = 0
    for i in range(5):
        if cb and hasattr(cb,'before_calc'): cb.before_calc(i)
        res += i*i
        sleep(1)
        if cb and hasattr(cb,'after_calc'):
            if cb.after_calc(i, res):
                print("stopping early")
                break
    return res

#collapse_show
class PrintAfterCallback():
    def after_calc (self, epoch, val):
        print(f"After {epoch}: {val}")
        if val>10: return True

#collapse_show
slow_calculation(PrintAfterCallback())
After 0: 0
After 1: 1
After 2: 5
After 3: 14
stopping early
14

We can now use implement a class, this will allow our callback to modify the values inside the class.

#collapse_show
class SlowCalculator():
    def __init__(self, cb=None): self.cb,self.res = cb,0
    
    def callback(self, cb_name, *args):
        if not self.cb: return
        cb = getattr(self.cb,cb_name, None)
        if cb: return cb(self, *args)

    def calc(self):
        for i in range(5):
            self.callback('before_calc', i)
            self.res += i*i
            sleep(1)
            if self.callback('after_calc', i):
                print("stopping early")
                break

Using dunder method __call__ we can do it like this :

#collapse_show
class SlowCalculator():
    def __init__(self, cb=None): self.cb,self.res = cb,0
    
    def __call__(self, cb_name, *args):
        if not self.cb: return
        cb = getattr(self.cb,cb_name, None)
        if cb: return cb(self, *args)

    def calc(self):
        for i in range(5):
            self.callback('before_calc', i)
            self.res += i*i
            sleep(1)
            if self('after_calc', i):
                print("stopping early")
                break

#collapse_show
class ModifyingCallback():
    def after_calc (self, calc, epoch):
        print(f"After {epoch}: {calc.res}")
        if calc.res>10: return True
        if calc.res<3: calc.res = calc.res*2

#collapse_show
calculator = SlowCalculator(ModifyingCallback())
calculator.calc()
calculator.res
After 0: 0
After 1: 1
After 2: 6
After 3: 15
stopping early
15

__dunder__ thingies

Anything that looks like __this__ is, in some way, special. Python, or some library, can define some functions that they will call at certain documented times. For instance, when your class is setting up a new object, python will call __init__. These are defined as part of the python data model.

For instance, if python sees +, then it will call the special method __add__. If you try to display an object in Jupyter (or lots of other places in Python) it will call __repr__.

#collapse_show
class SloppyAdder():
    def __init__(self,o): self.o=o
    def __add__(self,b): return SloppyAdder(self.o + b.o + 0.01)
    def __repr__(self): return str(self.o)

#collapse_show
a = SloppyAdder(1)
b = SloppyAdder(2)
a+b
3.01

Special methods you should probably know about (see data model link above) are:

  • __getitem__
  • __getattr__
  • __setattr__
  • __del__
  • __init__
  • __new__
  • __enter__
  • __exit__
  • __len__
  • __repr__
  • __str__

Variance and stuff

Variance

Variance is the average of how far away each data point is from the mean. E.g.:

#collapse
t = torch.tensor([1.,2.,4.,18])

#collapse
m = t.mean(); m
tensor(6.2500)

#collapse
(t-m).mean()
tensor(0.)

Oops. We can't do that. Because by definition the positives and negatives cancel out. So we can fix that in one of (at least) two ways:

#collapse
(t-m).pow(2).mean()
tensor(47.1875)

#collapse
(t-m).abs().mean()
tensor(5.8750)

But the first of these is now a totally different scale, since we squared. So let's undo that at the end.

#collapse
(t-m).pow(2).mean().sqrt()
tensor(6.8693)

They're still different. Why?

Note that we have one outlier (18). In the version where we square everything, it makes that much bigger than everything else.

(t-m).pow(2).mean() is refered to as variance. It's a measure of how spread out the data is, and is particularly sensitive to outliers.

When we take the sqrt of the variance, we get the standard deviation. Since it's on the same kind of scale as the original data, it's generally more interpretable. However, since sqrt(1)==1, it doesn't much matter which we use when talking about unit variance for initializing neural nets.

(t-m).abs().mean() is referred to as the mean absolute deviation. It isn't used nearly as much as it deserves to be, because mathematicians don't like how awkward it is to work with. But that shouldn't stop us, because we have computers and stuff.

Here's a useful thing to note about variance:

#collapse_show
(t-m).pow(2).mean(), (t*t).mean() - (m*m)
(tensor(47.1875), tensor(47.1875))

You can see why these are equal if you want to work thru the algebra. Or not.

But, what's important here is that the latter is generally much easier to work with. In particular, you only have to track two things: the sum of the data, and the sum of squares of the data. Whereas in the first form you actually have to go thru all the data twice (once to calculate the mean, once to calculate the differences).

Let's go steal the LaTeX from Wikipedia:

$$\operatorname{E}\left[X^2 \right] - \operatorname{E}[X]^2$$

Covariance and correlation

Here's how Wikipedia defines covariance:

$$\operatorname{cov}(X,Y) = \operatorname{E}{\big[(X - \operatorname{E}[X])(Y - \operatorname{E}[Y])\big]}$$

#collapse
t
tensor([ 1.,  2.,  4., 18.])

Let's see that in code. So now we need two vectors.

#collapse
# `u` is twice `t`, plus a bit of randomness
u = t*2
u *= torch.randn_like(t)/10+0.95

plt.scatter(t, u);

#collapse
prod = (t-t.mean())*(u-u.mean()); prod
tensor([ 55.9552,  41.3348,   9.6900, 290.1150])

#collapse
prod.mean()
tensor(99.2737)

#collapse
v = torch.randn_like(t)
plt.scatter(t, v);

#collapse
((t-t.mean())*(v-v.mean())).mean()
tensor(1.0830)

It's generally more conveniently defined like so:

$$\operatorname{E}\left[X Y\right] - \operatorname{E}\left[X\right] \operatorname{E}\left[Y\right]$$

#collapse
cov = (t*v).mean() - t.mean()*v.mean(); cov
tensor(1.0830)

From now on, you're not allowed to look at an equation (or especially type it in LaTeX) without also typing it in Python and actually calculating some values. Ideally, you should also plot some values.

Finally, here is the Pearson correlation coefficient:

$$\rho_{X,Y}= \frac{\operatorname{cov}(X,Y)}{\sigma_X \sigma_Y}$$

#collapse_show
cov / (t.std() * v.std())
tensor(0.1165)

It's just a scaled version of the same thing. Question: Why is it scaled by standard deviation, and not by variance or mean or something else?

Softmax

Here's our final logsoftmax definition:

#collapse_show
def log_softmax(x): return x - x.exp().sum(-1,keepdim=True).log()

which is:

$$\hbox{logsoftmax(x)}_{i} = x_{i} - \log \sum_{j} e^{x_{j}}$$

And our cross entropy loss is: $$-\log(p_{i})$$

Softmax is only a good idea, when our data (e.g. image) has only one (at least one) example, due to the highest output being far higher in Softmax due to the e function. In these cases binomial : $e^{x}/(1+e^{x})$

Browsing source code

  • Jump to tag/symbol by with (with completions)
  • Jump to current tag
  • Jump to library tags
  • Go back
  • Search
  • Outlining / folding