Fastai Course DL from the Foundations Callbacks and __dunder__
Callbacks and __dunder__ Deep Dive (Lesson 3 Part 1)
- Deep Learning from the Foundations Lesson 3 Part1
- Callbacks
- __dunder__ thingies
- Variance and stuff
- Softmax
- Browsing source code
In the beginning we recap some lesson 2 concepts (Callbacks, Variants, under special methods) :
Just our imports :
#collapse
%load_ext autoreload
%autoreload 2
%matplotlib inline
#collapse
import torch
import matplotlib.pyplot as plt
Jump_to lesson 10 video
Imports :
#collapse
import ipywidgets as widgets
#collapse_show
def f(o): print('hi')
From the ipywidget docs:
- the button widget is used to handle mouse clicks. The on_click method of the Button can be used to register function to be called when the button is clicked
#collapse_show
w = widgets.Button(description='Click me')
#collapse
w
Now that we created this button we can pass it a function that will execute when pushing the button. It's a callback using a function pointer !
#collapse_show
w.on_click(f)
NB: When callbacks are used in this way they are often called "events".
Did you know what you can create interactive apps in Jupyter with these widgets? Here's an example from plotly:
#collapse
from time import sleep
We create a dummy calculation function to show this concept :
#collapse_show
def slow_calculation(cb=None):
res = 0
for i in range(5):
res += i*i
sleep(1)
if cb: cb(i)
return res
#collapse_show
def show_progress(epoch):
print(f"Awesome! We've finished epoch {epoch}!")
We can now use this show_progress fct to use it as a callback !
#collapse_show
slow_calculation(show_progress)
We can also define the function with Lambda and use it right away !
#collapse_show
slow_calculation(lambda o: print(f"Awesome! We've finished epoch {o}!"))
#collapse_show
def show_progress(exclamation, epoch):
print(f"{exclamation}! We've finished epoch {epoch}!")
The above function can not be passed, because it uses 2 arguments. So we use Lambda to fix this :
#collapse_show
slow_calculation(lambda o: show_progress("OK I guess", o))
It's better to do it like this, where we pass the function with the exclamation :
#collapse_show
import torch.nn
def make_show_progress(exclamation):
# Leading "_" is generally understood to be "private"
def _inner(epoch): print(f"{exclamation}! We've finished epoch {epoch}!")
return _inner
def logsumexp(x):
m = x.max(-1)[0]
print('m shape : {}'.format(m.shape))
return m + (x-m[:,None]).exp().sum(-1).log()
def log_softmax(x):
return x - x.logsumexp(-1,keepdim=True)
sm_pred = [10,11]
sm_pred = torch.FloatTensor(sm_pred)
a = log_softmax(sm_pred)
print(a)
def progress(epochs,cb=None):
for i in range(epochs):
res = torch.FloatTensor([i,i+1])
if cb :
cb(i,res)
return res
def show_random_loss():
def _inner(epoch,loss):
print("Loss",loss)
a = log_softmax(loss)
print("log-soft",a)
print(f"The loss in epoch {epoch} is : {log_softmax(loss)}")
return _inner
progress(10,show_random_loss())
slow_calculation(show_random_loss(log_softmax(sm_pred)))
#collapse_show
slow_calculation(make_show_progress("Nice!"))
Obviously we can also do it like this with f2 containing the closure :
#collapse_show
f2 = make_show_progress("Terrific")
#collapse
slow_calculation(f2)
#collapse
slow_calculation(make_show_progress("Amazing"))
#collapse
from functools import partial
We can also use partial to use the argmuent passed always as given. This means we can only pass the epoch argument now. We can also store it in f2 again like before.
#collapse_show
slow_calculation(partial(show_progress, "OK I guess"))
#collapse
f2 = partial(show_progress, "OK I guess")
Most of the time we want to use a class, so we store our exclamation and epoch args in class attributes :
#collapse_show
class ProgressShowingCallback():
def __init__(self, exclamation="Awesome"): self.exclamation = exclamation
def __call__(self, epoch): print(f"{self.exclamation}! We've finished epoch {epoch}!")
#collapse_show
cb = ProgressShowingCallback("Just super")
call is a magic name, will be called when you take an object and treat it like a function :
#collapse_show
cb("hi")
#collapse
slow_calculation(cb)
All the things that are positional arguments end up in a tuple (args) and all the keyword arguments (kwargs) are stored as a dict. This is used to wrap other classes/objects, **kwargs can be passed off to the subclasses for example.
#collapse_show
def f(*args, **kwargs): print(f"args: {args}; kwargs: {kwargs}")
#collapse_show
f(3, 'a', thing1="hello")
NB: We've been guilty of over-using kwargs in fastai - it's very convenient for the developer, but is annoying for the end-user unless care is taken to ensure docs show all kwargs too. kwargs can also hide bugs (because it might not tell you about a typo in a param name). In R there's a very similar issue (R uses ...
for the same thing), and matplotlib uses kwargs a lot too.
Let's go back to our function from the start, adding a callback before and after the calculation. This is a good use for args and kwargs, due to it's flexibility :
#collapse_show
def slow_calculation(cb=None):
res = 0
for i in range(5):
if cb: cb.before_calc(i)
res += i*i
sleep(1)
if cb: cb.after_calc(i, val=res)
return res
#collapse_show
class PrintStepCallback():
def __init__(self): pass
#when removing args and kwargs here it won't work
def before_calc(self, *args, **kwargs): print(f"About to start")
def after_calc (self, *args, **kwargs): print(f"Done step")
#collapse_show
slow_calculation(PrintStepCallback())
We can now use this with epoch and val to print our details :
#collapse_show
class PrintStatusCallback():
def __init__(self): pass
def before_calc(self, epoch, **kwargs): print(f"About to start: {epoch}")
def after_calc (self, epoch, val, **kwargs): print(f"After {epoch}: {val}")
#collapse_show
slow_calculation(PrintStatusCallback())
We can now use this to implement early stopping.
#collapse_show
def slow_calculation(cb=None):
res = 0
for i in range(5):
if cb and hasattr(cb,'before_calc'): cb.before_calc(i)
res += i*i
sleep(1)
if cb and hasattr(cb,'after_calc'):
if cb.after_calc(i, res):
print("stopping early")
break
return res
#collapse_show
class PrintAfterCallback():
def after_calc (self, epoch, val):
print(f"After {epoch}: {val}")
if val>10: return True
#collapse_show
slow_calculation(PrintAfterCallback())
We can now use implement a class, this will allow our callback to modify the values inside the class.
#collapse_show
class SlowCalculator():
def __init__(self, cb=None): self.cb,self.res = cb,0
def callback(self, cb_name, *args):
if not self.cb: return
cb = getattr(self.cb,cb_name, None)
if cb: return cb(self, *args)
def calc(self):
for i in range(5):
self.callback('before_calc', i)
self.res += i*i
sleep(1)
if self.callback('after_calc', i):
print("stopping early")
break
Using dunder method __call__
we can do it like this :
#collapse_show
class SlowCalculator():
def __init__(self, cb=None): self.cb,self.res = cb,0
def __call__(self, cb_name, *args):
if not self.cb: return
cb = getattr(self.cb,cb_name, None)
if cb: return cb(self, *args)
def calc(self):
for i in range(5):
self.callback('before_calc', i)
self.res += i*i
sleep(1)
if self('after_calc', i):
print("stopping early")
break
#collapse_show
class ModifyingCallback():
def after_calc (self, calc, epoch):
print(f"After {epoch}: {calc.res}")
if calc.res>10: return True
if calc.res<3: calc.res = calc.res*2
#collapse_show
calculator = SlowCalculator(ModifyingCallback())
calculator.calc()
calculator.res
Anything that looks like __this__
is, in some way, special. Python, or some library, can define some functions that they will call at certain documented times. For instance, when your class is setting up a new object, python will call __init__
. These are defined as part of the python data model.
For instance, if python sees +
, then it will call the special method __add__
. If you try to display an object in Jupyter (or lots of other places in Python) it will call __repr__
.
#collapse_show
class SloppyAdder():
def __init__(self,o): self.o=o
def __add__(self,b): return SloppyAdder(self.o + b.o + 0.01)
def __repr__(self): return str(self.o)
#collapse_show
a = SloppyAdder(1)
b = SloppyAdder(2)
a+b
Special methods you should probably know about (see data model link above) are:
__getitem__
__getattr__
__setattr__
__del__
__init__
__new__
__enter__
__exit__
__len__
__repr__
__str__
Variance is the average of how far away each data point is from the mean. E.g.:
#collapse
t = torch.tensor([1.,2.,4.,18])
#collapse
m = t.mean(); m
#collapse
(t-m).mean()
Oops. We can't do that. Because by definition the positives and negatives cancel out. So we can fix that in one of (at least) two ways:
#collapse
(t-m).pow(2).mean()
#collapse
(t-m).abs().mean()
But the first of these is now a totally different scale, since we squared. So let's undo that at the end.
#collapse
(t-m).pow(2).mean().sqrt()
They're still different. Why?
Note that we have one outlier (18
). In the version where we square everything, it makes that much bigger than everything else.
(t-m).pow(2).mean()
is refered to as variance. It's a measure of how spread out the data is, and is particularly sensitive to outliers.
When we take the sqrt of the variance, we get the standard deviation. Since it's on the same kind of scale as the original data, it's generally more interpretable. However, since sqrt(1)==1
, it doesn't much matter which we use when talking about unit variance for initializing neural nets.
(t-m).abs().mean()
is referred to as the mean absolute deviation. It isn't used nearly as much as it deserves to be, because mathematicians don't like how awkward it is to work with. But that shouldn't stop us, because we have computers and stuff.
Here's a useful thing to note about variance:
#collapse_show
(t-m).pow(2).mean(), (t*t).mean() - (m*m)
You can see why these are equal if you want to work thru the algebra. Or not.
But, what's important here is that the latter is generally much easier to work with. In particular, you only have to track two things: the sum of the data, and the sum of squares of the data. Whereas in the first form you actually have to go thru all the data twice (once to calculate the mean, once to calculate the differences).
Let's go steal the LaTeX from Wikipedia:
$$\operatorname{E}\left[X^2 \right] - \operatorname{E}[X]^2$$
Here's how Wikipedia defines covariance:
$$\operatorname{cov}(X,Y) = \operatorname{E}{\big[(X - \operatorname{E}[X])(Y - \operatorname{E}[Y])\big]}$$
#collapse
t
Let's see that in code. So now we need two vectors.
#collapse
# `u` is twice `t`, plus a bit of randomness
u = t*2
u *= torch.randn_like(t)/10+0.95
plt.scatter(t, u);
#collapse
prod = (t-t.mean())*(u-u.mean()); prod
#collapse
prod.mean()
#collapse
v = torch.randn_like(t)
plt.scatter(t, v);
#collapse
((t-t.mean())*(v-v.mean())).mean()
It's generally more conveniently defined like so:
$$\operatorname{E}\left[X Y\right] - \operatorname{E}\left[X\right] \operatorname{E}\left[Y\right]$$
#collapse
cov = (t*v).mean() - t.mean()*v.mean(); cov
From now on, you're not allowed to look at an equation (or especially type it in LaTeX) without also typing it in Python and actually calculating some values. Ideally, you should also plot some values.
Finally, here is the Pearson correlation coefficient:
$$\rho_{X,Y}= \frac{\operatorname{cov}(X,Y)}{\sigma_X \sigma_Y}$$
#collapse_show
cov / (t.std() * v.std())
It's just a scaled version of the same thing. Question: Why is it scaled by standard deviation, and not by variance or mean or something else?
Here's our final logsoftmax
definition:
#collapse_show
def log_softmax(x): return x - x.exp().sum(-1,keepdim=True).log()
which is:
$$\hbox{logsoftmax(x)}_{i} = x_{i} - \log \sum_{j} e^{x_{j}}$$And our cross entropy loss is: $$-\log(p_{i})$$
Softmax is only a good idea, when our data (e.g. image) has only one (at least one) example, due to the highest output being far higher in Softmax due to the e function. In these cases binomial : $e^{x}/(1+e^{x})$
- Jump to tag/symbol by with (with completions)
- Jump to current tag
- Jump to library tags
- Go back
- Search
- Outlining / folding