July 1, 2020
Partial Derivatives & Recursive Descent
@anthonycorletti

Writing partial derivatives is a great way to understand some of the underlying features of machine learning and neural network libraries.

In this post I'll explain how partial derivatives are a necessary building block in understanding machine learning and neural networks, and how to write some python code to help bring partial derivatives and recursive descent to life!

Simply put, a partial derivative is a derivative where we hold some of the variables constant.

For example, take the following function:

f(x)=x2f(x) = x^2

It's derivative with respect to xx is:

df(x)dx=2x\frac{df(x)}{dx} = 2x

But what about two variables? xx and yy?

f(x,y)=x2+y3f(x, y) = x^2 + y^3

To find the partial derivative of the above function with respect to one variable, let's say xx, we treat yy as a constant, so we get the following:

(f(x,y))(x)=2x+0=2x\frac{\partial(f(x,y))}{\partial(x)} = 2x + 0 = 2x

And the partial derivative with respect to yy:

(f(x,y))(y)=0+3y2=3y2\frac{\partial(f(x,y))}{\partial(y)} = 0 + 3y^2 = 3y^2

That's it. That's all there is to it.

Just remember to treat all other variables as if they are constants.

Now what's recursive descent? Well, imagine these functions being plotted in a graph in some nn dimensions where we feed inputs to our function and map those inputs to whatever the function returns.

Recursive descent means that we will keep feeding inputs that keep returning smaller and smaller outputs in order to find the lowest valley or valleys in the functions space across our partial derivatives.

In this way, a neural network can be the function with many parameters that can be adjusted over time in order to return desired or correct outcomes.

So that's enough about math, let's actually write some code that defines expressions and evaluates derivatives.

I'm using python 3.8 for this if you'd like to follow along.

To start, we'll write out all classes we'll be working with.

Namely, we'll need a base class, what I'm calling Expression (Expr), a class for variables (Var), constants (Const), and for operations – let's just focus on adding and multiplying for now (Add, Mult).

class Expr:
    pass

class Add(Expr):
    pass

class Mult(Expr):
    pass

class Var(Expr):
    pass

class Const(Expr):
    pass

We'll be building our expressions with a binary tree structure, such that each add and multiply will have to evaluate a left and right sub-structure of expressions, variables, and/ or constants.

Now let's fill in our __init__ methods.

from numbers import Number

class Expr:
    pass

class Add(Expr):
    def __init__(self, left: Expr, right: Expr):
        self.left = left
        self.right = right

class Mult(Expr):
    def __init__(self, left: Expr, right: Expr):
        self.left = left
        self.right = right

class Var(Expr):
    def __init__(self, name: str):
        self.name = name

class Const(Expr):
    def __init__(self, val: Number):
        self.val = val

Hm, already noticing some re-factoring we can do. Let's combine Add and Mult into one class, a binary operator class (BinOp).

class BinOp(Expr):
    def __init__(self, left: Expr, right: Expr, operator: str):
        self.left = left
        self.right = right
        self.operator = operator

Now we want to be able to view and evaluate our expressions so lets add our __str__ and eval methods and run a few examples.

from numbers import Number
from typing import Dict

class Expr:
    pass


class BinOp(Expr):
    def __init__(self, left: Expr, right: Expr, operator: str):
        self.left = left
        self.right = right
        self.operator = operator

    def __str__(self) -> str:
        return f"({self.left}{self.operator}{self.right})"

    def eval(self, input: Dict[str, Number]) -> Number:
        left_eval = self.left.eval(input)
        right_eval = self.right.eval(input)
        return eval(f"{left_eval}{self.operator}{right_eval}")


class Const(Expr):
    def __init__(self, val: Number):
        self.val = val

    def __str__(self) -> str:
        return str(self.val)

    def eval(self, input: Dict[str, Number]) -> Number:
        return self.val


class Var(Expr):
    def __init__(self, name: str):
        self.name = name

    def __str__(self) -> str:
        return self.name

    def eval(self, input: Dict[str, Number]) -> Number:
        return input[self.name]

And with running a few tests we should see this work ...

Note that I'm defining our input as a dict so that we could have n unique parameters passed in as { "x": 1, "y": 1 ... }

# 3 * (y + x)
e1 = BinOp(Const(3), BinOp(Var("y"), Var("x"), "+"), "*")
print(f"e1: {e1}")

# (3 * y) + x
e2 = BinOp(BinOp(Const(3), Var("y"), "*"), Var("x"), "+")
print(f"e2: {e2}")

input = {"x": 2, "y": 4}

print(f"e1.eval(input): {e1.eval(input)}")
print(f"e2.eval(input): {e2.eval(input)}")

# e1: (3*(y+x))
# e2: ((3*y)+x)
# e1.eval(input): 18
# e2.eval(input): 14

Nice! So we can print and evaluate expressions. Now let's experiment with taking partial derivatives.

We'll need to define deriv functions for each Expr like so.

The final state of code should look like the following.

from numbers import Number
from typing import Dict


class Expr:
    pass


class BinOp(Expr):
    def __init__(self, left: Expr, right: Expr, operator: str):
        self.left = left
        self.right = right
        self.operator = operator

    def __str__(self) -> str:
        return f"({self.left}{self.operator}{self.right})"

    def eval(self, input: Dict[str, Number]) -> Number:
        left_eval = self.left.eval(input)
        right_eval = self.right.eval(input)
        return eval(f"{left_eval}{self.operator}{right_eval}")

    def deriv(self, var: str) -> Expr:
        if self.operator == "+":
            return BinOp(self.left.deriv(var), self.right.deriv(var), "+")
        elif self.operator == "*":
            return BinOp(BinOp(self.left.deriv(var), self.right, "*"),
                         BinOp(self.left, self.right.deriv(var), "*"), "+")


class Const(Expr):
    def __init__(self, val: Number):
        self.val = val

    def __str__(self) -> str:
        return str(self.val)

    def eval(self, input: Dict[str, Number]) -> Number:
        return self.val

    def deriv(self, var: str) -> Expr:
        return Const(0)


class Var(Expr):
    def __init__(self, name: str):
        self.name = name

    def __str__(self) -> str:
        return self.name

    def eval(self, input: Dict[str, Number]) -> Number:
        return input[self.name]

    def deriv(self, var: str) -> Expr:
        if self.name == var:
            return Const(1)
        else:
            return Const(0)

And for our examples ...

# x2 + xy + y3
a = BinOp(Var("x"), Var("x"), "*")
b = BinOp(Var("x"), Var("y"), "*")
c = BinOp(Var("y"), BinOp(Var("y"), Var("y"), "*"), "*")

e5 = BinOp(a, BinOp(b, c, "+"), "+")
e5x = e5.deriv("x")
e5y = e5.deriv("y")

print(f"e5: {e5}")
print(f"e5x: {e5x}")
print(f"e5y: {e5y}")

input = {"x": 1, "y": 1}
print(f"e5.eval(input): {e5.eval(input)}")
print(f"e5x.eval(input): {e5x.eval(input)}")
print(f"e5y.eval(input): {e5y.eval(input)}")

# e5: ((x*x)+((x*y)+(y*(y*y))))
# e5x: (((1*x)+(x*1))+(((1*y)+(x*0))+((0*(y*y))+(y*((0*y)+(y*0))))))
# e5y: (((0*x)+(x*0))+(((0*y)+(x*1))+((1*(y*y))+(y*((1*y)+(y*1))))))
# e5.eval(input): 3
# e5x.eval(input): 3
# e5y.eval(input): 4

Now we can map input parameters to outputs to find local and absolute minimums and maximums in functions. But we could probably spare the number of parenthesis ... but we'll get to that next time.

These types of software patterns and data aggregations are some of the tools used in machine learning and neural nets in order to make recommendations on all kinds of things like image recognition, structure identification, natural language processing, optical character recognition, and more!