Hyper-parameters in Action! Introducing DeepReplay

Photo by Immo Wegmann on Unsplash

Originally posted on Towards Data Science.

Introduction

In my previous post, I invited you to wonder what exactly is going on under the hood when you train a neural network. Then I investigated the role of activation functions, illustrating the effect they have on the feature space using plots and animations.

Now, I invite you to play an active role on the investigation!

It turns out these plots and animations drew quite some attention. So I decided to organize my code and structure it into a proper Python package, so you can plot and animate your own Deep Learning models!

How do they look like, you ask? Well, if you haven’t checked the original post yet, here it is a quick peek at it:

This is what animating with DeepReplay looks like :-)

So, without further ado, I present you… DeepReplay!

DeepReplay

The package is called DeepReplay because this is exactly what it allows you to do: REPLAY the process of training your Deep Learning Model, plotting and animating several aspects of it.

The process is simple enough, consisting of five steps:

  1. It all starts with creating an instance of a callback!
  2. Then, business as usual: build and train your model.
  3. Next, load the collected data into Replay.
  4. Finally, create a figure and attach the visualizations to it.
  5. Plot and/or animate it!

Let’s go through each one of these steps!

1. Creating an instance of a callback

The callback should be an instance of ReplayData.

[gist id=”61394f6733e33ec72522a58614d1425a” /]

The callback takes, as arguments, the model inputs (X and y), as well as the filename and group name where you want to store the collected training data.

Two things to keep in mind:

  • For toy datasets, it is fine to use the same X and y as in your model fitting. These are the examples that will be plot —so, you can choose a random subset of your dataset to keep computation times reasonable, if you are using a bigger dataset.
  • The data is stored in a HDF5 file, and you can use the same file several times over, but never the same group! If you try running it twice using the same group name, you will get an error.

2. Build and train your model

Like I said, business as usual, nothing to see here… just don’t forget to add your callback instance to the list of callbacks when fitting!

[gist id=”86591c9796731c21f920e01ed2376b23″ /]

3. Load collected data into Replay

So, the part that gives the whole thing its name… time to replay it!

It should be straightforward enough: create an instance of Replay, providing the filename and the group name you chose in Step 1.

[gist id=”019637d6d041fdbd269db9a78a2311b6″ /]

4. Create a figure and attach visualizations to it

This is the step where things get interesting, actually. Just use Matplotlib to create a figure, as simple as the one in the example, or as complex as subplot2grid allows you to make it, and start attaching visualizations from your Replay object to the figure.

[gist id=”ba49bdca40a2abaa68af39922e78a556″ /]

The example above builds a feature space based on the output of the layer named, suggestively, hidden.

But there are five types of visualizations available:

  • Feature Space: plot representing the twisted and turned feature space, corresponding to the output of a hidden layer (only 2-unit hidden layers supported for now), including grid lines for 2-dimensional inputs;
  • Decision Boundary: plot of a 2-D grid representing the original feature space, together with the decision boundary (only 2-dimensional inputs supported for now);
  • Probability Histogram: two histograms of the resulting classification probabilities for the inputs, one for each class, corresponding to the model output (only binary classification supported for now);
  • Loss and Metric: line plot for both the loss and a chosen metric, computed over all the inputs you passed as arguments to the callback;
  • Loss Histogram: histogram of the losses computed over all the inputs you passed as arguments to the callback (only binary cross-entropy loss supported for now).

5. Plot and/or animate it!

For this example, with a single visualization, you can use its plot and animate methods directly. These methods will return, respectively, a figure and an animation, which you can then save to a file.

[gist id=”83ef91da63de149f5a58f6e428ab37f3″ /]

If you decide to go with multiple simultaneous visualizations, there are two helper methods that return composed plots and animations, respectively: compose_plots and compose_animations.

To illustrate these methods, here is a gist that comes from the “canonicalexample I used in my original post. There are four visualizations and five plots (Probability Histogram has two plots, for negative and positive cases).

The animated GIF at the beginning of this post is actually the result of this composed animation!

[gist id=”6ad78608f5ae7ebe2c31f84f9b001625″ /]

Limitations

At this point, you probably noticed that the two coolest visualizations, Feature Space and Decision Boundary, are limited to two dimensions.

I plan on adding support for visualizations in three dimensions also, but most of datasets and models have either more inputs or hidden layers with many more units.

So, these are the options you have:

  • 2D inputs, 2-unit hidden layer: Feature Space with optional grid (check the Activation Functions example);
  • 3D+ inputs, 2-unit hidden layer: Feature Space, but no grid;
  • 2D inputs, hidden layer with 3+ units: Decision Boundary with optional grid (check the Circles example);
  • nothing is two dimensional: well… there is always a workaround, right?

Working around multidimensionality

What do we want to achieve? Since we can only do 2-dimensional plots, we want 2-dimensional outputs — simple enough.

How to get 2-dimensional outputs? Adding an extra hidden layer with two units, of course! OK, I know this is suboptimal, as it is actually modifying the model (did I mention this is a workaround?!). We can then use the outputs of this extra layer for plotting.

You can check either the Moons or the UCI Spambase notebooks, for examples on adding an extra hidden layer and plotting it.

NOTE: The following part is a bit more advanced, it delves deeper into the reasoning behind adding the extra hidden layer and what it represents. Proceed at your own risk :-)

What are we doing with the model, anyway? By adding an extra hidden layer, we can think of our model as having two components: an encoder and a decoder. Let’s dive just a bit deeper into those:

  • Encoder: the encoder goes from the inputs all the way to our extra hidden layer. Let’s consider its 2-dimensional output as features and call them f1 and f2.
  • Decoder: the decoder, in this case, is just a plain and simple logistic regression, which takes two inputs, say, f1 and f2, and outputs a classification probability.

Let me try to make it more clear with a network diagram:

Encoder / Decoder after adding an extra hidden layer

What do we have here? A 9-dimensional input, an original hidden layer with 5 units, an extra hidden layer with two units, its corresponding two outputs (features) and a single unit output layer.

So, what happens with the inputs along the way? Let’s see:

  1. Inputs (x1 through x9) are fed into the encoder part of the model.
  2. The original hidden layer twists and turns the inputs. The outputs of the hidden layer can also be thought of as features (these would be the outputs of units h1 through h5 in the diagram), but these are assumed to be n-dimensional and therefore not suited for plotting. So far, business as usual.
  3. Then comes the extra hidden layer. Its weights matrix has shape (n, 2) (in the diagram, n = 5 and we can count 10 arrows between h and e nodes). If we assume a linear activation function, this layer is actually performing an affine transformation, mapping points from a n-dimensional to a 2-dimensional feature space. These are our features, f1 and f2, the output of the encoder part.
  4. Since we assumed a linear activation function for the extra hidden layer, f1 and f2 are going to be directly fed to the decoder (output layer), that is, to a single unit with a sigmoid activation function. This is a plain and simple logistic regression.

What does it all mean? It means that our model is also learning a latent space with two latent factors (f1 and f2) now! Fancy, uh?! Don’t get intimidated by the fanciness of these terms, though… it basically means the model learned to best compress the information to only two features, given the task at hand — a binary classification.

This is the basic underlying principle of auto-encoders, the major difference being the fact that the auto-encoder’s task is to reconstruct its inputs, not classify them in any way.

Final Thoughts

I hope this post enticed you to try DeepReplay out :-)

If you come up with nice and cool visualizations for different datasets, or using different network architectures or hyper-parameters, please share it on the comments section. I am considering starting a Gallery page, if there is enough interest in it.

For more information about the DeepReplay package, like installation, documentation, examples and notebooks (which you can play with using Google Colab), please go to my GitHub repository:

Have fun animating your models! :-)

If you have any thoughts, comments or questions, please leave a comment below or contact me on Twitter.