## Hyper-parameters in Action! Introducing DeepReplay

Originally posted on Towards Data Science.

### Introduction

In my previous post, I invited you to wonder** what exactly is going on under the hood** when you train a neural network. Then I investigated the role of ** activation functions**, illustrating the effect they have on the

**using**

*feature space***plots**and

**animations**.

Now, I invite **you** to play an active role on the investigation!

It turns out these **plots** and **animations** drew quite some attention. So I decided to organize my code and structure it into a proper **Python package**, so **you** can plot and animate **your own Deep Learning models**!

How do they look like, you ask? Well, if you haven’t checked the original post yet, here it is a quick peek at it:

This is what animating with DeepReplay looks like :-)

So, without further ado, I present you… **DeepReplay**!

### DeepReplay

The package is called **DeepReplay** because this is exactly what it allows you to do: **REPLAY** the process of training your Deep Learning Model, **plotting** and **animating** several aspects of it.

The process is simple enough, consisting of **five steps**:

- It all starts with creating an instance of a
**callback**! - Then, business as usual: build and train your model.
- Next, load the collected data into
**Replay**. - Finally, create a figure and
**attach the visualizations**to it. **Plot**and/or**animate**it!

Let’s go through each one of these steps!

#### 1. Creating an instance of a callback

The callback should be an instance of **ReplayData**.

[gist id=”61394f6733e33ec72522a58614d1425a” /]

The **callback** takes, as arguments, the model inputs (** X** and

**), as well as the**

*y***and**

*filename***where you want to store the collected training data.**

*group name*Two things to keep in mind:

- For toy datasets, it is fine to use the same
and*X*as in your model fitting. These are the examples that will be plot —so, you can choose a random subset of your dataset to keep computation times reasonable, if you are using a bigger dataset.*y* - The data is stored in a HDF5 file, and you can use the
*same file***several times over**, but**never**the! If you try running it twice using the same group name, you will get an*same group***error**.

#### 2. Build and train your model

Like I said, business as usual, nothing to see here… just don’t forget to **add your callback instance** to the list of callbacks when fitting!

#### 3. Load collected data into Replay

So, the part that gives the whole thing its name… time to **replay** it!

It should be straightforward enough: create an instance of **Replay**, providing the ** filename** and the

**you chose in**

*group name***Step 1**.

#### 4. Create a figure and attach visualizations to it

This is the step where things get interesting, actually. Just use **Matplotlib **to create a figure, as simple as the one in the example, or as complex as *subplot2grid* allows you to make it, and start **attaching visualizations** from your **Replay** object to the figure.

The example above builds a ** feature space** based on the output of the layer named, suggestively,

**.**

*hidden*But there are **five** types of visualizations available:

**Feature Space**: plot representing the, corresponding to the output of a*twisted and turned feature space***hidden**layer (only 2-unit hidden layers supported for now), including**grid lines**for 2-dimensional inputs;

**Decision Boundary**: plot of a 2-D grid representing the, together with the*original feature space*(only 2-dimensional inputs supported for now);*decision boundary*

**Probability Histogram**:**two**histograms of the resulting**classification probabilities**for the inputs, one for each class, corresponding to the model output (only binary classification supported for now);

**Loss and Metric**: line plot for both the**loss**and a**chosen metric**, computed over all the inputs you passed as arguments to the callback;

**Loss Histogram**: histogram of the**losses**computed over all the inputs you passed as arguments to the callback (only binary cross-entropy loss supported for now).

#### 5. Plot and/or animate it!

For this example, with a **single** **visualization**, you can use its ** plot** and

**methods directly. These methods will return, respectively, a figure and an animation, which you can then save to a file.**

*animate*If you decide to go with **multiple simultaneous visualizations**, there are two **helper methods** that return composed plots and animations, respectively: ** compose_plots** and

**.**

*compose_animations*To illustrate these methods, here is a **gist **that comes from the “** canonical**” example I used in my original post. There are

**four**visualizations and

**five**plots (

**has**

*Probability Histogram***two plots**, for negative and positive cases).

The** animated GIF** at the beginning of this post is actually the result of **this** composed animation!

#### Limitations

At this point, you probably noticed that the two **coolest** visualizations, *Feature Space** and *** Decision Boundary**, are limited to

**two dimensions**.

I plan on adding support for visualizations in **three dimensions** also, but most of datasets and models have either **more inputs** or hidden layers with **many more units**.

So, these are the options you have:

- 2D inputs, 2-unit hidden layer:
with optional grid (check the Activation Functions example);*Feature Space* - 3D+ inputs, 2-unit hidden layer:
, but no grid;*Feature Space* - 2D inputs, hidden layer with 3+ units:
with optional grid (check the Circles example);*Decision Boundary* - nothing is two dimensional: well… there is always a
**workaround**, right?

### Working around multidimensionality

**What do we want to achieve?** Since we can only do 2-dimensional plots, we want **2-dimensional outputs **— simple enough.

**How to get 2-dimensional outputs?** Adding an **extra hidden layer** with **two units**, of course! OK, I know this is **suboptimal**, as it is actually modifying the model (*did I mention this is a workaround?!*). We can then use the outputs of this extra layer for plotting.

You can check either the *Moons* or the *UCI Spambase* notebooks, for examples on adding an extra hidden layer and plotting it.

: The following part is a bit more advanced, it delves deeper into the reasoning behind adding the extra hidden layer and what it represents. Proceed at your own risk :-)NOTE

**What are we doing with the model, anyway?** By adding an extra hidden layer, we can think of our model as having **two components**: an ** encoder** and a

**. Let’s dive**

*decoder**just a bit*deeper into those:

**Encoder**: the encoder goes from the inputs all the way to our**extra hidden layer**. Let’s consider its 2-dimensional output as**features**and call themand*f1*.*f2***Decoder**: the decoder, in this case, is just a plain and simple**logistic regression**, which takes two inputs, say,and*f1*, and outputs a classification probability.*f2*

Let me try to make it more clear with a network diagram:

Encoder / Decoder after adding an extra hidden layer

What do we have here? A 9-dimensional input, an original hidden layer with 5 units, an extra hidden layer with two units, its corresponding two outputs (features) and a single unit output layer.

So, **what happens with the inputs** along the way? Let’s see:

- Inputs (
through*x1*) are fed into the*x9*part of the model.*encoder* - The
**original**hidden layerthe inputs. The*twists and turns***outputs**of the hidden layer can also be thought of as**features**(these would be the outputs of unitsthrough*h1**h5**in the*diagram), but these are assumed to be**n-dimensional**and therefore not suited for plotting. So far, business as usual. - Then comes the
**extra**hidden layer. Its**weights matrix**has shape**(n, 2)**(in the diagram,and we can count*n = 5***10**arrows betweenand*h*nodes). If we assume a*e*, this layer is actually performing an*linear activation function***affine transformation**, mapping points from a**n-dimensional**to a**2-dimensional**feature space. These are our features,and*f1*, the output of the*f2*part.*encoder* - Since we assumed a
for the extra hidden layer,*linear activation function*and*f1*are going to be directly fed to the*f2*(output layer), that is, to a single unit with a*decoder**sigmoid*. This is a plain and simple*activation function***logistic regression**.

**What does it all mean? **It means that our model is also learning a ** latent space** with

**(**

*two latent factors***and**

*f1***) now! Fancy, uh?! Don’t get intimidated by the fanciness of these terms, though… it basically means the model learned to**

*f2***best compress the information**to only two features,

**given the task at hand**— a binary classification.

This is the basic underlying principle of **auto-encoders**, the major difference being the fact that the auto-encoder’s task is to **reconstruct its inputs**, not classify them in any way.

### Final Thoughts

I hope this post enticed you to try **DeepReplay** out :-)

If you come up with **nice and cool** visualizations for different datasets, or using different network architectures or hyper-parameters, please **share** it on the **comments** section. I am considering starting a **Gallery** page, if there is enough interest in it.

For more information about the **DeepReplay** package, like installation, documentation, examples and **notebooks** (which you can play with using **Google Colab**), please go to my GitHub repository:

**Have fun animating your models! :-)**

*If you have any thoughts, comments or questions, please leave a comment below or contact me on **Twitter**.*