This article will give you an overview of how to automate the deep learning model hyper-parameter tuning.

Article Outline

  • Introduction
  • About Dataset
  • Loading Dataset
  • Data Preprocessing
  • Setting Model Configuration
  • Model Tuning Strategy
  • Identifying the best model parameters
  • Retraining with best parameters
  • Retrieving mean and standard deviation of CV score
  • Tutorial Code

Introduction

Currently, deep learning is being used in solving a variety of problems, such as image recognition, object detection, text classification, speech recognition (natural language processing), sequence prediction, neural style transfer, text generation, image reconstruction and many more.

It is the technology used behind self-driving cars, speech recognition used in Siri, Alexa or Google, photo tagging on Facebook, song recommendation on Spotify and product recommendation engines. Now even researches are using deep learning to understand complex patterns in data, for example detecting glaucoma in diabetes patients, disaster management (earthquake and flood predictions), new material development, fake news detection, robotics and biomechanics. For better understanding the practical application of deep learning, I will recommend you to watch the YouTube Series “The Age of A.I.”.

There are many tools available to train a deep neural network. For research work, researchers use programming language and libraries/packages to implement such complex models, as it provides more flexibility and one can modify the model as per work requirement. Nowadays training a deep neural network is very easy, thanks to François Chollet fordeveloping Keras deep learning library. Using Keras, one can implement a deep neural network model with few lines of code.

The problem starts when as a researcher you need to find out the best set of hyperparameters that gives you the most accurate model/solution. Manually trying each set of parameters could be very time consuming and exhausting. Here, KerasRegressor class, which act as a wrapper ofscikit-learn’s library in Keras comes as a handy tool for automating the tuning process.

In this article, we will learn step by step, how to tune a Keras deep learning regression model and identify the best set of hyperparameters. Same can be applied for the classification model.

About Dataset

I have a Transportation Engineering (Civil Engineering Domain) background. During my civil engineering Diploma, B.Tech and M.Tech I had performed the Concrete’s Characteristics Compressive Strength test in a laboratory setting. Thus, I thought it would be interesting to model the concrete’s compressive strength using a deep learning model.

Hence, in this article, we are going to use the concrete dataset [1] obtained from the UCI Machine Learning library.

The dataset includes the following variables, which are the ingredients for making durable high strength concrete.

I1: Cement (C1): kg in a m3 mixture
I2: Blast Furnace Slag (C2): kg in a m3 mixture
I3: Fly Ash (C3): kg in a m3 mixture
I4: Water (C4): kg in a m3 mixture
I5: Superplasticizer (C5): kg in a m3 mixture
I6: Coarse Aggregate (C6): kg in a m3 mixture
I7: Fine Aggregate (C7): kg in a m3 mixture
I8: Age: Day (1~365)
O1: Concrete compressive strength: MPa

Where I: Input; O: Output, C: Component; m3: meter cube and MPa: Megapascal.

Before proceeding to the data analysis part, let’s get familiar with the different inputs of the concrete dataset.

Concrete

Concrete is comprised of three basic components: water, aggregate (rock, sand, or gravel) and cement. Cement acts as a binding agent when mixed with water and aggregates.

Compressive Strength

Compressive strength is one of the vital parameters that determine the performance as a construction material. A concrete mix designed to get the required performance and durability for a given construction work/project. The compressive strength of concrete is determined in laboratories in order to maintain the desired quality of concrete during casting. The compressive strength is calculated by dividing the failure load with the area of application of load, usually after 28 days (I8: Age) of the curing period. Though researchers also report strength after 7, 14 and 21 days of curing period. The strength of concrete is achieved by controlling the proportion of cement (C1), fine (C7) and coarse (C6) aggregates, water, and various admixtures. The characteristic compressive strength of concrete fc/ fck is usually reported in MPa (O1). For normal Construction, the characteristic compressive strength can vary from 10 to 60 MPa; while for a certain structure the requirement can go beyond 600 MPa.

Admixture

Nowadays, researchers are using different admixtures to get desired property; the fly ash (C3) is one of them. The fly ash act as an admixture in concrete mixes, which is a pozzolan substance containing aluminous and siliceous material; when mixed with lime and water, forms a compound similar to cement. Fly ash is mixed in concrete as an admixture to improve workability and to reduce permeability and bleeding.

Similarly, the ground granulated blast furnace slag (C2), a mineral admixture is added in concrete to improves its properties such as workability, strength and durability.

Superplasticizers

Superplasticizers (high range water reducers) are used in concrete mixes for making high strength durable concrete. Superplasticizers (C5) are water-soluble organic substances that reduce the amount of water require to achieve certain stability of concrete, reduce the water-cement ratio, reduce cement content and increase slump. Use of superplasticizers reduces the water requirementup to 30% without losing workability.

Aim

The aim of the modelling is to predict the characteristic compressive strength of concrete (regression problem) based on the given input components (cement, blast furnace slag, fly ash, water, superplasticizers, coarse and fine aggregates, and Age).

Here, we will try to find out the best set of hyperparameters that minimizes the loss function to the maximum extend. In other words, we will look for the parameter set that provides the most accurate solution.

Loading relevant libraries

The very first step is to load relevant python libraries

import numpy as np                #for array manipulation
import pandas as pd               #data manipulation
from sklearn import preprocessing #scaling
import keras
from keras.layers import Dense    #for Dense layers
from keras.layers import BatchNormalization #for batch normalization
from keras.layers import Dropout            #for random dropout
from keras.models import Sequential #for sequential implementation
from keras.optimizers import Adam   #for adam optimizer
from keras import regularizers      #for l2 regularization
from keras.wrappers.scikit_learn import KerasRegressor 
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import KFold 
from sklearn.model_selection import cross_val_score

Loading dataset

The next step is to load the data from an excel sheet from your local storage and performing basic exploratory data analysis.

concrete = pd.read_excel('Concrete_Data.xlsx')
concrete.head()
Image for post

Defining input and target data

The next step is to assign the input columns (components) to train_inputs, and output/target column to train_targets variable. We need to convert the data to a NumPy array using .values method before feeding into the neural network model. The dataset includes 1030 observations and 8 columns.

train_inputs = concrete.drop("Comp_str", axis = 1).values
train_targets = concrete["Comp_str"].values
print(train_inputs.shape)
print(train_targets.shape)
Image for post

Data Preprocessing

Standardization of datasets is a common requirement for many machine learning estimators; else they might behave badly if the individual features do not more or less look like standard normally distributed data: Gaussian with zero mean and unit variance. So, the next step is to scale data so that it has zero mean and unit variance.

train_inputs =  preprocessing.scale(train_inputs)

Setting Model Configuration

To perform hyperparameter tuning the first step is to define a function comprised of the model layout of your deep neural network. Here, is the step by step guide for defining the function named create_model.

Step1: The very first step is to define a function create_model where we have initiated default arguments learning_rate = 0.01, and activation function “relu”. Don’t worry these are default and later we will tweak them for tuning purposes.

def create_model(learning_rate = 0.01, activation = 'relu')

Step2: The next step is to set our optimizer. Here we have selected Adam optimizer and initiated with our default argument learning rate value.

# Use Adam optimizer with the given learning rate
opt = Adam(lr = learning_rate)

Step3: The first layer always needs an input shape. Here, the input shape is the number of columns in the training dataset. We extracted the number of columns input using the .shape method and indexing the second value.

n_cols = train_inputs.shape[1]
input_shape = (n_cols, )

Step4: The next step is to define the sequential layout of your model. Here, we used two dense layers of 128 hidden neurons. The activation is set to the default argument i.e. “relu” and we also set an l2 regularization to penalize large weights and to improve representation learning. To make the representation learning more robust we added Dropout layer that drops 50% of the connections randomly.

# Create your binary classification model  
    model = Sequential()
    model.add(Dense(128,
                    activation = activation,
                    input_shape = input_shape,
                    activity_regularizer = regularizers.l2(1e-5)))
    model.add(Dropout(0.50))
    model.add(Dense(128,
                    activation = activation, 
                    activity_regularizer = regularizers.l2(1e-5)))
    model.add(Dropout(0.50))
    model.add(Dense(1, activation = activation))

Step5: The next step is to compile the model. For compilation, we need an optimizer and a loss function. Here we have opted for the Adam optimizer and as this is a regression task hence we opted for “mean_absolute_error” loss function. We choose mae as it is more robust to outlier than mse. To keep track of the other errors we set other two metrics which are mean absolute error (mse) and mean absolute percentage error (mape).

# Compile the model
    model.compile(optimizer = opt,
                  loss = "mean_absolute_error",
                  metrics=['mse', "mape"])
    return model

Here is the overall blueprint of model configuration:

n_cols = train_inputs.shape[1]
input_shape = (n_cols, )
# Creates a model given an activation and learning rate
def create_model(learning_rate = 0.01, activation = 'relu'):
  
    # Create an Adam optimizer with the given learning rate
    opt = Adam(lr=learning_rate)
  
    # Create your binary classification model  
    model = Sequential()
    model.add(Dense(128, 
                    activation = activation,
                    input_shape = input_shape,
                    activity_regularizer = regularizers.l2(1e-5)))
    model.add(Dropout(0.50))
    model.add(Dense(128,
                    activation = activation, 
                    activity_regularizer = regularizers.l2(1e-5)))
    model.add(Dropout(0.50))
    model.add(Dense(1, activation = activation))
# Compile the model
    model.compile(optimizer = opt,
                  loss = "mean_absolute_error",
                  metrics=['mse', "mape"])
    return model

Defining Model Tuning Strategy

The next step is to set the layout for hyperparameter tuning.

Step1: The first step is to create a model object using KerasRegressor from keras.wrappers.scikit_learn by passing the create_model function.We set verbose = 0 to stop showing the model training logs. Similarly, one can use KerasClassifier for tuning a classification model.

# Create a KerasRegressor
model = KerasRegressor(build_fn = create_model,
                       verbose = 0)

tep2: Next step is to define the hyperparameter search space. Here, we will try the following common hyperparameters:

activation function: relu and tanh

batch size: 1632 and 64

epochs: 50 and 100

learning rate: 0.01, 0.001 and 0.0001

# Define the parameters to try out
params = {'activation': ["relu", "tanh"],
          'batch_size': [16, 32, 64], 
          'epochs': [50, 100],
          'learning_rate': [0.01, 0.001, 0.0001]}

Step3: Next we will perform a randomized cross-validation search across the parameter space using RandomizedSearchCV function. We selected the randomized search as it works faster than a grid search. Here, we will perform a 10 fold cross-validation search. For smaller datasets, creating a separate validation dataset costs training data thus, in such scenarios cross-validation technique could be a better model training approach.

random_search = RandomizedSearchCV(model,
                                   param_distributions = params,
                                   cv = KFold(10))

Step4: Next, we will fit the model to our train_inputs and train_targets.

random_search_results = random_search.fit(train_inputs, train_targets)

Here, is the blueprint of overall model tuning layout.

# Create a KerasClassifier object
model = KerasRegressor(build_fn = create_model,
                       verbose = 0)
# Define the hyperparameter space
params = {'activation': ["relu", "tanh"],
          'batch_size': [16, 32, 64], 
          'epochs': [50, 100],
          'learning_rate': [0.01, 0.001, 0.0001]}
# Create a randomize search cv object 
random_search = RandomizedSearchCV(model,
                                   param_distributions = params,
                                   cv = KFold(10))
random_search_results = random_search.fit(train_inputs, train_targets)

Identifying best parameters

The model with the best parameters has achieved a Mean Absolute Error (MAE) of 6.197 (approx.). The best model performance is achieved with a learning rate of 0.001, epochs size of 100, batch_size of 16 and with a relu activation function.

print("Best Score: ",
      random_search_results.best_score_,
      "and Best Params: ",
      random_search_results.best_params_)
Image for post

Why negative score

The actual MAE is simply the positive version of the number we’re getting.

The unified scoring API always maximizes the score, so scores which need to be minimized are negated in order for the unified scoring API to work correctly. The score that is returned is therefore negated when it is a score that should be minimized and left positive if it is a score that should be maximized. You can read more about this here.

Re-evaluating Model with the Best Parameter Set

The next task is to refit the model with the best parameters i.e., learning rate of 0.001, epochs size of 100, batch_size of 16 and with a relu activation function. Here, we compute the mean and standard deviation of the 10-fold cross-validation score to see the variation in output loss.

n_cols = train_inputs.shape[1]
input_shape = (n_cols, )
# Create the model object with default arguments
def create_model(learning_rate = 0.001, activation='relu'):
  
    # Set Adam optimizer with the given learning rate
    opt = Adam(lr = learning_rate)
  
    # Create your binary classification model  
    model = Sequential()
    model.add(Dense(128,
                    activation = activation,
                    input_shape = input_shape,
                    activity_regularizer = regularizers.l2(1e-5)))
    model.add(Dropout(0.50))
    model.add(Dense(128,
                    activation = activation, 
                    activity_regularizer = regularizers.l2(1e-5)))
    model.add(Dropout(0.50))
    model.add(Dense(1, activation = activation))
    # Compile the model
    model.compile(optimizer = opt,
                  loss = "mean_absolute_error",
                  metrics = ['mse', "mape"])
    return model

Retrieving mean K-Fold score and standard deviation

Here, we again use the KerasRegressor to create the model object and calculate the accuracy score for each fold using the cross_val_score( ) function.

The result revealed that with the best parameters, the 10-fold CV model has achieved a mean value of Mean Absolute Error (MAE) of 6.269 (approx.) and a standard deviation of 1.799.

 Create a KerasClassifier
model = KerasRegressor(build_fn = create_model,
                       epochs = 100, 
                       batch_size = 16,
                       verbose = 0)
# Calculate the accuracy score for each fold
kfolds = cross_val_score(model,
                         train_inputs,
                         train_targets,
                         cv = 10)
# Print the mean accuracy
print('The mean accuracy was:', kfolds.mean())
# Print the accuracy standard deviation
print('With a standard deviation of:', kfolds.std())
Image for post

Now, you have an approximate idea of the best set of parameters that would give you the most accurate solution. Next, you can use this set of hyperparameters to train a model and test on the unseen dataset to see whether the model generalizes on the unseen dataset.

Note

Keras based hyperparameter search is very very resource and time-consuming. This model training took more than 1 hour in my local machine (i7, 16 GB RAM), even after using NVIDIA GPU. The hyperparameter tuning froze my PC several times. So, keep patience while tuning this type of large computationally expensive models. It is better to perform such task using cloud-based services.

I hope you learned something new from this blog.

Click here for the code

Image Credit: Photo by Muukii on Unsplash

References

[1] I-Cheng Yeh, “Modeling of the strength of high-performance concrete using artificial neural networks,” Cement and Concrete Research, Vol. 28, №12, pp. 1797–1808 (1998).