Introduction of Keras

March 22, 2017

Outline

Introduction of Keras
Model Customization
- Callbacks
- Data Generator
- Some Well-known Models
- Multi-Task

Introduction of Keras

Keras: Deep Learning Library for Theano and TensorFlow

https://keras.io/
Easy and fast prototyping
Supports CNN and RNN
Runs on CPU and GPU

Features

High level modeling
- Requires loss and structure
- Auto generated gradients and built-in stochastic optimization
- Built-in lots of deep learning structure such as Activation Functions, Dropout, BatchNormalization, LSTM…
Integration with both Dense and Sparse data structure
- Dense: numpy array
- Sparse: scipy sparse matrix

Environment

Install Python3
Install Keras via pip
Check Keras version

# pip install keras
python -c "import keras; print keras.__version__"

## Using Theano backend.
## 1.1.2

# pip install keras
python3 -c "import keras; print (keras.__version__)"

## Using Theano backend.
## 1.2.1

Configure Keras

Use theano backend
Modify ~/.keras/keras.json

cat ~/.keras/keras.json

## {
##     "image_dim_ordering": "tf", 
##     "epsilon": 1e-07, 
##     "floatx": "float32", 
##     "backend": "theano"
## }

Sequential Model

from keras.models import Sequential
from keras.layers import Dense, Activation

model = Sequential([
    Dense(11, input_dim=21),
    Activation('relu'),
    Dense(5),
    Activation('softmax'),
])

MNIST example

from keras.datasets import mnist
 
# Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
print(X_train.shape)
from matplotlib import pyplot as plt
fig, ax = plt.subplots(nrows = 1, ncols = 1)
ax.imshow(X_train[0])
fig.savefig("X_train_1.png")

## Using Theano backend.
## (60000, 28, 28)

MNIST example

Toy Example

from keras.datasets import mnist
from keras.utils import np_utils
# Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28)
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
model = Sequential([
  Flatten(input_shape = (1, 28, 28)),
  Dense(11, activation = "relu"),
  Dense(10, activation = "softmax")
])

Training Toy

model.compile(loss = "categorical_crossentropy", optimizer = "adam", metric = ["accuracy"])
model.fit(X_train, Y_train, batch_size=32, nb_epoch=10, verbose=1)
score = model.evaluate(X_test, Y_test, verbose=0)

Execution

Summary of Toy Example

Writing Keras Model is Easy
There are built-in State-of-the-art SGD based Optimizer
The tools of training is comprehensive

Model Customization

Logistic Regression

from keras.models import Sequential
from keras.layers import Dense, Activation, Input, Dropout
from keras.models import Model

input = Input(shape = (1024,), name = "input")
ctr_output = Dense(1, activation = "sigmoid", name = "ctr_output")(input)
model = Model(input = [input], output = [ctr_output])

## Using Theano backend.

Customize Objective Function

import theano
import keras.backend as K
def binary_crossentropy_skip_nan(y_true, y_pred):
  isnan = theano.tensor.isnan
  switch = K.switch
  return K.sum(switch(isnan(y_true), 0, theano.tensor.nnet.binary_crossentropy(y_pred, y_true)))
model.compile(optimizer = adam, loss = binary_crossentropy_skip_nan)

Sparse Input

Use sparse matrix from scipy
Write the sparse matrix from R to the disk with mm format, from scipy.io import mmread
Read the sparse matrix from the disk with scipy.mmread then convert it to csr format: mmread(path).tocsr()

from keras.models import Sequential
from keras.layers import Dense, Activation, Input, Dropout
from keras.models import Model

input = Input(shape = (1024,), name = "input", sparse = True)
ctr_output = Dense(1, activation = "sigmoid", name = "ctr_output")(input)
model = Model(input = [input], output = [ctr_output])

Callbacks

Validation with AUC

Keras supports Training-Validation-Testing procedure
There is a Callback class to do validation after batch/epoch

import numpy
from sklearn import metrics
from scipy.stats import linregress
from keras.callbacks import Callback

class LogAUC(Callback):
  def __init__(self, name, X, y, index = None, sparse = False):
    self.name = name
    self.X = X
    self.y = y
    self.index = index
    self.sparse = sparse
  # ...

## Using Theano backend.

Methods of Callback

on_epoch_begin appends metrics which could also be specified in model.compile

  def on_epoch_begin(self, epoch, logs = None):
    logs = logs or {}
    if self.name not in self.params['metrics']:
      self.params['metrics'].append(self.name)

Methods of Callback

on_epoch_end calculates the AUC

  def on_epoch_end(self, epoch, logs = None):
    logs = logs or {}
    logs[self.name] = 0
    pred = self.model.predict(self.X, verbose = 0)
    if type(pred) is not list:
      pred = [pred]
    else:
      y_true = self.y[self.index]
      y_pred = pred[self.index]
      index = numpy.isnan(y_true) == False
      auc = metrics.roc_auc_score(y_true[index], y_pred[index])
      logs[self.name] = auc

Add Callbacks to Training

callbacks = []
callbacks.append(
  LogAUC(LogAUC_named[key] + "auc", input_data[key], output_data[key], index = 0)
)
# ...
model.fit(input_data, output_data, nb_epoch = NB_EPOCH, batch_size = BATCH_SIZE, 
  verbose = 1, 
  validation_data = validation_data, callbacks = callbacks, shuffle = False)

Result

EarlyStopping

Stop training if the loss stops enhancing on the validation dataset for several epochs

early_stopping = EarlyStopping(monitor = "val_loss", patience = 5, mode = "min")
callbacks.append(early_stopping)

Model Checkpoint

Save the best model so far to the disk

model_checkpoint = ModelCheckpoint(check_path, monitor = "val_ctr_output_loss", mode = "min", save_best_only = True)
callbacks.append(model_checkpoint)

Analyize the History

Acquire the training history for advanced analyzing

history = model.fit(...)
json.dump(history, sys.stdout, sort_keys = True)

Data Generator

Fit Generator

fit_generator(self, generator, steps_per_epoch, epochs=1, verbose=1, 
  callbacks=None, validation_data=None, validation_steps=None, 
  class_weight=None, max_q_size=10, workers=1, pickle_safe=False, initial_epoch=0)

Producing Generator

A function which yield batched data
batch_size is replaced by the size of yielded data

def batch_generator_input_output(X, y, batch_size):
  if len(X) != 1 :
    raise RuntimeError("len(X) != 1")
  X = X[0]
  number_of_batch = X.shape[0] // batch_size
  if X.shape[0] % batch_size != 0 :
    number_of_batch = number_of_batch + 1
  while True :
    for i in range(number_of_batch):
      index_batch = range(i * batch_size, min(X.shape[0], (i + 1) * batch_size))
      X_batch = X[index_batch,:].toarray()
      y_batch = (lambda i=index_batch,y=y : [y_ele[i] for y_ele in y])()
    #   print("generate " + str(X_batch.shape[0]) + " data to fit. " + str(i) + "," + str(number_of_batch))
  yield(X_batch, y_batch)

Similar for Prediction

yeild covariates only

def batch_generator_input(X, batch_size):
  if len(X) != 1 :
    raise RuntimeError("len(X) != 1")
  X = X[0]
  number_of_batch = X.shape[0] // batch_size
  if X.shape[0] % batch_size != 0 :
    number_of_batch = number_of_batch + 1
  while True :
    for i in range(number_of_batch):
      index_batch = range(i * batch_size, min(X.shape[0], (i + 1) * batch_size))
      X_batch = X[index_batch,:].toarray()
    #   print("generate " + str(X_batch.shape[0]) + " data to fit. " + str(i) + "," + str(number_of_batch))
  yield(X_batch)

Some Well-known Models

Implementation becomes Easy

Only requires loss function itself without gradient
Rich built-in layers and APIs

NN with Dropout

input = Input(shape = (input_dim,), name = "input", sparse = sparse)
x1 = Dense(int({{Layer1NNode}}), activation = "relu", W_regularizer = l2({{WRegularization01}}), activity_regularizer = activity_l2({{ActivityRegularization01}}), name = "layer1")(input)
x1 = Dropout({{DropoutP}}, name = "dropout")(x1)
ctr_output = Dense(1, activation = "sigmoid", W_regularizer = l2({{WRegularization12}}), activity_regularizer = activity_l2({{ActivityRegularization12}}) , name = "ctr_output")(x1)
model = Model(input = [input], output = [ctr_output])
weight_named = {
  "ctr" : 1.0,
  "wr" : 0.1,
  "wp" : 0.1
}
learning_rate = {
  "lr" : {{LearningRate}}
}
[model, weight_named, learning_rate]

Factorization Machine

https://github.com/fchollet/keras/issues/4959

# customized layer
    def call(self, x, mask=None):
        output = K.sum(K.square(K.dot(x, self.V)) - K.dot(K.square(x), K.square(self.V)), 2)/2
        output += K.dot(x, self.W)
        if self.bias:
            output += self.b
        return self.activation(output)

LSTM

https://keras.io/layers/recurrent/

keras.layers.recurrent.LSTM(units, activation='tanh', recurrent_activation='hard_sigmoid', 
  use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', 
  bias_initializer='zeros', unit_forget_bias=True, kernel_regularizer=None, recurrent_regularizer=None, 
  bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, 
  bias_constraint=None, dropout=0.0, recurrent_dropout=0.0)

Multi-Task

Model Functional API

from keras.layers import Dense, Activation, Input, Dropout, merge
from keras.models import Model
from keras.regularizers import l2, activity_l2

input = Input(shape = (1024,), name = "input")
x1 = Dense(100, activation = "relu", W_regularizer = l2(0.1), name = "layer1")(input)
x1 = Dropout(0.5, name = "dropout")(x1)
price_output = Dense(1, activation = "linear", W_regularizer = l2(0.1), name = "wp_output")(x1)
x2 = merge([x1, price_output], mode = "concat")
ctr_output = Dense(1, activation = "sigmoid", W_regularizer = l2(0.1), name = "ctr_output")(x2)
model = Model(input = [input], output = [ctr_output, price_output])

Model Functional API

Data Structure

Input is list of numpy array, scipy sparse matrix
Output is list of numpy array, scipy sparse matrix
Loss: list of functions or string
Loss Weight: dictionary of weights
Validation / Testing dataset: similarly

Outline

Introduction of Keras

Keras: Deep Learning Library for Theano and TensorFlow

Features

Environment

Configure Keras

Sequential Model

MNIST example

MNIST example

Toy Example

Training Toy

Execution

Summary of Toy Example

Model Customization

Logistic Regression

Customize Objective Function

Sparse Input

Callbacks

Validation with AUC

Methods of Callback

Methods of Callback

Add Callbacks to Training

Result

EarlyStopping

Model Checkpoint

Analyize the History

Data Generator

Fit Generator

Producing Generator

Similar for Prediction

Some Well-known Models

Implementation becomes Easy

NN with Dropout

Factorization Machine

LSTM

Multi-Task

Model Functional API

Model Functional API

Data Structure

Q&A