March 22, 2017

Outline

  • Introduction of Keras
  • Model Customization
    • Callbacks
    • Data Generator
    • Some Well-known Models
    • Multi-Task

Introduction of Keras

Keras: Deep Learning Library for Theano and TensorFlow

  • https://keras.io/
  • Easy and fast prototyping
  • Supports CNN and RNN
  • Runs on CPU and GPU

Features

  • High level modeling
    • Requires loss and structure
    • Auto generated gradients and built-in stochastic optimization
    • Built-in lots of deep learning structure such as Activation Functions, Dropout, BatchNormalization, LSTM…
  • Integration with both Dense and Sparse data structure
    • Dense: numpy array
    • Sparse: scipy sparse matrix

Environment

  • Install Python3
  • Install Keras via pip
  • Check Keras version
# pip install keras
python -c "import keras; print keras.__version__"
## Using Theano backend.
## 1.1.2
# pip install keras
python3 -c "import keras; print (keras.__version__)"
## Using Theano backend.
## 1.2.1

Configure Keras

  • Use theano backend
  • Modify ~/.keras/keras.json
cat ~/.keras/keras.json
## {
##     "image_dim_ordering": "tf", 
##     "epsilon": 1e-07, 
##     "floatx": "float32", 
##     "backend": "theano"
## }

Sequential Model

from keras.models import Sequential
from keras.layers import Dense, Activation

model = Sequential([
    Dense(11, input_dim=21),
    Activation('relu'),
    Dense(5),
    Activation('softmax'),
])

MNIST example

from keras.datasets import mnist
 
# Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
print(X_train.shape)
from matplotlib import pyplot as plt
fig, ax = plt.subplots(nrows = 1, ncols = 1)
ax.imshow(X_train[0])
fig.savefig("X_train_1.png")
## Using Theano backend.
## (60000, 28, 28)

MNIST example

Toy Example

from keras.datasets import mnist
from keras.utils import np_utils
# Load pre-shuffled MNIST data into train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28)
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
Y_train = np_utils.to_categorical(y_train, 10)
Y_test = np_utils.to_categorical(y_test, 10)
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
model = Sequential([
  Flatten(input_shape = (1, 28, 28)),
  Dense(11, activation = "relu"),
  Dense(10, activation = "softmax")
])

Training Toy

model.compile(loss = "categorical_crossentropy", optimizer = "adam", metric = ["accuracy"])
model.fit(X_train, Y_train, batch_size=32, nb_epoch=10, verbose=1)
score = model.evaluate(X_test, Y_test, verbose=0)

Execution

Summary of Toy Example

  • Writing Keras Model is Easy
  • There are built-in State-of-the-art SGD based Optimizer
  • The tools of training is comprehensive

Model Customization

Logistic Regression

from keras.models import Sequential
from keras.layers import Dense, Activation, Input, Dropout
from keras.models import Model

input = Input(shape = (1024,), name = "input")
ctr_output = Dense(1, activation = "sigmoid", name = "ctr_output")(input)
model = Model(input = [input], output = [ctr_output])
## Using Theano backend.

Customize Objective Function

import theano
import keras.backend as K
def binary_crossentropy_skip_nan(y_true, y_pred):
  isnan = theano.tensor.isnan
  switch = K.switch
  return K.sum(switch(isnan(y_true), 0, theano.tensor.nnet.binary_crossentropy(y_pred, y_true)))
model.compile(optimizer = adam, loss = binary_crossentropy_skip_nan)

Sparse Input

  • Use sparse matrix from scipy
  • Write the sparse matrix from R to the disk with mm format, from scipy.io import mmread
  • Read the sparse matrix from the disk with scipy.mmread then convert it to csr format: mmread(path).tocsr()
from keras.models import Sequential
from keras.layers import Dense, Activation, Input, Dropout
from keras.models import Model

input = Input(shape = (1024,), name = "input", sparse = True)
ctr_output = Dense(1, activation = "sigmoid", name = "ctr_output")(input)
model = Model(input = [input], output = [ctr_output])

Callbacks

Validation with AUC

  • Keras supports Training-Validation-Testing procedure
  • There is a Callback class to do validation after batch/epoch
import numpy
from sklearn import metrics
from scipy.stats import linregress
from keras.callbacks import Callback

class LogAUC(Callback):
  def __init__(self, name, X, y, index = None, sparse = False):
    self.name = name
    self.X = X
    self.y = y
    self.index = index
    self.sparse = sparse
  # ...
## Using Theano backend.

Methods of Callback

  • on_epoch_begin appends metrics which could also be specified in model.compile
  def on_epoch_begin(self, epoch, logs = None):
    logs = logs or {}
    if self.name not in self.params['metrics']:
      self.params['metrics'].append(self.name)

Methods of Callback

  • on_epoch_end calculates the AUC
  def on_epoch_end(self, epoch, logs = None):
    logs = logs or {}
    logs[self.name] = 0
    pred = self.model.predict(self.X, verbose = 0)
    if type(pred) is not list:
      pred = [pred]
    else:
      y_true = self.y[self.index]
      y_pred = pred[self.index]
      index = numpy.isnan(y_true) == False
      auc = metrics.roc_auc_score(y_true[index], y_pred[index])
      logs[self.name] = auc

Add Callbacks to Training

callbacks = []
callbacks.append(
  LogAUC(LogAUC_named[key] + "auc", input_data[key], output_data[key], index = 0)
)
# ...
model.fit(input_data, output_data, nb_epoch = NB_EPOCH, batch_size = BATCH_SIZE, 
  verbose = 1, 
  validation_data = validation_data, callbacks = callbacks, shuffle = False)

Result

EarlyStopping

  • Stop training if the loss stops enhancing on the validation dataset for several epochs
early_stopping = EarlyStopping(monitor = "val_loss", patience = 5, mode = "min")
callbacks.append(early_stopping)

Model Checkpoint

  • Save the best model so far to the disk
model_checkpoint = ModelCheckpoint(check_path, monitor = "val_ctr_output_loss", mode = "min", save_best_only = True)
callbacks.append(model_checkpoint)

Analyize the History

  • Acquire the training history for advanced analyzing
history = model.fit(...)
json.dump(history, sys.stdout, sort_keys = True)

Data Generator

Fit Generator

fit_generator(self, generator, steps_per_epoch, epochs=1, verbose=1, 
  callbacks=None, validation_data=None, validation_steps=None, 
  class_weight=None, max_q_size=10, workers=1, pickle_safe=False, initial_epoch=0)

Producing Generator

  • A function which yield batched data
  • batch_size is replaced by the size of yielded data
def batch_generator_input_output(X, y, batch_size):
  if len(X) != 1 :
    raise RuntimeError("len(X) != 1")
  X = X[0]
  number_of_batch = X.shape[0] // batch_size
  if X.shape[0] % batch_size != 0 :
    number_of_batch = number_of_batch + 1
  while True :
    for i in range(number_of_batch):
      index_batch = range(i * batch_size, min(X.shape[0], (i + 1) * batch_size))
      X_batch = X[index_batch,:].toarray()
      y_batch = (lambda i=index_batch,y=y : [y_ele[i] for y_ele in y])()
    #   print("generate " + str(X_batch.shape[0]) + " data to fit. " + str(i) + "," + str(number_of_batch))
  yield(X_batch, y_batch)

Similar for Prediction

  • yeild covariates only
def batch_generator_input(X, batch_size):
  if len(X) != 1 :
    raise RuntimeError("len(X) != 1")
  X = X[0]
  number_of_batch = X.shape[0] // batch_size
  if X.shape[0] % batch_size != 0 :
    number_of_batch = number_of_batch + 1
  while True :
    for i in range(number_of_batch):
      index_batch = range(i * batch_size, min(X.shape[0], (i + 1) * batch_size))
      X_batch = X[index_batch,:].toarray()
    #   print("generate " + str(X_batch.shape[0]) + " data to fit. " + str(i) + "," + str(number_of_batch))
  yield(X_batch)

Some Well-known Models

Implementation becomes Easy

  • Only requires loss function itself without gradient
  • Rich built-in layers and APIs

NN with Dropout

input = Input(shape = (input_dim,), name = "input", sparse = sparse)
x1 = Dense(int({{Layer1NNode}}), activation = "relu", W_regularizer = l2({{WRegularization01}}), activity_regularizer = activity_l2({{ActivityRegularization01}}), name = "layer1")(input)
x1 = Dropout({{DropoutP}}, name = "dropout")(x1)
ctr_output = Dense(1, activation = "sigmoid", W_regularizer = l2({{WRegularization12}}), activity_regularizer = activity_l2({{ActivityRegularization12}}) , name = "ctr_output")(x1)
model = Model(input = [input], output = [ctr_output])
weight_named = {
  "ctr" : 1.0,
  "wr" : 0.1,
  "wp" : 0.1
}
learning_rate = {
  "lr" : {{LearningRate}}
}
[model, weight_named, learning_rate]

Factorization Machine

https://github.com/fchollet/keras/issues/4959

# customized layer
    def call(self, x, mask=None):
        output = K.sum(K.square(K.dot(x, self.V)) - K.dot(K.square(x), K.square(self.V)), 2)/2
        output += K.dot(x, self.W)
        if self.bias:
            output += self.b
        return self.activation(output)

LSTM

https://keras.io/layers/recurrent/

keras.layers.recurrent.LSTM(units, activation='tanh', recurrent_activation='hard_sigmoid', 
  use_bias=True, kernel_initializer='glorot_uniform', recurrent_initializer='orthogonal', 
  bias_initializer='zeros', unit_forget_bias=True, kernel_regularizer=None, recurrent_regularizer=None, 
  bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, recurrent_constraint=None, 
  bias_constraint=None, dropout=0.0, recurrent_dropout=0.0)

Multi-Task

Model Functional API

from keras.layers import Dense, Activation, Input, Dropout, merge
from keras.models import Model
from keras.regularizers import l2, activity_l2

input = Input(shape = (1024,), name = "input")
x1 = Dense(100, activation = "relu", W_regularizer = l2(0.1), name = "layer1")(input)
x1 = Dropout(0.5, name = "dropout")(x1)
price_output = Dense(1, activation = "linear", W_regularizer = l2(0.1), name = "wp_output")(x1)
x2 = merge([x1, price_output], mode = "concat")
ctr_output = Dense(1, activation = "sigmoid", W_regularizer = l2(0.1), name = "ctr_output")(x2)
model = Model(input = [input], output = [ctr_output, price_output])

Model Functional API

Data Structure

  • Input is list of numpy array, scipy sparse matrix
  • Output is list of numpy array, scipy sparse matrix
  • Loss: list of functions or string
  • Loss Weight: dictionary of weights
  • Validation / Testing dataset: similarly

Q&A