Data 621 Blog 4

Generalised Linear Models

The basic linear regression model is a linear mapping from P-dimensional input features (or covariates) x, to a set of targets (or responses) y, using a set of weights (or regression coefficients) ?? and a bias (offset) ??0 . The outputs can also by multivariate, but I’ll assume they are scalar here. The full probabilistic model assumes that the outputs are corrupted by Gaussian noise of unknown variance ??². You can also embed plots, for example:

\({\eta = \beta^\top x + \beta_0}\)

\({y = \eta+\epsilon \qquad \epsilon \sim \mathcal{N}(0,\sigma^2)}\)

n this formulation, \({\eta}\) is the systematic component of the model and \({\epsilon}\) is the random component. Generalised linear models (GLMs)[2] allow us to extend this formulation to problems where the distribution on the targets is not Gaussian but some other distribution (typically a distribution in the exponential family). In this case, we can write the generalised regression problem, combining the coefficients and bias for more compact notation, as:

\({\eta = \beta^\top x, \qquad \beta=[\hat \beta, \beta_0], x = [\hat{x}, 1]}\)

\({\mathbb{E}[y] = \mu = g^{-1}(\eta)}\)

where g(·) is the link function that allows us to move from natural parameters \({\eta}\) to mean parameters \({\mu}\). If the inverse link function used in the definition of \({\mu}\) above were the logistic sigmoid, then the mean parameters correspond to the probabilities of y being a 1 or 0 under the Bernoulli distribution.

There are many link functions that allow us to make other distributional assumptions for the target (response) y. In deep learning, the link function is referred to as the activation function and I list in the table below the names for these functions used in the two fields. From this table we can see that many of the popular approaches for specifying neural networks that have counterparts in statistics and related literatures under (sometimes) very different names, such multinomial regression in statistics and softmax classification in deep learning, or rectifier in deep learning and tobit models is statistics.

Recursive Generalised Linear Models

Constructing a recursive GLM or deep deep feedforward neural network using the linear predictor as the basic building block.

GLMS have a simple form: they use a linear combination of the input using weights ??, and pass this result through a simple non-linear function. In deep learning, this basic building block is called a layer. It is easy to see that such a building block can be easily repeated to form more complex, hierarchical and non-linear regression functions. This recursive application of the basic regression building block is why models in deep learning are described as having multiple layers and are described as deep.

If an arbitrary regression function h, for layer l, with linear predictor ??, and inverse link or activation function f, is specified as:

\({h_l(x) = f_l(\eta_l)}\)

then we can easily specify a recursive GLM by iteratively applying or composing this basic building block:

\({\mathbb{E}[y] = \mu_L = h_L \circ \ldots \circ h_1 \circ h_o(x)}\)

This composition is exactly the specification of an L-layer deep neural network model. There is no mystery in such a construction (and hence in feedforward neural networks) and the utility of such a model is easy to see, since it allows us to extend the power of our regressors far beyond what is possible using only linear predictors.

This form also shows that recursive GLMs and neural networks are one way of performing basis function regression. What such a formulation adds is a specific mechanism by which to specify the basis functions: by application of recursive linear predictors.

Data 621 Blog 4

MKunissery

May 20, 2019

Deep Learning - Recursive GLM

Generalised Linear Models

Recursive Generalised Linear Models