1 Multinomial Logistic Regression

Multinomial Logistic Regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes.

Multinomial logistic regression is known by a variety of other names, including polytomous LR, multiclass LR, softmax regression, multinomial logit (mod), the maximum entropy (MaxEnt) classifier, and the conditional maximum entropy model.

In multinomial logistic regression the objective is to model the probability \(\pi_j\) to belong to a specific group \(j=1, 2, \ldots, g\) using covariates \(\boldsymbol{x}\).

Usually we define a reference group, suppose that we choose the group number 1, in this case, we model the logit of \(\pi_j\) as:

\[ \text{logit}(\pi_j) = \log \left( \frac{\pi_j}{\pi_1} \right) = \boldsymbol{x}^\top \boldsymbol{\beta}_j, \quad \text{for} \, j=2, \ldots, g \]

After fitting the model we will obtain \(g-1\) estimated vectors \(\hat{\boldsymbol{\beta}}_j\), not \(g\) vector. Using thesde vectors we can express the estimated probability \(\hat{\pi}_j\) as:

\[ \hat{\pi}_1 = \frac{1}{1+\sum_{j=2}^{j=g} \exp(\boldsymbol{x}^\top \hat{\boldsymbol{\beta}}_j)}, \, \text{only for} \, j=1 \]

\[ \hat{\pi}_j = \frac{\exp(\boldsymbol{x}^\top \hat{\boldsymbol{\beta}}_j)}{1+\sum_{j=2}^{j=g} \exp(\boldsymbol{x}^\top \hat{\boldsymbol{\beta}}_j)}, \, \text{for} \, j=2,\ldots,g \]

2 Titanic dataset

Here we are goingo to use the titanic_train and titanic_test datasets from titanic package.

library(titanic)
d1 <- titanic_train
d1 <- subset(d1, d1$Embarked != "") # delete rows where Embarked didn't have any value

We are going to use Pclass as my dependent variable.

tabla <- prop.table(table(d1$Pclass))
barplot(tabla, las=1, col="white")

The objective is to model

\[P(Pclass) = f(Sex, \, Embarked, \, SibSp, \, Parch)\]

where

  • Sex: sex.
  • Embarked: Port of Embarkation.
  • SibSp: Number of Siblings (hermanos).
  • Parch: Number of Parents (padres).

Some part of the data is given below:

d1[1:10, c("Pclass", "Sex", "Embarked", "SibSp", "Parch")]
##    Pclass    Sex Embarked SibSp Parch
## 1       3   male        S     1     0
## 2       1 female        C     1     0
## 3       3 female        S     0     0
## 4       1 female        S     1     0
## 5       3   male        S     0     0
## 6       3   male        Q     0     0
## 7       1   male        S     0     0
## 8       3   male        S     3     1
## 9       3 female        S     0     2
## 10      2 female        C     1     0

To explore the data structure we can use:

str(d1[, c("Pclass", "Sex", "Embarked", "SibSp", "Parch")])
## 'data.frame':    889 obs. of  5 variables:
##  $ Pclass  : int  3 1 3 1 3 3 1 3 3 2 ...
##  $ Sex     : chr  "male" "female" "female" "female" ...
##  $ Embarked: chr  "S" "C" "S" "S" ...
##  $ SibSp   : int  1 1 0 1 0 0 0 3 0 1 ...
##  $ Parch   : int  0 0 0 0 0 0 0 1 2 0 ...
table(d1$Pclass)
## 
##   1   2   3 
## 214 184 491
table(d1$Sex)
## 
## female   male 
##    312    577
table(d1$Embarked)
## 
##   C   Q   S 
## 168  77 644
table(d1$Sex)
## 
## female   male 
##    312    577

Some variables must be convert properly.

d1$Pclass <- factor(d1$Pclass, levels=c("1", "2", "3"))
d1$Sex <- factor(d1$Sex, levels=c("female", "male"))
d1$Embarked <- factor(d1$Embarked)

3 Fitting the model

There are many R packages to fit this model but here we are going to use nnet package.

library(nnet)
formula <- "Pclass ~ Sex + Embarked + SibSp + Parch"
mod <- multinom(formula, data=d1)
## # weights:  21 (12 variable)
## initial  value 976.666325 
## iter  10 value 812.978209
## final  value 806.250967 
## converged

The summary can be found using:

output <- summary(mod)
print(output)
## Call:
## multinom(formula = formula, data = d1)
## 
## Coefficients:
##   (Intercept)    Sexmale EmbarkedQ EmbarkedS       SibSp      Parch
## 2  -1.5642365 -0.0698696  2.004449  1.876297 -0.07211907 0.04548861
## 3  -0.8142428  0.7559771  3.978519  1.164160  0.19712980 0.11427709
## 
## Std. Errors:
##   (Intercept)   Sexmale EmbarkedQ EmbarkedS     SibSp     Parch
## 2   0.3006577 0.2238023 0.9523768 0.2922098 0.1343888 0.1464731
## 3   0.2159193 0.1920332 0.7399698 0.1972837 0.1058155 0.1258582
## 
## Residual Deviance: 1612.502 
## AIC: 1636.502

From the last summary we can write the expression as:

\[\begin{aligned} \log \bigg(\frac{P(Pclass = 2)}{P(Pclass = 1)}\bigg) = &- 1.5642365 - 0.0698696\cdot Sexmale + 2.004449\cdot EmbarkedQ \\&+ 1.876297\cdot EmbarkedS - 0.07211907\cdot SibSp + 0.04548861\cdot Parch \end{aligned}\] \[\begin{aligned} \log \bigg(\frac{P(Pclass = 3)}{P(Pclass = 1)}\bigg) = &- 0.8142428 + 0.7559771\cdot Sexmale + 3.978519\cdot EmbarkedQ \\&+ 1.164160\cdot EmbarkedS + 0.19712980\cdot SibSp + 0.11427709\cdot Parch \end{aligned}\]

We can obtain the estimated probability for each class as:

p_hat <- fitted(mod)
head(round(p_hat, digits=2))
##      1    2    3
## 1 0.17 0.20 0.63
## 2 0.58 0.11 0.31
## 3 0.26 0.36 0.37
## 4 0.25 0.32 0.43
## 5 0.19 0.24 0.57
## 6 0.02 0.03 0.95

What would be the Pclass for a female, embarked at Q with 2 iblings and 2 parents.

newdata <- data.frame(Sex="female", Embarked="Q", SibSp=2, Parch=2)
predict(object=mod, newdata=newdata, type="probs")
##          1          2          3 
## 0.02145909 0.03159820 0.94694271
predict(object=mod, newdata=newdata, type="class")
## [1] 3
## Levels: 1 2 3

The confusion matrix can be obtained as:

confusion <- table(Predicted=predict(object=mod, type="class"),
                   True=d1$Pclass)
confusion
##          True
## Predicted   1   2   3
##         1  66  11  56
##         2   0   0   0
##         3 148 173 435

Upps, something is wrong!!!

4 Homework

To do the homework you have a limited time, press the button below to start.

30:00
  1. Calculate the accuracy for the last confusion table.
  2. Create a new model with other covariates to improve the classification model.
  3. Obtain the new confusion matrix.
  4. Consult other R packages to fit multinomial regression models.