Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables. Type of questions that a binary logistic regression can examine: How does the probability of getting lung cancer (yes vs. no) change for every additional pound a person is overweight and for every pack of cigarettes smoked per day? Do body weight, calorie intake, fat intake, and age have an influence on the probability of having a heart attack (yes vs. no)?
# load the package
library(VGAM)
## Warning: package 'VGAM' was built under R version 3.4.4
## Loading required package: stats4
## Loading required package: splines
# load data
data(iris)
# fit model
fit <- vglm(Species~., family=multinomial, data=iris)
## Warning in checkwz(wz, M = M, trace = trace, wzepsilon = control
## $wzepsilon): 2 diagonal elements of the working weights variable 'wz' have
## been replaced by 1.819e-12
## Warning in checkwz(wz, M = M, trace = trace, wzepsilon = control
## $wzepsilon): 13 diagonal elements of the working weights variable 'wz' have
## been replaced by 1.819e-12
## Warning in checkwz(wz, M = M, trace = trace, wzepsilon = control
## $wzepsilon): 22 diagonal elements of the working weights variable 'wz' have
## been replaced by 1.819e-12
## Warning in checkwz(wz, M = M, trace = trace, wzepsilon = control
## $wzepsilon): 34 diagonal elements of the working weights variable 'wz' have
## been replaced by 1.819e-12
## Warning in checkwz(wz, M = M, trace = trace, wzepsilon = control
## $wzepsilon): 39 diagonal elements of the working weights variable 'wz' have
## been replaced by 1.819e-12
## Warning in checkwz(wz, M = M, trace = trace, wzepsilon = control
## $wzepsilon): 41 diagonal elements of the working weights variable 'wz' have
## been replaced by 1.819e-12
## Warning in checkwz(wz, M = M, trace = trace, wzepsilon = control
## $wzepsilon): 47 diagonal elements of the working weights variable 'wz' have
## been replaced by 1.819e-12
## Warning in checkwz(wz, M = M, trace = trace, wzepsilon = control
## $wzepsilon): 50 diagonal elements of the working weights variable 'wz' have
## been replaced by 1.819e-12
## Warning in checkwz(wz, M = M, trace = trace, wzepsilon = control
## $wzepsilon): 54 diagonal elements of the working weights variable 'wz' have
## been replaced by 1.819e-12
## Warning in checkwz(wz, M = M, trace = trace, wzepsilon = control
## $wzepsilon): 59 diagonal elements of the working weights variable 'wz' have
## been replaced by 1.819e-12
## Warning in checkwz(wz, M = M, trace = trace, wzepsilon = control
## $wzepsilon): 63 diagonal elements of the working weights variable 'wz' have
## been replaced by 1.819e-12
## Warning in checkwz(wz, M = M, trace = trace, wzepsilon = control
## $wzepsilon): 78 diagonal elements of the working weights variable 'wz' have
## been replaced by 1.819e-12
## Warning in checkwz(wz, M = M, trace = trace, wzepsilon = control
## $wzepsilon): 91 diagonal elements of the working weights variable 'wz' have
## been replaced by 1.819e-12
## Warning in checkwz(wz, M = M, trace = trace, wzepsilon = control
## $wzepsilon): 101 diagonal elements of the working weights variable 'wz'
## have been replaced by 1.819e-12
## Warning in slot(family, "linkinv")(eta, extra = extra): fitted
## probabilities numerically 0 or 1 occurred
## Warning in tfun(mu = mu, y = y, w = w, res = FALSE, eta = eta, extra =
## extra): fitted values close to 0 or 1
## Warning in slot(family, "linkinv")(eta, extra = extra): fitted
## probabilities numerically 0 or 1 occurred
## Warning in tfun(mu = mu, y = y, w = w, res = FALSE, eta = eta, extra =
## extra): fitted values close to 0 or 1
## Warning in slot(family, "linkinv")(eta, extra = extra): fitted
## probabilities numerically 0 or 1 occurred
## Warning in tfun(mu = mu, y = y, w = w, res = FALSE, eta = eta, extra =
## extra): fitted values close to 0 or 1
## Warning in slot(family, "linkinv")(eta, extra = extra): fitted
## probabilities numerically 0 or 1 occurred
## Warning in tfun(mu = mu, y = y, w = w, res = FALSE, eta = eta, extra =
## extra): fitted values close to 0 or 1
## Warning in slot(family, "linkinv")(eta, extra = extra): fitted
## probabilities numerically 0 or 1 occurred
## Warning in tfun(mu = mu, y = y, w = w, res = FALSE, eta = eta, extra =
## extra): fitted values close to 0 or 1
## Warning in slot(family, "linkinv")(eta, extra = extra): fitted
## probabilities numerically 0 or 1 occurred
## Warning in tfun(mu = mu, y = y, w = w, res = FALSE, eta = eta, extra =
## extra): fitted values close to 0 or 1
## Warning in vglm.fitter(x = x, y = y, w = w, offset = offset, Xm2 =
## Xm2, : some quantities such as z, residuals, SEs may be inaccurate due to
## convergence at a half-step
# summarize the fit
summary(fit)
##
## Call:
## vglm(formula = Species ~ ., family = multinomial, data = iris)
##
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## log(mu[,1]/mu[,3]) -0.0003362 7.294e-10 2.102e-09 9.960e-07 0.0003164
## log(mu[,2]/mu[,3]) -1.9700374 -3.420e-04 -4.358e-06 4.635e-04 2.5601905
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept):1 35.361 25704.949 NA NA
## (Intercept):2 42.638 25.708 1.659 0.0972 .
## Sepal.Length:1 9.637 7631.535 NA NA
## Sepal.Length:2 2.465 2.394 1.030 0.3032
## Sepal.Width:1 12.359 3557.648 NA NA
## Sepal.Width:2 6.681 4.480 1.491 0.1359
## Petal.Length:1 -23.214 5435.364 -0.004 0.9966
## Petal.Length:2 -9.429 4.737 NA NA
## Petal.Width:1 -34.102 8576.875 -0.004 0.9968
## Petal.Width:2 -18.286 9.743 NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of linear predictors: 2
##
## Names of linear predictors: log(mu[,1]/mu[,3]), log(mu[,2]/mu[,3])
##
## Residual deviance: 11.8985 on 290 degrees of freedom
##
## Log-likelihood: -5.9493 on 290 degrees of freedom
##
## Number of iterations: 20
##
## Warning: Hauck-Donner effect detected in the following estimate(s):
## '(Intercept):1', 'Sepal.Length:1', 'Sepal.Width:1', 'Petal.Length:2', 'Petal.Width:2'
##
## Reference group is level 3 of the response
# make predictions
probabilities <- predict(fit, iris[,1:4], type="response")
predictions <- apply(probabilities, 1, which.max)
predictions[which(predictions=="1")] <- levels(iris$Species)[1]
predictions[which(predictions=="2")] <- levels(iris$Species)[2]
predictions[which(predictions=="3")] <- levels(iris$Species)[3]
# summarize accuracy
table(predictions, iris$Species)
##
## predictions setosa versicolor virginica
## setosa 50 0 0
## versicolor 0 49 1
## virginica 0 1 49
LDA is a classification method that finds a linear combination of data attributes that best separate the data into classes. Logistic regression is a classification algorithm traditionally limited to only two-class classification problems. If you have more than two classes then Linear Discriminant Analysis is the preferred linear classification technique. It consists of statistical properties of your data, calculated for each class. For a single input variable (x) this is the mean and the variance of the variable for each class. For multiple variables, this is the same properties calculated over the multivariate Gaussian, namely the means and the covariance matrix.
# load the package
library(MASS)
data(iris)
# fit model
fit <- lda(Species~., data=iris)
# summarize the fit
summary(fit)
## Length Class Mode
## prior 3 -none- numeric
## counts 3 -none- numeric
## means 12 -none- numeric
## scaling 8 -none- numeric
## lev 3 -none- character
## svd 2 -none- numeric
## N 1 -none- numeric
## call 3 -none- call
## terms 3 terms call
## xlevels 0 -none- list
# make predictions
predictions <- predict(fit, iris[,1:4])$class
# summarize accuracy
table(predictions, iris$Species)
##
## predictions setosa versicolor virginica
## setosa 50 0 0
## versicolor 0 48 1
## virginica 0 2 49
#pair-wise scatterplots colored by class
pairs(Species~., data=iris, col=iris$Species)