Given data as follows.
#Read or open data "data.csv" in R
data1=read.csv("data.csv")
#Present the data
data1
## Group Hour Motivation
## 1 A 4 30
## 2 A 5 25
## 3 A 4 20
## 4 A 4 40
## 5 A 3 20
## 6 A 4 30
## 7 A 3 40
## 8 A 4 20
## 9 A 5 30
## 10 A 6 40
## 11 A 4 50
## 12 A 5 60
## 13 A 6 40
## 14 A 7 50
## 15 A 4 50
## 16 A 9 30
## 17 A 8 20
## 18 A 4 40
## 19 A 3 80
## 20 A 3 30
## 21 A 4 70
## 22 A 3 50
## 23 A 5 60
## 24 A 6 70
## 25 A 4 50
## 26 A 3 50
## 27 A 9 30
## 28 A 4 80
## 29 A 3 40
## 30 A 2 90
## 31 B 7 60
## 32 B 7 60
## 33 B 8 60
## 34 B 9 70
## 35 B 9 70
## 36 B 7 70
## 37 B 6 70
## 38 B 7 70
## 39 B 8 70
## 40 B 9 70
## 41 B 2 80
## 42 B 3 80
## 43 B 3 80
## 44 B 3 80
## 45 B 2 80
## 46 B 7 75
## 47 B 8 75
## 48 B 9 75
## 49 B 9 40
## 50 B 9 70
## 51 B 9 70
## 52 B 9 60
## 53 B 8 70
## 54 B 8 70
## 55 B 2 70
## 56 B 3 80
## 57 B 3 80
## 58 B 9 60
## 59 B 9 60
## 60 B 8 75
## 61 C 13 90
## 62 C 13 99
## 63 C 14 99
## 64 C 15 99
## 65 C 14 98
## 66 C 14 99
## 67 C 15 99
## 68 C 15 99
## 69 C 15 98
## 70 C 15 95
## 71 C 16 93
## 72 C 16 97
## 73 C 16 98
## 74 C 16 98
## 75 C 14 98
## 76 C 16 98
## 77 C 14 99
## 78 C 15 92
## 79 C 15 95
## 80 C 15 97
## 81 C 6 70
## 82 C 7 70
## 83 C 6 71
## 84 C 8 99
## 85 C 8 92
## 86 C 7 91
## 87 C 15 89
## 88 C 14 89
## 89 C 15 75
## 90 C 16 79
Based on data above, Hour and Motivation are metric independent variables, whereas Group is non-metric (category) dependent variable. Based on data above, MLR will be used to make equation of classification, and use data Hour and Motivation above, to predict membership.
Next, plot the data above.
library(ggplot2) #Load package ggplot2 to use function qplot
qplot(Hour, Motivation, data=data1, color = Group)
#Plot data
plot( data1[ , c(2,3)], col=data1[ ,1 ])
Perform MLR in R.
library(stats4) #Load package stats
library(splines) #Load package splines
#To load package VGAM, need to load package stats4 and splines.
library(VGAM) #Load package VGAM
#Perform MLR
fit.MLR <- vglm( Group ~ Hour + Motivation, family=multinomial, data1)
summary(fit.MLR)
##
## Call:
## vglm(formula = Group ~ Hour + Motivation, family = multinomial,
## data = data1)
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## log(mu[,1]/mu[,3]) -2.552 -0.05001 -7.996e-05 0.04161 2.309
## log(mu[,2]/mu[,3]) -3.609 -0.14875 -4.471e-02 0.41364 1.707
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept):1 32.13235 6.69199 4.802 1.57e-06 ***
## (Intercept):2 17.69522 5.24161 3.376 0.000736 ***
## Hour:1 -1.30432 0.32954 -3.958 7.56e-05 ***
## Hour:2 -0.44061 0.18115 -2.432 0.015005 *
## Motivation:1 -0.33261 0.07706 -4.316 1.59e-05 ***
## Motivation:2 -0.17476 0.06329 -2.761 0.005758 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of linear predictors: 2
##
## Names of linear predictors: log(mu[,1]/mu[,3]), log(mu[,2]/mu[,3])
##
## Dispersion Parameter for multinomial family: 1
##
## Residual deviance: 59.2849 on 174 degrees of freedom
##
## Log-likelihood: -29.6424 on 174 degrees of freedom
##
## Number of iterations: 8
Note that the reference/base category is “>=3”. The following result based on SPSS.
#  is command to present SPSS picture
Note that the result MLR above based on R and SPSS are same.
#Perform classification
probabilities.MLR <- predict(fit.MLR, data1[,2:3], type="response")
predictions <- apply(probabilities.MLR, 1, which.max)
predictions[which(predictions=="1")] <- levels(data1$Group)[1]
predictions[which(predictions=="2")] <- levels(data1$Group)[2]
predictions[which(predictions=="3")] <- levels(data1$Group)[3]
# Summarize accuracy
table(data1$Group, predictions)
## predictions
## A B C
## A 25 5 0
## B 4 26 0
## C 0 3 27
The classification table below based on MLR using SPSS.
#  is command to present SPSS picture