Load all the libraries or functions that you will use to for the rest of the assignment. It is helpful to define your libraries and functions at the top of a report, so that others can know what they need for the report to compile correctly.
The data for this project has already been loaded. Here you will be distinguishing between the uses for let, allow, and permit. You should pick two of these verbs to examine and subset the dataset for only those columns. You can use the droplevels() function to help drop the empty level after you subset.
# Load all libraries
library(Rling)
library(visreg)
## Warning: package 'visreg' was built under R version 3.5.2
library(car)
## Warning: package 'car' was built under R version 3.5.2
## Loading required package: carData
library(rms)
## Warning: package 'rms' was built under R version 3.5.2
## Loading required package: Hmisc
## Warning: package 'Hmisc' was built under R version 3.5.2
## Loading required package: lattice
## Loading required package: survival
## Warning: package 'survival' was built under R version 3.5.2
## Loading required package: Formula
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.5.2
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
##
## format.pval, units
## Loading required package: SparseM
##
## Attaching package: 'SparseM'
## The following object is masked from 'package:base':
##
## backsolve
##
## Attaching package: 'rms'
## The following objects are masked from 'package:car':
##
## Predict, vif
data(let)
let <- subset(let, Verb != "allow")
head(let)
## Year Reg Verb Neg Permitter Imper
## 3 1990 SPOK let No Anim No
## 5 1997 MAG permit No Anim Yes
## 8 2007 SPOK let No Anim Yes
## 9 2005 MAG permit No Anim No
## 11 2007 MAG let No Anim No
## 12 2005 MAG let Yes Anim No
The data is from the COCA: Corpus of Contemporary American English investigating the verb choice of let, allow, and permit. These are permissive constructions that are often paired with the word to. Predict the verb choice in the Verb column with the following independent variables:
table(droplevels(let)$Verb)
##
## let permit
## 187 164
Interpretation: The sample size is large enough for predoctors, let=187, permit=164 The split between choosen verbs (let, permit) is enough to predict
rms package.LogReg = lrm(Verb ~ Reg + Permitter+ Imper, data = let)
LogReg
## Logistic Regression Model
##
## lrm(formula = Verb ~ Reg + Permitter + Imper, data = let)
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 351 LR chi2 204.92 R2 0.591 C 0.882
## let 187 d.f. 4 g 2.557 Dxy 0.765
## permit 164 Pr(> chi2) <0.0001 gr 12.892 gamma 0.840
## max |deriv| 4e-08 gp 0.385 tau-a 0.382
## Brier 0.130
##
## Coef S.E. Wald Z Pr(>|Z|)
## Intercept -0.3366 0.2317 -1.45 0.1462
## Reg=SPOK 0.3153 0.3096 1.02 0.3085
## Permitter=Inanim 2.6323 0.4176 6.30 <0.0001
## Permitter=Undef 1.6039 0.5886 2.72 0.0064
## Imper=Yes -3.4614 0.6208 -5.58 <0.0001
##
Interpretation: After running the logistic regression using lms: - the overall model is predictive of verb choice and it is significant - the fitted regression line has 59% variance since R2 is 0.591 which means that not all the data points fall on the fitted regression line.
table(droplevels(let)$Verb, let$Permitter)
##
## Anim Inanim Undef
## let 175 8 4
## permit 62 86 16
table(droplevels(let)$Verb, let$Reg)
##
## MAG SPOK
## let 72 115
## permit 100 64
table(droplevels(let)$Verb, let$Imper)
##
## No Yes
## let 83 104
## permit 161 3
Imper*Reg, but remember you will need to do a glm model.anova function to answer if the addition of the interaction was significant.visreg library and funtion to visualize the interaction.model1 = glm(Verb ~ Reg + Permitter + Imper,
family = binomial,
data = let)
model2 = glm(Verb ~ Permitter + Imper*Reg,
family = binomial,
data = let)
anova(model1, model2, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: Verb ~ Reg + Permitter + Imper
## Model 2: Verb ~ Permitter + Imper * Reg
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 346 280.16
## 2 345 276.29 1 3.8693 0.04918 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(model2)
##
## Call:
## glm(formula = Verb ~ Permitter + Imper * Reg, family = binomial,
## data = let)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.1737 -0.4172 -0.1557 0.4448 2.9728
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.4087 0.2357 -1.734 0.08294 .
## PermitterInanim 2.6723 0.4202 6.360 2.01e-10 ***
## PermitterUndef 1.6249 0.5906 2.752 0.00593 **
## ImperYes -1.9892 0.7753 -2.566 0.01029 *
## RegSPOK 0.4637 0.3188 1.455 0.14576
## ImperYes:RegSPOK -2.4725 1.2881 -1.919 0.05492 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 485.08 on 350 degrees of freedom
## Residual deviance: 276.29 on 345 degrees of freedom
## AIC: 288.29
##
## Number of Fisher Scoring iterations: 7
visreg(model2, "Imper", by = "Reg")
table(droplevels(let)$Verb, let$Reg, let$Imper)
## , , = No
##
##
## MAG SPOK
## let 50 33
## permit 98 63
##
## , , = Yes
##
##
## MAG SPOK
## let 22 82
## permit 2 1
car library and the influencePlot() to create a picture of the outliers.influencePlot(model2)
## StudRes Hat CookD
## 36 3.136508 0.01204819 0.168699187
## 59 2.334111 0.04166667 0.083175803
## 88 0.732927 0.05752692 0.003198719
## 106 0.732927 0.05752692 0.003198719
Interpretation: - The visual above shows that there are outliers in the dataset. - The biggest ones are record 36, 59 and 188.
vif values of the original model (not the interaction model) and determine if you meet the assumption of additivity (meaning no multicollinearity).rms::vif(model1)
## RegSPOK PermitterInanim PermitterUndef ImperYes
## 1.080052 1.069528 1.027143 1.051163
Interpretation: - The vif original model values (not the interaction model) shows that we meet the assumption of additivity (meaning no multicollinearity). - Variables in the model have a vif around 1 which is less than 5 or 10 so there is not much multicollinearity, since there is no need to remove any variable from the model.