Load all the libraries or functions that you will use to for the rest of the assignment. It is helpful to define your libraries and functions at the top of a report, so that others can know what they need for the report to compile correctly.
The data for this project has already been loaded. Here you will be distinguishing between the uses for let, allow, and permit. You should pick two of these verbs to examine and subset the dataset for only those columns. You can use the droplevels() function to help drop the empty level after you subset.
The data is from the COCA: Corpus of Contemporary American English investigating the verb choice of let, allow, and permit. These are permissive constructions that are often paired with the word to. Predict the verb choice in the Verb column with the following independent variables:
##
## let allow
## 187 167
Run the logistic regression using the rms package.
## Logistic Regression Model
##
## lrm(formula = Verb ~ Reg + Permitter + Imper, data = let)
##
## Model Likelihood Discrimination Rank Discrim.
## Ratio Test Indexes Indexes
## Obs 354 LR chi2 201.17 R2 0.579 C 0.876
## let 187 d.f. 4 g 2.417 Dxy 0.753
## allow 167 Pr(> chi2) <0.0001 gr 11.214 gamma 0.825
## max |deriv| 6e-10 gp 0.377 tau-a 0.376
## Brier 0.132
##
## Coef S.E. Wald Z Pr(>|Z|)
## Intercept -0.1935 0.2280 -0.85 0.3961
## Reg=SPOK 0.0530 0.3072 0.17 0.8629
## Permitter=Inanim 2.6069 0.4106 6.35 <0.0001
## Permitter=Undef 1.2888 0.6166 2.09 0.0366
## Imper=Yes -3.1051 0.5464 -5.68 <0.0001
##
Explain each coefficient - are they significant? What do they imply if they are significant (i.e., which verb does it predict)?
We have 3 categories for ‘Permitter’, the semantic class of the object, and 2 categories each for ‘Reg’ and ‘Imper’.
Permitter:
##
## Anim Inanim Undef
## let 175 8 4
## allow 64 91 12
Reg:
##
## MAG SPOK
## let 72 115
## allow 103 64
Imper:
##
## No Yes
## let 83 104
## allow 163 4
Imper*Reg, but remember you will need to do a glm model.anova function to answer if the addition of the interaction was significant.
visreg library and funtion to visualize the interaction.
model1 = glm(Verb ~ Reg + Permitter + Imper,
family = binomial,
data = let)
model2 = glm(Verb ~ Permitter + Imper*Reg,
family = binomial,
data = let)
anova(model1, model2, test = "Chisq")##
## Call:
## glm(formula = Verb ~ Permitter + Imper * Reg, family = binomial,
## data = let)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.19626 -0.57802 -0.00013 0.43343 1.93484
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.3432 0.2343 -1.465 0.1430
## PermitterInanim 2.6611 0.4142 6.425 1.32e-10 ***
## PermitterUndef 1.4209 0.6195 2.294 0.0218 *
## ImperYes -1.3615 0.5919 -2.300 0.0214 *
## RegSPOK 0.3667 0.3216 1.140 0.2543
## ImperYes:RegSPOK -17.2280 720.3052 -0.024 0.9809
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 489.62 on 353 degrees of freedom
## Residual deviance: 275.27 on 348 degrees of freedom
## AIC: 287.27
##
## Number of Fisher Scoring iterations: 17
The addition of the interaction is significant as seen from the p-value of the anova test. However, the interaction term doesn’t seem to be useful as seen from the p-value of the interaction term in the model summary of the model with the interaction.
It is difficult to really interpret anything about the interaction from the visualization. However, based on the table below, we can see the interaction. When Imperative and Spoken, the model predicts ‘let’ always but predicts ‘allow’ when Not Imperative and Spoken.
## , , = No
##
##
## MAG SPOK
## let 50 33
## allow 99 64
##
## , , = Yes
##
##
## MAG SPOK
## let 22 82
## allow 4 0
car library and the influencePlot() to create a picture of the outliers.
There seem to be some outliers in the data as seen from the plot (larger hat-values and studentized residuals above +2 and below -2)
vif values of the original model (not the interaction model) and determine if you meet the assumption of additivity (meaning no multicollinearity).## Reg=SPOK Permitter=Inanim Permitter=Undef Imper=Yes
## 1.090869 1.044777 1.067404 1.055026
There doesn’t seem to be multi-collinearity as the values are not extreme (not over 5). That is it meets the assumption of additivity.