Load the Libraries + Functions

Load all the libraries or functions that you will use to for the rest of the assignment. It is helpful to define your libraries and functions at the top of a report, so that others can know what they need for the report to compile correctly.

The data for this project has already been loaded. Here you will be distinguishing between the uses for let, allow, and permit. You should pick two of these verbs to examine and subset the dataset for only those columns. You can use the droplevels() function to help drop the empty level after you subset.

# Load all libraries 
library(Rling)
library(visreg)
## Warning: package 'visreg' was built under R version 3.5.2
library(car)
## Warning: package 'car' was built under R version 3.5.2
## Loading required package: carData
library(rms)
## Warning: package 'rms' was built under R version 3.5.2
## Loading required package: Hmisc
## Warning: package 'Hmisc' was built under R version 3.5.2
## Loading required package: lattice
## Loading required package: survival
## Warning: package 'survival' was built under R version 3.5.2
## Loading required package: Formula
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.5.2
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
## 
##     format.pval, units
## Loading required package: SparseM
## 
## Attaching package: 'SparseM'
## The following object is masked from 'package:base':
## 
##     backsolve
## 
## Attaching package: 'rms'
## The following objects are masked from 'package:car':
## 
##     Predict, vif
data(let)
let <- subset(let, Verb != "allow")
head(let)
##    Year  Reg   Verb Neg Permitter Imper
## 3  1990 SPOK    let  No      Anim    No
## 5  1997  MAG permit  No      Anim   Yes
## 8  2007 SPOK    let  No      Anim   Yes
## 9  2005  MAG permit  No      Anim    No
## 11 2007  MAG    let  No      Anim    No
## 12 2005  MAG    let Yes      Anim    No

Description of the Data

The data is from the COCA: Corpus of Contemporary American English investigating the verb choice of let, allow, and permit. These are permissive constructions that are often paired with the word to. Predict the verb choice in the Verb column with the following independent variables:

Sample Size Requirements

table(droplevels(let)$Verb)
## 
##    let permit 
##    187    164

Interpretation: The sample size is large enough for predoctors, let=187, permit=164 The split between choosen verbs (let, permit) is enough to predict

Running a Binary Logistic Regression

LogReg = lrm(Verb ~ Reg + Permitter+ Imper, data = let)
LogReg
## Logistic Regression Model
##  
##  lrm(formula = Verb ~ Reg + Permitter + Imper, data = let)
##  
##                        Model Likelihood     Discrimination    Rank Discrim.    
##                           Ratio Test           Indexes           Indexes       
##  Obs           351    LR chi2     204.92    R2       0.591    C       0.882    
##   let          187    d.f.             4    g        2.557    Dxy     0.765    
##   permit       164    Pr(> chi2) <0.0001    gr      12.892    gamma   0.840    
##  max |deriv| 4e-08                          gp       0.385    tau-a   0.382    
##                                             Brier    0.130                     
##  
##                   Coef    S.E.   Wald Z Pr(>|Z|)
##  Intercept        -0.3366 0.2317 -1.45  0.1462  
##  Reg=SPOK          0.3153 0.3096  1.02  0.3085  
##  Permitter=Inanim  2.6323 0.4176  6.30  <0.0001 
##  Permitter=Undef   1.6039 0.5886  2.72  0.0064  
##  Imper=Yes        -3.4614 0.6208 -5.58  <0.0001 
## 

Interpretation: After running the logistic regression using lms: - the overall model is predictive of verb choice and it is significant - the fitted regression line has 59% variance since R2 is 0.591 which means that not all the data points fall on the fitted regression line.

Coefficients

table(droplevels(let)$Verb, let$Permitter)
##         
##          Anim Inanim Undef
##   let     175      8     4
##   permit   62     86    16
table(droplevels(let)$Verb, let$Reg)
##         
##          MAG SPOK
##   let     72  115
##   permit 100   64
table(droplevels(let)$Verb, let$Imper)
##         
##           No Yes
##   let     83 104
##   permit 161   3

Interactions

model1 = glm(Verb ~ Reg + Permitter + Imper,
             family = binomial,
             data = let)
model2 = glm(Verb ~ Permitter + Imper*Reg,
             family = binomial,
             data = let)
anova(model1, model2, test = "Chisq")
## Analysis of Deviance Table
## 
## Model 1: Verb ~ Reg + Permitter + Imper
## Model 2: Verb ~ Permitter + Imper * Reg
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
## 1       346     280.16                       
## 2       345     276.29  1   3.8693  0.04918 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(model2)
## 
## Call:
## glm(formula = Verb ~ Permitter + Imper * Reg, family = binomial, 
##     data = let)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1737  -0.4172  -0.1557   0.4448   2.9728  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -0.4087     0.2357  -1.734  0.08294 .  
## PermitterInanim    2.6723     0.4202   6.360 2.01e-10 ***
## PermitterUndef     1.6249     0.5906   2.752  0.00593 ** 
## ImperYes          -1.9892     0.7753  -2.566  0.01029 *  
## RegSPOK            0.4637     0.3188   1.455  0.14576    
## ImperYes:RegSPOK  -2.4725     1.2881  -1.919  0.05492 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 485.08  on 350  degrees of freedom
## Residual deviance: 276.29  on 345  degrees of freedom
## AIC: 288.29
## 
## Number of Fisher Scoring iterations: 7
visreg(model2, "Imper", by = "Reg")

table(droplevels(let)$Verb, let$Reg, let$Imper)
## , ,  = No
## 
##         
##          MAG SPOK
##   let     50   33
##   permit  98   63
## 
## , ,  = Yes
## 
##         
##          MAG SPOK
##   let     22   82
##   permit   2    1

Outliers

influencePlot(model2)

##      StudRes        Hat       CookD
## 36  3.136508 0.01204819 0.168699187
## 59  2.334111 0.04166667 0.083175803
## 88  0.732927 0.05752692 0.003198719
## 106 0.732927 0.05752692 0.003198719

Interpretation: - The visual above shows that there are outliers in the dataset. - The biggest ones are record 36, 59 and 188.

Assumptions

rms::vif(model1)
##         RegSPOK PermitterInanim  PermitterUndef        ImperYes 
##        1.080052        1.069528        1.027143        1.051163

Interpretation: - The vif original model values (not the interaction model) shows that we meet the assumption of additivity (meaning no multicollinearity). - Variables in the model have a vif around 1 which is less than 5 or 10 so there is not much multicollinearity, since there is no need to remove any variable from the model.