Package and sample dataset

Data set has two categorical variables (cat_1 and cat_2), with 3 leves in cat_1 (A, B, C) and 2 levels in cat_2 (a, b).

Data has two continuous variables (cont_x and cont_x).

An interaction was forced on the data.

#the necessary package for glht
library(multcomp)
## Loading required package: mvtnorm
## Loading required package: survival
## Loading required package: TH.data
## Loading required package: MASS
## 
## Attaching package: 'TH.data'
## The following object is masked from 'package:MASS':
## 
##     geyser
##Creating a simple dataset
cat1v<-c("A","B","C")
cat2v<-c("a","b")
cat3v<-c(rep(cat2v,each=5))
data<-data.frame("cont_y"=c(rnorm(5,10,1),rnorm(5,15,1),rnorm(20,20,1)),
                 "cont_x"=rnorm(30,50,3),
                 "cat_1"=rep(cat1v, each=10),
                 "cat_2"=rep(cat3v, 3))
print(data)
##       cont_y   cont_x cat_1 cat_2
## 1  10.037987 49.53592     A     a
## 2  10.780778 50.43588     A     a
## 3   8.993202 51.63332     A     a
## 4  10.295491 55.66041     A     a
## 5  11.913381 48.12947     A     a
## 6  15.735750 44.64367     A     b
## 7  17.272409 53.97580     A     b
## 8  13.216592 54.28147     A     b
## 9  15.592313 51.83406     A     b
## 10 14.610264 48.44897     A     b
## 11 19.595790 48.31685     B     a
## 12 19.409661 50.05037     B     a
## 13 20.132650 47.57749     B     a
## 14 21.144225 44.03148     B     a
## 15 19.089872 49.75332     B     a
## 16 20.186379 46.39267     B     b
## 17 18.970095 49.63521     B     b
## 18 20.294997 45.30636     B     b
## 19 19.469384 50.82990     B     b
## 20 18.925418 48.58998     B     b
## 21 21.325058 51.19675     C     a
## 22 19.145550 45.77869     C     a
## 23 21.010435 48.84306     C     a
## 24 18.919804 47.67432     C     a
## 25 20.340733 45.86633     C     a
## 26 19.307198 43.15493     C     b
## 27 19.965195 51.02644     C     b
## 28 22.340914 44.82557     C     b
## 29 20.368386 50.81993     C     b
## 30 19.566510 49.77552     C     b

The model of interest

Fitting a basic general linear model, though glht should handle generalized and mixed models as well.

##Fit the model of interest
##In our case, is there an interaction of cat_1 and cat_2 on cont_y? cont_x is a co-variate
model<-glm(data$cont_y ~ data$cont_x + data$cat_1*data$cat_2, family=gaussian)
summary.glm(model)
## 
## Call:
## glm(formula = data$cont_y ~ data$cont_x + data$cat_1 * data$cat_2, 
##     family = gaussian)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.79415  -0.60888   0.01989   0.46272   2.23863  
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             14.25438    3.78716   3.764  0.00101 ** 
## data$cont_x             -0.07538    0.07353  -1.025  0.31598    
## data$cat_1B              9.23411    0.72372  12.759 6.44e-12 ***
## data$cat_1C              9.50240    0.72547  13.098 3.78e-12 ***
## data$cat_2b              4.84797    0.68684   7.058 3.43e-07 ***
## data$cat_1B:data$cat_2b -5.13770    0.97142  -5.289 2.28e-05 ***
## data$cat_1C:data$cat_2b -4.68297    0.97093  -4.823 7.24e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 1.176743)
## 
##     Null deviance: 428.422  on 29  degrees of freedom
## Residual deviance:  27.065  on 23  degrees of freedom
## AIC: 98.048
## 
## Number of Fisher Scoring iterations: 2
##Investigating the interaction

#Group the data by groups of interest by creating a new variable
data$groups<-interaction(data$cat_1, data$cat_2)

Post-hoc comparisons

First we create new columns that combine explanatory variables in ways of interest. In our case we want to compare data that belongs to “A” and “a” to every other group. We use a “holm” correction for multiple independent t-tests (this is basically a sequential Bonferroni correction where P-value get’s stricter as groups are added).

#Create a post-hoc model
model_posthoc<-with(data, glm(cont_y ~ groups, family=gaussian))

##Determine the post-hoc comparisons of interest
summary(glht(model_posthoc, 
             linfct = mcp(groups=
                     #Is the difference between these groups different from zero?
                     c("(A.a)-(B.a)=0",
                       "(A.a)-(B.b)=0",
                       "(A.a)-(C.a)=0",
                       "(A.a)-(C.b)=0"))),test = adjusted("holm"))
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: User-defined Contrasts
## 
## 
## Fit: glm(formula = cont_y ~ groups, family = gaussian)
## 
## Linear Hypotheses:
##                    Estimate Std. Error z value Pr(>|z|)    
## (A.a) - (B.a) == 0  -9.4703     0.6868  -13.79   <2e-16 ***
## (A.a) - (B.b) == 0  -9.1651     0.6868  -13.35   <2e-16 ***
## (A.a) - (C.a) == 0  -9.7441     0.6868  -14.19   <2e-16 ***
## (A.a) - (C.b) == 0  -9.9055     0.6868  -14.42   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- holm method)