Data set has two categorical variables (cat_1 and cat_2), with 3 leves in cat_1 (A, B, C) and 2 levels in cat_2 (a, b).
Data has two continuous variables (cont_x and cont_x).
An interaction was forced on the data.
#the necessary package for glht
library(multcomp)
## Loading required package: mvtnorm
## Loading required package: survival
## Loading required package: TH.data
## Loading required package: MASS
##
## Attaching package: 'TH.data'
## The following object is masked from 'package:MASS':
##
## geyser
##Creating a simple dataset
cat1v<-c("A","B","C")
cat2v<-c("a","b")
cat3v<-c(rep(cat2v,each=5))
data<-data.frame("cont_y"=c(rnorm(5,10,1),rnorm(5,15,1),rnorm(20,20,1)),
"cont_x"=rnorm(30,50,3),
"cat_1"=rep(cat1v, each=10),
"cat_2"=rep(cat3v, 3))
print(data)
## cont_y cont_x cat_1 cat_2
## 1 10.037987 49.53592 A a
## 2 10.780778 50.43588 A a
## 3 8.993202 51.63332 A a
## 4 10.295491 55.66041 A a
## 5 11.913381 48.12947 A a
## 6 15.735750 44.64367 A b
## 7 17.272409 53.97580 A b
## 8 13.216592 54.28147 A b
## 9 15.592313 51.83406 A b
## 10 14.610264 48.44897 A b
## 11 19.595790 48.31685 B a
## 12 19.409661 50.05037 B a
## 13 20.132650 47.57749 B a
## 14 21.144225 44.03148 B a
## 15 19.089872 49.75332 B a
## 16 20.186379 46.39267 B b
## 17 18.970095 49.63521 B b
## 18 20.294997 45.30636 B b
## 19 19.469384 50.82990 B b
## 20 18.925418 48.58998 B b
## 21 21.325058 51.19675 C a
## 22 19.145550 45.77869 C a
## 23 21.010435 48.84306 C a
## 24 18.919804 47.67432 C a
## 25 20.340733 45.86633 C a
## 26 19.307198 43.15493 C b
## 27 19.965195 51.02644 C b
## 28 22.340914 44.82557 C b
## 29 20.368386 50.81993 C b
## 30 19.566510 49.77552 C b
Fitting a basic general linear model, though glht should handle generalized and mixed models as well.
##Fit the model of interest
##In our case, is there an interaction of cat_1 and cat_2 on cont_y? cont_x is a co-variate
model<-glm(data$cont_y ~ data$cont_x + data$cat_1*data$cat_2, family=gaussian)
summary.glm(model)
##
## Call:
## glm(formula = data$cont_y ~ data$cont_x + data$cat_1 * data$cat_2,
## family = gaussian)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.79415 -0.60888 0.01989 0.46272 2.23863
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.25438 3.78716 3.764 0.00101 **
## data$cont_x -0.07538 0.07353 -1.025 0.31598
## data$cat_1B 9.23411 0.72372 12.759 6.44e-12 ***
## data$cat_1C 9.50240 0.72547 13.098 3.78e-12 ***
## data$cat_2b 4.84797 0.68684 7.058 3.43e-07 ***
## data$cat_1B:data$cat_2b -5.13770 0.97142 -5.289 2.28e-05 ***
## data$cat_1C:data$cat_2b -4.68297 0.97093 -4.823 7.24e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 1.176743)
##
## Null deviance: 428.422 on 29 degrees of freedom
## Residual deviance: 27.065 on 23 degrees of freedom
## AIC: 98.048
##
## Number of Fisher Scoring iterations: 2
##Investigating the interaction
#Group the data by groups of interest by creating a new variable
data$groups<-interaction(data$cat_1, data$cat_2)
First we create new columns that combine explanatory variables in ways of interest. In our case we want to compare data that belongs to “A” and “a” to every other group. We use a “holm” correction for multiple independent t-tests (this is basically a sequential Bonferroni correction where P-value get’s stricter as groups are added).
#Create a post-hoc model
model_posthoc<-with(data, glm(cont_y ~ groups, family=gaussian))
##Determine the post-hoc comparisons of interest
summary(glht(model_posthoc,
linfct = mcp(groups=
#Is the difference between these groups different from zero?
c("(A.a)-(B.a)=0",
"(A.a)-(B.b)=0",
"(A.a)-(C.a)=0",
"(A.a)-(C.b)=0"))),test = adjusted("holm"))
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: User-defined Contrasts
##
##
## Fit: glm(formula = cont_y ~ groups, family = gaussian)
##
## Linear Hypotheses:
## Estimate Std. Error z value Pr(>|z|)
## (A.a) - (B.a) == 0 -9.4703 0.6868 -13.79 <2e-16 ***
## (A.a) - (B.b) == 0 -9.1651 0.6868 -13.35 <2e-16 ***
## (A.a) - (C.a) == 0 -9.7441 0.6868 -14.19 <2e-16 ***
## (A.a) - (C.b) == 0 -9.9055 0.6868 -14.42 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- holm method)