Package moonBook is avaiable on CRAN and github. Package moonBook2 is available only on github. Please install moonBook2 package using the following R code.
install.packages("devtools")
devtools::install_github("cardiomoon/moonBook")
devtools::install_github("cardiomoon/moonBook2")In this vignette, I will show you how to visualize the interaction between two predictor variables.
In ANCOVA model, there are one categorical variable with one continuous variable(covariate). With mtcars data, we can make a ANCOVA model in which the mpg(mile per gallon) is a response variable and the wt(weight) and the number of cylinders(cyl) are explanatory variables. The number of cylinders are recorded as number. To make a ANCOVA model, we make a new categorical variable cyl1.
mtcars$engine=factor(mtcars$cyl)You can easily calculate the mean and SD of mpg by engine.
require(moonBook)
require(moonBook2)
require(ggplot2)
summarySE(mtcars,"mpg","engine") engine n mpg sd se ci
1 4 11 26.66364 4.509828 1.3597642 3.029743
2 6 7 19.74286 1.453567 0.5493967 1.344325
3 8 14 15.10000 2.560048 0.6842016 1.478128
The summarySE() function is made by slight modification of summarySE() function in “R graphics cookbook” by Winston Chang. You can make plot comparing the average of three groups.
str(mtcars)'data.frame': 32 obs. of 12 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp : num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat : num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec : num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear : num 4 4 4 3 3 3 3 4 4 4 ...
$ carb : num 4 4 1 1 2 1 4 2 2 4 ...
$ engine: Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
ggCatepillar(mpg~engine,data=mtcars,interactive=TRUE)fit=lm(mpg~wt+engine,data=mtcars)
summary(fit)
Call:
lm(formula = mpg ~ wt + engine, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.5890 -1.2357 -0.5159 1.3845 5.7915
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.9908 1.8878 18.006 < 2e-16 ***
wt -3.2056 0.7539 -4.252 0.000213 ***
engine6 -4.2556 1.3861 -3.070 0.004718 **
engine8 -6.0709 1.6523 -3.674 0.000999 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.557 on 28 degrees of freedom
Multiple R-squared: 0.8374, Adjusted R-squared: 0.82
F-statistic: 48.08 on 3 and 28 DF, p-value: 3.594e-11
In summary, we can obtain regression equation for engine 4,6 and 8.
engine4 : mpg = -3.2056*wt+33.9908
engine6 : mpg = -3.2056*wt+33.9908-4.2556
engine8 : mpg = -3.2056*wt+33.9908-6.0709
We can easly visualize this model with ggAncova() function.
ggAncova(fit,interactive=TRUE)ggAncova(mpg~wt+engine,data=mtcars,interactive=TRUE)If you are not familiar with the formula, You can enter data and variables in order. If you enter two continuous variables as explanatory variables, the last one is converted into categorical variable and make the ANCOVA plot.
ggAncova(mtcars,mpg,wt,cyl,interactive=TRUE)In radial data in moonBook package, the degree of atherosclerosis are measured as the normalized total atheroma volume(NTAV). With this data we can make an ANCOVA model in which age and sex are used as explanatory variables.
ggAncova(NTAV~age+sex,data=radial,interactive=TRUE)As you can see, the NTAV is increasing with age in both sex, but men have larger NTAV thae women.
One of the most interesting research findings are those involving interactions among predictor variables. If you are interested in th impact of horse power and automoble weight on mileage, you can fit a regression model with the following formula.
fit2=lm(mpg~hp*wt,data=mtcars)
summary(fit2)
Call:
lm(formula = mpg ~ hp * wt, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.0632 -1.6491 -0.7362 1.4211 4.5513
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 49.80842 3.60516 13.816 5.01e-14 ***
hp -0.12010 0.02470 -4.863 4.04e-05 ***
wt -8.21662 1.26971 -6.471 5.20e-07 ***
hp:wt 0.02785 0.00742 3.753 0.000811 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.153 on 28 degrees of freedom
Multiple R-squared: 0.8848, Adjusted R-squared: 0.8724
F-statistic: 71.66 on 3 and 28 DF, p-value: 2.981e-13
In formula, A*B is interpreted as A + B + A:B. A:B is the interaction of A and B. The relationship between mileage(mpg) and horse power(hp) varies by car weight(wt). The quantile of weight are as follows.
summary(mtcars$wt) Min. 1st Qu. Median Mean 3rd Qu. Max.
1.513 2.581 3.325 3.217 3.610 5.424
The minimal value of weight is 1.513 and th maimal value of weight is 5.424. The regression equation between mpg and wt varies with hp. If the coefficients were rounded to the decimal point two digits, the regression equation is as follows
mpg=49.81 - 0.12 * hp- 8.22 * wt + 0.03 * hp * wt
If the weight is 3.2(mean), the regression equation will be:
mpg=49.81-0.12 * hp - 8.22 * 3.2 + 0.03 * hp * 3.2
mpg=23.51 - 0.03 * hp
You can calculate the regression slope and y intercept with the mean of weight(3.2) and one standard deviation below and above the maen(2.2,and 4.2 respectively) as follows:
A<-c(2.2,3.2,4.2)
coef<-fit2$coef
labels=as.character(A)
intercept=coef[1]+coef[2]*A
slope=coef[3]+coef[4]*A
intercept[1] 49.54420 49.42410 49.30399
slope[1] -8.155358 -8.127510 -8.099662
With ggEffect() function, you can visualize this interaction easily.
ggEffect(mpg~hp*wt,data=mtcars,interactive=TRUE)By default, three regression lines are displayed at c(0.10,0.5,0.9) percentiles. You can get other regression lines by changing the probs parameter.
ggEffect(mpg~hp*wt,data=mtcars,probs=c(0.2,0.4,0.6,0.8),interactive=TRUE)You can get regression lines with the desired value. If you wanted to get regression lines with weight 2.2,3.2 and 4.2, try this :
ggEffect(mpg~hp*wt,data=mtcars,xvalue=c(2.2,3.2,4.2),interactive=TRUE)You can use the ggEffect() function with categorical variables. You can see the impact of intweraction between aging and smoking status on the progression of atherosclerosis, try this:
ggEffect(NTAV~age*smoking,data=radial,interactive=TRUE)You can clearly see that the atheroscerosis progresses rapidly among smokers.
Another interesting examples comes from acs data(moonBook package). In acs data, 857 patients with acute coronary syndrome(acs) were enrolled. If you are interested the age of disease onset and diabetes and smoking status, you can summarize the onset of age grouped by smoking and DM status as follows.
summarySE(acs,"age",c("DM","smoking")) DM smoking n age sd se ci
1 No Ex-smoker 135 66.54074 11.79853 1.0154562 2.008395
2 No Never 195 66.92821 11.05166 0.7914257 1.560903
3 No Smoker 223 58.61435 11.60437 0.7770861 1.531409
4 Yes Ex-smoker 69 64.18841 10.76797 1.2963118 2.586750
5 Yes Never 137 65.91241 10.09038 0.8620795 1.704815
6 Yes Smoker 98 58.10204 10.45904 1.0565227 2.096905
You can visualize this as follows:
ggEffect(age~DM*smoking,data=acs,interactive=TRUE)ggEffect(age~DM*smoking,data=acs,x=2,interactive=TRUE)fit=aov(age~DM*smoking,data=acs)
summary(fit) Df Sum Sq Mean Sq F value Pr(>F)
DM 1 45 45 0.364 0.546
smoking 2 12378 6189 50.359 <2e-16 ***
DM:smoking 2 95 48 0.387 0.679
Residuals 851 104586 123
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In the two-way ANOVA, the impact of smoking status is significant, whereas the impacts of DM status and the interaction between DM and smoking are insignificant. You can perform multiple comparison by computinig Tukey Honest Significant Differences.
result=TukeyHSD(fit)
result Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = age ~ DM * smoking, data = acs)
$DM
diff lwr upr p adj
Yes-No -0.4777232 -2.031286 1.07584 0.5463034
$smoking
diff lwr upr p adj
Never-Ex-smoker 0.799488 -1.515954 3.114930 0.6965132
Smoker-Ex-smoker -7.302890 -9.633383 -4.972397 0.0000000
Smoker-Never -8.102378 -10.139752 -6.065004 0.0000000
$`DM:smoking`
diff lwr upr p adj
Yes:Ex-smoker-No:Ex-smoker -2.3523349 -7.038172 2.333502 0.7063073
No:Never-No:Ex-smoker 0.3874644 -3.157700 3.932628 0.9996051
Yes:Never-No:Ex-smoker -0.6283320 -4.468240 3.211576 0.9972215
No:Smoker-No:Ex-smoker -7.9263910 -11.379305 -4.473477 0.0000000
Yes:Smoker-No:Ex-smoker -8.4386999 -12.640753 -4.236646 0.0000002
No:Never-Yes:Ex-smoker 2.7397993 -1.695507 7.175106 0.4895283
Yes:Never-Yes:Ex-smoker 1.7240030 -2.950251 6.398257 0.8994286
No:Smoker-Yes:Ex-smoker -5.5740560 -9.935979 -1.212133 0.0037712
Yes:Smoker-Yes:Ex-smoker -6.0863650 -11.062409 -1.110321 0.0066345
Yes:Never-No:Never -1.0157964 -4.545636 2.514043 0.9634988
No:Smoker-No:Never -8.3138554 -11.418286 -5.209425 0.0000000
Yes:Smoker-No:Never -8.8261643 -12.746895 -4.905434 0.0000000
No:Smoker-Yes:Never -7.2980590 -10.735237 -3.860881 0.0000000
Yes:Smoker-Yes:Never -7.8103679 -11.999501 -3.621235 0.0000019
Yes:Smoker-No:Smoker -0.5123090 -4.349828 3.325210 0.9989544
You can visualize the result with ggHSD() function. Because the result of HSD test in thsi case is a list of length 3, you can select the second list.
ggHSD(result,2,interactive=TRUE)Followings are othe examples of ggEffect()
ggEffect(age~Dx*sex,data=acs,interactive=TRUE)As you can see, male patient with acs are younger than female patient with acs
This plot is identical with the following plot
ggCatepillar(age~Dx+sex,data=acs,interactive=TRUE)The static plots(default: interactive=FALSE) are standard ggplots. You can save the plots with ggsave() function. You can make a static plot interactive by ggiraph() function. Interactive plots are htmlwidgets. You can save the plot with htmlwidgets::saveWidget() function.
p<-ggCatepillar(age~Dx+sex,data=acs)
ggsave("ggCatepillar.pdf",p)
p1<-ggiraph(code=print(p))
require(htmlwidgets)
saveWidget(p1,"ggCatepillar.html")Alternatively, you can make a interactive plot by setting the parameter interactive=TRUE. In this case, you can save as html file with saveWidget() function.
p2<-ggEffect(NTAV~age*smoking,data=radial,interactive=TRUE)
saveWidget(p2,"ggEffect.html")