3D Visualization of multiple regression model(1)

Keon-Woong Moon

2017-06-15

To reproduce this document, you have to install R package ggiraphExtra from github.
install.packages("devtools")
devtools::install_github("cardiomoon/ggiraphExtra")

This documnet is the vignette part 1. You can find the vignette part 2 at:

http://rpubs.com/cardiomoon/284985

Loading required packages

require(ggplot2)
require(plyr)
require(reshape2)
require(ggiraph)
require(rgl)
require(ggiraphExtra)
require(moonBook)   # for use of data radial

Linear gegression

One-way analysis of variance with one covariate(ANCOVA)

The radial data contains demographic data and laboratory data of 115 patients performing IVUS(intravascular ultrasound) examination of a radial artery after tansradial coronary angiography. The NTAV(normalized total atheroma volume measured by intravascular ultrasound(IVUS) in cubic mm) is a quantitative measurement of atherosclerosis. Suppose you want to predict the amount of atherosclerosis(NTAV) from age and status of smoking.

fit=lm(NTAV~age+smoking,data=radial)
summary(fit)

Call:
lm(formula = NTAV ~ age + smoking, data = radial)

Residuals:
    Min      1Q  Median      3Q     Max 
-52.799 -14.154  -3.824   9.303  91.393 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)  
(Intercept)        26.5082    17.9231   1.479   0.1420  
age                 0.5515     0.2323   2.374   0.0193 *
smokingnon-smoker   3.8101     8.7821   0.434   0.6652  
smokingsmoker      15.8976     9.4225   1.687   0.0944 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 23.47 on 111 degrees of freedom
Multiple R-squared:  0.0787,    Adjusted R-squared:  0.0538 
F-statistic: 3.161 on 3 and 111 DF,  p-value: 0.02749

ggPredict() : 2D visualization with interaction

You can visualize this model with ggPredict(). This function uses ggiraph::geom_point_interactive() and ggiraph::geom_path_interactive() functions to make a interactive plot. You can identify the points and see the regression equation with your mouse. In ANCOVA model, the slope of regression lines are all the same. You can see three parallel lines in this model.

ggPredict(fit,interactive=TRUE)

ggPredict3d() : 3D visualization

You can make 3D plot for this model with ggPredict3d() function. This function uses rgl::plot3d() function to make 3d plot. You can use your mouse to manipulate the plot. The default is that if you click and hold with the left mouse button, you can rotate the plot by dragging it. The right mouse button(or the mouse wheel) is used to resize it, and the middle button changes the perspective in the point of view.

ggPredict3d(fit,radius=2)

Alternatively, you can make the facetted plot according to categorical variable. Set the overlay parameter FALSE and you can get a facetted plot.

ggPredict3d(fit,radius=2,overlay=FALSE)

You can make a HTML file(Write HTML and Javascript to display a scene in a web browser) with writeWebGL() function.

writeWebGL("test.html")

You man make a snapshot and save as a *.png file with snapshot3d() function.

snapshot3d("test1.png")

One-way ANCOVA with interaction

You can make model with interaction with same predictor variables.

fit1=lm(NTAV~age*smoking,data=radial)
summary(fit1)

Call:
lm(formula = NTAV ~ age * smoking, data = radial)

Residuals:
   Min     1Q Median     3Q    Max 
-51.60 -15.38  -2.80   8.20  87.05 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)
(Intercept)            44.46124   58.73248   0.757    0.451
age                     0.28890    0.85036   0.340    0.735
smokingnon-smoker       1.06313   62.02355   0.017    0.986
smokingsmoker         -29.57296   63.48579  -0.466    0.642
age:smokingnon-smoker   0.02888    0.90286   0.032    0.975
age:smokingsmoker       0.72427    0.93927   0.771    0.442

Residual standard error: 23.47 on 109 degrees of freedom
Multiple R-squared:  0.09553,   Adjusted R-squared:  0.05405 
F-statistic: 2.303 on 5 and 109 DF,  p-value: 0.04956

In one-way ANCOVA with interaction model, the slope of regression lines are different. The regression equations according to smoking status are:

For ex-smoker, y=0.29*x+44.46
For non-smoker, y=0.32*x+45.52
For smoker, y=1.01*x+14.89

2D visualization

Because the slopes of regression lines are different in one-way ANCOVA with interaction model, you can see the regression lines cross each other.

ggPredict(fit1,interactive=TRUE)

3D visualization

You can make 3D plot for this model with ggPredict3d() function.

ggPredict3d(fit1,radius=2)

One-way ANOVA with two covariates

Suppose you want to predict the amount of atherosclerosis(NTAV) from age and body weight and the status of smoking.

fit2=lm(NTAV~age+weight+smoking,data=radial)
summary(fit2)

Call:
lm(formula = NTAV ~ age + weight + smoking, data = radial)

Residuals:
    Min      1Q  Median      3Q     Max 
-54.210 -12.984  -3.611   7.997  87.451 

Coefficients:
                  Estimate Std. Error t value Pr(>|t|)   
(Intercept)       -29.6078    25.1687  -1.176  0.24199   
age                 0.7000     0.2292   3.054  0.00283 **
weight              0.7489     0.2442   3.067  0.00272 **
smokingnon-smoker   4.1419     8.4681   0.489  0.62573   
smokingsmoker      13.2881     9.1246   1.456  0.14816   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 22.63 on 110 degrees of freedom
Multiple R-squared:  0.1513,    Adjusted R-squared:  0.1204 
F-statistic: 4.901 on 4 and 110 DF,  p-value: 0.001127

In this model, the regression equations according to the smoking status are:

For ex-smoker, NTAV=0.7*age+0.75*weight+-29.61
For non-smoker,  NTAV=0.7*age+0.75*weight+-25.47
For smoker,NTAV=0.7*age+0.75*weight+-16.32

2D visualization

Because the slopes of regression lines are all the same, you can see the regression planes are all parallel.

ggPredict(fit2,interactive=TRUE)

3D visualization

You can make 3D plot for this model with ggPredict3d() function.

ggPredict3d(fit2,radius=2)

You can make overlayed plot with the following R code.

ggPredict3d(fit2,radius=2,overlay=TRUE)

One-way ANOVA with two covariates inth interaction

You can make model with interaction with same predictor variables.

fit3=lm(NTAV~(age+weight)*smoking,data=radial)
summary(fit3)

Call:
lm(formula = NTAV ~ (age + weight) * smoking, data = radial)

Residuals:
    Min      1Q  Median      3Q     Max 
-53.515 -13.451  -4.229   7.816  83.795 

Coefficients:
                         Estimate Std. Error t value Pr(>|t|)
(Intercept)              -73.9307   130.3128  -0.567    0.572
age                        0.9075     1.0301   0.881    0.380
weight                     1.2398     1.2260   1.011    0.314
smokingnon-smoker         72.4766   134.1433   0.540    0.590
smokingsmoker             30.1693   135.4396   0.223    0.824
age:smokingnon-smoker     -0.4406     1.0747  -0.410    0.683
age:smokingsmoker          0.1472     1.1012   0.134    0.894
weight:smokingnon-smoker  -0.6346     1.2683  -0.500    0.618
weight:smokingsmoker      -0.3964     1.2925  -0.307    0.760

Residual standard error: 22.88 on 106 degrees of freedom
Multiple R-squared:  0.1644,    Adjusted R-squared:  0.1014 
F-statistic: 2.607 on 8 and 106 DF,  p-value: 0.0121

2D visualization

Because the slopes of regression lines are different, you can see the regression lines are not parallel among groups.

ggPredict(fit3,interactive=TRUE)

3D visualization

You can make 3D plot for this model with ggPredict3d() function.

ggPredict3d(fit3,radius=2)

You can make overlayed plot with the following R code. In this plot you can see the regression planes cross each other.

ggPredict3d(fit3,radius=2,overlay=TRUE)

Two-way factorial ANOVA with one covariate with completely crossing

You can visualize the two-way factorial ANOVA model.

fit4=lm(NTAV~age*DM*smoking,data=radial)
summary(fit4)

Call:
lm(formula = NTAV ~ age * DM * smoking, data = radial)

Residuals:
    Min      1Q  Median      3Q     Max 
-52.869 -12.669  -2.989  10.970  80.747 

Coefficients:
                           Estimate Std. Error t value Pr(>|t|)
(Intercept)               3.931e+01  6.283e+01   0.626    0.533
age                       3.643e-01  9.439e-01   0.386    0.700
DM                        1.357e+02  3.109e+02   0.436    0.663
smokingnon-smoker        -5.997e-01  6.686e+01  -0.009    0.993
smokingsmoker            -1.198e+01  6.766e+01  -0.177    0.860
age:DM                   -1.785e+00  4.111e+00  -0.434    0.665
age:smokingnon-smoker    -4.112e-03  1.007e+00  -0.004    0.997
age:smokingsmoker         5.599e-01  1.034e+00   0.541    0.589
DM:smokingnon-smoker     -1.071e+02  3.138e+02  -0.341    0.733
DM:smokingsmoker         -2.679e+02  3.183e+02  -0.842    0.402
age:DM:smokingnon-smoker  1.523e+00  4.160e+00   0.366    0.715
age:DM:smokingsmoker      3.585e+00  4.256e+00   0.842    0.402

Residual standard error: 22.63 on 103 degrees of freedom
Multiple R-squared:  0.2054,    Adjusted R-squared:  0.1205 
F-statistic:  2.42 on 11 and 103 DF,  p-value: 0.01014
ggPredict(fit4,interactive=TRUE)

3D visualization

You can make 3D plot for this model with ggPredict3d() function.

ggPredict3d(fit4,radius=2)

You can make facetted plot with the following R code. In this plot you can see the regression planes cross each other.

ggPredict3d(fit4,radius=2,overlay=TRUE)