3D Visualization of multiple regression model(2)

Keon-Woong Moon

2017-06-15

To reproduce this document, you have to install R package ggiraphExtra from github.
install.packages("devtools")
devtools::install_github("cardiomoon/ggiraphExtra")

This documnet is the vignette part 2. You can find the vignette part I at:

http://rpubs.com/cardiomoon/284987

Loading required packages

require(ggplot2)
require(plyr)
require(reshape2)
require(ggiraph)
require(rgl)
require(ggiraphExtra)
require(TH.data)   # for use of data GBSG2

Logistic Regession

Multiple logistic regression model with one continuous and one categorical variables with interaction

You can use glm() function to make a logistic regression model. The GBSG2 data in package TH.data contains data from German Breast Cancer Study Group 2. Suppose you want to predict survival with number of positive nodes and hormonal therapy.

require(TH.data) # for use data GBSG2
fit=glm(cens~pnodes*horTh,data=GBSG2,family=binomial)
summary(fit)

Call:
glm(formula = cens ~ pnodes * horTh, family = binomial, data = GBSG2)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.7892  -1.0208  -0.7573   1.2288   1.6667  

Coefficients:
                Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -0.55368    0.13942  -3.971 7.15e-05 ***
pnodes           0.08672    0.02172   3.993 6.53e-05 ***
horThyes        -0.69833    0.25394  -2.750  0.00596 ** 
pnodes:horThyes  0.06306    0.03899   1.617  0.10582    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 939.68  on 685  degrees of freedom
Residual deviance: 887.69  on 682  degrees of freedom
AIC: 895.69

Number of Fisher Scoring iterations: 4

ggPredict() : 2D visualization with interaction

You can visualize this model with ggPredict(). This function uses ggiraph::geom_point_interactive() and ggiraph::geom_path_interactive() functions to make a interactive plot. You can identify the points and see the regression equation with your mouse. In ANCOVA model, the slope of regression lines are all the same. You can see three parallel lines in this model.

ggPredict(fit,interactive=TRUE)

ggPredict3d() : 3D visualization

You can make 3D plot for this model with ggPredict3d() function. This function uses rgl::plot3d() function to make 3d plot. You can use your mouse to manipulate the plot. The default is that if you click and hold with the left mouse button, you can rotate the plot by dragging it. The right mouse button(or the mouse wheel) is used to resize it, and the middle button changes the perspective in the point of view.

ggPredict3d(fit,radius=0.5)

Multiple logistic regression model with two continuous variables with interaction

Suppose you want to predict survival with number of positive nodes and the patient age.

fit1=glm(cens~pnodes*age,data=GBSG2,family=binomial)
summary(fit1)

Call:
glm(formula = cens ~ pnodes * age, family = binomial, data = GBSG2)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.1166  -0.9791  -0.8979   1.2464   1.5204  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.4992499  0.6092464  -0.819    0.413
pnodes       0.0738909  0.0910865   0.811    0.417
age         -0.0053115  0.0112587  -0.472    0.637
pnodes:age   0.0006119  0.0016679   0.367    0.714

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 939.68  on 685  degrees of freedom
Residual deviance: 895.69  on 682  degrees of freedom
AIC: 903.69

Number of Fisher Scoring iterations: 4

2D visualization with interaction

You can visualize this model with ggPredict().

ggPredict(fit1,colorn=100,interactive=TRUE)

ggPredict3d() : 3D visualization

You can make 3D plot for this model with ggPredict3d() function. In this plot, the regression lines make a plane.

ggPredict3d(fit1,radius=0.5)

Multiple logistic regression model with three predictor variables with interaction

You can make model with three predictor variables.

fit2=glm(cens~(pnodes+age)*horTh,data=GBSG2,family=binomial)
summary(fit2)

Call:
glm(formula = cens ~ (pnodes + age) * horTh, family = binomial, 
    data = GBSG2)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.8030  -1.0151  -0.7612   1.2388   1.6738  

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)     -0.813227   0.523562  -1.553    0.120    
pnodes           0.086474   0.021757   3.975 7.05e-05 ***
age              0.005110   0.009920   0.515    0.606    
horThyes        -0.352656   1.010005  -0.349    0.727    
pnodes:horThyes  0.063242   0.039017   1.621    0.105    
age:horThyes    -0.006628   0.017795  -0.372    0.710    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 939.68  on 685  degrees of freedom
Residual deviance: 887.42  on 680  degrees of freedom
AIC: 899.42

Number of Fisher Scoring iterations: 4

2D visualization with interaction

You can visualize this model with ggPredict().

ggPredict(fit2,colorn=100,interactive=TRUE)

ggPredict3d() : 3D visualization

You can make 3D plot for this model with ggPredict3d() function. In this plot, the regression lines make a plane.

ggPredict3d(fit2,radius=0.5)

Alternatively, you can make overlayed plot with the following R code. In this plot you can see the regression plane cross each other.

ggPredict3d(fit2,radius=0.5,overlay=TRUE,show.legend=TRUE)

Multiple logistic regression model with three predictor variables without interaction

You can make model without interaction.

fit3=glm(cens~pnodes+age+horTh,data=GBSG2,family=binomial)
summary(fit3)

Call:
glm(formula = cens ~ pnodes + age + horTh, family = binomial, 
    data = GBSG2)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.1386  -0.9945  -0.8250   1.2435   1.6142  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.793962   0.439612  -1.806   0.0709 .  
pnodes       0.108696   0.018180   5.979 2.25e-09 ***
age          0.002764   0.008248   0.335   0.7376    
horThyes    -0.408251   0.174513  -2.339   0.0193 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 939.68  on 685  degrees of freedom
Residual deviance: 890.29  on 682  degrees of freedom
AIC: 898.29

Number of Fisher Scoring iterations: 4

2D visualization with interaction

You can visualize this model with ggPredict().

ggPredict(fit3,colorn=100,interactive=TRUE)

ggPredict3d() : 3D visualization

You can make 3D plot for this model with ggPredict3d() function. In this plot, the regression planes are parallel.

ggPredict3d(fit3,radius=0.5,overlay=TRUE,show.legend=TRUE)