Application of multivariate regression analysis

Minoo Ashtiani

IQVIA, second Interview, first presentation

October 04, 2020

Multivariate Multiple Regression

  • A method of modeling multiple responses, with a single set of predictor variables.

Application for a pharmacologist

  • A pharmacologist has collected data on seven drug variables. The data involve 17 overdoses of the drug amitriptyline. She is interested in how the set of TOT and AMI for plasma level are related to the predictor variables like gender, AMT, and …
Variables Role Description
TOT response total TCAD plasma level
AMI response the amount of amitriptyline present in the TCAD plasma level
GEN predictor gender (male = 0, female = 1)
AMT predictor amount of drug taken at time of overdose
PR predictor wave measurement
DIAP predictor diastolic blood pressure
QRS predictor QRS wave measurement

Data preparation

## # A tibble: 17 x 7
##      TOT   AMI   GEN   AMT    PR  DIAP   QRS
##    <int> <int> <int> <int> <int> <int> <int>
##  1  3389  3149     1  7500   220     0   140
##  2  1101   653     1  1975   200     0   100
##  3  1131   810     0  3600   205    60   111
##  4   596   448     1   675   160    60   120
##  5   896   844     1   750   185    70    83
##  6  1767  1450     1  2500   180    60    80
##  7   807   493     1   350   154    80    98
##  8  1111   941     0  1500   200    70    93
##  9   645   547     1   375   137    60   105
## 10   628   392     1  1050   167    60    74
## 11  1360  1283     1  3000   180    60    80
## 12   652   458     1   450   160    64    60
## 13   860   722     1  1750   135    90    79
## 14   500   384     0  2000   160    60    80
## 15   781   501     0  4500   180     0   100
## 16  1070   405     0  1500   170    90   120
## 17  1754  1520     1  3000   180     0   129

Johnson, R.A. and Wichern, D.W., 2002. Applied multivariate statistical analysis (Vol. 5, No. 8). Upper Saddle River, NJ: Prentice hall.

Scatterplot between variables

Multivariate multiple regression model

  • Model TOT and AMI as a function of GEN, AMT, PR, DIAP and QRS:
## Response TOT :
## 
## Call:
## lm(formula = TOT ~ GEN + AMT + PR + DIAP + QRS, data = ami_data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -399.2 -180.1    4.5  164.1  366.8 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.879e+03  8.933e+02  -3.224 0.008108 ** 
## GEN          6.757e+02  1.621e+02   4.169 0.001565 ** 
## AMT          2.848e-01  6.091e-02   4.677 0.000675 ***
## PR           1.027e+01  4.255e+00   2.414 0.034358 *  
## DIAP         7.251e+00  3.225e+00   2.248 0.046026 *  
## QRS          7.598e+00  3.849e+00   1.974 0.074006 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 281.2 on 11 degrees of freedom
## Multiple R-squared:  0.8871, Adjusted R-squared:  0.8358 
## F-statistic: 17.29 on 5 and 11 DF,  p-value: 6.983e-05
## 
## 
## Response AMI :
## 
## Call:
## lm(formula = AMI ~ GEN + AMT + PR + DIAP + QRS, data = ami_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -373.85 -247.29  -83.74  217.13  462.72 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.729e+03  9.288e+02  -2.938 0.013502 *  
## GEN          7.630e+02  1.685e+02   4.528 0.000861 ***
## AMT          3.064e-01  6.334e-02   4.837 0.000521 ***
## PR           8.896e+00  4.424e+00   2.011 0.069515 .  
## DIAP         7.206e+00  3.354e+00   2.149 0.054782 .  
## QRS          4.987e+00  4.002e+00   1.246 0.238622    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 292.4 on 11 degrees of freedom
## Multiple R-squared:  0.8764, Adjusted R-squared:  0.8202 
## F-statistic:  15.6 on 5 and 11 DF,  p-value: 0.0001132

MANOVA

##           Df  Pillai approx F num Df den Df    Pr(>F)    
## GEN        1 0.36666    2.895      2     10    0.1019    
## AMT        1 0.86998   33.455      2     10 3.716e-05 ***
## PR         1 0.28438    1.987      2     10    0.1877    
## DIAP       1 0.26466    1.800      2     10    0.2150    
## QRS        1 0.29184    2.061      2     10    0.1781    
## Residuals 11                                             
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##  Response TOT :
##             Df  Sum Sq Mean Sq F value   Pr(>F)    
## GEN          1  288658  288658  3.6497  0.08248 .  
## AMT          1 5616926 5616926 71.0179 3.97e-06 ***
## PR           1  341134  341134  4.3131  0.06204 .  
## DIAP         1  280973  280973  3.5525  0.08613 .  
## QRS          1  308241  308241  3.8973  0.07401 .  
## Residuals   11  870008   79092                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Response AMI :
##             Df  Sum Sq Mean Sq F value    Pr(>F)    
## GEN          1  532382  532382  6.2253   0.02977 *  
## AMT          1 5457338 5457338 63.8143 6.623e-06 ***
## PR           1  227012  227012  2.6545   0.13153    
## DIAP         1  320151  320151  3.7436   0.07913 .  
## QRS          1  132786  132786  1.5527   0.23862    
## Residuals   11  940709   85519                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comparing to models with different predictor participants

  • Based on these results we may want to see if a model with just GEN and AMT fits as well as a model with all five predictors.
## Analysis of Variance Table
## 
## Model 1: cbind(TOT, AMI) ~ GEN + AMT + PR + DIAP + QRS
## Model 2: cbind(TOT, AMI) ~ GEN + AMT
##   Res.Df Df Gen.var.  Pillai approx F num Df den Df Pr(>F)
## 1     11       43803                                      
## 2     14  3    51856 0.60386   1.5859      6     22 0.1983