Regression

In Regression, we study the dependency relationship between the variables.

Let us consider a dataset Boston which contains response variable medv.

library(MASS)
data("Boston")
head(Boston)
##      crim zn indus chas   nox    rm  age    dis rad tax ptratio  black
## 1 0.00632 18  2.31    0 0.538 6.575 65.2 4.0900   1 296    15.3 396.90
## 2 0.02731  0  7.07    0 0.469 6.421 78.9 4.9671   2 242    17.8 396.90
## 3 0.02729  0  7.07    0 0.469 7.185 61.1 4.9671   2 242    17.8 392.83
## 4 0.03237  0  2.18    0 0.458 6.998 45.8 6.0622   3 222    18.7 394.63
## 5 0.06905  0  2.18    0 0.458 7.147 54.2 6.0622   3 222    18.7 396.90
## 6 0.02985  0  2.18    0 0.458 6.430 58.7 6.0622   3 222    18.7 394.12
##   lstat medv
## 1  4.98 24.0
## 2  9.14 21.6
## 3  4.03 34.7
## 4  2.94 33.4
## 5  5.33 36.2
## 6  5.21 28.7

Let us apply Multiple Linear Regression for the data

fitLM <- lm(medv ~ . , data = Boston)
summary(fitLM)
## 
## Call:
## lm(formula = medv ~ ., data = Boston)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.595  -2.730  -0.518   1.777  26.199 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.646e+01  5.103e+00   7.144 3.28e-12 ***
## crim        -1.080e-01  3.286e-02  -3.287 0.001087 ** 
## zn           4.642e-02  1.373e-02   3.382 0.000778 ***
## indus        2.056e-02  6.150e-02   0.334 0.738288    
## chas         2.687e+00  8.616e-01   3.118 0.001925 ** 
## nox         -1.777e+01  3.820e+00  -4.651 4.25e-06 ***
## rm           3.810e+00  4.179e-01   9.116  < 2e-16 ***
## age          6.922e-04  1.321e-02   0.052 0.958229    
## dis         -1.476e+00  1.995e-01  -7.398 6.01e-13 ***
## rad          3.060e-01  6.635e-02   4.613 5.07e-06 ***
## tax         -1.233e-02  3.760e-03  -3.280 0.001112 ** 
## ptratio     -9.527e-01  1.308e-01  -7.283 1.31e-12 ***
## black        9.312e-03  2.686e-03   3.467 0.000573 ***
## lstat       -5.248e-01  5.072e-02 -10.347  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.745 on 492 degrees of freedom
## Multiple R-squared:  0.7406, Adjusted R-squared:  0.7338 
## F-statistic: 108.1 on 13 and 492 DF,  p-value: < 2.2e-16

Viewing the Summary Table in a Proper Form:

library(xtable)
options(xtable.comment=FALSE)
sfit <- summary(fitLM)
print(xtable(sfit), type="html",html.table.attributes="border=1")
Estimate Std. Error t value Pr(>|t|)
(Intercept) 36.4595 5.1035 7.14 0.0000
crim -0.1080 0.0329 -3.29 0.0011
zn 0.0464 0.0137 3.38 0.0008
indus 0.0206 0.0615 0.33 0.7383
chas 2.6867 0.8616 3.12 0.0019
nox -17.7666 3.8197 -4.65 0.0000
rm 3.8099 0.4179 9.12 0.0000
age 0.0007 0.0132 0.05 0.9582
dis -1.4756 0.1995 -7.40 0.0000
rad 0.3060 0.0663 4.61 0.0000
tax -0.0123 0.0038 -3.28 0.0011
ptratio -0.9527 0.1308 -7.28 0.0000
black 0.0093 0.0027 3.47 0.0006
lstat -0.5248 0.0507 -10.35 0.0000

Using Stargazer Package

library(stargazer)
## 
## Please cite as:
##  Hlavac, Marek (2015). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2. http://CRAN.R-project.org/package=stargazer
stargazer(Boston,type = "text")
## 
## =============================================
## Statistic  N   Mean   St. Dev.  Min     Max  
## ---------------------------------------------
## crim      506  3.614   8.602   0.006  88.976 
## zn        506 11.364   23.322  0.000  100.000
## indus     506 11.137   6.860   0.460  27.740 
## chas      506  0.069   0.254     0       1   
## nox       506  0.555   0.116   0.385   0.871 
## rm        506  6.285   0.703   3.561   8.780 
## age       506 68.575   28.149  2.900  100.000
## dis       506  3.795   2.106   1.130  12.127 
## rad       506  9.549   8.707     1      24   
## tax       506 408.237 168.537   187     711  
## ptratio   506 18.456   2.165   12.600 22.000 
## black     506 356.674  91.295  0.320  396.900
## lstat     506 12.653   7.141   1.730  37.970 
## medv      506 22.533   9.197   5.000  50.000 
## ---------------------------------------------
Summary
stargazer(fitLM,summary.logical = T, type = "text")
## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                medv            
## -----------------------------------------------
## crim                         -0.108***         
##                               (0.033)          
##                                                
## zn                           0.046***          
##                               (0.014)          
##                                                
## indus                          0.021           
##                               (0.061)          
##                                                
## chas                         2.687***          
##                               (0.862)          
##                                                
## nox                         -17.767***         
##                               (3.820)          
##                                                
## rm                           3.810***          
##                               (0.418)          
##                                                
## age                            0.001           
##                               (0.013)          
##                                                
## dis                          -1.476***         
##                               (0.199)          
##                                                
## rad                          0.306***          
##                               (0.066)          
##                                                
## tax                          -0.012***         
##                               (0.004)          
##                                                
## ptratio                      -0.953***         
##                               (0.131)          
##                                                
## black                        0.009***          
##                               (0.003)          
##                                                
## lstat                        -0.525***         
##                               (0.051)          
##                                                
## Constant                     36.459***         
##                               (5.103)          
##                                                
## -----------------------------------------------
## Observations                    506            
## R2                             0.741           
## Adjusted R2                    0.734           
## Residual Std. Error      4.745 (df = 492)      
## F Statistic          108.077*** (df = 13; 492) 
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01