Data Product module Week 4

Marie Dup
18th of November 2021

Linear regression model: mpg knowing transmission

Simple example of how shiny can be used to build a web application. We choose the mtcars data set from R as we are more familiar. From that on the left side bar menu user can select few additional predictors. Then assess the impact on the linear regression model

Mtcars Dataset

From a simple table we can sense the content and various data available from the dataset.

      mpg             cyl             disp             hp       
 Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
 Median :19.20   Median :6.000   Median :196.3   Median :123.0  
 Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
 3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
 Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
      drat             wt             qsec             vs        
 Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
 1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
 Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
 Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
 3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
 Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
       am              gear            carb      
 Min.   :0.0000   Min.   :3.000   Min.   :1.000  
 1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
 Median :0.0000   Median :4.000   Median :2.000  
 Mean   :0.4062   Mean   :3.688   Mean   :2.812  
 3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
 Max.   :1.0000   Max.   :5.000   Max.   :8.000  

Mpg and transmission: simple linear model

From a simple chart we can assume the transmission type would have a significant impact on the mpg. plot of chunk unnamed-chunk-2

the simple model

reg <- (lm(mpg ~ am, data=mtcars[,]))
reg$coef
(Intercept)          am 
  17.147368    7.244939 

Benefit of the application

By using the application, we realise the optimal model with the 3 predictors would be model would be:


Call:
lm(formula = mpg ~ am + wt + qsec + wt:am, data = mtcars[, ])

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5076 -1.3801 -0.5588  1.0630  4.3684 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    9.723      5.899   1.648 0.110893    
am            14.079      3.435   4.099 0.000341 ***
wt            -2.937      0.666  -4.409 0.000149 ***
qsec           1.017      0.252   4.035 0.000403 ***
am:wt         -4.141      1.197  -3.460 0.001809 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.084 on 27 degrees of freedom
Multiple R-squared:  0.8959,    Adjusted R-squared:  0.8804 
F-statistic: 58.06 on 4 and 27 DF,  p-value: 7.168e-13