This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.
plot(cars)
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.
head(mtcars)
library(data.table)
library(readr)
library(readxl)
library(ggplot2)
library(ggmosaic)
library(GGally)
Registered S3 method overwritten by 'GGally':
method from
+.gg ggplot2
Attaching package: ‘GGally’
The following object is masked from ‘package:ggmosaic’:
happy
library(corrplot)
corrplot 0.94 loaded
library(car)
Loading required package: carData
setdir <- "C:/Users/Risky's/Documents/R/R Notebook/mtcars/"
fwrite(mtcars, paste0(setdir, "mtcars.csv"))
str(dt_mtcars)
Classes ‘data.table’ and 'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : int 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : int 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : int 0 0 1 1 0 1 0 1 1 1 ...
$ am : int 1 1 1 0 0 0 0 0 0 0 ...
$ gear: int 4 4 4 3 3 3 3 4 4 4 ...
$ carb: int 4 4 1 1 2 1 4 2 2 4 ...
- attr(*, ".internal.selfref")=<externalptr>
summary(dt_mtcars)
mpg cyl disp hp drat
Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0 Min. :2.760
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5 1st Qu.:3.080
Median :19.20 Median :6.000 Median :196.3 Median :123.0 Median :3.695
Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7 Mean :3.597
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0 3rd Qu.:3.920
Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0 Max. :4.930
wt qsec vs am gear
Min. :1.513 Min. :14.50 Min. :0.0000 Min. :0.0000 Min. :3.000
1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:3.000
Median :3.325 Median :17.71 Median :0.0000 Median :0.0000 Median :4.000
Mean :3.217 Mean :17.85 Mean :0.4375 Mean :0.4062 Mean :3.688
3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:4.000
Max. :5.424 Max. :22.90 Max. :1.0000 Max. :1.0000 Max. :5.000
carb
Min. :1.000
1st Qu.:2.000
Median :2.000
Mean :2.812
3rd Qu.:4.000
Max. :8.000
dim(dt_mtcars)
[1] 32 11
quantile(dt_mtcars [, wt])
0% 25% 50% 75% 100%
1.51300 2.58125 3.32500 3.61000 5.42400
print(summary(dt_mtcars))
mpg cyl disp
Min. :10.40 Min. :4.000 Min. : 71.1
1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8
Median :19.20 Median :6.000 Median :196.3
Mean :20.09 Mean :6.188 Mean :230.7
3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0
Max. :33.90 Max. :8.000 Max. :472.0
hp drat wt
Min. : 52.0 Min. :2.760 Min. :1.513
1st Qu.: 96.5 1st Qu.:3.080 1st Qu.:2.581
Median :123.0 Median :3.695 Median :3.325
Mean :146.7 Mean :3.597 Mean :3.217
3rd Qu.:180.0 3rd Qu.:3.920 3rd Qu.:3.610
Max. :335.0 Max. :4.930 Max. :5.424
qsec vs am
Min. :14.50 Min. :0.0000 Min. :0.0000
1st Qu.:16.89 1st Qu.:0.0000 1st Qu.:0.0000
Median :17.71 Median :0.0000 Median :0.0000
Mean :17.85 Mean :0.4375 Mean :0.4062
3rd Qu.:18.90 3rd Qu.:1.0000 3rd Qu.:1.0000
Max. :22.90 Max. :1.0000 Max. :1.0000
gear carb
Min. :3.000 Min. :1.000
1st Qu.:3.000 1st Qu.:2.000
Median :4.000 Median :2.000
Mean :3.688 Mean :2.812
3rd Qu.:4.000 3rd Qu.:4.000
Max. :5.000 Max. :8.000
print(colSums(is.na(dt_mtcars)))
mpg cyl disp hp drat wt qsec vs am gear
0 0 0 0 0 0 0 0 0 0
carb
0
ggpairs(dt_mtcars)
plot: [1, 1] [>---------------------------------------------------------------] 1% est: 0s
plot: [1, 2] [>---------------------------------------------------------------] 2% est: 5s
plot: [1, 3] [=>--------------------------------------------------------------] 2% est: 6s
plot: [1, 4] [=>--------------------------------------------------------------] 3% est: 6s
plot: [1, 5] [==>-------------------------------------------------------------] 4% est: 7s
plot: [1, 6] [==>-------------------------------------------------------------] 5% est: 7s
plot: [1, 7] [===>------------------------------------------------------------] 6% est: 7s
plot: [1, 8] [===>------------------------------------------------------------] 7% est: 8s
plot: [1, 9] [====>-----------------------------------------------------------] 7% est: 8s
plot: [1, 10] [====>----------------------------------------------------------] 8% est: 8s
plot: [1, 11] [=====>---------------------------------------------------------] 9% est: 8s
plot: [2, 1] [=====>----------------------------------------------------------] 10% est: 8s
plot: [2, 2] [======>---------------------------------------------------------] 11% est: 8s
plot: [2, 3] [======>---------------------------------------------------------] 12% est: 8s
plot: [2, 4] [=======>--------------------------------------------------------] 12% est: 8s
plot: [2, 5] [=======>--------------------------------------------------------] 13% est: 7s
plot: [2, 6] [========>-------------------------------------------------------] 14% est: 7s
plot: [2, 7] [=========>------------------------------------------------------] 15% est: 7s
plot: [2, 8] [=========>------------------------------------------------------] 16% est: 7s
plot: [2, 9] [==========>-----------------------------------------------------] 17% est: 7s
plot: [2, 10] [==========>----------------------------------------------------] 17% est: 7s
plot: [2, 11] [==========>----------------------------------------------------] 18% est: 7s
plot: [3, 1] [===========>----------------------------------------------------] 19% est: 7s
plot: [3, 2] [============>---------------------------------------------------] 20% est: 7s
plot: [3, 3] [============>---------------------------------------------------] 21% est: 7s
plot: [3, 4] [=============>--------------------------------------------------] 21% est: 6s
plot: [3, 5] [=============>--------------------------------------------------] 22% est: 6s
plot: [3, 6] [==============>-------------------------------------------------] 23% est: 6s
plot: [3, 7] [==============>-------------------------------------------------] 24% est: 6s
plot: [3, 8] [===============>------------------------------------------------] 25% est: 6s
plot: [3, 9] [===============>------------------------------------------------] 26% est: 6s
plot: [3, 10] [================>----------------------------------------------] 26% est: 6s
plot: [3, 11] [================>----------------------------------------------] 27% est: 6s
plot: [4, 1] [=================>----------------------------------------------] 28% est: 6s
plot: [4, 2] [==================>---------------------------------------------] 29% est: 6s
plot: [4, 3] [==================>---------------------------------------------] 30% est: 6s
plot: [4, 4] [===================>--------------------------------------------] 31% est: 6s
plot: [4, 5] [===================>--------------------------------------------] 31% est: 6s
plot: [4, 6] [====================>-------------------------------------------] 32% est: 6s
plot: [4, 7] [====================>-------------------------------------------] 33% est: 5s
plot: [4, 8] [=====================>------------------------------------------] 34% est: 5s
plot: [4, 9] [=====================>------------------------------------------] 35% est: 5s
plot: [4, 10] [=====================>-----------------------------------------] 36% est: 5s
plot: [4, 11] [======================>----------------------------------------] 36% est: 5s
plot: [5, 1] [=======================>----------------------------------------] 37% est: 5s
plot: [5, 2] [=======================>----------------------------------------] 38% est: 5s
plot: [5, 3] [========================>---------------------------------------] 39% est: 5s
plot: [5, 4] [========================>---------------------------------------] 40% est: 5s
plot: [5, 5] [=========================>--------------------------------------] 40% est: 5s
plot: [5, 6] [=========================>--------------------------------------] 41% est: 5s
plot: [5, 7] [==========================>-------------------------------------] 42% est: 5s
plot: [5, 8] [===========================>------------------------------------] 43% est: 5s
plot: [5, 9] [===========================>------------------------------------] 44% est: 4s
plot: [5, 10] [===========================>-----------------------------------] 45% est: 4s
plot: [5, 11] [============================>----------------------------------] 45% est: 4s
plot: [6, 1] [=============================>----------------------------------] 46% est: 4s
plot: [6, 2] [=============================>----------------------------------] 47% est: 4s
plot: [6, 3] [==============================>---------------------------------] 48% est: 4s
plot: [6, 4] [==============================>---------------------------------] 49% est: 4s
plot: [6, 5] [===============================>--------------------------------] 50% est: 4s
plot: [6, 6] [===============================>--------------------------------] 50% est: 4s
plot: [6, 7] [================================>-------------------------------] 51% est: 4s
plot: [6, 8] [================================>-------------------------------] 52% est: 4s
plot: [6, 9] [=================================>------------------------------] 53% est: 4s
plot: [6, 10] [=================================>-----------------------------] 54% est: 4s
plot: [6, 11] [=================================>-----------------------------] 55% est: 4s
plot: [7, 1] [==================================>-----------------------------] 55% est: 4s
plot: [7, 2] [===================================>----------------------------] 56% est: 3s
plot: [7, 3] [===================================>----------------------------] 57% est: 3s
plot: [7, 4] [====================================>---------------------------] 58% est: 3s
plot: [7, 5] [=====================================>--------------------------] 59% est: 3s
plot: [7, 6] [=====================================>--------------------------] 60% est: 3s
plot: [7, 7] [======================================>-------------------------] 60% est: 3s
plot: [7, 8] [======================================>-------------------------] 61% est: 3s
plot: [7, 9] [=======================================>------------------------] 62% est: 3s
plot: [7, 10] [=======================================>-----------------------] 63% est: 3s
plot: [7, 11] [=======================================>-----------------------] 64% est: 3s
plot: [8, 1] [========================================>-----------------------] 64% est: 3s
plot: [8, 2] [=========================================>----------------------] 65% est: 3s
plot: [8, 3] [=========================================>----------------------] 66% est: 3s
plot: [8, 4] [==========================================>---------------------] 67% est: 3s
plot: [8, 5] [==========================================>---------------------] 68% est: 3s
plot: [8, 6] [===========================================>--------------------] 69% est: 2s
plot: [8, 7] [===========================================>--------------------] 69% est: 2s
plot: [8, 8] [============================================>-------------------] 70% est: 2s
plot: [8, 9] [============================================>-------------------] 71% est: 2s
plot: [8, 10] [============================================>------------------] 72% est: 2s
plot: [8, 11] [=============================================>-----------------] 73% est: 2s
plot: [9, 1] [==============================================>-----------------] 74% est: 2s
plot: [9, 2] [===============================================>----------------] 74% est: 2s
plot: [9, 3] [===============================================>----------------] 75% est: 2s
plot: [9, 4] [================================================>---------------] 76% est: 2s
plot: [9, 5] [================================================>---------------] 77% est: 2s
plot: [9, 6] [=================================================>--------------] 78% est: 2s
plot: [9, 7] [=================================================>--------------] 79% est: 2s
plot: [9, 8] [==================================================>-------------] 79% est: 2s
plot: [9, 9] [==================================================>-------------] 80% est: 2s
plot: [9, 10] [==================================================>------------] 81% est: 1s
plot: [9, 11] [===================================================>-----------] 82% est: 1s
plot: [10, 1] [===================================================>-----------] 83% est: 1s
plot: [10, 2] [====================================================>----------] 83% est: 1s
plot: [10, 3] [====================================================>----------] 84% est: 1s
plot: [10, 4] [=====================================================>---------] 85% est: 1s
plot: [10, 5] [=====================================================>---------] 86% est: 1s
plot: [10, 6] [======================================================>--------] 87% est: 1s
plot: [10, 7] [======================================================>--------] 88% est: 1s
plot: [10, 8] [=======================================================>-------] 88% est: 1s
plot: [10, 9] [=======================================================>-------] 89% est: 1s
plot: [10, 10] [=======================================================>------] 90% est: 1s
plot: [10, 11] [=======================================================>------] 91% est: 1s
plot: [11, 1] [=========================================================>-----] 92% est: 1s
plot: [11, 2] [=========================================================>-----] 93% est: 1s
plot: [11, 3] [==========================================================>----] 93% est: 1s
plot: [11, 4] [==========================================================>----] 94% est: 0s
plot: [11, 5] [===========================================================>---] 95% est: 0s
plot: [11, 6] [===========================================================>---] 96% est: 0s
plot: [11, 7] [============================================================>--] 97% est: 0s
plot: [11, 8] [============================================================>--] 98% est: 0s
plot: [11, 9] [=============================================================>-] 98% est: 0s
plot: [11, 10] [============================================================>-] 99% est: 0s
plot: [11, 11] [==============================================================]100% est: 0s
cor(dt_mtcars [,.(mpg,disp,wt)])
mpg disp wt
mpg 1.0000000 -0.8475514 -0.8676594
disp -0.8475514 1.0000000 0.8879799
wt -0.8676594 0.8879799 1.0000000
cor(dt_mtcars [, 1:11])
mpg cyl disp hp drat wt qsec vs
mpg 1.0000000 -0.8521620 -0.8475514 -0.7761684 0.68117191 -0.8676594 0.41868403 0.6640389
cyl -0.8521620 1.0000000 0.9020329 0.8324475 -0.69993811 0.7824958 -0.59124207 -0.8108118
disp -0.8475514 0.9020329 1.0000000 0.7909486 -0.71021393 0.8879799 -0.43369788 -0.7104159
hp -0.7761684 0.8324475 0.7909486 1.0000000 -0.44875912 0.6587479 -0.70822339 -0.7230967
drat 0.6811719 -0.6999381 -0.7102139 -0.4487591 1.00000000 -0.7124406 0.09120476 0.4402785
wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.71244065 1.0000000 -0.17471588 -0.5549157
qsec 0.4186840 -0.5912421 -0.4336979 -0.7082234 0.09120476 -0.1747159 1.00000000 0.7445354
vs 0.6640389 -0.8108118 -0.7104159 -0.7230967 0.44027846 -0.5549157 0.74453544 1.0000000
am 0.5998324 -0.5226070 -0.5912270 -0.2432043 0.71271113 -0.6924953 -0.22986086 0.1683451
gear 0.4802848 -0.4926866 -0.5555692 -0.1257043 0.69961013 -0.5832870 -0.21268223 0.2060233
carb -0.5509251 0.5269883 0.3949769 0.7498125 -0.09078980 0.4276059 -0.65624923 -0.5696071
am gear carb
mpg 0.59983243 0.4802848 -0.55092507
cyl -0.52260705 -0.4926866 0.52698829
disp -0.59122704 -0.5555692 0.39497686
hp -0.24320426 -0.1257043 0.74981247
drat 0.71271113 0.6996101 -0.09078980
wt -0.69249526 -0.5832870 0.42760594
qsec -0.22986086 -0.2126822 -0.65624923
vs 0.16834512 0.2060233 -0.56960714
am 1.00000000 0.7940588 0.05753435
gear 0.79405876 1.0000000 0.27407284
carb 0.05753435 0.2740728 1.00000000
#Simple Regression Dependent Variable = mpg Independent Variable = am (transmisi) To investigate if transmission (auto=0, manual=1) is significant variable/predictor of miles per gallon (mpg) H0 : There is no significant difference between mpg and am H1 : There is a significant difference between mpg and am
shapiro.test(dt_mtcars [, mpg])
Shapiro-Wilk normality test
data: dt_mtcars[, mpg]
W = 0.94756, p-value = 0.1229
Null Hypothesis p < 0.05
t.test(mpg ~am, data= dt_mtcars)
Welch Two Sample t-test
data: mpg by am
t = -3.7671, df = 18.332, p-value = 0.001374
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
-11.280194 -3.209684
sample estimates:
mean in group 0 mean in group 1
17.14737 24.39231
p = 0,001374 < p = 0,05
#Simple Linear Regression
simregression <- lm(mpg ~ am, data= dt_mtcars)
summary(simregression)
Call:
lm(formula = mpg ~ am, data = dt_mtcars)
Residuals:
Min 1Q Median 3Q Max
-9.3923 -3.0923 -0.2974 3.2439 9.5077
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.147 1.125 15.247 1.13e-15 ***
am 7.245 1.764 4.106 0.000285 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.902 on 30 degrees of freedom
Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
summary(simregression2)
Call:
lm(formula = mpg ~ cyl, data = dt_mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.9814 -2.1185 0.2217 1.0717 7.5186
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.8846 2.0738 18.27 < 2e-16 ***
cyl -2.8758 0.3224 -8.92 6.11e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.206 on 30 degrees of freedom
Multiple R-squared: 0.7262, Adjusted R-squared: 0.7171
F-statistic: 79.56 on 1 and 30 DF, p-value: 6.113e-10
simregression3 <- lm(mpg ~ as.factor(cyl), data=dt_mtcars )
summary(simregression3)
Call:
lm(formula = mpg ~ as.factor(cyl), data = dt_mtcars)
Residuals:
Min 1Q Median 3Q Max
-5.2636 -1.8357 0.0286 1.3893 7.2364
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 26.6636 0.9718 27.437 < 2e-16 ***
as.factor(cyl)6 -6.9208 1.5583 -4.441 0.000119 ***
as.factor(cyl)8 -11.5636 1.2986 -8.905 8.57e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.223 on 29 degrees of freedom
Multiple R-squared: 0.7325, Adjusted R-squared: 0.714
F-statistic: 39.7 on 2 and 29 DF, p-value: 4.979e-09
dt_mtcars [, cyl6 := as.integer (ifelse(cyl==6, 1,0))]
dt_mtcars [, cyl8 := as.integer (ifelse(cyl==8, 1,0))]
dt_mtcars
summary(simregression4)
Call:
lm(formula = mpg ~ cyl6 + cyl8, data = dt_mtcars)
Residuals:
Min 1Q Median 3Q Max
-5.2636 -1.8357 0.0286 1.3893 7.2364
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 26.6636 0.9718 27.437 < 2e-16 ***
cyl6 -6.9208 1.5583 -4.441 0.000119 ***
cyl8 -11.5636 1.2986 -8.905 8.57e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.223 on 29 degrees of freedom
Multiple R-squared: 0.7325, Adjusted R-squared: 0.714
F-statistic: 39.7 on 2 and 29 DF, p-value: 4.979e-09
#MULTIPLE LINEAR REGRESSION
multiregression <- lm(mpg ~ ., data = dt_mtcars)
summary(multiregression)
Call:
lm(formula = mpg ~ ., data = dt_mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.4506 -1.6044 -0.1196 1.2193 4.6271
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.30337 18.71788 0.657 0.5181
cyl -0.11144 1.04502 -0.107 0.9161
disp 0.01334 0.01786 0.747 0.4635
hp -0.02148 0.02177 -0.987 0.3350
drat 0.78711 1.63537 0.481 0.6353
wt -3.71530 1.89441 -1.961 0.0633 .
qsec 0.82104 0.73084 1.123 0.2739
vs 0.31776 2.10451 0.151 0.8814
am 2.52023 2.05665 1.225 0.2340
gear 0.65541 1.49326 0.439 0.6652
carb -0.19942 0.82875 -0.241 0.8122
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.65 on 21 degrees of freedom
Multiple R-squared: 0.869, Adjusted R-squared: 0.8066
F-statistic: 13.93 on 10 and 21 DF, p-value: 3.793e-07
back_multiregression <- step(multiregression, direction ="backward", trace = FALSE)
summary(back_multiregression)
Call:
lm(formula = mpg ~ wt + qsec + am, data = dt_mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.4811 -1.5555 -0.7257 1.4110 4.6610
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.6178 6.9596 1.382 0.177915
wt -3.9165 0.7112 -5.507 6.95e-06 ***
qsec 1.2259 0.2887 4.247 0.000216 ***
am 2.9358 1.4109 2.081 0.046716 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.459 on 28 degrees of freedom
Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
par (mfrow = c(2,2))
plot(back_multiregression)
new_mtcars$cyl <- as.factor(new_mtcars$cyl)
new_mtcars$gear <- as.factor(new_mtcars$gear)
new_mtcars$carb <- as.factor(new_mtcars$carb)
new_mtcars$am <- as.factor(new_mtcars$am)
new_mtcars$vs <- as.factor(new_mtcars$vs)
#Plot mpg, wt & vs
ggplot(new_mtcars, aes(x=wt, y=mpg, color=vs)) + geom_smooth() + geom_point() + labs(x="Weight", y="Miles/gallon", color="Engine (0=V-Shaped,1=Straight)")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
summary(multiregression2)
Call:
lm(formula = mpg ~ ., data = new_mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.5087 -1.3584 -0.0948 0.7745 4.6251
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 23.87913 20.06582 1.190 0.2525
cyl6 -2.64870 3.04089 -0.871 0.3975
cyl8 -0.33616 7.15954 -0.047 0.9632
disp 0.03555 0.03190 1.114 0.2827
hp -0.07051 0.03943 -1.788 0.0939 .
drat 1.18283 2.48348 0.476 0.6407
wt -4.52978 2.53875 -1.784 0.0946 .
qsec 0.36784 0.93540 0.393 0.6997
vs1 1.93085 2.87126 0.672 0.5115
am1 1.21212 3.21355 0.377 0.7113
gear4 1.11435 3.79952 0.293 0.7733
gear5 2.52840 3.73636 0.677 0.5089
carb2 -0.97935 2.31797 -0.423 0.6787
carb3 2.99964 4.29355 0.699 0.4955
carb4 1.09142 4.44962 0.245 0.8096
carb6 4.47757 6.38406 0.701 0.4938
carb8 7.25041 8.36057 0.867 0.3995
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.833 on 15 degrees of freedom
Multiple R-squared: 0.8931, Adjusted R-squared: 0.779
F-statistic: 7.83 on 16 and 15 DF, p-value: 0.000124
back_multiregression2 <- step(multiregression2, direction="backward", trace=FALSE)
summary(back_multiregression2)
Call:
lm(formula = mpg ~ cyl + hp + wt + am, data = new_mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.9387 -1.2560 -0.4013 1.1253 5.0513
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.70832 2.60489 12.940 7.73e-13 ***
cyl6 -3.03134 1.40728 -2.154 0.04068 *
cyl8 -2.16368 2.28425 -0.947 0.35225
hp -0.03211 0.01369 -2.345 0.02693 *
wt -2.49683 0.88559 -2.819 0.00908 **
am1 1.80921 1.39630 1.296 0.20646
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.41 on 26 degrees of freedom
Multiple R-squared: 0.8659, Adjusted R-squared: 0.8401
F-statistic: 33.57 on 5 and 26 DF, p-value: 1.506e-10
both_multiregression <- stepAIC(multiregression2, direction = "both", trace= FALSE)
summary(both_multiregression)
Call:
lm(formula = mpg ~ cyl + hp + wt + am, data = new_mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.9387 -1.2560 -0.4013 1.1253 5.0513
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.70832 2.60489 12.940 7.73e-13 ***
cyl6 -3.03134 1.40728 -2.154 0.04068 *
cyl8 -2.16368 2.28425 -0.947 0.35225
hp -0.03211 0.01369 -2.345 0.02693 *
wt -2.49683 0.88559 -2.819 0.00908 **
am1 1.80921 1.39630 1.296 0.20646
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.41 on 26 degrees of freedom
Multiple R-squared: 0.8659, Adjusted R-squared: 0.8401
F-statistic: 33.57 on 5 and 26 DF, p-value: 1.506e-10
Intepretation: - The intercept represents the average improvement in miles per gallon (mpg) for an automatic car with 4 cylinders, assuming horsepower (hp) and weight (wt) are zero. - Moving from 4 to 6 cylinders decreases mpg by 3.013. - Moving from 4 to 8 cylinders decreases mpg by 2.16. - A 1-unit increase in horsepower(hp) decreases mpg by 0.032. - A 1-unit increase in weight(wt) decreases mpg by 2.5. - Manual transmission increases mpg by 1.809.
# Step 7: Diagnostic plots to check assumptions
par(mfrow = c(2, 2)) # 2x2 plot layout
plot(back_multiregression2)
Here’s a description of the diagnostic plots shown in the image:
This plot checks the linearity assumption. Ideally, residuals should be randomly scattered around the horizontal line (y = 0).
Result : In this case, there seems to be a slight curve, suggesting potential non-linearity or an issue with model fit.
This plot checks if residuals follow a normal distribution. Points should fall along the diagonal line.
Result : Most points follow the line, but deviations in the tails suggest that the residuals may not be perfectly normally distributed.
This plot tests for homoscedasticity (constant variance of residuals). The residuals should be evenly spread along the fitted values.
Result : The red line appears fairly flat, but some spread increases on the right side, indicating potential heteroscedasticity (non-constant variance).
This plot identifies influential points and potential outliers. Points with high leverage and large residuals could disproportionately affect the model.
Result : A few points, like 17, 18, and 20, are highlighted near Cook’s distance lines, which could indicate influential data points.
Overall, the diagnostic plots suggest that while the model performs reasonably well, there may be concerns regarding non-linearity, potential outliers, and slight heteroscedasticity. Adjustments to the model or further investigation of influential points may be needed.
# Step 8: Check for multicollinearity using VIF (Variance Inflation Factor)
vif(back_multiregression2)
GVIF Df GVIF^(1/(2*Df))
cyl 5.824545 2 1.553515
hp 4.703625 1 2.168784
wt 4.007113 1 2.001778
am 2.590777 1 1.609589
Calculated the Generalized Variance Inflation Factor (GVIF) for variables to check for multicollinearity. Interpretation of the results:
GVIF: Generalized Variance Inflation Factor, which adjusts for the degrees of freedom (Df). Df: Degrees of freedom for each variable.
Interpretation: cyl: GVIF^(1/(2Df)) = 1.553515 hp: GVIF^(1/(2Df)) = 2.168784 wt: GVIF^(1/(2Df)) = 2.001778 am: GVIF^(1/(2Df)) = 1.609589
Generally, a GVIF^(1/(2*Df)) value greater than 2.5 or 3 indicates potential multicollinearity issues. In this case, none of the variables exceed this threshold, suggesting that multicollinearity is not a significant problem for the model.
# Step 9: Conclusion (print output)
cat("\nBest variables selected for predicting mpg:\n")
Best variables selected for predicting mpg:
print(names(coef(back_multiregression2)))
[1] "(Intercept)" "cyl6" "cyl8" "hp" "wt" "am1"
cat("\nModel Summary:\n")
Model Summary:
print(summary(back_multiregression2))
Call:
lm(formula = mpg ~ cyl + hp + wt + am, data = new_mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.9387 -1.2560 -0.4013 1.1253 5.0513
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33.70832 2.60489 12.940 7.73e-13 ***
cyl6 -3.03134 1.40728 -2.154 0.04068 *
cyl8 -2.16368 2.28425 -0.947 0.35225
hp -0.03211 0.01369 -2.345 0.02693 *
wt -2.49683 0.88559 -2.819 0.00908 **
am1 1.80921 1.39630 1.296 0.20646
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.41 on 26 degrees of freedom
Multiple R-squared: 0.8659, Adjusted R-squared: 0.8401
F-statistic: 33.57 on 5 and 26 DF, p-value: 1.506e-10
Based on the linear regression model summary, here are the key conclusions:
In summary, the model suggests that the number of cylinders (specifically 6 cylinders), horsepower, and weight are significant factors affecting a car’s fuel efficiency, while the type of transmission and having 8 cylinders are not significant predictors in this context. The model fits the data well and explains a large portion of the variability in fuel efficiency.