Take the mtcars data set and write up an analysis to answer their question using regression models and exploratory data analyses. Your report must be: Written as a PDF printout of a compiled (using knitr) R markdown document. Brief. Roughly the equivalent of 2 pages or less for the main text. Supporting figures in an appendix can be included up to 5 total pages including the 2 for the main report. The appendix can only include figures. Include a first paragraph executive summary.
A data frame with 32 observations on 11 variables.
[, 1] mpg Miles/(US) gallon
[, 2] cyl Number of cylinders
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio ratio between the driveshaft revolutions and the rear axle’s revolutions per minute.
[, 6] wt Weight (1000 lbs)
[, 7] qsec 1/4 mile time
[, 8] vs V/S V engine 0 or straight or inline engine 1 binary
[, 9] am Transmission (0 = automatic, 1 = manual) binary
[,10] gear Number of forward gears integer
[,11] carb Number of carburetors integer
autom <- subset(mtcars, mtcars$am==0)
manual <- subset(mtcars, mtcars$am==1)
fita <- lm(mpg ~ ., data = autom)
fitm <- lm(mpg ~ ., data = manual)
step(fita, k = log(nrow(autom)))
## Start: AIC=37.67
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
##
##
## Step: AIC=37.67
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + gear + carb
##
## Df Sum of Sq RSS AIC
## - drat 1 0.1256 29.424 34.810
## - vs 1 0.2817 29.580 34.910
## - cyl 1 0.7301 30.028 35.196
## - wt 1 2.4968 31.795 36.282
## - disp 1 4.3918 33.690 37.382
## - qsec 1 4.4261 33.724 37.402
## <none> 29.298 37.673
## - hp 1 5.9415 35.240 38.237
## - gear 1 16.0652 45.363 43.035
## - carb 1 20.8593 50.157 44.944
##
## Step: AIC=34.81
## mpg ~ cyl + disp + hp + wt + qsec + vs + gear + carb
##
## Df Sum of Sq RSS AIC
## - vs 1 0.2652 29.689 32.036
## - cyl 1 1.7497 31.173 32.963
## - wt 1 2.4244 31.848 33.370
## - disp 1 4.8457 34.269 34.762
## <none> 29.424 34.810
## - qsec 1 5.1987 34.622 34.957
## - hp 1 8.0369 37.460 36.454
## - carb 1 20.9294 50.353 42.073
## - gear 1 22.8653 52.289 42.790
##
## Step: AIC=32.04
## mpg ~ cyl + disp + hp + wt + qsec + gear + carb
##
## Df Sum of Sq RSS AIC
## - wt 1 2.2129 31.902 30.457
## - cyl 1 3.6986 33.387 31.322
## - disp 1 4.6443 34.333 31.853
## - qsec 1 4.9751 34.664 32.035
## <none> 29.689 32.036
## - hp 1 8.2840 37.973 33.767
## - carb 1 21.7825 51.471 39.546
## - gear 1 23.1149 52.804 40.032
##
## Step: AIC=30.46
## mpg ~ cyl + disp + hp + qsec + gear + carb
##
## Df Sum of Sq RSS AIC
## - qsec 1 2.772 34.674 29.096
## - cyl 1 2.881 34.782 29.155
## - disp 1 3.055 34.957 29.251
## <none> 31.902 30.457
## - hp 1 6.168 38.069 30.871
## - gear 1 23.036 54.938 37.840
## - carb 1 34.766 66.668 41.517
##
## Step: AIC=29.1
## mpg ~ cyl + disp + hp + gear + carb
##
## Df Sum of Sq RSS AIC
## - cyl 1 0.782 35.456 26.575
## - disp 1 5.689 40.363 29.038
## <none> 34.674 29.096
## - hp 1 8.928 43.602 30.505
## - gear 1 23.097 57.771 35.851
## - carb 1 35.065 69.739 39.428
##
## Step: AIC=26.58
## mpg ~ disp + hp + gear + carb
##
## Df Sum of Sq RSS AIC
## <none> 35.456 26.575
## - disp 1 8.053 43.509 27.520
## - hp 1 8.365 43.821 27.656
## - gear 1 27.866 63.322 34.650
## - carb 1 37.681 73.137 37.388
##
## Call:
## lm(formula = mpg ~ disp + hp + gear + carb, data = autom)
##
## Coefficients:
## (Intercept) disp hp gear carb
## -1.81912 -0.01180 0.04693 7.64765 -3.53781
step(fitm, k = log(nrow(manual)))
## Start: AIC=25.61
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
##
##
## Step: AIC=25.61
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + gear + carb
##
## Df Sum of Sq RSS AIC
## - vs 1 0.1484 13.106 23.190
## - cyl 1 0.3444 13.302 23.383
## <none> 12.958 25.607
## - drat 1 3.5389 16.497 26.181
## - disp 1 4.5333 17.491 26.942
## - hp 1 5.4058 18.364 27.575
## - carb 1 5.7473 18.705 27.815
## - gear 1 14.7990 27.757 32.945
## - wt 1 19.2325 32.190 34.872
## - qsec 1 23.7588 36.717 36.582
##
## Step: AIC=23.19
## mpg ~ cyl + disp + hp + drat + wt + qsec + gear + carb
##
## Df Sum of Sq RSS AIC
## - cyl 1 0.2996 13.406 20.919
## <none> 13.106 23.190
## - drat 1 3.4388 16.545 23.654
## - disp 1 4.7132 17.819 24.619
## - carb 1 6.4223 19.528 25.810
## - hp 1 7.9243 21.030 26.773
## - wt 1 20.9326 34.039 33.033
## - qsec 1 24.9928 38.099 34.498
## - gear 1 25.9273 39.033 34.813
##
## Step: AIC=20.92
## mpg ~ disp + hp + drat + wt + qsec + gear + carb
##
## Df Sum of Sq RSS AIC
## <none> 13.406 20.919
## - drat 1 6.450 19.856 23.461
## - hp 1 18.886 32.292 29.783
## - disp 1 22.215 35.620 31.058
## - carb 1 28.064 41.469 33.035
## - gear 1 28.131 41.537 33.056
## - qsec 1 47.030 60.436 37.931
## - wt 1 52.341 65.747 39.026
##
## Call:
## lm(formula = mpg ~ disp + hp + drat + wt + qsec + gear + carb,
## data = manual)
##
## Coefficients:
## (Intercept) disp hp drat wt
## -125.3722 0.1269 -0.1191 -3.4887 -9.6853
## qsec gear carb
## 7.2555 10.9650 3.4573
summary(autom)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. :120.1 Min. : 62.0
## 1st Qu.:14.95 1st Qu.:6.000 1st Qu.:196.3 1st Qu.:116.5
## Median :17.30 Median :8.000 Median :275.8 Median :175.0
## Mean :17.15 Mean :6.947 Mean :290.4 Mean :160.3
## 3rd Qu.:19.20 3rd Qu.:8.000 3rd Qu.:360.0 3rd Qu.:192.5
## Max. :24.40 Max. :8.000 Max. :472.0 Max. :245.0
## drat wt qsec vs
## Min. :2.760 Min. :2.465 Min. :15.41 Min. :0.0000
## 1st Qu.:3.070 1st Qu.:3.438 1st Qu.:17.18 1st Qu.:0.0000
## Median :3.150 Median :3.520 Median :17.82 Median :0.0000
## Mean :3.286 Mean :3.769 Mean :18.18 Mean :0.3684
## 3rd Qu.:3.695 3rd Qu.:3.842 3rd Qu.:19.17 3rd Qu.:1.0000
## Max. :3.920 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0 Min. :3.000 Min. :1.000
## 1st Qu.:0 1st Qu.:3.000 1st Qu.:2.000
## Median :0 Median :3.000 Median :3.000
## Mean :0 Mean :3.211 Mean :2.737
## 3rd Qu.:0 3rd Qu.:3.000 3rd Qu.:4.000
## Max. :0 Max. :4.000 Max. :4.000
summary(manual)
## mpg cyl disp hp
## Min. :15.00 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:21.00 1st Qu.:4.000 1st Qu.: 79.0 1st Qu.: 66.0
## Median :22.80 Median :4.000 Median :120.3 Median :109.0
## Mean :24.39 Mean :5.077 Mean :143.5 Mean :126.8
## 3rd Qu.:30.40 3rd Qu.:6.000 3rd Qu.:160.0 3rd Qu.:113.0
## Max. :33.90 Max. :8.000 Max. :351.0 Max. :335.0
## drat wt qsec vs
## Min. :3.54 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.85 1st Qu.:1.935 1st Qu.:16.46 1st Qu.:0.0000
## Median :4.08 Median :2.320 Median :17.02 Median :1.0000
## Mean :4.05 Mean :2.411 Mean :17.36 Mean :0.5385
## 3rd Qu.:4.22 3rd Qu.:2.780 3rd Qu.:18.61 3rd Qu.:1.0000
## Max. :4.93 Max. :3.570 Max. :19.90 Max. :1.0000
## am gear carb
## Min. :1 Min. :4.000 Min. :1.000
## 1st Qu.:1 1st Qu.:4.000 1st Qu.:1.000
## Median :1 Median :4.000 Median :2.000
## Mean :1 Mean :4.385 Mean :2.923
## 3rd Qu.:1 3rd Qu.:5.000 3rd Qu.:4.000
## Max. :1 Max. :5.000 Max. :8.000
# dp <- ggplot(mtcars, aes(x=dose, y=len, fill=dose)) +
# geom_violin(trim=FALSE)+
# geom_boxplot(width=0.1, fill="white")+
# labs(title="Plot of length by dose",x="Dose (mg)", y = "Length")
library(psych)
# pairs.panels(mtcars,
# method = "spearman", # correlation method
# hist.col = "#00AFBB",
# density = TRUE, # show density plots
# ellipses = FALSE # show correlation ellipses
# )
pairs.panels(mtcars[, c(1,3,4,5,6,7)],lm=TRUE,ellipses=FALSE,hist.col = "#00AFBB")
pairs.panels(mtcars[, c(1,2,8,9,10,11)],lm=TRUE,ellipses=FALSE,hist.col = "blue")
describe(mtcars)
## vars n mean sd median trimmed mad min max range skew
## mpg 1 32 20.09 6.03 19.20 19.70 5.41 10.40 33.90 23.50 0.61
## cyl 2 32 6.19 1.79 6.00 6.23 2.97 4.00 8.00 4.00 -0.17
## disp 3 32 230.72 123.94 196.30 222.52 140.48 71.10 472.00 400.90 0.38
## hp 4 32 146.69 68.56 123.00 141.19 77.10 52.00 335.00 283.00 0.73
## drat 5 32 3.60 0.53 3.70 3.58 0.70 2.76 4.93 2.17 0.27
## wt 6 32 3.22 0.98 3.33 3.15 0.77 1.51 5.42 3.91 0.42
## qsec 7 32 17.85 1.79 17.71 17.83 1.42 14.50 22.90 8.40 0.37
## vs 8 32 0.44 0.50 0.00 0.42 0.00 0.00 1.00 1.00 0.24
## am 9 32 0.41 0.50 0.00 0.38 0.00 0.00 1.00 1.00 0.36
## gear 10 32 3.69 0.74 4.00 3.62 1.48 3.00 5.00 2.00 0.53
## carb 11 32 2.81 1.62 2.00 2.65 1.48 1.00 8.00 7.00 1.05
## kurtosis se
## mpg -0.37 1.07
## cyl -1.76 0.32
## disp -1.21 21.91
## hp -0.14 12.12
## drat -0.71 0.09
## wt -0.02 0.17
## qsec 0.34 0.32
## vs -2.00 0.09
## am -1.92 0.09
## gear -1.07 0.13
## carb 1.26 0.29
require(datasets); data(mtcars); require(GGally)
## Loading required package: GGally
ggpairs(mtcars, lower = list(continuous = "smooth"), parameters = c(method = "loess"))
## Warning in warn_if_args_exist(list(...)): Extra arguments: 'parameters' are
## being ignored. If these are meant to be aesthetics, submit them using the
## 'mapping' variable within ggpairs with ggplot2::aes or ggplot2::aes_string.
summary(lm(mpg ~ . , data = mtcars))$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.30337416 18.71788443 0.6573058 0.51812440
## cyl -0.11144048 1.04502336 -0.1066392 0.91608738
## disp 0.01333524 0.01785750 0.7467585 0.46348865
## hp -0.02148212 0.02176858 -0.9868407 0.33495531
## drat 0.78711097 1.63537307 0.4813036 0.63527790
## wt -3.71530393 1.89441430 -1.9611887 0.06325215
## qsec 0.82104075 0.73084480 1.1234133 0.27394127
## vs 0.31776281 2.10450861 0.1509915 0.88142347
## am 2.52022689 2.05665055 1.2254035 0.23398971
## gear 0.65541302 1.49325996 0.4389142 0.66520643
## carb -0.19941925 0.82875250 -0.2406258 0.81217871
summary(lm(mpg ~ am, data = mtcars))$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147368 1.124603 15.247492 1.133983e-15
## am 7.244939 1.764422 4.106127 2.850207e-04
custom_car <- ggpairs(mtcars[, c("mpg", "wt", "cyl", "am")], upper = "blank", title = "Custom Example")
plot <- ggplot2::ggplot(mtcars, ggplot2::aes(x=am, y=mpg, label=rownames(mtcars)))
plot <- plot +
ggplot2::geom_text(ggplot2::aes(colour=factor(am)), size = 3) +
ggplot2::scale_colour_discrete(l=40)
# custom_car[1, 4] <- plot
# plot
# car
library(GGally)
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
library(scales)
##
## Attaching package: 'scales'
## The following objects are masked from 'package:psych':
##
## alpha, rescale
data("mtcars")
# Function to return points and geom_smooth
# allow for the method to be changed
my_fn <- function(data, mapping, method="lm", ...){
p <- ggplot(data = data, mapping = mapping) +
geom_point() +
geom_smooth(method=method, ...)
p
}
# Default loess curve
ggpairs(mtcars[, c(1,3,4,5,6,7)], lower = list(continuous = my_fn))
ggpairs(mtcars[, c(9,1)], lower = list(continuous = my_fn))
library(datasets); data(mtcars); require(stats); require(graphics)
pairs(mtcars, panel = panel.smooth, main = "Modern Trend")
library("GGally")
library(ggplot2)
plot1 <- ggpairs(mtcars, lower=list(continuous="smooth"),
diag=list(continuous=c("barDiag","densityDiag")), axisLabels="show",
columns = c("mpg", "wt", "qsec", "hp", "disp", "drat"))
## Warning in if (!str_detect(val, "Diag$")) {: the condition has length > 1
## and only the first element will be used
## Warning in if (ret == ".subType") {: the condition has length > 1 and only
## the first element will be used
## Warning in if (ret == ".subType") {: the condition has length > 1 and only
## the first element will be used
## Warning in if (ret == ".subType") {: the condition has length > 1 and only
## the first element will be used
## Warning in if (ret == ".subType") {: the condition has length > 1 and only
## the first element will be used
## Warning in if (ret == ".subType") {: the condition has length > 1 and only
## the first element will be used
## Warning in if (ret == ".subType") {: the condition has length > 1 and only
## the first element will be used
plot1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
plot2 <- ggpairs(mtcars, lower=list(continuous="smooth"),
diag=list(continuous="barDiag"), axisLabels="show",
columns = c("mpg", "am", "cyl", "vs", "carb", "gear"))
plot2
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
You work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). They are particularly interested in the following two questions: “Is an automatic or manual transmission better for MPG” “Quantify the MPG difference between automatic and manual transmissions”
Did the student interpret the coefficients correctly?
Did the student do some exploratory data analyses?
Did the student fit multiple models and detail their strategy for model selection?
Did the student answer the questions of interest or detail why the question(s) is (are) not answerable?
Did the student do a residual plot and some diagnostics?
Did the student quantify the uncertainty in their conclusions and/or perform an inference correctly?
Was the report brief (about 2 pages long) for the main body of the report and no longer than 5 with supporting appendix of figures?
Did the report include an executive summary?
Was the report done in Rmd (knitr)?