Libraries

library(ISLR2)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(lubridate)
library(dplyr)
library(ggplot2)

Dataset

str(Auto)

## 'data.frame':    392 obs. of  9 variables:
##  $ mpg         : num  18 15 18 16 17 15 14 14 14 15 ...
##  $ cylinders   : int  8 8 8 8 8 8 8 8 8 8 ...
##  $ displacement: num  307 350 318 304 302 429 454 440 455 390 ...
##  $ horsepower  : int  130 165 150 150 140 198 220 215 225 190 ...
##  $ weight      : int  3504 3693 3436 3433 3449 4341 4354 4312 4425 3850 ...
##  $ acceleration: num  12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
##  $ year        : int  70 70 70 70 70 70 70 70 70 70 ...
##  $ origin      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ name        : Factor w/ 304 levels "amc ambassador brougham",..: 49 36 231 14 161 141 54 223 241 2 ...
##  - attr(*, "na.action")= 'omit' Named int [1:5] 33 127 331 337 355
##   ..- attr(*, "names")= chr [1:5] "33" "127" "331" "337" ...

a) Which of the predictors are quantitative, and which are qualitative?

quantitative_predictors <- data.frame(
  variable = c("mpg", "cylinders", "displacement", "horsepower", "weight", "acceleration", "year"),
  type = "Quantitative")

qualitative_predictors <- data.frame(
  variable = c("origin", "name"),
  type = "Qualitative")

predictors_info <- rbind(quantitative_predictors, qualitative_predictors)

print(predictors_info)

##       variable         type
## 1          mpg Quantitative
## 2    cylinders Quantitative
## 3 displacement Quantitative
## 4   horsepower Quantitative
## 5       weight Quantitative
## 6 acceleration Quantitative
## 7         year Quantitative
## 8       origin  Qualitative
## 9         name  Qualitative

b) What is the range of each quantitative predictor? You can answer this using the range() function.

range_Auto <- data.frame(sapply(Auto[ ,1:7], range))
rownames(range_Auto) <- c("min", "max")
range_Auto

##      mpg cylinders displacement horsepower weight acceleration year
## min  9.0         3           68         46   1613          8.0   70
## max 46.6         8          455        230   5140         24.8   82

c) What is the mean and standard deviation of each quantitative predictor?

Mean

sapply(Auto[ ,1:7], mean)

##          mpg    cylinders displacement   horsepower       weight acceleration 
##    23.445918     5.471939   194.411990   104.469388  2977.584184    15.541327 
##         year 
##    75.979592

Standard deviation

sapply(Auto[ ,1:7], sd)

##          mpg    cylinders displacement   horsepower       weight acceleration 
##     7.805007     1.705783   104.644004    38.491160   849.402560     2.758864 
##         year 
##     3.683737

d) Remove the 10th through 85th observations. What’s the range, mean, and standard deviation of each predictor in the subset of the data that remains?

Range

range_Auto_subset <- data.frame(sapply(Auto[-c(10:85),1:7], range))
rownames(range_Auto) <- c("min", "max")
range_Auto_subset

##    mpg cylinders displacement horsepower weight acceleration year
## 1 11.0         3           68         46   1649          8.5   70
## 2 46.6         8          455        230   4997         24.8   82

Mean

sapply(Auto[-c(10:85),1:7], mean)

##          mpg    cylinders displacement   horsepower       weight acceleration 
##    24.404430     5.373418   187.240506   100.721519  2935.971519    15.726899 
##         year 
##    77.145570

Standard deviation

sapply(Auto[-c(10:85),1:7], sd)

##          mpg    cylinders displacement   horsepower       weight acceleration 
##     7.867283     1.654179    99.678367    35.708853   811.300208     2.693721 
##         year 
##     3.106217

e) Using the full dataset, investigate the predictors graphically, using scatterplots or other tools of your choice. Create some plots highlighting the relationships among the predictors. Comment on your findings.

pairs(Auto[,1:7])

f) Suppose that we wish to predict gas mileage (mpg) on the basis of other variables. Do your plots suggest that any of the other variables might be useful in predicting mpg? Justify your answer.

Based on the plots above, some of the variables that might be useful in predicting mpg are cylinders, horsepower, acceleration, and year

round(cor(Auto[,1:7]),3)

##                 mpg cylinders displacement horsepower weight acceleration
## mpg           1.000    -0.778       -0.805     -0.778 -0.832        0.423
## cylinders    -0.778     1.000        0.951      0.843  0.898       -0.505
## displacement -0.805     0.951        1.000      0.897  0.933       -0.544
## horsepower   -0.778     0.843        0.897      1.000  0.865       -0.689
## weight       -0.832     0.898        0.933      0.865  1.000       -0.417
## acceleration  0.423    -0.505       -0.544     -0.689 -0.417        1.000
## year          0.581    -0.346       -0.370     -0.416 -0.309        0.290
##                year
## mpg           0.581
## cylinders    -0.346
## displacement -0.370
## horsepower   -0.416
## weight       -0.309
## acceleration  0.290
## year          1.000

20242902_In-class1

PhamMinhTam

2024-02-29

Libraries

Dataset

a) Which of the predictors are quantitative, and which are qualitative?

b) What is the range of each quantitative predictor? You can answer this using the range() function.

c) What is the mean and standard deviation of each quantitative predictor?

Mean

Standard deviation

d) Remove the 10th through 85th observations. What’s the range, mean, and standard deviation of each predictor in the subset of the data that remains?

Range

Mean

Standard deviation

e) Using the full dataset, investigate the predictors graphically, using scatterplots or other tools of your choice. Create some plots highlighting the relationships among the predictors. Comment on your findings.

f) Suppose that we wish to predict gas mileage (mpg) on the basis of other variables. Do your plots suggest that any of the other variables might be useful in predicting mpg? Justify your answer.