Objective: Introduce the “mtcars” dataset and explain the reason for its selection.
The “mtcars” dataset is a well-known dataset in R that provides useful details about different car models. It was compiled from the 1974 Motor Trend US magazine and includes data on 32 cars, such as miles per gallon (mpg), number of cylinders (cyl), displacement (disp), horsepower (hp), and other factors that significantly impact a car’s performance and fuel efficiency.
I chose the “mtcars” dataset because it’s widely utilized in basic data analysis and statistical modeling. It’s great for examining how various car features relate to fuel efficiency and performance. Its simplicity and real-world applicability make it perfect for practicing data manipulation, visualization, and modeling in R.
# Load the mtcars dataset
data(mtcars)
# Display the first few rows of the dataset
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Objective: Describe the variables present in the “mtcars” dataset.
# Display column names and their corresponding variables
names(mtcars)
## [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
## [11] "carb"
Interpretation of these results:
The “mpg” column represents miles per gallon (continuous variable).
The “cyl” column represents the number of cylinders (categorical variable).
The “disp” column represents displacement (continuous variable).
The “hp” column represents horsepower (continuous variable).
The “drat” column represents rear axle ratio (continuous variable).
The “wt” column represents weight (continuous variable).
The “qsec” column represents quarter mile time (continuous variable).
The “vs” column represents engine (0 = V-shaped, 1 = straight; categorical variable).
The “am” column represents transmission (0 = automatic, 1 = manual; categorical variable).
The “gear” column represents number of forward gears (categorical variable).
The “carb” column represents number of carburetors (categorical variable).
Objective: Discuss the number of observations in the “mtcars” dataset.
# Check the number of rows/observations in the dataset
nrow(mtcars)
## [1] 32
Interpretation: Each row in the “mtcars” dataset represents a different car model. There are a total of 32 rows/observations in the dataset.
Objective: Analyze if there are any missing values in the dataset.
# Check for missing values
any(is.na(mtcars))
## [1] FALSE
Interpretation: There are no missing values in “mtcars” dataset.
summary(mtcars[, c("mpg", "disp", "hp", "drat", "wt", "qsec")])
## mpg disp hp drat
## Min. :10.40 Min. : 71.1 Min. : 52.0 Min. :2.760
## 1st Qu.:15.43 1st Qu.:120.8 1st Qu.: 96.5 1st Qu.:3.080
## Median :19.20 Median :196.3 Median :123.0 Median :3.695
## Mean :20.09 Mean :230.7 Mean :146.7 Mean :3.597
## 3rd Qu.:22.80 3rd Qu.:326.0 3rd Qu.:180.0 3rd Qu.:3.920
## Max. :33.90 Max. :472.0 Max. :335.0 Max. :4.930
## wt qsec
## Min. :1.513 Min. :14.50
## 1st Qu.:2.581 1st Qu.:16.89
## Median :3.325 Median :17.71
## Mean :3.217 Mean :17.85
## 3rd Qu.:3.610 3rd Qu.:18.90
## Max. :5.424 Max. :22.90
Interpretation: The summary statistics provide information about the central tendency, spread, and distribution of the numeric variables in the dataset, including minimum, maximum, median, mean, and quartiles.
table(mtcars$cyl)
##
## 4 6 8
## 11 7 14
table(mtcars$vs)
##
## 0 1
## 18 14
table(mtcars$am)
##
## 0 1
## 19 13
table(mtcars$gear)
##
## 3 4 5
## 15 12 5
table(mtcars$carb)
##
## 1 2 3 4 6 8
## 7 10 3 10 1 1
Interpretation: The frequency tables displays the count of observations for each category of the categorical variables in the dataset, providing insights into the distribution of these variables.We do these descriptive aalytics to gain a better understanding of the “mtcars” dataset’s characteristics and distribution of variables.
Objective: To analyze the variation in fuel efficiency (mpg) among different car models, identifying common trends and the range of mpg values across the dataset
hist(mtcars$mpg, main = "Distribution of Miles Per Gallon", xlab = "Miles Per Gallon", ylab = "Frequency")
Interpretation: The histogram of miles per gallon (mpg) shows a bell-shaped distribution, with a peak in the mid to high teens mpg. This indicates that many car models in the dataset have mpg values clustered in this range. However, there’s notable variability, with some models achieving much higher or lower mpg. This gives insight into the fuel efficiency across different car models, showcasing both the range of mpg values and common trends in mpg levels among the models.
Objective: To investigate the relationship between the number of cylinders (cyl) and the miles per gallon (mpg) of cars, aiming to understand how changes in the number of cylinders impact fuel efficiency.
boxplot(mpg ~ cyl, data = mtcars, main = "Miles Per Gallon by Number of Cylinders", xlab = "Number of Cylinders", ylab = "Miles Per Gallon")
Interpretation: The boxplot depicting the relationship between the number of cylinders (cyl) and miles per gallon (mpg) reveals interesting insights. As the number of cylinders increases, there seems to be a general trend of decreasing mpg, indicating that cars with more cylinders tend to have lower fuel efficiency. This suggests that the number of cylinders is an important factor influencing a car’s fuel economy.
Objective: To explore the relationship between transmission type (am) and both miles per gallon (mpg) and horsepower (hp) of cars, aiming to understand how different transmission types affect fuel efficiency and engine power.
plot(mtcars$mpg, mtcars$hp, col = ifelse(mtcars$am == 0, "red", "blue"), pch = ifelse(mtcars$am == 0, 16, 17), main = "Relationship between MPG, HP, and Transmission Type", xlab = "Miles Per Gallon (mpg)", ylab = "Horsepower (hp)", xlim = c(0, 40), ylim = c(0, 400))
legend("topright", legend = c("Automatic", "Manual"), col = c("red", "blue"), pch = c(16, 17), title = "Transmission Type")
Interpretation: The scatter plot visualizes the relationship between miles per gallon (mpg), horsepower (hp), and transmission type (am) of cars. Each point represents a car model, with red points indicating automatic transmissions and blue points indicating manual transmissions. From the visualization, it can be observed that there is no clear pattern between mpg, hp, and transmission type. However, there seems to be a tendency for cars with manual transmissions (blue points) to have higher horsepower and slightly better mpg compared to cars with automatic transmissions (red points). This suggests that manual transmissions may be associated with higher performance and potentially better fuel efficiency in the dataset.