In this analysis, we will explore the mtcars dataset,
which contains information about various car models and their
specifications. We will focus on attributes like miles per gallon (mpg),
horsepower (hp), weight (wt), and number of cylinders (cyl). The goal is
to perform data wrangling using dplyr, generate summary
statistics, and visualize relationships between key variables using
ggplot2.
The mtcars dataset is built into R, so no import is
needed. We will start by performing some data wrangling tasks such as
selecting, filtering, and mutating the data.
# Loading necessary libraries
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.2
# View the first few rows of the mtcars dataset
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# Data Wrangling: Selecting, filtering, and mutating
mtcars_clean <- mtcars %>%
select(mpg, hp, wt, cyl, gear) %>% # Selecting specific columns
filter(hp > 100) %>% # Filtering cars with more than 100 horsepower
mutate(mpg_per_1000wt = mpg / (wt / 1000)) # Mutating a new column
# Show the first few rows of the cleaned dataset
head(mtcars_clean)
## mpg hp wt cyl gear mpg_per_1000wt
## Mazda RX4 21.0 110 2.620 6 4 8015.267
## Mazda RX4 Wag 21.0 110 2.875 6 4 7304.348
## Hornet 4 Drive 21.4 110 3.215 6 3 6656.299
## Hornet Sportabout 18.7 175 3.440 8 3 5436.047
## Valiant 18.1 105 3.460 6 3 5231.214
## Duster 360 14.3 245 3.570 8 3 4005.602
# Scatter plot of horsepower vs mpg
ggplot(mtcars_clean, aes(x = hp, y = mpg)) +
geom_point() +
labs(title = "Horsepower vs. Miles per Gallon",
x = "Horsepower",
y = "Miles per Gallon") +
theme_minimal()
# Box plot of mpg by number of cylinders
ggplot(mtcars_clean, aes(x = factor(cyl), y = mpg)) +
geom_boxplot() +
labs(title = "MPG by Number of Cylinders",
x = "Number of Cylinders",
y = "Miles per Gallon") +
theme_minimal()
##summary and interpretation
Horsepower vs. Miles per Gallon (MPG):
There is a negative correlation between horsepower and mpg. Cars with higher horsepower tend to have lower fuel efficiency, as more powerful engines consume more fuel. Cylinders vs. Miles per Gallon (MPG):
Cars with fewer cylinders (e.g., 4-cylinder engines) generally have higher mpg, indicating they are more fuel-efficient. Cars with 6 or 8 cylinders tend to have lower mpg due to higher fuel consumption associated with more powerful engines. Horsepower and Fuel Efficiency:
Higher horsepower generally leads to lower fuel efficiency, confirming the trade-off between performance and mpg. Cylinders and Fuel Efficiency:
4-cylinder cars are more fuel-efficient than 6-cylinder or 8-cylinder cars, reinforcing the idea that smaller engines consume less fuel.