# Load required libraries
library(tidyverse)
# Load the mtcars dataset
data(mtcars)
# Explore the dataset
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# Check for missing values
missing_values <- sum(is.na(mtcars))
cat("Number of missing values:", missing_values, "\n")
## Number of missing values: 0
# Convert transmission column to a factor
mtcars$am <- factor(mtcars$am, labels = c("Automatic", "Manual"))
# Preview the cleaned data
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 Manual 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 Manual 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 Manual 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 Automatic 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 Automatic 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 Automatic 3 1
# Bubble plot: MPG vs. horsepower with bubble size representing the number of cylinders and custom bubble colors
ggplot(mtcars, aes(x = hp, y = mpg, size = cyl, color = factor(cyl))) +
geom_point(alpha = 0.7) +
scale_size_continuous(range = c(2, 10)) +
scale_color_manual(values = c("#1f77b4", "#ff7f0e", "#2ca02c", "#d62728", "#9467bd", "#8c564b", "#e377c2", "#7f7f7f", "#bcbd22", "#17becf")) +
labs(x = "Horsepower", y = "MPG", size = "Number of Cylinders", color = "Number of Cylinders", title = "MPG vs. Horsepower (Bubble Size: Number of Cylinders)") +
theme_minimal()
This bubble plot allows you to visualize the relationship between MPG and horsepower while incorporating the number of cylinders as the bubble size. The x-axis represents horsepower, the y-axis represents MPG, and the size of the bubbles indicates the number of cylinders. This plot helps you understand how MPG and horsepower relate to each other, while considering the impact of the number of cylinders on the car models.
# Pie chart: Distribution of car models by number of cylinders
cylinder_counts <- table(mtcars$cyl)
pie(cylinder_counts,
labels = paste(names(cylinder_counts), "Cylinders"),
main = "Distribution of Car Models by Number of Cylinders")
The pie chart displays the distribution of car models in the mtcars dataset based on the number of cylinders. It visually represents the proportion of car models in each cylinder category, providing an overview of the distribution.
# Plot the boxplot of mpg by transmission type
ggplot(mtcars, aes(x = am, y = mpg, fill = am)) +
geom_boxplot() +
labs(x = "Transmission", y = "MPG", title = "MPG by Transmission Type") +
theme_minimal()
The boxplot above illustrates that car models with manual transmission generally have higher MPG compared to those with automatic transmission.
# Calculate the average mpg for each number of cylinders
avg_mpg_by_cyl <- mtcars %>%
group_by(cyl) %>%
summarise(avg_mpg = mean(mpg))
# Plot the average mpg by number of cylinders
ggplot(avg_mpg_by_cyl, aes(x = cyl, y = avg_mpg)) +
geom_bar(stat = "identity", fill = "purple") +
labs(x = "Cylinders", y = "Average MPG", title = "Average MPG by Number of Cylinders") +
theme_minimal()
The bar plot above demonstrates that car models with 4 cylinders tend to have the highest average MPG, followed by those with 6 cylinders, while car models with 8 cylinders have the lowest average MPG.
# Plot the bar plot of MPG by car model
ggplot(mtcars, aes(x = rownames(mtcars), y = mpg)) +
geom_bar(stat = "identity", fill = "red") +
labs(x = "Car Model", y = "MPG", title = "MPG by Car Model") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))
This bar plot provides a visual representation of the MPG values for each car model in the mtcars dataset, allowing for easy comparison.
# Plot the violin plot of MPG by number of cylinders and transmission type
ggplot(mtcars, aes(x = factor(cyl), y = mpg, fill = am)) +
geom_violin() +
labs(x = "Cylinders", y = "MPG", title = "MPG by Cylinder and Transmission Type") +
theme_minimal()
The violin plot provides a combination of a box plot and kernel density plot, allowing us to visualize the distribution and density of MPG values for each combination of the number of cylinders and transmission type.
# Plot the stacked bar plot of MPG by number of cylinders and transmission type
ggplot(mtcars, aes(x = factor(cyl), fill = am)) +
geom_bar() +
labs(x = "Cylinders", y = "Count", title = "MPG by Cylinder and Transmission Type") +
theme_minimal()
The stacked bar plot provides a visual representation of the distribution of MPG values for each combination of the number of cylinders and transmission type, helping identify the proportion of car models in each category.
In this project, we conducted a comprehensive analysis of the mtcars dataset, focusing on visualizing and interpreting key features of the car models. Through our exploration, we uncovered meaningful insights regarding MPG, horsepower, weight, and transmission type.
The visualizations employed, including bubble plots, bar plots, pie charts, box plots, violin plots, and stacked bar plots, provided an elegant representation of the relationships between variables. These visualizations allowed us to understand the distribution of MPG values across different factors such as horsepower, number of cylinders, and transmission type.
Our analysis revealed that car models with manual transmission tend to have higher MPG than those with automatic transmission. Additionally, car models with 4 cylinders exhibited the highest average MPG, while those with 8 cylinders had the lowest average MPG. These findings provide valuable insights into the factors influencing fuel efficiency in car models.
The visualizations and analysis conducted in this project highlight the importance of considering factors such as transmission type and number of cylinders when examining MPG and performance in car models. The elegant representation of the data enables a clear understanding of the relationships between variables, facilitating informed decision-making in the automotive industry.