In-class activity #3

mtcars Dataset Visualization

Introduction:

In this assignment, we explore the mtcars dataset, which contains information about various car models.

mpg: Miles per gallon (Numeric)

cyl: Number of cylinders (Numeric)

disp: Displacement Numeric cubic (inches)

hp: Gross horsepower Numeric (hp)

drat: Rear axle ratio Numeric (ratio)

wt: Weight Numeric (1000 lbs)

qsec:1/4 mile time Numeric (seconds)

vs V/S engine (0 = V-shaped, 1 = straight) Numeric (binary)

am Transmission (0 = automatic, 1 = manual) Numeric (binary)

gear Number of forward gears Numeric (gears)

carb Number of carburetors Numeric (carburetors)

install.packages("tidyverse")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)

library(tidyverse) # Load the tidyverse package for data manipulation and visualization

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.0
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggplot2) # Load ggplot2 for creating plots

# Load the mtcars dataset (built into R)
data_mtcars <- mtcars

# View the first few rows to understand the data
head(data_mtcars)

# Convert 'am' (transmission type) and 'cyl' (number of cylinders) to factors for categorical plotting
data_mtcars$am <- as.factor(data_mtcars$am)
data_mtcars$cyl <- as.factor(data_mtcars$cyl)

Scatter Plot:

# Create a scatter plot of car weight vs. miles per gallon, colored by cylinder count
ggplot(data_mtcars, aes(x = wt, y = mpg, color = cyl)) +
  geom_point() + # Add points to the plot
  labs(title = "Weight vs. Miles Per Gallon", x = "Weight (1000 lbs)", y = "Miles Per Gallon") # Add plot labels

Line Graph:

#Create a line graph of ordered mpg by the row number.
data_mtcars_line <- data_mtcars %>% mutate(index = row_number()) #add index column so we can plot it

ggplot(data_mtcars_line, aes(x = index, y = mpg)) +
  geom_line() + # add a line to the plot
  labs(title = "Miles Per Gallon by Index", x = "Index", y = "Miles Per Gallon") # add plot labels

Stacked horizontal bar chart:

# Create a horizontal bar chart of the average horsepower grouped by cylinder count
hp_by_cyl <- data_mtcars %>% group_by(cyl) %>% summarize(avg_hp = mean(hp)) # Calculate average horsepower for each cylinder group

ggplot(hp_by_cyl, aes(y = cyl, x = avg_hp)) +
  geom_bar(stat = 'identity') + # Create bars based on the calculated averages
  labs(title = "Average HP by Cylinder Count", y = "Cylinder Count", x = "Average Horsepower") # Add plot labels

Stacked vertical bar chart:

#Create a stacked bar chart of average mpg, disp, hp, and wt, grouped by cyl.
bar_data_mtcars <- data_mtcars %>% group_by(cyl) %>% summarize(mpg = mean(mpg), disp = mean(disp), hp = mean(hp), wt = mean(wt)) %>% pivot_longer(cols = c("mpg", "disp", "hp", "wt"), names_to = "Measurement", values_to = "Average") #Calculate average values for each measurement, and pivot the data into a long format.

ggplot(bar_data_mtcars, aes(x = cyl, fill = Measurement, y = Average)) +
  geom_bar(stat = "identity") + #Create bars based on the calculated averages
  labs(title = "Average Measurements by Cylinder Count", x = "Cylinder Count", y = "Average Measurement") #add plot labels

Conclusion: This activity helped me better understand how data analysis and visualization work together to tell a story we can say.

I was able to see how raw data can be transformed into meaningful insights using tools like R and different 4 types of graphs.

One of the main things is the importance of organizing data correctly before creating visualizations. If the data is not clean or structured well, it becomes much harder to interpret results and I think that’s universal. I also learned how different types of charts can highlight different aspects of the data, and how choosing the right visualization can make patterns and relationships easier to understand.

Another important thing is how interpretation plays a key role. It’s not just about creating graphs, but also about explaining what they mean.

Overall, this showed me how data visualization can simplify complex information and make it easier to draw conclusions.