What is one of the only documented advantages of a pie chart over other types of display? Try to come up with another advantage not noted by the author?*
One of the only documented advantages is when comparing the sum of adjacent slices of a pie chart than trying to stack and sum the heights of a bar chart.
An additional advantage might be that pie charts are extremely easy to remember. Because of their simplicity one could use distinct coloring to make a statement to an audience less technical or versed in data visualization.
## Question 2
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
library(plotly)
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
mtcars$cyl <- factor(mtcars$cyl)
mtcars$gear <- factor(mtcars$gear)
inner_data <- mtcars %>%
group_by(cyl) %>%
summarize(count = n())
outer_data <- mtcars %>%
group_by(cyl, gear) %>%
summarize(count = n())
## `summarise()` has grouped output by 'cyl'. You can override using the `.groups`
## argument.
plot_ly() %>%
add_pie(data = inner_data, labels = ~cyl, values = ~count,
hole = 0.4, sort = FALSE,
textinfo = 'label',
marker = list(line = list(width = 1)),
name = "Cylinders",
domain = list(x = c(0.15, 0.85), y = c(0.15, 0.85))) %>%
add_pie(data = outer_data, labels = ~gear, values = ~count,
hole = 0.7, sort = FALSE,
textinfo = 'label',
marker = list(line = list(width = 1)),
name = "Gears",
domain = list(x = c(0, 1), y = c(0, 1))) %>%
layout(title = "Layered Pie Chart of Cylinders and Gears from mtcars)",
showlegend = TRUE,
annotations = list(
list(text = "Cylinders", x = 0.5, y = 0.5, showarrow = FALSE, font = list(size = 12))
))
Write a few sentences describing what works and what does not work*
What works: I think the inner ring cylinders and outer ring are distinct and differntiable, the labeling is clear and there is color differentiation between each category
What does not work: Some of the colors (4 for both cyl and gear) are the same which may be confusing, there is missing context and also the data does not draw a clear connection between the Cylinders and Gears and the total count, how many is this for each? plotly is interactable which is a plus to see counts, but it takes more effort and has other visualizations that are more optimal.
## Question 3
library(ggplot2)
ggplot(outer_data, aes(x = cyl, y = count, fill = gear)) +
geom_bar(stat = "identity", position = "stack", color = "white") +
labs(title = "Stacked Bar Chart of Cylinders and Gears",
x = "Number of Cylinders",
y = "Count of Cars",
fill = "Number of Gears") +
theme_minimal()
Address how this display is a more effective visual.*
The stacked bar chart is a significantly better visualization for comparing Cylinders and Gear types because you can see the count of cars and what is the most common, you can also judge the relative sizes of each segment rather than it being the vague percentages
The legend and labels make it very easy to interpret. Overall it is better because it simplfiies the comparisons and has a clearer view of proportions.
## Question 4
library(ggalluvial)
mtcars$am <- factor(mtcars$am, labels = c("Automatic", "Manual"))
alluvial_data <- mtcars %>%
group_by(cyl, gear, am) %>%
summarize(count = n()) %>%
ungroup()
## `summarise()` has grouped output by 'cyl', 'gear'. You can override using the
## `.groups` argument.
ggplot(alluvial_data,
aes(axis1 = cyl, axis2 = gear, axis3 = am, y = count)) +
geom_alluvium(aes(fill = cyl)) +
geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
scale_x_discrete(limits = c("Cylinders", "Gears", "Transmission")) +
labs(title = "Alluvial Plot of Cylinders, Gears, and Transmission Types (mtcars)",
x = "", y = "Count of Cars") +
theme(legend.position = "none")
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.
## Chat Improved Version
ggplot(alluvial_data,
aes(axis1 = cyl, axis2 = gear, axis3 = am, y = count)) +
geom_alluvium(aes(fill = cyl, alpha = count), size = 0.5) + # Thinner, transparent flows
geom_stratum(aes(fill = cyl), color = "black") + # Black borders for clarity
geom_text(stat = "stratum", aes(label = after_stat(stratum)), color = "black", size = 4) +
geom_text(aes(label = count), stat = "alluvium", size = 3, color = "black", nudge_x = 0.2) + # Counts on flows
scale_x_discrete(limits = c("Cylinders", "Gears", "Transmission")) +
scale_fill_brewer(palette = "Set2") + # Distinct colors for clarity
scale_alpha_continuous(range = c(0.5, 1)) + # Transparency based on count
labs(title = "Improved Alluvial Plot of Cylinders, Gears, and Transmission Types (mtcars)",
x = "", y = "Count of Cars") +
theme_minimal() +
theme(legend.position = "none")
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.
## Warning in to_lodes_form(data = data, axes = axis_ind, discern =
## params$discern): Some strata appear at multiple axes.