Plot 1: Parallel Coordinates Plot
install.packages("GGally")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.6'
## (as 'lib' is unspecified)
library(GGally)
## Loading required package: ggplot2
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
iris_scaled <- iris %>%
mutate(across(1:4, scale))
ggparcoord(
data = iris_scaled,
columns = 1:4,
groupColumn = 5,
scale = "uniminmax",
showPoints = TRUE,
alphaLines = 0.4
) +
labs(
title = "Parallel Coordinates Plot of Iris Flower Measurements",
x = "Flower Measurements",
y = "Scaled Values",
color = "Species"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
legend.position = "bottom"
)
This parellel coordinates plot shows different variable groups simultanously across the different species. The different colors help you see the different species and be able to tell even when they overlap. This graph visually shows trends and we can see here the trends between different flowers species and are able to observe them side by side.
Plot 2: Map
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.6'
## (as 'lib' is unspecified)
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.6'
## (as 'lib' is unspecified)
install.packages("maps")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.6'
## (as 'lib' is unspecified)
library(ggplot2)
library(dplyr)
library(maps)
data("USArrests")
USArrests$state <- rownames(USArrests)
USArrests$state <- tolower(USArrests$state)
states_map <- map_data("state")
map_data_combined <- left_join(states_map, USArrests, by = c("region" = "state"))
ggplot(map_data_combined, aes(x = long, y = lat, group = group, fill = Assault)) +
geom_polygon(color = "white", size = 0.2) +
coord_fixed(1.3) +
scale_fill_gradient(
low = "pink",
high = "blue",
name = "Assault Rate"
) +
labs(
title = "Assault arrest rates across the U.S.",
subtitle = "Data from the USArrests dataset"
) +
theme_void()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
why is this plot important? This plot type is important because it is showing each state and distributed the assault rate by color making it visually easy to read.A choropleth map shows regional differences and different spatial trends across the country. In this graph the lighter pink is showing lower assault rates and the darker pinks/purple are showing higher assault rates.
Plot 3: Flow diagram
library(titanic)
library(dplyr)
library(ggplot2)
install.packages("ggalluvial")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.6'
## (as 'lib' is unspecified)
library(ggalluvial)
data <- Titanic
df <- as.data.frame(data)
df_expanded <- df[rep(1:nrow(df), df$Freq), 1:4]
df_expanded$Age <- df_expanded$Age
# Build alluvial plot
ggplot(df_expanded,
aes(axis1 = Class,
axis2 = Sex,
axis3 = Age,
axis4 = Survived,
y = 1)) +
geom_alluvium(aes(fill = Survived), alpha = 0.5) +
geom_stratum() +
geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
scale_x_discrete(limits = c("Class", "Sex", "Age", "Survival")) +
labs(title = "Titanic Passenger Flow: Class → Sex → Age → Survival",
y = "Count")+
theme(
panel.grid = element_blank(),
axis.title = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
legend.position = "bottom",
plot.title = element_text(
face = "bold",
size = 16))
This flow plot is important because it shows the survival rate and comparing it to each class, sex, and age. This helps us see different flow patterns and connections betwen categories. Another reason a flow graoh is intersting is because the different colors represent if the different sexes survived or did not survive.
Plot 4: Raincloud plot
# Load libraries
library(ggplot2)
install.packages("ggdist")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.6'
## (as 'lib' is unspecified)
library(ggdist)
# Load dataset
data("ToothGrowth")
# Convert dose to factor
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
# Create raincloud plot
ggplot(ToothGrowth,
aes(x = dose,
y = len,
fill = supp)) +
# Violin / cloud portion
stat_halfeye(
adjust = 0.5,
width = 0.6,
justification = -0.2,
point_colour = NA,
alpha = 0.6
) +
# Boxplot
geom_boxplot(
width = 0.2,
outlier.shape = NA,
alpha = 0.5,
position = position_dodge(width = 0.5)
) +
# Raw data points
geom_jitter(
aes(color = supp),
width = 0.15,
size = 1,
alpha = 0.5
) +
# Labels
labs(
title = "Raincloud Plot of Tooth Growth",
subtitle = "Comparison by Supplement Type and Dose",
x = "Dose (mg/day)",
y = "Tooth Length",
fill = "Supplement",
color = "Supplement"
) +
theme_minimal()
A raincloud plot is useful to show the distribution of data and the specific data point on the same graph. This raincloud plot combines a violin and boxplot in the same graph. I can visably see the outliers in this plot and where there is overlaping of groups.
Student Portfolio Graph 1: boxplot
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.6'
## (as 'lib' is unspecified)
library(ggplot2)
ggplot(iris, aes(x = Species, y = Petal.Length, fill = Species)) +
geom_boxplot(alpha = 0.6) +
labs(
title = "Petal Length Across Iris Species",
x = "Species",
y = "Petal Length"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
legend.position = "none"
)
Research question: How is the petal legth varying across the different species of plants?
Boxplot
interpretation: The Setosa plant has smaller petals than the Versicolor and Virginica types. It seems like the Virginica has the largest petal sizes.
Explanation: This plot is appropriate because boxplots are good at showing the distribution of numeric variables across several categories. They also show outliers and the spread of the petal length.
Graph 2: Violin plot
library(ggplot2)
ggplot(iris, aes(x = Species, y = Sepal.Width, fill = Species)) +
geom_violin(trim = FALSE, alpha = 0.7) +
geom_boxplot(width = 0.1, fill = "White") +
labs(
title = "Distribution of Sepal Width by Species",
x = "Species",
y = "Sepal Width"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
legend.position = "none"
)
Research question: How does sepal width vary across iris species?
Plot type: violin plot
interpretation: Setosa species has larger sepal widths compared to the versicolor and virginica. Those two species show overlap in their distributions.
Appropriateness: A violin plot is good for showing this information because it combines a boxplot and a density plot which allows the distribution shape and variation to be represented for each species.
Plot 3: Scatterplot
library(ggplot2)
ggplot(iris, aes(x = Petal.Length, y = Petal.Width, color = Species)) +
geom_point(size = 2, alpha = 0.8) +
labs(
title = "Relationship Between Petal Length and Width",
x = "Petal Length",
y = "Petal Width"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 16)
)
Research Question: Across these three species, what is the relationship between petal width and length?
Plot Type: Scatterplot
Interpretation: We can see a strong positive realationship beween the petal width and length. The sepcies all cluster in there own groups, especially the Setosa.
Appropriatness: A scatterplot is appropraite to compare these two groups because a scatterplot can examine the realationship between two numeric variables. We can see the difference in the species though based on thier different colors.
Plot 4: Density Plot
library(ggplot2)
ggplot(iris, aes(x = Petal.Length, fill = Species)) +
geom_density(
alpha = 0.6
) +
labs(
title = "Density Distribution of Petal Length by Species",
x = "Petal Length",
y = "Density"
) +
theme_minimal() +
theme(
plot.title = element_text(
face = "bold",
size = 16,
hjust = 0.5
)
)
Research question: How does petal lenght differ throughout the iris species?
Plot type: density plot
interpretation: Setosa has much smaller petal legnths as we can see by the seperate distribution. The Versicolor and Virginica overlap shows that they are similar but still have their own distinct peaks.
Appropriantness: A density plot is appropriate for this research quesiton becuase it is good at comparing the distribution of a continuous varibale across groups.
graph 5: Parallel Coordinates Plot
library(GGally)
library(dplyr)
iris_scaled <- iris %>%
mutate(across(1:4, scale))
ggparcoord(
data = iris_scaled,
columns = 1:4,
groupColumn = 5,
scale = "uniminmax",
showPoints = TRUE,
alphaLines = 0.4
) +
labs(
title = "Parallel Coordinates Plot of Iris Measurements",
x = "Flower Measurements",
y = "Scaled Values",
color = "Species"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 16),
legend.position = "bottom"
)
Research question: How do these 4 different species of flower measurements change simultaneously across the species?
Plot type: Parallel Coordinates Plot
Interpretation: The patterns of each species is very different across the flower measurements. The setosa has a different pattern than the otehr two species meaning it has smaller petal dimensions.
Appropriateness: A parallel coordinates plot is good for comparing multiple numeric variables on the same graph. This graph highlights the multivariate patterns and their differences.