library(ggplot2)
data("mtcars")
data("USArrests")
data("Titanic")
data("ToothGrowth")
data("iris")

Part 1

Parallel Coordinates Plot

Part 2

Map

This type of map is good for showing geographic variation because it communicates spatial patterns using easily read color schemes and standardized data. It is useful for this dataset because it can show how data can change place to place.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(maps)
## 
## Attaching package: 'maps'
## The following object is masked from 'package:viridis':
## 
##     unemp
library(tibble)

us_data <- USArrests %>%
  rownames_to_column(var = "state") %>%
  mutate(region = tolower(state)) 

us_map <- map_data("state")

map_df <- us_map %>%
  left_join(us_data, by = "region")

ggplot(map_df, aes(x = long, y = lat, group = group, fill = Murder)) +
  geom_polygon(color = "white", size = 0.2) +
  coord_fixed(1.3) +
  scale_fill_viridis_c(option = "rocket", direction = -1, name = "Murder Rate") +
  labs(
    title = "US Murder Rates by State",
    subtitle = "Rates per 100,000 residents",
    caption = "Source: USArrests dataset"
  ) +
  theme_minimal() +
  theme(
    axis.text = element_blank(),
    axis.title = element_blank(),
    panel.grid = element_blank()
  )
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Part 3

Flow diagram

This diagram is useful for showing connections because it can show the structure of the data while also showing the flow between steps. It is good for complex data. This is useful for the Titanic dataset because the dataset has multiple categorical variables that otherwise would be difficult to map together.

library(networkD3)
library(dplyr)

data("Titanic")

df <- as.data.frame(Titanic)

df <- df %>%
  group_by(Class, Sex, Age, Survived) %>%
  summarise(Freq = sum(Freq), .groups = "drop")

nodes <- data.frame(
  name = unique(c(df$Class, df$Sex, df$Age, df$Survived))
)

get_id <- function(x) {
  match(x, nodes$name) - 1
}

links_class_sex <- df %>%
  group_by(Class, Sex) %>%
  summarise(value = sum(Freq), .groups = "drop") %>%
  mutate(source = get_id(Class),
         target = get_id(Sex))

links_sex_age <- df %>%
  group_by(Sex, Age) %>%
  summarise(value = sum(Freq), .groups = "drop") %>%
  mutate(source = get_id(Sex),
         target = get_id(Age))

links_age_surv <- df %>%
  group_by(Age, Survived) %>%
  summarise(value = sum(Freq), .groups = "drop") %>%
  mutate(source = get_id(Age),
         target = get_id(Survived))

links <- bind_rows(links_class_sex, links_sex_age, links_age_surv)

links <- links %>% select(source, target, value)

sankeyNetwork(
  Links = links,
  Nodes = nodes,
  Source = "source",
  Target = "target",
  Value = "value",
  NodeID = "name",
  fontSize = 12,
  nodeWidth = 30
)
## Links is a tbl_df. Converting to a plain data frame.

Part 4

Raincloud Plot

This type of plot is useful because it combines different types of distributions into one plot, allowing for easier interpretation. You can see the density, summary statistics, and raw points in one plot. This is good for this dataset because it allows for comparasion and visuzalization between all of the different factors that can influence teeth growth.

library(ggdist)

teeth_clean <- ToothGrowth %>%
  select(len, supp, dose) %>%
  na.omit()

ggplot(teeth_clean, aes(x = supp, y = len, fill = dose)) +
  stat_halfeye(
    position = position_dodge(width = 0.75),
    adjust = 0.6,
    width = 0.55,
    .width = 0,
    justification = -0.2,
    point_colour = NA,
    alpha = 0.5
  ) +
  geom_boxplot(
    aes(color = dose),
    width = 0.12,
    position = position_dodge(width = 0.75),
    outlier.shape = NA,
    alpha = 0.65,
    linewidth = 0.5
  ) +
  geom_jitter(
    aes(color = dose),
    position = position_jitterdodge(
      jitter.width = 0.1,
      dodge.width = 0.80
    ),
    size = 1.8,
    alpha = 0.25
  ) +
  labs(
    title = "Tooth Length by Supplement and Dose",
    subtitle = "Raincloud plot showing distribution, summary statistics, and individual observations",
    x = "Supplement Delivery Method",
    y = "Tooth Length",
    fill = "Vitamin C Dosage",
    color = "Vitamin C Dosage"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5),
    legend.position = "right"
  )
## Warning: The following aesthetics were dropped during statistical transformation: colour
## and fill.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

Portfolio

library(ggplot2)
library(dplyr)
library(patchwork)

data(iris)

str(iris)
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
head(iris)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Main research question: How do the three iris species differ in their sepal and petal measurements, and which variables best separate the species?

Plot 1: Scatterplot

Is there a relationship between sepal length and petal length, and do species cluster separately?

ggplot(iris,
             aes(x = Sepal.Length,
                 y = Petal.Length,
                 color = Species)) +
  geom_point(size = 3, alpha = 0.8) +
  labs(title = "Sepal Length vs Petal Length",
       subtitle = "Species by flower dimensions",
       x = "Sepal Length",
       y = "Petal Length") +
  theme_minimal()

Plot 2: Histogram

How is sepal width distributed within each species?

ggplot(iris,
             aes(x = Sepal.Width,
                 fill = Species)) +
  geom_histogram(binwidth = 0.2,
                 color = "black",
                 alpha = 0.7,
                 position = "identity") +
  labs(title = "Distribution of Sepal Width",
       subtitle = "Comparing spread and overlap among species",
       x = "Sepal Width",
       y = "Count") +
  theme_minimal()

Setosa tends to have wider sepals overall, while versicolor and virginica show more overlap in their distributions.A histogram is useful for examining the distribution and frequency of a single numeric variable across groups.

Plot 3: Boxplot

Which species tends to have the largest petal width, and how much variation exists within species?

ggplot(iris,
             aes(x = Species,
                 y = Petal.Width,
                 fill = Species)) +
  geom_boxplot(alpha = 0.8) +
  labs(title = "Petal Width by Species",
       subtitle = "How petal width differs by species",
       x = "Species",
       y = "Petal Width") +
  theme_minimal()

Virginica has the largest petal widths on average, while setosa has the smallest. Variation is also greater in virginica compared with the other species. A boxplot is appropriate because it summarizes median, spread, and potential outliers for a numeric variable across categories.

Plot 4: Density Plot

How do petal length distributions overlap among species?

ggplot(iris,
             aes(x = Petal.Length,
                 fill = Species)) +
  geom_density(alpha = 0.5) +
  labs(title = "Density of Petal Length",
       subtitle = "How petal length differs by species",
       x = "Petal Length",
       y = "Density") +
  theme_minimal()

Setosa has a distinct petal length distribution, while versicolor and virginica overlap slightly but still differ in their central tendencies. Density plots are useful for comparing the shape and overlap of continuous distributions between groups.

Plot 5: Faceted Scatterplot

How does the relationship between sepal length and sepal width vary within each species individually?

ggplot(iris,
             aes(x = Sepal.Length,
                 y = Sepal.Width,
                 color = Species)) +
  geom_point(size = 2.5) +
  facet_wrap(~Species) +
  labs(title = "Sepal Dimensions by Species",
       subtitle = "Species-specific dimensions",
       x = "Sepal Length",
       y = "Sepal Width") +
  theme_minimal()

Each species shows a different spread and relationship between sepal length and width. Setosa appears more tightly clustered, while virginica shows greater variability. Faceting is appropriate because it separates each species into its own panel, making comparisons easier and reducing visual overlap.

The plots suggest that petal measurements are more effective than sepal measurements for distinguishing iris species. Iris setosa forms a distinct group, while versicolor and virginica show some overlap.