Plot 1: Multivariative Comparison

install.packages("GGally")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
library(GGally)
## Loading required package: ggplot2
install.packages("viridis")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
library(viridis)
## Loading required package: viridisLite
install.packages("hrbrthemes")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
library(hrbrthemes)

# Data set is provided by R natively
data <- iris

data %>%
  ggparcoord(
    columns = 1:4,
    groupColumn = 5,
    order = "anyClass",
    scale = "uniminmax",
    showPoints = TRUE,
    alphaLines = 0.3,
    title = "Comparison of Iris Species Across Floral Measurements"
  ) +
  scale_x_discrete(labels = c(
    "Petal.Length" = "Petal Length",
    "Petal.Width"  = "Petal Width",
    "Sepal.Length" = "Sepal Length",
    "Sepal.Width"  = "Sepal Width"
  )) +
  scale_color_viridis(discrete = TRUE) +
  theme_ipsum()

Explanation

This plot is appropriate for the Iris dataset because it allows multiple numerical variables to be compared at the same time. Each line represents one flower observation and the axes display the different floral measurements. This makes it easier to identify patterns, similarities, and differences among the iris species.

Plot 2: Map

library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
install.packages("maps")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
library(maps)
## 
## Attaching package: 'maps'
## The following object is masked from 'package:viridis':
## 
##     unemp
install.packages("tibble")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
library(tibble)

crime_data = USArrests %>%
  rownames_to_column("state") %>%
  mutate(state = tolower(state))

us_map = map_data("state")

map_df = left_join(us_map, crime_data, by = c("region" = "state"))

ggplot(map_df, aes(long, lat, group = group, fill = Murder)) +
  geom_polygon(color = "white", linewidth = 0.2) +
  coord_fixed(1.3) +
  scale_fill_gradient(
    low = "lightblue",
    high = "darkred",
    na.value = "grey90",
    name = "Murder Rate"
  ) +
  labs(
    title = "US Murder Rates by State") +
  theme_void()

Explanation

This plot shows the distribution of the crime variable “murder” in the united states. The color gradients I used was red as high murder rates since red is visually correlated with bad and blue as the oposite representing low murder rates. This map shows greopraphic variation through the united states.

Plot 3: Network and Flow Diagram

library(ggplot2)
library(ggalluvial)
library(RColorBrewer)

# Convert Titanic table to dataframe
titanic_df = as.data.frame(Titanic)

ggplot(titanic_df,aes(axis1 = Sex, axis2 = Class, axis3 = Age, axis4 = Survived, y = Freq)) +
  geom_alluvium(
    aes(fill = Sex),
    width = 0.15,
    alpha = 0.7
  ) +
  geom_stratum(
    width = 0.22,
    fill = "grey20",
    color = "white"
  ) +
  geom_text(
    stat = "stratum",
    aes(label = after_stat(stratum)),
    color = "white",
    size = 3
  ) +
  scale_x_discrete(
    limits = c("Sex", "Class", "Age", "Survived"),
    labels = c(
      "Sex",
      "Passenger Class",
      "Age Group",
      "Survival"
    ),
    expand = c(.1, .05)
  ) +
  scale_fill_brewer(
    palette = "Dark2"
  ) +
  labs(
    title = "Titanic Passenger Survival Flow",
    fill = "Sex"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    panel.grid = element_blank(),
    axis.title = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks = element_blank(),
    legend.position = "bottom",
    plot.title = element_text(
      face = "bold",
      size = 16))

Explanation

This graph shows the flow and connection between sex, passenger class, age group and survival rate. This graph is apporipate because it shows all required varables using the titantic dataset. I chose to color by sex becuause it made the connections more distinct when folowing the flow to each variable.

Plot 4: Distribution Visualization

install.packages("ggforce")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
library(ggforce)

ggplot(iris,aes( x = Species, y = Sepal.Length, color = Species)) +
  geom_sina(
    size = 2,
    alpha = 0.7
  ) +
  labs(
    title = "Distribution of Sepal Length by Iris Species",
    x = "Species",
    y = "Sepal Length"
  ) +
  scale_color_brewer(
    palette = "Set2"
  ) +
  theme_minimal(base_size = 15) +
  theme(
    legend.position = "none",
    plot.title = element_text(
      face = "bold",
      size = 16))

Explanation

This graph is appropriate for this data because its shows the distribution for the sepal length variable. The data shows each point which shows all the data which helps see the spread. Using geom_sina allows for less overlap making it clearer to understand the distribution.

My Portfolio

I will be using the Mtcars data set to show how different characteristics differ across vehicle type.

data = mtcars
mtcars
##                      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
## Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
## Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
## Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
## Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
## Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
## Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
## Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
## Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
## Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
## Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
## Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
## AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
## Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
## Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
## Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

Plot 1

For plot 1, a box plot is being used to represent the average MPG grouped by cylinder type. This shows how fuel efficiency is effected by the amount of cylinders a vehicle has.

mtcars$cyl <- as.factor(mtcars$cyl)

ggplot(mtcars, aes(x = cyl, y = mpg, fill = cyl)) +
  geom_boxplot(alpha = 0.5) +
  labs(
    title = "Fuel Efficiency Across Cylinder Groups",
    x = "Number of Cylinders",
    y = "Miles Per Gallon (MPG)",
    fill = "Cylinders"
  ) +
  theme_minimal(base_size = 12)

The fuel efficiency decreases when there is more cylinders. The four cylinder cars have a median of 26 mpg with eight cylinder cars having a median of 15. This type of graph is appropriate because it directly shows how the characteristic of cylinder type effects the cars fuel efficiency.

Plot 2

For plot 2, a parallel coordinate plot is being used to represent the relationship between MPG, displacement, horsepower and weight by cylinder type. This shows how each factor plays a role and differs in each cylinder type.

mtcars$cyl <- factor(mtcars$cyl)

ggparcoord(
  mtcars,
  columns = c(1, 3, 4, 6),
  groupColumn = "cyl",
  scale = "uniminmax",
  showPoints = TRUE,
  alphaLines = 0.4,
  title = "Vehicle Characteristics Across Cylinder Groups"
) +
  scale_x_discrete(labels = c(
    "mpg"  = "MPG",
    "disp" = "Displacement",
    "hp"   = "Horsepower",
    "wt"   = "Weight"
  )) +
  scale_color_viridis(discrete = TRUE) +
  labs(
    x = NULL,
    y = "Scaled Value",
    color = "Cylinders"
  ) +
  theme_ipsum() +
  theme(
    plot.title = element_text(
      size = 18,
      face = "bold"))

Plot 2 shows that the four cylinder cars have higher MPG, lower horsepower, smaller engine displacement, and lower weight compared with six and eight cylinder cars. It also shows that eight cylinder cars are generally heavier, more powerful, and less fuel efficient. Overall, the plot suggests a strong relationship where larger engines and heavier vehicles are associated with lower fuel economy. This graph is appropriate because it shows multiple characteristics across cylinder types.

Plot 3

For plot 3, a scatter plot with regression trend lines is showing the relationship between horsepower and fuel efficiency across transmission type.

mtcars$am <- factor(
  mtcars$am,
  labels = c("Automatic", "Manual")
)

ggplot(mtcars, aes(x = hp, y = mpg, color = am)) +
  geom_point(size = 3, alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE) +
  
  labs(
    title = "Horsepower vs Fuel Efficiency",
    x = "Horsepower",
    y = "Miles Per Gallon",
    color = "Transmission"
  ) +
  
  theme_minimal(base_size = 14)
## `geom_smooth()` using formula = 'y ~ x'

Plot 3 shows a negative relationship between horsepower and fuel efficiency, meaning cars with higher horsepower generally have lower MPG. Manual transmission cars tend to have slightly better fuel efficiency than automatic cars at similar horsepower levels. A scatter plot is appropriate because it clearly displays the relationship between two continuous variables (horsepower and MPG). Adding the regression lines and color helps compare trends between transmission types and makes patterns easier to interpret.

Plot 4

Plot 4, uses a histogram to explore the distribution of fuel efficiency among cars in the mtcars dataset.

hist(
  mtcars$mpg,
  main = "Histogram of MPG in mtcars Dataset",
  xlab = "Miles Per Gallon (mpg)",
  ylab = "Frequency",
  breaks = 10
)

In plot 4, the histogram shows that most cars in the dataset have MPG values between about 15 and 22. There are fewer cars with very low or very high fuel efficiency, indicating the data is somewhat concentrated around the middle range. A histogram is appropriate because it displays the distribution and frequency of a single variable. It helps identify the spread and possible skewness of MPG values in the dataset.

Plot 5

For plot 5 I used a density plot to compare the distribution of fuel efficiency across cars with different cylinder counts.

ggplot(mtcars, aes(x = mpg, fill = cyl)) +
  geom_density(alpha = 0.5) +
  labs(
    title = "Distribution of Fuel Efficiency by Cylinder Count",
    x = "Miles Per Gallon",
    y = "Density",
    fill = "Cylinders"
  ) +
  theme_minimal(base_size = 14)

Plot 5 shows that four cylinder cars have higher MPG, lower horsepower, smaller engine displacement, and lower weight compared with six and eight cylinder cars. It also shows that eight cylinder cars are generally heavier, more powerful, and less fuel efficient. Overall, the plot suggests a strong relationship where larger engines and heavier vehicles are associated with lower fuel economy. This graph is appropriate because it allows vehicle fuel efficiency to be compared across cylinder type but this time the density is being shown.