install.packages("GGally")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
library(GGally)
## Loading required package: ggplot2
install.packages("viridis")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
library(viridis)
## Loading required package: viridisLite
install.packages("hrbrthemes")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
library(hrbrthemes)
# Data set is provided by R natively
data <- iris
data %>%
ggparcoord(
columns = 1:4,
groupColumn = 5,
order = "anyClass",
scale = "uniminmax",
showPoints = TRUE,
alphaLines = 0.3,
title = "Comparison of Iris Species Across Floral Measurements"
) +
scale_x_discrete(labels = c(
"Petal.Length" = "Petal Length",
"Petal.Width" = "Petal Width",
"Sepal.Length" = "Sepal Length",
"Sepal.Width" = "Sepal Width"
)) +
scale_color_viridis(discrete = TRUE) +
theme_ipsum()
Explanation
This plot is appropriate for the Iris dataset because it allows multiple numerical variables to be compared at the same time. Each line represents one flower observation and the axes display the different floral measurements. This makes it easier to identify patterns, similarities, and differences among the iris species.
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
install.packages("maps")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
library(maps)
##
## Attaching package: 'maps'
## The following object is masked from 'package:viridis':
##
## unemp
install.packages("tibble")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
library(tibble)
crime_data = USArrests %>%
rownames_to_column("state") %>%
mutate(state = tolower(state))
us_map = map_data("state")
map_df = left_join(us_map, crime_data, by = c("region" = "state"))
ggplot(map_df, aes(long, lat, group = group, fill = Murder)) +
geom_polygon(color = "white", linewidth = 0.2) +
coord_fixed(1.3) +
scale_fill_gradient(
low = "lightblue",
high = "darkred",
na.value = "grey90",
name = "Murder Rate"
) +
labs(
title = "US Murder Rates by State") +
theme_void()
Explanation
This plot shows the distribution of the crime variable “murder” in the united states. The color gradients I used was red as high murder rates since red is visually correlated with bad and blue as the oposite representing low murder rates. This map shows greopraphic variation through the united states.
library(ggplot2)
library(ggalluvial)
library(RColorBrewer)
# Convert Titanic table to dataframe
titanic_df = as.data.frame(Titanic)
ggplot(titanic_df,aes(axis1 = Sex, axis2 = Class, axis3 = Age, axis4 = Survived, y = Freq)) +
geom_alluvium(
aes(fill = Sex),
width = 0.15,
alpha = 0.7
) +
geom_stratum(
width = 0.22,
fill = "grey20",
color = "white"
) +
geom_text(
stat = "stratum",
aes(label = after_stat(stratum)),
color = "white",
size = 3
) +
scale_x_discrete(
limits = c("Sex", "Class", "Age", "Survived"),
labels = c(
"Sex",
"Passenger Class",
"Age Group",
"Survival"
),
expand = c(.1, .05)
) +
scale_fill_brewer(
palette = "Dark2"
) +
labs(
title = "Titanic Passenger Survival Flow",
fill = "Sex"
) +
theme_minimal(base_size = 14) +
theme(
panel.grid = element_blank(),
axis.title = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
legend.position = "bottom",
plot.title = element_text(
face = "bold",
size = 16))
Explanation
This graph shows the flow and connection between sex, passenger class, age group and survival rate. This graph is apporipate because it shows all required varables using the titantic dataset. I chose to color by sex becuause it made the connections more distinct when folowing the flow to each variable.
install.packages("ggforce")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
library(ggforce)
ggplot(iris,aes( x = Species, y = Sepal.Length, color = Species)) +
geom_sina(
size = 2,
alpha = 0.7
) +
labs(
title = "Distribution of Sepal Length by Iris Species",
x = "Species",
y = "Sepal Length"
) +
scale_color_brewer(
palette = "Set2"
) +
theme_minimal(base_size = 15) +
theme(
legend.position = "none",
plot.title = element_text(
face = "bold",
size = 16))
Explanation
This graph is appropriate for this data because its shows the distribution for the sepal length variable. The data shows each point which shows all the data which helps see the spread. Using geom_sina allows for less overlap making it clearer to understand the distribution.
I will be using the Mtcars data set to show how different characteristics differ across vehicle type.
data = mtcars
mtcars
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
For plot 1, a box plot is being used to represent the average MPG grouped by cylinder type. This shows how fuel efficiency is effected by the amount of cylinders a vehicle has.
mtcars$cyl <- as.factor(mtcars$cyl)
ggplot(mtcars, aes(x = cyl, y = mpg, fill = cyl)) +
geom_boxplot(alpha = 0.5) +
labs(
title = "Fuel Efficiency Across Cylinder Groups",
x = "Number of Cylinders",
y = "Miles Per Gallon (MPG)",
fill = "Cylinders"
) +
theme_minimal(base_size = 12)
The fuel efficiency decreases when there is more cylinders. The four cylinder cars have a median of 26 mpg with eight cylinder cars having a median of 15. This type of graph is appropriate because it directly shows how the characteristic of cylinder type effects the cars fuel efficiency.
For plot 2, a parallel coordinate plot is being used to represent the relationship between MPG, displacement, horsepower and weight by cylinder type. This shows how each factor plays a role and differs in each cylinder type.
mtcars$cyl <- factor(mtcars$cyl)
ggparcoord(
mtcars,
columns = c(1, 3, 4, 6),
groupColumn = "cyl",
scale = "uniminmax",
showPoints = TRUE,
alphaLines = 0.4,
title = "Vehicle Characteristics Across Cylinder Groups"
) +
scale_x_discrete(labels = c(
"mpg" = "MPG",
"disp" = "Displacement",
"hp" = "Horsepower",
"wt" = "Weight"
)) +
scale_color_viridis(discrete = TRUE) +
labs(
x = NULL,
y = "Scaled Value",
color = "Cylinders"
) +
theme_ipsum() +
theme(
plot.title = element_text(
size = 18,
face = "bold"))
Plot 2 shows that the four cylinder cars have higher MPG, lower horsepower, smaller engine displacement, and lower weight compared with six and eight cylinder cars. It also shows that eight cylinder cars are generally heavier, more powerful, and less fuel efficient. Overall, the plot suggests a strong relationship where larger engines and heavier vehicles are associated with lower fuel economy. This graph is appropriate because it shows multiple characteristics across cylinder types.
For plot 3, a scatter plot with regression trend lines is showing the relationship between horsepower and fuel efficiency across transmission type.
mtcars$am <- factor(
mtcars$am,
labels = c("Automatic", "Manual")
)
ggplot(mtcars, aes(x = hp, y = mpg, color = am)) +
geom_point(size = 3, alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE) +
labs(
title = "Horsepower vs Fuel Efficiency",
x = "Horsepower",
y = "Miles Per Gallon",
color = "Transmission"
) +
theme_minimal(base_size = 14)
## `geom_smooth()` using formula = 'y ~ x'
Plot 3 shows a negative relationship between horsepower and fuel
efficiency, meaning cars with higher horsepower generally have lower
MPG. Manual transmission cars tend to have slightly better fuel
efficiency than automatic cars at similar horsepower levels. A scatter
plot is appropriate because it clearly displays the relationship between
two continuous variables (horsepower and MPG). Adding the regression
lines and color helps compare trends between transmission types and
makes patterns easier to interpret.
Plot 4, uses a histogram to explore the distribution of fuel efficiency among cars in the mtcars dataset.
hist(
mtcars$mpg,
main = "Histogram of MPG in mtcars Dataset",
xlab = "Miles Per Gallon (mpg)",
ylab = "Frequency",
breaks = 10
)
In plot 4, the histogram shows that most cars in the dataset have MPG values between about 15 and 22. There are fewer cars with very low or very high fuel efficiency, indicating the data is somewhat concentrated around the middle range. A histogram is appropriate because it displays the distribution and frequency of a single variable. It helps identify the spread and possible skewness of MPG values in the dataset.
For plot 5 I used a density plot to compare the distribution of fuel efficiency across cars with different cylinder counts.
ggplot(mtcars, aes(x = mpg, fill = cyl)) +
geom_density(alpha = 0.5) +
labs(
title = "Distribution of Fuel Efficiency by Cylinder Count",
x = "Miles Per Gallon",
y = "Density",
fill = "Cylinders"
) +
theme_minimal(base_size = 14)
Plot 5 shows that four cylinder cars have higher MPG, lower horsepower, smaller engine displacement, and lower weight compared with six and eight cylinder cars. It also shows that eight cylinder cars are generally heavier, more powerful, and less fuel efficient. Overall, the plot suggests a strong relationship where larger engines and heavier vehicles are associated with lower fuel economy. This graph is appropriate because it allows vehicle fuel efficiency to be compared across cylinder type but this time the density is being shown.