WiDS 2025 Datathon @ Ball State University
2025-03-27
What is Data Viz?
Why Data Viz?
How Data Viz?
This presentation introduces the world of ggplot2, several different types of plots, and some basic features to enhance your visuals. Then it covers simple data visualization principles when working with multiple variables and how to implement these within ggplot2.
Before we can use the functions, datasets, and help pages within the tidyverse, which includes ggplot2, we need to load the package. We can do this by running:
Note if any package is not currently installed, it cannot be loaded. We can install packages using the ‘Packages’ tab or by running:
We are going to use msleep dataset from ggplot2 package, aka ggplot2::msleep (this is the syntax for denoting which package a function or dataset comes from packagename::functionname()). This contains price and other attribute information about a large sample of diamonds.
You can search a function or dataset name in the ‘Help’ tab or run ?< function or dataset > to bring up the documentation.
To preview the dataset, we can click on it in the ‘Environment’ tab or run glimpse(), which shows a better formatted preview than the standard print() function.
Displayed results compactly show the number of observations, the number of variables and their corresponding data types and also some of the raw data.
Building from scratch
All plots follow a similar structure that builds up from the ggplot() function.
And the first thing we can do is specify the dataset we will be using.
Aesthetic mapping
Next, we can add a layer of geometric features with geom_*(). This uses uses aesthetic mapping, which takes values of a variable and translates them into a visual feature.
Choice of geometry depends on the data types. sleep_rem and sleep_total are continuous \(\Longrightarrow\) scatterplot via geom_point().
Use aes() function to tell R the attributes are from the specified dataset and use + between ggplot2 functions to add layers.
Other attributes
Now we can adapt the scatterplot from before to learn more about function structure.
Anything that is a simple “constant” value (i.e. not part of the data and just an option for visual look) should be specified locally and outside of the aes() function.
Incorporating more variables via aes()
Only data-driven attributes go inside the aes() function.
To see how this works, let’s take a look at the iris dataset.
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
11 5.4 3.7 1.5 0.2 setosa
12 4.8 3.4 1.6 0.2 setosa
13 4.8 3.0 1.4 0.1 setosa
14 4.3 3.0 1.1 0.1 setosa
15 5.8 4.0 1.2 0.2 setosa
16 5.7 4.4 1.5 0.4 setosa
17 5.4 3.9 1.3 0.4 setosa
18 5.1 3.5 1.4 0.3 setosa
19 5.7 3.8 1.7 0.3 setosa
20 5.1 3.8 1.5 0.3 setosa
21 5.4 3.4 1.7 0.2 setosa
22 5.1 3.7 1.5 0.4 setosa
23 4.6 3.6 1.0 0.2 setosa
24 5.1 3.3 1.7 0.5 setosa
25 4.8 3.4 1.9 0.2 setosa
26 5.0 3.0 1.6 0.2 setosa
27 5.0 3.4 1.6 0.4 setosa
28 5.2 3.5 1.5 0.2 setosa
29 5.2 3.4 1.4 0.2 setosa
30 4.7 3.2 1.6 0.2 setosa
31 4.8 3.1 1.6 0.2 setosa
32 5.4 3.4 1.5 0.4 setosa
33 5.2 4.1 1.5 0.1 setosa
34 5.5 4.2 1.4 0.2 setosa
35 4.9 3.1 1.5 0.2 setosa
36 5.0 3.2 1.2 0.2 setosa
37 5.5 3.5 1.3 0.2 setosa
38 4.9 3.6 1.4 0.1 setosa
39 4.4 3.0 1.3 0.2 setosa
40 5.1 3.4 1.5 0.2 setosa
41 5.0 3.5 1.3 0.3 setosa
42 4.5 2.3 1.3 0.3 setosa
43 4.4 3.2 1.3 0.2 setosa
44 5.0 3.5 1.6 0.6 setosa
45 5.1 3.8 1.9 0.4 setosa
46 4.8 3.0 1.4 0.3 setosa
47 5.1 3.8 1.6 0.2 setosa
48 4.6 3.2 1.4 0.2 setosa
49 5.3 3.7 1.5 0.2 setosa
50 5.0 3.3 1.4 0.2 setosa
51 7.0 3.2 4.7 1.4 versicolor
52 6.4 3.2 4.5 1.5 versicolor
53 6.9 3.1 4.9 1.5 versicolor
54 5.5 2.3 4.0 1.3 versicolor
55 6.5 2.8 4.6 1.5 versicolor
56 5.7 2.8 4.5 1.3 versicolor
57 6.3 3.3 4.7 1.6 versicolor
58 4.9 2.4 3.3 1.0 versicolor
59 6.6 2.9 4.6 1.3 versicolor
60 5.2 2.7 3.9 1.4 versicolor
61 5.0 2.0 3.5 1.0 versicolor
62 5.9 3.0 4.2 1.5 versicolor
63 6.0 2.2 4.0 1.0 versicolor
64 6.1 2.9 4.7 1.4 versicolor
65 5.6 2.9 3.6 1.3 versicolor
66 6.7 3.1 4.4 1.4 versicolor
67 5.6 3.0 4.5 1.5 versicolor
68 5.8 2.7 4.1 1.0 versicolor
69 6.2 2.2 4.5 1.5 versicolor
70 5.6 2.5 3.9 1.1 versicolor
71 5.9 3.2 4.8 1.8 versicolor
72 6.1 2.8 4.0 1.3 versicolor
73 6.3 2.5 4.9 1.5 versicolor
74 6.1 2.8 4.7 1.2 versicolor
75 6.4 2.9 4.3 1.3 versicolor
76 6.6 3.0 4.4 1.4 versicolor
77 6.8 2.8 4.8 1.4 versicolor
78 6.7 3.0 5.0 1.7 versicolor
79 6.0 2.9 4.5 1.5 versicolor
80 5.7 2.6 3.5 1.0 versicolor
81 5.5 2.4 3.8 1.1 versicolor
82 5.5 2.4 3.7 1.0 versicolor
83 5.8 2.7 3.9 1.2 versicolor
84 6.0 2.7 5.1 1.6 versicolor
85 5.4 3.0 4.5 1.5 versicolor
86 6.0 3.4 4.5 1.6 versicolor
87 6.7 3.1 4.7 1.5 versicolor
88 6.3 2.3 4.4 1.3 versicolor
89 5.6 3.0 4.1 1.3 versicolor
90 5.5 2.5 4.0 1.3 versicolor
91 5.5 2.6 4.4 1.2 versicolor
92 6.1 3.0 4.6 1.4 versicolor
93 5.8 2.6 4.0 1.2 versicolor
94 5.0 2.3 3.3 1.0 versicolor
95 5.6 2.7 4.2 1.3 versicolor
96 5.7 3.0 4.2 1.2 versicolor
97 5.7 2.9 4.2 1.3 versicolor
98 6.2 2.9 4.3 1.3 versicolor
99 5.1 2.5 3.0 1.1 versicolor
100 5.7 2.8 4.1 1.3 versicolor
101 6.3 3.3 6.0 2.5 virginica
102 5.8 2.7 5.1 1.9 virginica
103 7.1 3.0 5.9 2.1 virginica
104 6.3 2.9 5.6 1.8 virginica
105 6.5 3.0 5.8 2.2 virginica
106 7.6 3.0 6.6 2.1 virginica
107 4.9 2.5 4.5 1.7 virginica
108 7.3 2.9 6.3 1.8 virginica
109 6.7 2.5 5.8 1.8 virginica
110 7.2 3.6 6.1 2.5 virginica
111 6.5 3.2 5.1 2.0 virginica
112 6.4 2.7 5.3 1.9 virginica
113 6.8 3.0 5.5 2.1 virginica
114 5.7 2.5 5.0 2.0 virginica
115 5.8 2.8 5.1 2.4 virginica
116 6.4 3.2 5.3 2.3 virginica
117 6.5 3.0 5.5 1.8 virginica
118 7.7 3.8 6.7 2.2 virginica
119 7.7 2.6 6.9 2.3 virginica
120 6.0 2.2 5.0 1.5 virginica
121 6.9 3.2 5.7 2.3 virginica
122 5.6 2.8 4.9 2.0 virginica
123 7.7 2.8 6.7 2.0 virginica
124 6.3 2.7 4.9 1.8 virginica
125 6.7 3.3 5.7 2.1 virginica
126 7.2 3.2 6.0 1.8 virginica
127 6.2 2.8 4.8 1.8 virginica
128 6.1 3.0 4.9 1.8 virginica
129 6.4 2.8 5.6 2.1 virginica
130 7.2 3.0 5.8 1.6 virginica
131 7.4 2.8 6.1 1.9 virginica
132 7.9 3.8 6.4 2.0 virginica
133 6.4 2.8 5.6 2.2 virginica
134 6.3 2.8 5.1 1.5 virginica
135 6.1 2.6 5.6 1.4 virginica
136 7.7 3.0 6.1 2.3 virginica
137 6.3 3.4 5.6 2.4 virginica
138 6.4 3.1 5.5 1.8 virginica
139 6.0 3.0 4.8 1.8 virginica
140 6.9 3.1 5.4 2.1 virginica
141 6.7 3.1 5.6 2.4 virginica
142 6.9 3.1 5.1 2.3 virginica
143 5.8 2.7 5.1 1.9 virginica
144 6.8 3.2 5.9 2.3 virginica
145 6.7 3.3 5.7 2.5 virginica
146 6.7 3.0 5.2 2.3 virginica
147 6.3 2.5 5.0 1.9 virginica
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
geom_histogram()
A basic histogram is a univariate plot that can be used for continuous variables and is created via geom_histogram().
Pay close attention to the how wide the intervals are on histograms.
Use the labs() function to adds main titles, axis titles, etc. These labels can be tacked onto any ggplot2 plot.
Themes can be added via theme_*().
boxplot() and geom_boxplot()
Boxplots are another common plot, which are used to visualize the distribution of a numeric variable. However, they no longer map the raw data.
Instead, boxplot() and geom_boxplot() map the five number summary that is computed from the raw data.
Min. 1st Qu. Median Mean 3rd Qu. Max.
-3.05367 -0.58794 -0.03286 0.02546 0.63498 3.11785
Comparitive boxplots
geom_boxplot() requires a continuous variable to be mapped to either the x or y argument.
We can also make comparative (side-by-side) boxplots by mapping a categorical variable to the other axis.
geom_bar()
A bar graph (also known as bar chart or bar plot) is used for categorical data and assigns a height of a bar to the count of a group.
Bargraphs plot a summary of the data, specifically the frequency (or relative frequency = frequency / total).
If we have raw data (not already summarized), then we can use geom_bar() without any options.
# create bar graph
ggplot(data = na.omit(msleep),
aes(x = vore)) +
geom_bar(fill = "magenta") +
geom_text(stat = "count",
aes(x = vore,
label = after_stat(count)),
vjust = -0.5) +
theme(axis.text.x=element_text(angle=45,hjust=0.5,vjust=0.5,size=14),
axis.title=element_text(size=16,face="bold")) +
labs(x = "Vore",
y = "Count")ggplot2 and R, you can make very informative, professional plots!# line plot of points over the season
# -> specific colors by W/L and also take into account location with facets
# -> add reference line for average points
# The database can be found on the link provided below.
# https://ballstate-my.sharepoint.com/:u:/g/personal/hazem_almofleh_bsu_edu/Ef9So2vCD6BApdofoFhkQDIBACKeHLlnSyMbizabbtLw6w?e=y0xowK
# load basketball data
load("data-bsu-game.RData")
# filter to most recent season and take out the few neutral games
data_bsu_plot <- bsu_game %>%
filter(Season == max(Season),
Location != "Neutral")ggplot(data = data_bsu_plot,
aes(x = Date,
y = Points)) +
geom_point(aes(color = Outcome),
size = 6) +
geom_line(color = "lightgrey") +
geom_hline(aes(yintercept = mean(Points)),
color = "grey20",
linetype = "dashed") +
scale_color_manual(values = c("L" = "red", "W" = "green")) +
labs(title = "Ball State Basketball 2012-13") +
theme_bw()geom_line(), facet_wrap(), etc.ggplot(data = data_bsu_plot,
aes(x = Date,
y = Points)) +
geom_point(aes(color = Outcome),
size = 6) +
geom_line(color = "lightgrey") +
geom_hline(aes(yintercept = mean(Points)),
color = "grey20",
linetype = "dashed") +
scale_color_manual(values = c("L" = "red", "W" = "green")) +
facet_wrap(~ Location) +
labs(title = "Ball State Basketball 2012-13") +
theme_bw()The easiest way to add interactivity to plots is via plotly::ggplotly(), which allows us to create our usual ggplot2 workflows and then translate them to plotly.
To do this, we simply need to create a ggplot object, say p <- < ggplot call > and pass that to our new function, ggplotly(p).
Shiny is an open source R package that provides an elegant and powerful web framework for building web applications using R.
Shiny helps you turn your analyses into interactive web applications without requiring HTML, CSS, or JavaScript knowledge.
shinyuieditor lets us build Shiny application UIs by dragging-and-dropping. Generates clean and proper code as you build.
library(shiny)
library(tidyverse)
library(ggplot2)
# This is a shiny application that is used to explore and visualize data that
# uses a Spotify dataset with different variables, such as Energy, Danceability,
# and Loudness.
# The database can be found on the canvas page or the link provided below.
# https://ballstate-my.sharepoint.com/:u:/g/personal/hazem_almofleh_bsu_edu/EVa5ZQ9q-ZdKgxEx0R1VtykBJEtzUYbZe0Kv69UEAqmbMg?e=DAaXpU
load("data-music.RData")
# Select the specified columns
columns_to_keep <- c("danceability", "energy", "key", "loudness", "mode",
"speechiness", "acousticness", "instrumentalness",
"liveness", "valence", "tempo", "duration_ms", "popularity")
# Filter the dataset to include only the columns listed above
filtered_data <- data_music[, columns_to_keep]
ui <- fluidPage(
# Adjust graph size when adjusted window size
tags$head(
tags$script(HTML("
$(function() {
function resizePlots() {
var plotHeight = $(window).height() * 0.8; // 80% of window height
$('#relation').height(plotHeight);
$('#density').height(plotHeight);
}
$(window).on('resize', resizePlots);
$(document).ready(resizePlots);
});
"))
),
titlePanel("Spotify Data Visualizer"),
sidebarLayout(
sidebarPanel(
selectInput("xvar", "Select X-axis variable:", choices = names(filtered_data)),
selectInput("yvar", "Select Y-axis variable:", choices = names(filtered_data)),
"Select a variable and a chart type to see the relationship between variables"
),
mainPanel(
tabsetPanel(
tabPanel(title = "Relationships Between Variables",
plotOutput(outputId = "relation", height = "500px")),
tabPanel(title = "Density of Variable",
plotOutput(outputId = "density", height = "500px")),
)
)
)
)
server <- function(input, output, session) {
output$relation <- renderPlot({
req(input$xvar, input$yvar)
ggplot(filtered_data, aes_string(x = input$xvar, y = input$yvar)) +
geom_smooth(fill = "blue") + # made the bar plot into a smooth plot because that's what the title implies - Cody Schultz
theme_minimal() +
labs(title = paste('Average ', str_to_title(input$yvar), ' by ', str_to_title(input$xvar), ' Plot'), x = input$xvar, y = input$yvar) # Cleaned up the title so it changes dynamically with the selected variables. - Cody Schultz
})
output$density <- renderPlot({ # Added another plot that performs a similar function as the old one, shows the density of songs with a certain value of the x-axis variable (y-axis does nothing) - Cody Schultz
req(input$xvar, input$yvar)
ggplot(filtered_data, aes_string(x = input$xvar)) +
geom_density(fill = "blue") +
theme_minimal() +
labs(title = paste(str_to_title(input$xvar), " Density Plot"), x = input$xvar, y = "Density of Songs")
})
}
shinyApp(ui = ui, server = server)http://rpubs.com/almof1hm/BSU_WiDS_2025
WiDS 2025 Datathon