Our first visualization will be an interactive map, containing various locations of gas stations from the contential United States. Each gas station will be a point on the map, and each point will have a hover text box containing state, county, address, and zip code.
Since our source data contains nearly 73,000 observations, we will randomly select 500 observations.
gas<-read.csv("https://pengdsci.github.io/datasets/POC/POC.csv")
rand.df <- gas[sample(nrow(gas), size=500), ]
We will use the plotly package to map the data onto an
interactive map.
g <- list( scope = 'p',
projection = list(type = 'albers usa'),
showland = TRUE,
landcolor = toRGB("gray95"),
subunitcolor = toRGB("gray85"),
countrycolor = toRGB("gray85"),
countrywidth = 0.5,
subunitwidth = 0.5
)
fig <- plot_geo(rand.df, lat = ~ycoord, lon = ~xcoord) %>%
add_markers( text = ~paste(STATE, county, ADDRESS, ZIPnew,
sep = "<br>"),
#color = ,
symbol = "circle",
#size = ,
hoverinfo = "text") %>%
layout( title = 'Randomly Selected US Gas Stations',
geo = g )
fig
The above is a demonstration of a simple - yet effective - way to visualize data on a map. Our next will be a little more complex.
For this visualization, we will begin with a dataset of crimes committed in the city of Philadelphia between 2015 and early March of 2024.
We will need to subset the 2023 data before we can impose it over a
map. We will use the stringr library for this.
crime<-read.csv("https://pengdsci.github.io/STA553VIZ/w08/PhillyCrimeSince2015.csv")
df<-crime
df$year <- str_extract(df$date, "\\d{4}")
df$year <- as.numeric(df$year)
crime23<-subset(df, year==2023)
write.csv(crime23, "C:\\Users\\Alex\\Documents\\R\\Grad\\553\\datasets\\wk7.csv")
A copy of the 2023 data can be found at https://raw.githubusercontent.com/AlexDragonetti/STA553/main/hw7/wk7.csv
#remove observations with missing values - at least one has a missing value for coordinates
crime23.nona<-na.omit(crime23)
Finally, we will map the incident data using the leaflet
package.
color2 <- rep("red", length(crime23.nona))
color2[which(crime23.nona$fatal=="Nonfatal")] <- "blue"
color2[which(crime23.nona$fatal=="Fatal")] <- "red"
label.msg <- paste("Street:", crime23.nona$street_name,
"<br>Block Number:",crime23.nona$block_number,
"<br>Neighborhood:", crime23.nona$neighborhood,
"<br>Incident Type:", crime23.nona$fatal)
leaflet(crime23.nona) %>%
addTiles() %>%
setView(lng=mean(crime23.nona$lng), lat=mean(crime23.nona$lat), zoom = 11) %>%
addProviderTiles(providers$Esri.WorldGrayCanvas) %>%
addCircleMarkers(
~lng,
~lat,
color = color2,
stroke = FALSE,
fillOpacity = 0.5,
popup= ~label.msg) %>%
addLegend(position = "bottomright",
colors = c("red", "blue"),
labels= c("Fatal", "Nonfatal"),
title= "Type of Incident",
opacity = 0.4)
Our resulting graph is fully interactive - clicking a dot will show details of the incident.
Please note that the above graph has spots that appear purple. This
is due to the opaque, overlapping red and blue dots, indicating both
fatal and nonfatal incidents at the same address. For example, 1000 E
Bristol Street in Juniata saw an incident with two victims, one being a
fatality. For clarification on any confusing point, please refer to the
dataset linked at the end of Preparing the Data.
Our next visualization will specifically focus on the demographics of
shooting victims. The data is from OpenDataPhilly and can be accessed
here:https://opendataphilly.org/datasets/shooting-victims
(see below code for raw, direct link). We will again utilize
leaflet.
philly.data<-read.csv("https://phl.carto.com/api/v2/sql?q=SELECT+*,+ST_Y(the_geom)+AS+lat,+ST_X(the_geom)+AS+lng+FROM+shootings&filename=shootings&format=csv&skipfields=cartodb_id")
phillyNeighborShooting <- na.omit(st_read("https://pengdsci.github.io/STA553VIZ/w08/PhillyShootings.geojson"))
Reading layer `PhillyShootings' from data source
`https://pengdsci.github.io/STA553VIZ/w08/PhillyShootings.geojson'
using driver `GeoJSON'
replacing null geometries with empty geometries
Simple feature collection with 15555 features and 21 fields (with 29 geometries empty)
Geometry type: POINT
Dimension: XY
Bounding box: xmin: -75.27362 ymin: 39.87799 xmax: -74.95936 ymax: 40.13117
Geodetic CRS: WGS 84
phillyNeighbor <- st_read("https://pengdsci.github.io/STA553VIZ/w08/Neighborhoods_Philadelphia.geojson")
Reading layer `Neighborhoods_Philadelphia' from data source
`https://pengdsci.github.io/STA553VIZ/w08/Neighborhoods_Philadelphia.geojson'
using driver `GeoJSON'
Simple feature collection with 158 features and 8 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -75.28027 ymin: 39.867 xmax: -74.95576 ymax: 40.13799
Geodetic CRS: WGS 84
philly <- st_read("https://pengdsci.github.io/STA553VIZ/w08/PhillyNeighborhood-blocks.geojson")
Reading layer `PhillyNeighborhood-blocks' from data source
`https://pengdsci.github.io/STA553VIZ/w08/PhillyNeighborhood-blocks.geojson'
using driver `GeoJSON'
Simple feature collection with 17555 features and 7 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -75.28027 ymin: 39.867 xmax: -74.95576 ymax: 40.13799
Geodetic CRS: WGS 84
While the data is usable, we intend to display much more information
with each point, so we will remove unnecessary or redundant data, while
fixing a formatting quirk with the ‘date_’ variable (which will also be
renamed to ‘date’) using the stringr package. Without this,
the ‘date_’ variable ends each entry with ‘00:00:00+00’, which is clunky
and likely unintended.
philly.data2 <- philly.data %>%
select(-c(1, 2, 3, 5, 6, 17, 18))
names(philly.data2)[which(names(philly.data2) == "date_")] <- "date"
philly.data2$date <- str_replace(philly.data2$date, " 00:00:00\\+00", "")
# Convert 'date' variable to Date format for possible future use
philly.data2$date <- as.Date(philly.data2$date, format = "%Y-%m-%d")
With out map, we would like to feature individual incidents and allow
someone to click to see victim demographic information, but we would
also like to have ‘big picture’ aggregate plots available as well. We
will prepare three simple plots here using ggplot for
illustrative purposes.
#Distribution of Age across Race
ar.plot<-ggplot(philly.data2, aes(x = age, fill = race)) +
geom_density(alpha = 0.5) +
labs(title = "Distribution of Age by Race",
x = "Age",
y = "Density") +
scale_fill_discrete(name = "Race") +
theme_minimal()
#Probably unnecessary step, but made an extra dataset while testing something out
philly.data3<-subset(philly.data2)
philly.data3$fatal <- factor(philly.data3$fatal, levels = c(0, 1), labels = c("Nonfatal", "Fatal"))
# Plot the point graph with colored points
year.plot <- ggplot(philly.data3, aes(x = year, fill = fatal)) +
geom_bar(position = position_dodge(width = 0.5), stat = "count") +
scale_fill_manual(values = c("Nonfatal" = "blue", "Fatal" = "red")) +
labs(title = "Number of Fatal and Nonfatal Incidents by Year",
x = "Year",
y = "Count") +
theme_minimal() +
scale_x_continuous(breaks = seq(min(philly.data3$year), max(philly.data3$year), by = 1))
#Frequency of indoor vs outdoor incidents
outside.plot <- ggplot(philly.data2, aes(x = year, fill = factor(outside, labels = c("Indoor", "Outdoor")))) +
geom_bar(stat = "count") +
labs(title = "Number of Indoor and Outdoor Incidents by Year",
x = "Year",
y = "Count",
fill = "Location") +
scale_fill_manual(values = c("Indoor" = "darkorchid", "Outdoor" = "darkgreen")) +
scale_x_continuous(breaks = seq(min(philly.data2$year), max(philly.data2$year), by = 1)) +
theme_minimal()
We have exported and uploaded these images separately and will now redefine them:
ar="https://raw.githubusercontent.com/AlexDragonetti/STA553/main/hw8/arplot.png"
out="https://raw.githubusercontent.com/AlexDragonetti/STA553/main/hw8/outsideplot2.png"
yr="https://raw.githubusercontent.com/AlexDragonetti/STA553/main/hw8/yearplot2.png"
Our final visualization will contain a map of shooting victims, fully interactive with demographic information, as well as additional, big-picture data available.
#defining things
pal <- colorFactor(c("blue", "red"), domain = c(0, 1))
ageraceplot = st_as_sf(data.frame(x = -75.4077, y = 39.9168),
coords = c("x", "y"),
crs = 4326)
yearplot = st_as_sf(data.frame(x = -75.3877, y = 39.9168),
coords = c("x", "y"),
crs = 4326)
outdoorplot = st_as_sf(data.frame(x = -75.3677, y = 39.9168),
coords = c("x", "y"),
crs = 4326)
fig <- plot_ly(philly.data2, x = ~lng, y = ~lat,
type = 'scatter',
mode = 'markers',
marker = list(symbol = 'circle',
sizemode = 'diameter',
line = list(width = 2, color = '#FFFFFF')))
tag.map.title <- tags$style(HTML("
.leaflet-control.map-title {
transform: translate(50%,50%);
position: fixed !important;
left: 50%;
text-align: center;
padding-left: 10px;
padding-right: 10px;
background: transparent;
font-weight: bold;
font-size: 18px;}
"))
rr <- tags$div(
HTML('<img border="0" alt="ImageTitle" src="https://raw.githubusercontent.com/AlexDragonetti/STA553/main/hw8/map%20title.png" width="200" height="45">')
)
###
leaflet() %>%
setView(lng=-75.15092, lat=40.00995, zoom = 11) %>%
addProviderTiles(providers$CartoDB.DarkMatter, group="Dark") %>%
addProviderTiles(providers$CartoDB.DarkMatterNoLabels, group="DarkLabel") %>%
addProviderTiles(providers$Esri.NatGeoWorldMap, group="Esri") %>%
addControl(rr, position = "topleft", className="map-title") %>%
## mini reference map
addMiniMap() %>%
## neighborhood boundary
addPolygons(data = phillyNeighbor,
color = 'skyblue',
weight = 1) %>%
addCircleMarkers(data = ageraceplot,
color = "white",
weight = 2,
label = "Distribution of Age by Race",
stroke = FALSE,
fillOpacity = 0.95,
group = "ageraceplot") %>%
addPopupImages(ar,
width = 500,
height = 320,
tooltip = FALSE,
group = "ageraceplot") %>%
addCircleMarkers(data = yearplot,
color = "skyblue",
weight = 2,
label = "Incident Rate by Year",
stroke = FALSE,
fillOpacity = 0.95,
group = "yearplot") %>%
addPopupImages(yr,
width = 500,
height = 320,
tooltip = FALSE,
group = "yearplot") %>%
addCircleMarkers(data = outdoorplot,
color = "darkgreen",
weight = 2,
label = "Rate of Indoor vs Outdoor Incidents",
stroke = FALSE,
fillOpacity = 0.95,
group = "outdoorplot") %>%
addPopupImages(out,
width = 500,
height = 320,
group = "outdoorplot" ) %>%
## plot information on the map
addCircleMarkers(data = philly.data2,
color = ~pal(as.factor(fatal)),
stroke = FALSE,
fillOpacity = 0.5,
popup = ~popupTable(philly.data2),
clusterOptions = markerClusterOptions(maxClusterRadius = 40)) %>%
addLayersControl(baseGroups = c('Dark', 'DarkLabel', 'Esri'),
overlayGroups = c("Crime Data"),
options = layersControlOptions(collapsed = TRUE)) %>%
##
browsable()