In this note, we use a real data set containing crime cases since 2015 in Philadelphia neighborhoods as an example to illustrate various visual inspections of the given data including basic EDA and inference statistics if the data set has relevant information for inference. You can find other related data sets with information from OpenDataPhilly.
We have introduced two basic types of maps: choropleth and reference maps. When analyzing data involving geo-information of specific geographical regions at certain levels such as blocks, neighborhoods, districts, zip codes, counties, states, etc, we need to use a related boundary shapefile to draw the related regions. We use filled colors to denote the values of a key variable of interest. In general, making a choropleth map requires a process of data integration in which other relevant data sources with be merged with the shapefile.
include_graphics("shapeFileContent.png")
Plotting a reference map is relatively straightforward, the only required geo-information is longitude and latitude. The point sizes and colors can be mapped to the values of variables other than longitude and latitude.
The visual analysis is based on the following three data sets
## covid_vaccines_by_census_tract.geojson
phillyNeighbor <- st_read("https://pengdsci.github.io/STA553VIZ/w08/Neighborhoods_Philadelphia.geojson")
philly <- st_read("https://pengdsci.github.io/STA553VIZ/w08/PhillyNeighborhood-blocks.geojson") # block level data
phillyCrime <- read.csv("https://pengdsci.github.io/STA553VIZ/w08/PhillyCrimeSince2015.csv")
###
phillyCrime$name = toupper(gsub("-", "_", phillyCrime$neighborhood))
# extracting year from variable 'date'
slash.loc = unlist(gregexpr('/', phillyCrime$date))
slash2.loc = slash.loc[2*(1:dim(phillyCrime)[1])]
phillyCrime$year = substr(phillyCrime$date, slash2.loc+1, slash2.loc+4)
Since the crime data set has a neighborhood name which will be used as a key to join the shapefile of the neighborhood. There is no variable in the crime data that can be used as a key to merge with the census tract shapefile. We aggregate relevant information at the neighborhood level to make a choropleth map and a scatter-plot map to display information on individual crime cases.
SubCrime = phillyCrime[,c("dc_key","fatal","name","year")]
aggregateCrime = aggregate(SubCrime$fatal, by=list(SubCrime$name,SubCrime$year), FUN=length)
## Longitude and latitude of zip code center
ZipLon = aggregate(phillyCrime$lng, by=list(phillyCrime$zip_code), FUN=mean)
ZipLat = aggregate(phillyCrime$lat, by=list(phillyCrime$zip_code), FUN=mean)
ZipLonLat = merge(ZipLon, ZipLat, by = "Group.1")
names(ZipLonLat) = c("zip", "lng", "lat")
## Zip code crime
ZipCrimeTable = table(phillyCrime$fatal, phillyCrime$zip_code)
ZipCrime = data.frame(zip=as.numeric(names(table(phillyCrime$zip_code))),
fatal = table(phillyCrime$fatal, phillyCrime$zip_code)[1,],
nonfatal = table(phillyCrime$fatal, phillyCrime$zip_code)[2,],
total.crime = table(phillyCrime$zip_code)
)
ZipCrime = ZipCrime[, c("zip", "fatal", "nonfatal", "total.crime.Freq")]
colnames(ZipCrime) = c("zip", "fatal", "nonfatal", "total.crime")
ZipCrime = merge(ZipLonLat, ZipCrime, by = "zip")
The process of visualization involves several steps. In order to create an effective visualization, we need to know the data, the audience needs, the principles of visualization, the relevant information to be visualized, the appropriate tools for creating visualization, and
The crime data set is neither a population nor a random sample from a population. It is only a set of public records of crime in the neighborhoods of Philadelphia for the past 10 years.
The following code reflects the above thoughts.
## Define a color palette
pal <- colorFactor(c("red", "gold"), domain = c("Fatal", "Nonfatal"))
## Define objects with geo-coordinate system to plot specific information
pnt = st_as_sf(data.frame(x = -75.1652, y = 39.9526),
coords = c("x", "y"),
crs = 4326)
media = st_as_sf(data.frame(x = -75.3877, y = 39.9168),
coords = c("x", "y"),
crs = 4326)
ageDistloc = st_as_sf(data.frame(x = -75.3677, y = 39.9168),
coords = c("x", "y"),
crs = 4326)
ziploc = st_as_sf(data.frame(x = -75.3477, y = 39.9168),
coords = c("x", "y"),
crs = 4326)
aniloc = st_as_sf(data.frame(x = -75.3277, y = 39.9168),
coords = c("x", "y"),
crs = 4326)
ageDistloc01 = st_as_sf(data.frame(x = -75.3077, y = 39.9168),
coords = c("x", "y"),
crs = 4326)
#### Caution: plot_ly DID NOT show in the popup of leaflet.
fig <- plot_ly(ZipCrime, x = ~lng, y = ~lat, color = ~zip, size = ~total.crime, #colors = colors,
type = 'scatter',
mode = 'markers',
#sizes = c(min(log(ZipCrime$total.crime)), max(log(ZipCrime$total.crime))),
marker = list(symbol = 'circle',
sizemode = 'diameter',
line = list(width = 2, color = '#FFFFFF')),
text = ~paste('Zip Code:', zip,
'<br>Total Crime:', total.crime,
'<br>Fatal Crime:', fatal,
'<br>Nonfatal Crime:', nonfatal)) %>% hide_colorbar()
## Images to be plot on the map
img = "https://pengdsci.github.io/STA553VIZ/w08/PhillyCityHall.jpg"
trend = "https://pengdsci.github.io/STA553VIZ/w08/CrimeTrend.jpg"
ageDist = "https://pengdsci.github.io/STA553VIZ/w08/ageDist.jpg"
trendIcon = "https://pengdsci.github.io/STA553VIZ/w08/trend-icon.jpg"
crimeByYear= "https://pengdsci.github.io/STA553VIZ/w08/PhillyCrimeByYear.gif"
## HTML style for the map title
tag.map.title <- tags$style(HTML("
.leaflet-control.map-title {
transform: translate(50%,50%);
position: fixed !important;
left: 5%;
text-align: center;
padding-left: 10px;
padding-right: 10px;
background: transparent;
font-weight: bold;
font-size: 18px;}
"))
rr <- tags$div(
HTML('<img border="0" alt="ImageTitle" src="https://pengdsci.github.io/STA553VIZ/w08/leafTitle.png" width="200" height="45">')
)
######################################
######################################
## Data analysis
## 1. age distribution by age
fatal.cols <- c("#F76D5E", "#72D8FF")
# Basic density plot in ggplot2
fatalAge = ggplot(phillyCrime, aes(x = age, fill =fatal )) +
geom_density(alpha = 0.7) +
scale_fill_manual(values = fatal.cols)
##
tble = as.matrix(table(phillyCrime$race, phillyCrime$fatal))
dat.tbl = data.frame(race=rownames(tble), fatal = as.vector(tble[, 1]), non.fatal = as.vector(tble[, 2]))
popups = paste(str_pad("Race", 20, "right"), str_pad("Fatal", 20, "right"), str_pad("Nonfatal", 20, "right"),
"<br> Asian" ,strrep(' ', 10), 25,strrep(' ', 6), strrep(' ', 20), "86",
"<br> Black (Non-Hispanic)", strrep(' ', 4), "2628",strrep(' ', 5), "9924",
"<br> Hispanic (Black or White)", strrep(' ', 2), "393",strrep(' ', 5), "1482",
"<br> Other/Unknown", strrep(' ', 9), "2",strrep(' ', 5), "128",
"<br> White (Non-Hispanic)", strrep(' ', 5), "169", strrep(' ', 5), "683")
#poptble <- tble %>% htmlTable
##################################################
## Crime distribution by zip code - bubble plot
pal0 <- colorNumeric(palette = viridis(256, option = "B"), domain = range(ZipCrime$total.crime))
zipfig <- leaflet() %>%
setView(lng=-75.1527, lat=39.9707, zoom = 11) %>%
addProviderTiles(providers$Esri.WorldGrayCanvas) %>%
## plot the neighborhood boundary of Philly
addPolygons(data = phillyNeighbor,
color = 'skyblue',
weight = 1) %>%
##
addCircleMarkers(data = ZipCrime,
radius = ~((total.crime)^(1/3)),
color = ~pal0(total.crime),
stroke = FALSE,
fillOpacity = 0.5,
popup = ~paste('Zip Code:', zip,
'<br>Total Crime:', total.crime,
'<br>Fatal Crime:', fatal,
'<br>Nonfatal Crime:', nonfatal))
##################################################
## popup interactive graphs with plotly
fl = tempfile(fileext = ".html")
saveWidget(zipfig, file = fl)
###
leaflet() %>%
setView(lng=-75.15092, lat=40.00995, zoom = 11) %>%
#addProviderTiles(providers$CartoDB.DarkMatter) %>%
##
addProviderTiles(providers$CartoDB.DarkMatter, group="Dark") %>%
addProviderTiles(providers$CartoDB.DarkMatterNoLabels, group="DarkLabel") %>%
addProviderTiles(providers$Esri.NatGeoWorldMap, group="Esri") %>%
#addTiles(providers$CartoDB.PositronNoLabels) %>%
addControl(rr, position = "topleft", className="map-title") %>%
## mini reference map
addMiniMap() %>%
## neighborhood boundary
addPolygons(data = phillyNeighbor,
color = 'skyblue',
weight = 1) %>%
## plot information on the map
addCircleMarkers(data = phillyCrime,
radius = ~ifelse(fatal == "Fatal", 5, 3),
color = ~pal(fatal),
stroke = FALSE,
fillOpacity = 0.5,
popup = ~popupTable(phillyCrime),
clusterOptions = markerClusterOptions(maxClusterRadius = 40)) %>%
# Adding the image of city hall
addCircleMarkers(data = pnt,
color = "blue",
weight = 2,
label = "City Hall",
stroke = FALSE,
fillOpacity = 0.95,
group = "pnt") %>%
addPopupImages(img,
width = 100,
height = 120,
tooltip = FALSE,
group = "pnt") %>%
# Trend of crimes over the years
addCircleMarkers(data = media,
color = "red",
weight = 2,
label = "Trend",
stroke = FALSE,
fillOpacity = 0.95,
group = "media") %>%
addPopupImages(trend,
width = 500,
height = 350,
tooltip = FALSE,
group = "media") %>%
# age distribution within the two types of crimes
addCircleMarkers(data = ageDistloc,
color = "skyblue",
weight = 2,
label = "Age Distribution",
stroke = FALSE,
fillOpacity = 0.95,
group = "ageDistloc") %>%
addPopupImages(ageDist,
width = 500,
height = 320,
tooltip = FALSE,
group = "ageDistloc") %>%
# Crime counts by zip code: bubble plot
addCircleMarkers(data = ziploc,
color = "white",
weight = 2,
label = "ZIP Location",
stroke = FALSE,
fillOpacity = 0.95,
group = "ziploc") %>%
leafpop:::addPopupIframes(
source = fl,
width = 500,
height = 400,
group = "ziploc" ) %>%
# Animated (cumulative) crime counts by years
addCircleMarkers(data = aniloc,
color = "gold",
weight = 2,
label = "Crimes Accumulation by year",
stroke = FALSE,
fillOpacity = 0.95,
group = "aniloc") %>%
addPopupImages(crimeByYear,
width = 500,
height = 400,
tooltip = FALSE,
group = "aniloc") %>%
# Crime counts by zip code: bubble plot
addCircleMarkers(data = ageDistloc01,
color = "purple",
weight = 2,
label = "Contingency Table: Race vs Crime Type",
stroke = FALSE,
fillOpacity = 0.95,
popup = ~popups) %>%
addLayersControl(baseGroups = c('Dark', 'DarkLabel', 'Esri'),
overlayGroups = c("Crime Data"),
options = layersControlOptions(collapsed = TRUE)) %>%
##
browsable()