Data Visualization

Project Description

Before performing data analysis and making any assumptions about the distributions and relationships between variable data sets, it is always better to visualize our data in order to understand its properties and identify appropriate analytical techniques. It is imperative to understand that basic statistics can often fail to capture real-world complexities (such as outliers, relationships, and complex distributions) since summary statistics do not capture all of the complexities of the data.

In this project two different visualization techniques are going to be used to draw visual understanding from data.

Data Description

The data that will be used in this project comes from the National Council of Missing and Exploited Children. The data set is a small sample of attempted kidnapping information that they collect and store from precincts across the United States. The data contains incidents that occurred over the past five years(2012-2017). The specific incident characteristics will be described as they are used.

Visualization Techniques

Barplots

The first visualization technique that will be used is a bar plot. This plot can be used to display the gender spread of children kidnapped for each attempted kidnapping incident for the last five years.

attemptedKidnappings = read.csv("Attempts_Hackathon_5_Years_of_Data.csv")

barplot(table(attemptedKidnappings$Child.Gender.1),
        main = "Gender of Children in Kidnapping Attempts",
        ylab = "Number of Incidents")

Next, a bar plot is created to visualize the genders of the attempted kidnapping offenders.

barplot(table(attemptedKidnappings$Offender.Gender.1)[-1],
        main = "Gender of Offenders in Kidnapping Attempts",
        ylab = "Number of Incidents")

A bar plot can also be made that displays the races of the children involved in attempted kidnappings.

barplot(table(attemptedKidnappings$Child.Race.1)[-6],
        main = "Race of Children in Kidnapping Attempts",
        ylab = "Number of Incidents")

The races of the offenders can be displayed as well.

barplot(table(attemptedKidnappings$Offender.Race.1),
        main = "Race of Offenders in Kidnapping Attempts",
        ylab = "Number of Incidents")

These bar plots allow for the visualization of the differences between the amount of attempted kidnappings based on various characteristics. In the gender bar plots above, the difference between the gender of children and offenders who are involved in kidnapping attempts can be displayed in a comprehensible way. This type of informative reporting can be useful in organizing rescue efforts or developing preventative measures for kidnapping. Knowing which people groups are targeted the most can assist police officers in keeping an eye out for the right people to protect.

Heatmap

The next visualization technique is a heat map. This heat map shows the amount of attempted kidnappings that have occurred over the past five years for each state across the United States. The darker the color of the state, the higher the incident count. Users can zoom in and out of the map and if a state is clicked then the amount of kidnapping incidents that occurred in that state is displayed.

library(tmap)
library(tmaptools)
library(sf)
library(leaflet)

# read in the shape file
usgeo = read_shape(file = "cb_2016_us_state_5m/cb_2016_us_state_5m.shp", as.sf = TRUE)

# get a table of the frequencies of incidents in each state
kidnapping_table_by_state = as.data.frame(table(attemptedKidnappings$Incident.State))

# convert factors into characters
usgeo$STUSPS = as.character(usgeo$STUSPS)
kidnapping_table_by_state$Var1 = as.character(kidnapping_table_by_state$Var1)

# order the columns by state
usgeo = usgeo[order(usgeo$STUSPS),]
kidnapping_table_by_state = kidnapping_table_by_state[order(kidnapping_table_by_state$Var1),]

# get the states that are not in both columns
removed_states = usgeo$STUSPS[!usgeo$STUSPS %in% kidnapping_table_by_state$Var1]
# remove these states
usgeo = usgeo[!usgeo$STUSPS %in% removed_states,]

# make sure the column names are the same
colnames(kidnapping_table_by_state)[1] = "STUSPS"

# merge the data
usmap = append_data(usgeo, kidnapping_table_by_state, key.shp = "STUSPS", key.data = "STUSPS")

# create the leaflet palette
mypalette = colorNumeric(palette = "Reds", domain = usmap$Freq)

# create the popup window
us_popup = paste0("<b>State: ", usmap$NAME, "</b></br>", "Incidents: ", usmap$Freq)

# display the map
leaflet(usmap, width = "100%") %>%
  setView(lng = -97, lat = 42, zoom = 3) %>%
  addProviderTiles("CartoDB.Positron") %>%
  addPolygons(stroke = FALSE,
              smoothFactor = 0.2,
              fillOpacity = 0.8,
              popup = us_popup,
              color = ~mypalette(usmap$Freq)
              )

This heat map can be helpful for people to understand where to invest resources for preventative kidnapping measures across the country. Knowing which locations need the most help can be an important step when figuring out how to fight against incidents like these. This type of visualization can also be helpful in the sense that the police can use this map to get an understanding of the scope of the problem at hand. Knowing the magnitude of incidents that take place along with the spread across a geographical area may be able to assist police officers in their investigative work.

References

(n.d.). Retrieved November 30, 2017, from https://rstudio.github.io/leaflet/

Branch, G. P. (2012, September 01). Cartographic Boundary Shapefiles. Retrieved November 30, 2017, from https://www.census.gov/geo/maps-data/data/tiger-cart-boundary.html

Machlis, S. (2017, October 31). Create maps in R in 10 (fairly) easy steps. Retrieved November 30, 2017, from https://www.computerworld.com/article/3038270/data-analytics/create-maps-in-r-in-10-fairly-easy-steps.html?page=2