Introduction

Violent crime in the United States refers to murder, rape , robbery, and aggravated assault. Violent crime in the United States has fallen over the last two decades. However, the number of reported violent crimes rose in 2015. Among the various types of violent crime reported in the United State, aggravated assault is the most common. In goal of this analysis is to use the Geo-spatial data and find meaningful results.

Data

The data is county level data of 3177 counties with 161 different types of variables related to violent and non-violent crime. The data is obtained from the website “Social Explorer”. The other type of data (shape-file) was downloaded from the US Census Bureau’s website.

Statistical Analysis.

In this analysis, we will focus on spatial mapping of the data, instead of statistical analysis. Certain data types are especially one which is used in this assignment is more relevant for spatial analysis or mapping than for statistical analysis. We will use tmap package to plot the data on geographic shape files to get virtual representation of prevalence and distribution of violent crime in the US. First, we will start with common per-analysis practice of cleaning the data and converting the variables. Second, we will start with the non-spatial analysis to tell that why the non-spatial analysis is not a recommended way to deal with this type of data. Third, we will end our analysis with the spatial mapping of the data and conclusion.

Pre-Analysis Coding

The pre-analysis coding is nothing more than the important data cleaning procedures which ones needs to take before proceeding towards analysis. Few variables where created and percentages were calculated for the violent crimes to be used in a spatial data.

Reading Data

c1 <- read_csv("D:/Data Sets/crime.csv")

Reading the Map File

map <- st_read('/Data Sets/tl_2014_us_county/tl_2014_us_county.shp', stringsAsFactors = FALSE)
## Reading layer `tl_2014_us_county' from data source `D:\Data Sets\tl_2014_us_county\tl_2014_us_county.shp' using driver `ESRI Shapefile'
## Simple feature collection with 3233 features and 17 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -179.2311 ymin: -14.60181 xmax: 179.8597 ymax: 71.44106
## epsg (SRID):    4269
## proj4string:    +proj=longlat +datum=NAD83 +no_defs

Caculating Percentages and Rate/100,000 for Specific Variables(s)

c1 <- mutate(c1, Violent.Crime.Rate = Total.Violent.Crimes/Total.Population * 100000)
  c1 <- mutate(c1,  Murders.Percent = c1$Murders/ c1$Total.Violent.Crimes * 100)
    c1 <- mutate(c1,  Robberies.Percent = c1$Robberies/c1$Total.Violent.Crimes * 100)
      c1 <- mutate(c1,  Rape.Percent = c1$Rapes/c1$Total.Violent.Crimes * 100)
        c1 <- mutate(c1, Aggravated.Assualt.Percent = c1$Aggravated.Assaults/c1$Total.Violent.Crimes * 100)

Coverting Variables to Numeric

c1$Total.Violent.Crimes <- as.numeric(c1$Total.Violent.Crimes)
  c1$Murders.Percent  <- as.numeric(c1$Murders.Percent)
    c1$Robberies.Percent <- as.numeric(c1$Robberies.Percent)
      c1$Rape.Percent <- as.numeric(c1$Rape.Percent)
        c1$Aggravated.Assualt.Percent <- as.numeric(c1$Aggravated.Assualt.Percent)

Merging Shape File and Data

c2 <- c1 %>% 
  mutate(fips = Geo_FIPS)

map <- map %>% 
  mutate(fips = parse_integer(GEOID))

comb_data <- map %>% 
  left_join(c2, by = "fips")

Omiting Few States

c3 <- comb_data %>% 
  filter(STATEFP != "02") %>% 
    filter(STATEFP != "15") %>% 
      filter(STATEFP != "60") %>% 
        filter(STATEFP != "66") %>% 
          filter(STATEFP != "69") %>% 
            filter(STATEFP != "72") %>% 
              filter(STATEFP != "78")

Setting Up State Borders

c4 <- c3 %>% 
  aggregate_map(by = "STATEFP")

Non-Spatial Analysis

Overall Averages for Reported Crime.

Let’s start by analyzing the average reported crime for all of the counties in the US. The data shows that on average: aggravated assault are the most occurring crime in the US. Similar, there were 101 robbers on average reported in the US in 2014.

c1 %>%
  summarise(mean_murders = mean(Murders, na.rm=TRUE),   
    mean_robberies = mean(Robberies, na.rm=TRUE),
      mean_rapes = mean(Rapes, na.rm=TRUE),
        mean_aggravated.assaults = mean(Aggravated.Assaults, na.rm=TRUE))
## # A tibble: 1 x 4
##   mean_murders mean_robberies mean_rapes mean_aggravated.assaults
##          <dbl>          <dbl>      <dbl>                    <dbl>
## 1         4.40           101.       34.3                     227.

Regression Analysis

1: Robberies & Murders

The figure below shows the reported cases of violent crime, where x=robberies and y=murders. It has been observed that robberies increases the likelihood of murders because of resistance or retaliation which can lead to murder or fatal injuries. The non-spatial analysis clearly shows that there is some sort of relation between the two type of crimes but there are outline and the data is very much saturated.

ggplot(data = c1, aes(x = Robberies, y = Murders, na.rm=TRUE)) +
  geom_point(color='black') +
  geom_smooth(method = "lm", se = TRUE)+
  theme_bw()

2: (Log) of Aggravated Assaults & Muders

Similarly, for figure, to get bring the aggravated assaults and murders on the same scale we took the log of aggravated assaults. But instead of giving us more textured results, the obtained does not give us clear of of the association between murders and aggravated assaults. At some point, the for 100000 aggravated assault the likelihood of murders is about 390. However, this is not appropriate.

ggplot(data = c1, aes(x = log(Aggravated.Assaults),y = Murders), na.rm=TRUE) +
geom_point(color='black') + geom_smooth(method = "lm", se = TRUE)+theme_bw()

3: States & Total Incidents of Violent Crime.

In this figure, we used ggplot function to see variation of violent between different states. However, it very difficult to follow which state and has exactly high violent crime incidents.

ggplot(c3, aes(x=STATEFP, y=log(Total.Violent.Crimes))) + 
  geom_point() + 
  coord_capped_cart(bottom='both', left='none') +
  theme_light() 

Linear Models

To analyze the association statistically, we will use linear model. The model#1 shows that for each incident of robbery the likelihood of murder is increases by 0.32. However, this can be very tricky to interpret and requires very careful understanding of dependent and independent variable.

The non-spatial analysis for this type of data is not appropriate. Because this data contains Geo-spatial information which can be mapped to a geographical shape file to see visual representation of results. Therefore, our next step, in this analysis would be map this data on a shape file.

============================================= Model 1 Model 2
——————————————— (Intercept) 1.10 *** -0.40 ** (0.13) (0.14)
Robberies 0.03 ***
(0.00)
Aggravated.Assaults 0.02 (0.00)
——————————————— R^2 0.88 0.86
Adj. R^2 0.88 0.86
Num. obs. 3176 3176
RMSE 7.30 7.77
=============================================
p < 0.001, ** p < 0.01, * p < 0.05

Spatial Mapping

The spatial mapping is done through tmap and Tigris package. The goal of this mapping task is to plot the data which shows the appropriate representation of counties within the state and

Setting Colour for the Choropleth Map

display.brewer.all(type="seq")

pal1 <- brewer.pal(5, "YlOrRd")
pal2 <- brewer.pal(5, "OrRd")

Using Tigris Package (Shape-file)

Note: If cb=True than it will acquire a less detailed file from the sever, however, if it is false it download a detailed file for Tigris.

Map1 <- tm_shape(c3, projection = 2163) + 
  tm_polygons("Robberies.Percent",
              palette=pal1,
              breaks=c(0,5,10,15,20,Inf)) + 
  tm_shape(c4) + 
  tm_borders(lwd = .36, 
             col = "black", 
             alpha = 1)+
  tm_layout(title = " 2014, Incidents of Robberies in Percent by County", 
  title.position = c("center", "top"), 
  legend.position = c("left", "bottom"), 
  frame = FALSE, 
  inner.margins = c(0.1, 0.1, 0.05, 0.05))+
  tm_credits("2014 UCR Crime Data \n2014 US Census Bureau", 
             position=c("center", "bottom"))
Map2 <- tm_shape(c3, 
         projection = 2163) + 
  tm_polygons("Murders.Percent",
              breaks=c(0,1,2,3,4,5,6,7,8,9,10,Inf), 
              palette=pal2,
              border.col = "black", 
              border.alpha = .5) +
  tm_shape(c4) + 
  tm_borders(lwd = .50, 
             col = "black",
             alpha = 1)+
  tm_layout(title = "2014, Murders in US States by County, Percent", 
  title.position = c("center", "top"), 
  legend.position = c("left", "bottom")) 
Map3 <- tm_shape(t_comb_data_sub, 
         projection = 2163) + 
  tm_polygons("Rape.Percent",
              breaks=c(0,4,6,8,10,12,14,16,18,20,Inf), 
              palette=pal2,
              border.col = "black", 
              border.alpha = .5) +
  tm_shape(us_states) + 
  tm_borders(lwd = .50, 
             col = "black",
             alpha = 1)+
  tm_layout(title = "2014, Rapes in US States by County, Percent", 
  title.position = c("center", "top"), 
  legend.position = c("left", "bottom"))+
  tm_credits("2014 UCR Crime Data \n2014 US Census Bureau", 
             position=c("center", "bottom"))
Map4 <- tm_shape(t_comb_data_sub, 
         projection = 2163) + 
  tm_polygons("Aggravated.Assualt.Percent",
              breaks=c(0,10,20,40,60,70,80,100), 
              palette=pal1,
              border.col = "black", 
              border.alpha = .5) +
  tm_shape(us_states) + 
  tm_borders(lwd = .50, 
             col = "black",
             alpha = 1)+
  tm_layout(title = "2014, Aggravated_Assualt in US States by County, Percent", 
  title.position = c("center", "top"), 
  legend.position = c("left", "bottom"))+
  tm_credits("2014 UCR Crime Data \n2014 US Census Bureau", 
             position=c("center", "bottom"))

Map#1: 2014 “Incidents of Robberies in Percent by County in the United States”

The map below shows the percentage of robberies occurred in each county of the US. The percentage represent the amount of incidents of robbery reported divide by the total amount of violent crime reported in that county multiplied by 100. For instance: In California the among total violent crime robbery accounts for more than 20%.

tmap_mode("plot")
Map1

Map#2: 2014 “Incidents of Muders in Percent by County in the United States”

Map#2 shows percent of murders in the US. It is important to mention then even 5% of murders is huge number when it comes crime statistics. For instance: In Texas, two counties have 9% to 10% of Murders reported which means if the total number of violent crime reported is 500 than there would be 50 incidents of murder. However, the number would be far more than that. That is why for this map our breaks are between 1%. This is an interactive map.

Map#3: 2014 “Incidents of Rapes in Percent by County in the United States”

The reported incidents of rape in the US are still alarming. Most of the mid-western states and counties have high incidents of rapes. Unfortunately, majority of states have at least 20% of incidents of the total violent crime reported as rape.

Map#4: 2014 “Incidents of Aggravated Assaults in Percent by County in the United States”

The most occurring crime in the US is aggravated assault. As we can see from the map below that for every county or state 80% of their overall violent crime is reported to as aggravated assault. The incidents have rose since 2014. However, the data is not available to the general public.

Conclusion

The spatial mapping results clearly show that there has been no decline in most of the violent crimes in the US except for the murder or homicide. Historically, it has been a huge achievement for the people of the US that crime rates have significant fallen. But still the pattern shows the crime in small counties and remote locations still rampant. The Non-spatial mapping in R is very useful way to map Geo-spatial data and produce fruitful results. Obviously, this data is only useful when it comes presentations and illustrative purposes. However, using both spatial and non-spatial analysis is one of the most strongest ways to deliver results.