Introduction and Research Question
According to the FBI, index crime in the United States includes violent crime and property crime. Violent crime consists of four criminal offenses: murder and non-negligent manslaughter, rape, robbery, and aggravated assault; property crime consists of burglary, larceny, motor vehicle theft, and arson. According to Pew Research Center, violent crime in the U.S. has fallen sharply over the past quarter century. Using the FBI numbers, the violent crime rate fell 48% between 1993 and 2016. In a survey in late 2016, 57% of registered voters said crime in the U.S. had gotten worse since 2008, even though BJS and FBI data show that violent and property crime rates declined by double-digit percentages during that span.The FBI’s data show big differences from state to state and city to city. In 2016, there were more than 600 violent crimes per 100,000 residents in Alaska, Nevada, New Mexico and Tennessee. By contrast, Maine, New Hampshire and Vermont had rates below 200 violent crimes per 100,000 residents. For this assignment, I used the FBI Crime Data 2016 data set to retrieve county-level information. The crime data originates from the U.S. Department of Justice- Federal Bureau of Investigation (FBI). FBI collects the data through the Uniform Crime Reporting (UCR) program and provides the volume of violent and property crimes as reported by law enforcement agencies that contributed data to the UCR program.The purpose of this analysis is to determine the differences in violent crimes among counties in the U.S. by mapping spatial data. I suspect states such as Alaska, Nevada and New Mexico to have the highest levels of violent crimes and Maine and New Hampshire to have low levels based on the research and statistics from Pew Research Center.
Importing and loading the data set
library(readr)
Crime_Data<-read_csv("C:\\users\\Sangita Roy\\Desktop\\crime_data1.csv")
head(Crime_Data)
Cleaning up the data set
library(magrittr)
library(dplyr)
Crime2<-Crime_Data%>%
rename(County=Geo_NAME,
Total_Crimes=SE_T003_001,
Total_Violent_Crimes=SE_T003_002,
Total_Property_Crimes=SE_T003_003)%>%
select(Geo_FIPS,Geo_STATE,Geo_NATION, County,Total_Crimes,Total_Violent_Crimes, Total_Property_Crimes)%>%
mutate(Percent_Violent=(Total_Violent_Crimes/Total_Crimes)*100)%>%
mutate(Geo_FIPS=as.integer(Geo_FIPS))
head(Crime2)
American Nations data set
am_nations <-read_csv("C:\\users\\Sangita Roy\\Desktop\\am_nations.csv")
head(am_nations)
Reading the map file
library(sf)
ct_map <- st_read("C:\\Users\\Sangita Roy\\Desktop\\tl_2016_us_county.shp", stringsAsFactors = FALSE)
Reading layer `tl_2016_us_county' from data source `C:\Users\Sangita Roy\Desktop\tl_2016_us_county.shp' using driver `ESRI Shapefile'
Simple feature collection with 3233 features and 17 fields
geometry type: MULTIPOLYGON
dimension: XY
bbox: xmin: -179.2311 ymin: -14.60181 xmax: 179.8597 ymax: 71.44106
epsg (SRID): 4269
proj4string: +proj=longlat +datum=NAD83 +no_defs
names(ct_map)
[1] "STATEFP" "COUNTYFP" "COUNTYNS" "GEOID" "NAME" "NAMELSAD"
[7] "LSAD" "CLASSFP" "MTFCC" "CSAFP" "CBSAFP" "METDIVFP"
[13] "FUNCSTAT" "ALAND" "AWATER" "INTPTLAT" "INTPTLON" "geometry"
names(Crime_Data)
[1] "Geo_FIPS" "Geo_NAME" "Geo_QNAME" "Geo_NATION" "Geo_STATE"
[6] "Geo_COUNTY" "SE_T003_001" "SE_T003_002" "SE_T003_003"
Merging the three data sets
library(dplyr)
Crime2 <- Crime2 %>%
mutate(fips = Geo_FIPS, STATEFP=Geo_STATE)
ct_map <- ct_map %>%
mutate(fips = parse_integer(GEOID))
am_nations<- am_nations%>%
mutate(fips= fips_code)
comb_data <- ct_map %>%
left_join(Crime2, by = "fips", "STATEFP")
comb_data<- comb_data %>%
left_join(am_nations, by = "fips")
Mapping the percent of violent crimes
library(tmap)
tm_shape(comb_data) + tm_polygons("Percent_Violent")

Excluding Alaska and Hawaii
comb_data_sub <- comb_data %>%
filter(STATEFP.x != "02") %>%
filter(STATEFP.x != "15") %>%
filter(STATEFP.x != "60") %>%
filter(STATEFP.x != "66") %>%
filter(STATEFP.x != "69") %>%
filter(STATEFP.x != "72") %>%
filter(STATEFP.x != "78")
tm_shape(comb_data_sub, projection = 2163) + tm_polygons("Percent_Violent")

Changing color
comb_data_sub <- comb_data_sub %>%
mutate(redblue = Percent_Violent - mean(Crime2$Percent_Violent, na.rm=TRUE ))
tm_shape(comb_data_sub, projection = 2163) + tm_polygons("redblue", palette = "-RdBu")

Adding state borders
library(tmaptools)
us_states <- comb_data_sub %>%
aggregate_map(by = "STATEFP.x")
tm_shape(comb_data_sub, projection = 2163) + tm_polygons("redblue", palette = "-RdBu") +
tm_shape(us_states) + tm_borders(lwd = .36, col = "black", alpha = 1)

Highlighting the state line
tm_shape(comb_data_sub, projection = 2163) + tm_polygons("redblue", palette = "-RdBu", border.col = "grey", border.alpha = .4) +
tm_shape(us_states) + tm_borders(lwd = .36, col = "black", alpha = 1)

Highlighting the state lines allows the us to see the differences in the percent of violent crimes among different counties and states more clearly. It is easier to locate the name of the state where there is low or high levels of violent crimes in the U.S. According to this spatial mapping of the 2016 FBI data set, the red indicates high levels of violent crimes whereas the light blue indicates low levels of violent crimes. The states which have high level of violent crimes include New York, Montana, South Dakota, Pennsylvania, New Hampshire, etc. The map is covered with a lot of light blue such as Texas, Washington, Maine, etc indicating low levels of violent crime. This spatial mapping confirms the research from Pew Research Center which stated that violent crimes has declined over the years. The states Pew mentioned with high and low levels of violent crimes also match with my findings.
Showing the 11 American Nations
library(tmap)
am_nations <- comb_data_sub %>%
aggregate_map(by = "AN_KEY")
tm_shape(comb_data_sub, projection = 2163) + tm_polygons("redblue", palette = "-RdBu", border.col = "grey", border.alpha = .4) +
tm_shape(am_nations) + tm_borders(lwd = .50, col = "black", alpha = 1)

Award-winning author, Colin Woodard identifies 11 distinct cultures that have historically divides the US. High level of violent crimes are in located in The Far West, New Netherland, and Yankeedom, according to the 11 American Nations. Nations with low levels of violent crimes include Greater Appalachia, Deep South, and El Norte.
Boundary lines are clearer
tm_shape(am_nations, projection = 2163) + tm_polygons(col = "MAP_COLORS") + tm_borders(lwd = .50, col = "black", alpha = 1)

Using tigris package to retrieve county-level spatial shapefiles
library(tigris)
options(tigris_class = "sf")
t_county <- counties(cb = TRUE)
names(t_county)
[1] "STATEFP" "COUNTYFP" "COUNTYNS" "AFFGEOID" "GEOID" "NAME"
[7] "LSAD" "ALAND" "AWATER" "geometry"
Merging the data sets
t_comb_data <- t_county %>%
mutate(fips = parse_integer(GEOID)) %>%
left_join(Crime2, by = "fips")
t_comb_data_sub <- t_comb_data %>%
filter(STATEFP.x != "02") %>%
filter(STATEFP.x != "15") %>%
filter(STATEFP.x != "60") %>%
filter(STATEFP.x != "66") %>%
filter(STATEFP.x != "69") %>%
filter(STATEFP.x != "72") %>%
filter(STATEFP.x != "78")
Non-Spatial Mapping
The ecological regression produced for this analysis demonstrates the relationship between the number of total crimes and the percent of violent crimes that occurs in the counties of the U.S. The ecological regression below gives the mean of total crimes and violent crimes for the counties. The regression evaluated is significant and the mean of total crimes is decreased by 0.009 for every percent increase of violent crimes in the counties. The issue with evaluating violent crime rate on a ecological level is that it cannot be evaluated on an individual level, which is also known as ecological fallacy.
Crime_er<- Crime2 %>%
group_by(STATEFP) %>%
summarise(mean_p = mean(Percent_Violent, na.rm = TRUE), mean_s = mean(Total_Crimes, na.rm = TRUE))
ecoobs <- lm(mean_p ~ mean_s, data = Crime_er)
summary(ecoobs)
Call:
lm(formula = mean_p ~ mean_s, data = Crime_er)
Residuals:
Min 1Q Median 3Q Max
-16.609 -6.182 -3.118 3.724 58.103
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 23.572648 3.625822 6.501 7.54e-08 ***
mean_s -0.009241 0.004484 -2.061 0.0455 *
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 11.85 on 42 degrees of freedom
(2 observations deleted due to missingness)
Multiple R-squared: 0.09183, Adjusted R-squared: 0.07021
F-statistic: 4.247 on 1 and 42 DF, p-value: 0.04555
ggplot
The plot below, shows a statistical observation of the percent of violent crimes among the counties within its State. This is exhibits non-spatial data which is independent of all geometric considerations and is one dimensional.
library(lemon)
ggplot(data = Crime2, aes(x = STATEFP, y = Percent_Violent)) +
geom_point() +
coord_capped_cart(bottom='both', left='none') +
theme_light() + theme(panel.border=element_blank(), axis.line = element_line())

Conclusion
Mapping spatial data enabled me to visualize the locations and areas with high and low violent crimes more accurately and apply to the 11 American Nations. The maps confirmed that the percent of violent crimes has declined in 2016 as there was not much red in the maps to indicate high levels of violent crimes among the counties in different states. However, it is important to note that most crimes are not reported to police, and most reported crimes are not solved. In its annual survey, BJS asks victims of crime whether they reported that crime to police. In 2016, only 42% of the violent crime tracked by BJS was reported to police. There are a variety of reasons crime might not be reported, including a feeling that police “would not or could not do anything to help” or that the crime is “a personal issue or too trivial to report,” according to BJS.
Spatial vs. Non-Spatial
Spatial data is data that define a location on the surface of the Earth. These are in the form of graphic primitives that are usually either points, lines, polygons or pixels. Spatial data includes location, shape, size and orientation. The differences in violent crime rate among counties in the U.S. can be seen through location, shape and borderlines of the map in terms of which areas have high or low violent crime rates. On the other hand, non-spatial data is related to a specific, precisely defines location. In other words, non-spatial data is that information which is independent of all geometric considerations and are numbers, characters or logical type. The fundamental difference between the two is that spatial data are generally multi-dimensional and auto-correlated whereas non-spatial data are generally one-dimension and independent. For large number of locations, the scope of data can be used to understand the name of the location having the highest or lowest violent-crime rate of all locations in my data set. Representation can be mapped using different colors to identify hot spots as per requirements.Spatial mapping is more appealing to the eye because the visualization is more clear with state lines and borders whereas non-spatial mapping is more of statistical numbers that can be intricate and confusing to interpret accurately.
Pg 32 of Lecture #10
“cb=TRUE” is the default geometry used by tidycensus and subset ID for the desired metros with a resolution of 1:500k. If I change it to “cb=FALSE”, it defaults to FALSE (the most detailed TIGER/Line File). An additional 8 variables appeared:
- “CLASSFP”"
- “MTFCC”
- “CSAFP”
- “METDIVP”
- “FUNCSTAT”
- “INTPLAT”
- “INTPLON”
References
Gramlich, John. “5 Facts about Crime in the U.S.” Pew Research Center, 30 Jan. 2018, www.pewresearch.org/fact-tank/2018/01/30/5-facts-about-crime-in-the-u-s/#.
---
title: "#**Mapping Spatial Data**"
author: "Sangita Roy"
date: "April 22, 2018"
output: html_notebook
---
###**Introduction and Research Question**
According to the FBI, index crime in the United States includes violent crime and property crime. Violent crime consists of four criminal offenses: murder and non-negligent manslaughter, rape, robbery, and aggravated assault; property crime consists of burglary, larceny, motor vehicle theft, and arson. According to Pew Research Center, violent crime in the U.S. has fallen sharply over the past quarter century. Using the FBI numbers, the violent crime rate fell 48% between 1993 and 2016. In a survey in late 2016, 57% of registered voters said crime in the U.S. had gotten worse since 2008, even though BJS and FBI data show that violent and property crime rates declined by double-digit percentages during that span.The FBI's data show big differences from state to state and city to city. In 2016, there were more than 600 violent crimes per 100,000 residents in Alaska, Nevada, New Mexico and Tennessee. By contrast, Maine, New Hampshire and Vermont had rates below 200 violent crimes per 100,000 residents. 
For this assignment, I used the FBI Crime Data 2016 data set to retrieve county-level information. The crime data originates from the U.S. Department of Justice- Federal Bureau of Investigation (FBI). FBI collects the data through the Uniform Crime Reporting (UCR) program and provides the volume of violent and property crimes as reported by law enforcement agencies that contributed data to the UCR program.The purpose of this analysis is to determine the differences in violent crimes among counties in the U.S. by mapping spatial data. I suspect states such as Alaska, Nevada and New Mexico to have the highest levels of violent crimes and Maine and New Hampshire to have low levels based on the research and statistics from Pew Research Center.


###Importing and loading the data set
```{r message=FALSE, warning=FALSE}
library(readr)

Crime_Data<-read_csv("C:\\users\\Sangita Roy\\Desktop\\crime_data1.csv")
head(Crime_Data)
```

###Cleaning up the data set
```{r message=FALSE, warning=FALSE}
library(magrittr)
library(dplyr)
Crime2<-Crime_Data%>%
  rename(County=Geo_NAME,
       Total_Crimes=SE_T003_001,
       Total_Violent_Crimes=SE_T003_002,
       Total_Property_Crimes=SE_T003_003)%>%
  select(Geo_FIPS,Geo_STATE,Geo_NATION, County,Total_Crimes,Total_Violent_Crimes, Total_Property_Crimes)%>%
  mutate(Percent_Violent=(Total_Violent_Crimes/Total_Crimes)*100)%>%
  mutate(Geo_FIPS=as.integer(Geo_FIPS))
head(Crime2)
```

###American Nations data set
```{r message=FALSE, warning=FALSE}
am_nations <-read_csv("C:\\users\\Sangita Roy\\Desktop\\am_nations.csv")
head(am_nations)
```

###Reading the map file
```{r message=FALSE, warning=FALSE}
library(sf)
ct_map <- st_read("C:\\Users\\Sangita Roy\\Desktop\\tl_2016_us_county.shp", stringsAsFactors = FALSE)
```

```{r}
names(ct_map)
```

```{r}
names(Crime_Data)
```

###Merging the three data sets
```{r}
library(dplyr)
Crime2 <- Crime2 %>% 
  mutate(fips = Geo_FIPS, STATEFP=Geo_STATE)
ct_map <- ct_map %>% 
  mutate(fips = parse_integer(GEOID))
am_nations<- am_nations%>%
mutate(fips= fips_code)

comb_data <- ct_map %>% 
  left_join(Crime2, by = "fips", "STATEFP")

comb_data<- comb_data %>% 
  left_join(am_nations, by = "fips")
```

###Mapping the percent of violent crimes 
```{r message=FALSE, warning=FALSE}
library(tmap)
tm_shape(comb_data) + tm_polygons("Percent_Violent")
```

###Excluding Alaska and Hawaii
```{r}
comb_data_sub <- comb_data %>% 
  filter(STATEFP.x != "02") %>% 
  filter(STATEFP.x != "15") %>% 
  filter(STATEFP.x != "60") %>% 
  filter(STATEFP.x != "66") %>% 
  filter(STATEFP.x != "69") %>% 
  filter(STATEFP.x != "72") %>% 
  filter(STATEFP.x != "78")
```


```{r}
tm_shape(comb_data_sub, projection = 2163) + tm_polygons("Percent_Violent")
```

###Changing color
```{r}
comb_data_sub <- comb_data_sub %>% 
mutate(redblue = Percent_Violent - mean(Crime2$Percent_Violent, na.rm=TRUE ))
```


```{r}
tm_shape(comb_data_sub, projection = 2163) + tm_polygons("redblue", palette = "-RdBu")
```

###Adding state borders
```{r message=FALSE, warning=FALSE}
library(tmaptools)
us_states <- comb_data_sub %>% 
  aggregate_map(by = "STATEFP.x")
```

```{r}
tm_shape(comb_data_sub, projection = 2163) + tm_polygons("redblue", palette = "-RdBu") + 
  tm_shape(us_states) + tm_borders(lwd = .36, col = "black", alpha = 1)
```

###Highlighting the state line
```{r}
tm_shape(comb_data_sub, projection = 2163) + tm_polygons("redblue", palette = "-RdBu", border.col = "grey", border.alpha = .4) + 
  tm_shape(us_states) + tm_borders(lwd = .36, col = "black", alpha = 1)
```
Highlighting the state lines allows the us to see the differences in the percent of violent crimes among different counties and states more clearly. It is easier to locate the name of the state where there is low or high levels of violent crimes in the U.S. According to this spatial mapping of the 2016 FBI data set, the red indicates high levels of violent crimes whereas the light blue indicates low levels of violent crimes. The states which have high level of violent crimes include New York, Montana, South Dakota, Pennsylvania, New Hampshire, etc. The map is covered with a lot of light blue such as Texas, Washington, Maine, etc indicating low levels of violent crime. This spatial mapping confirms the research from Pew Research Center which stated that violent crimes has declined over the years. The states Pew mentioned with high and low levels of violent crimes also match with my findings.


###Showing the 11 American Nations
```{r}
library(tmap)
am_nations <- comb_data_sub %>% 
  aggregate_map(by = "AN_KEY")
```

```{r}
tm_shape(comb_data_sub, projection = 2163) + tm_polygons("redblue", palette = "-RdBu", border.col = "grey", border.alpha = .4) + 
  tm_shape(am_nations) + tm_borders(lwd = .50, col = "black", alpha = 1)
```
Award-winning author, Colin Woodard identifies 11 distinct cultures that have historically divides the US. High level of violent crimes are in located in The Far West, New Netherland, and Yankeedom, according to the 11 American Nations. Nations with low levels of violent crimes include Greater Appalachia, Deep South, and El Norte.

###Boundary lines are clearer
```{r message=FALSE, warning=FALSE}
tm_shape(am_nations, projection = 2163) + tm_polygons(col = "MAP_COLORS") + tm_borders(lwd = .50, col = "black", alpha = 1)
```


###Using tigris package to retrieve county-level spatial shapefiles
```{r message=FALSE, warning=FALSE}
library(tigris)
options(tigris_class = "sf")
t_county <- counties(cb = TRUE)
names(t_county)
```


###Merging the data sets
```{r}
t_comb_data <- t_county %>% 
  mutate(fips = parse_integer(GEOID)) %>% 
  left_join(Crime2, by = "fips")
```

```{r}
t_comb_data_sub <- t_comb_data %>% 
  filter(STATEFP.x != "02") %>% 
  filter(STATEFP.x != "15") %>% 
  filter(STATEFP.x != "60") %>% 
  filter(STATEFP.x != "66") %>% 
  filter(STATEFP.x != "69") %>% 
  filter(STATEFP.x != "72") %>% 
  filter(STATEFP.x != "78")
```

###Mapping county-level information using the tmap package
```{r}
library(tmaptools)
us_states <- t_comb_data_sub %>% 
  aggregate_map(by = "STATEFP.x")
am_nations <- t_comb_data_sub %>% 
  aggregate_map(by = "STATEFP.x")
t_comb_data_sub <- t_comb_data_sub %>% 
 mutate(redblue = Percent_Violent - mean(Crime2$Percent_Violent, na.rm=TRUE ))
```

```{r}
tm_shape(t_comb_data_sub, projection = 2163) + tm_polygons("redblue", palette = "-RdBu", border.col = "grey", border.alpha = .4) + 
  tm_shape(am_nations) + tm_borders(lwd = .50, col = "black", alpha = 1)
```

###Non-Spatial Mapping
The ecological regression produced for this analysis demonstrates the relationship between the number of total crimes and the percent of violent crimes that occurs in the counties of the U.S. The ecological regression below gives the mean of total crimes and violent crimes for the counties. The regression evaluated is significant and the mean of total crimes is decreased by 0.009 for every percent increase of violent crimes in the counties. The issue with evaluating violent crime rate on a ecological level is that it cannot be evaluated on an individual level, which is also known as ecological fallacy.
```{r}
Crime_er<- Crime2 %>% 
  group_by(STATEFP) %>% 
  summarise(mean_p = mean(Percent_Violent, na.rm = TRUE), mean_s = mean(Total_Crimes, na.rm = TRUE))

ecoobs <- lm(mean_p ~ mean_s, data = Crime_er)
summary(ecoobs)
```

###ggplot
The plot below, shows a statistical observation of the percent of violent crimes among the counties within its State. This is exhibits non-spatial data  which is independent of all geometric considerations and is one dimensional. 
```{r message=FALSE, warning=FALSE}
library(lemon)
ggplot(data = Crime2, aes(x = STATEFP, y = Percent_Violent)) +
    geom_point() +
  coord_capped_cart(bottom='both', left='none') +
  theme_light() + theme(panel.border=element_blank(), axis.line = element_line())
```


###Conclusion
Mapping spatial data enabled me to visualize the locations and areas with high and low violent crimes more accurately and apply to the 11 American Nations. The maps confirmed that the percent of violent crimes has declined in 2016 as there was not much red in the maps to indicate high levels of violent crimes among the counties in different states. However, it is important to note that most crimes are not reported to police, and most reported crimes are not solved. In its annual survey, BJS asks victims of crime whether they reported that crime to police. In 2016, only 42% of the violent crime tracked by BJS was reported to police. There are a variety of reasons crime might not be reported, including a feeling that police "would not or could not do anything to help" or that the crime is "a personal issue or too trivial to report," according to BJS.

###Spatial vs. Non-Spatial
Spatial data is data that define a location on the surface of the Earth. These are in the form of graphic primitives that are usually either points, lines, polygons or pixels. Spatial data includes location, shape, size and orientation. The differences in violent crime rate among counties in the U.S. can be seen through location, shape and borderlines of the map in terms of which areas have high or low violent crime rates. On the other hand, non-spatial data is related to a specific, precisely defines location. In other words, non-spatial data is that information which is independent of all geometric considerations and are numbers, characters or logical type. The fundamental difference between the two is that spatial data are generally multi-dimensional and auto-correlated whereas non-spatial data are generally one-dimension and independent. For large number of locations, the scope of data can be used to understand the name of the location having the highest or lowest violent-crime rate of all locations in my data set. Representation can be mapped using different colors to identify hot spots as per requirements.Spatial mapping is more appealing to the eye because the visualization is more clear with state lines and borders whereas non-spatial mapping is more of statistical numbers that can be intricate and confusing to interpret accurately.

###Pg 32 of Lecture #10
"cb=TRUE" is the default geometry used by tidycensus and subset ID for the desired metros with a resolution of 1:500k. If I change it to "cb=FALSE", it defaults to FALSE (the most detailed TIGER/Line File). An additional 8 variables appeared:

+ "CLASSFP""
+ "MTFCC"
+ "CSAFP"
+ "METDIVP"
+ "FUNCSTAT"
+ "INTPLAT"
+ "INTPLON"

###**References**
Gramlich, John. "5 Facts about Crime in the U.S." Pew Research Center, 30 Jan. 2018, www.pewresearch.org/fact-tank/2018/01/30/5-facts-about-crime-in-the-u-s/#.
