Three Data Sets for Analysis
The three data sets that will be looked at are detailed below. You can click on any section listed below to jump to it.
I. Food Trucks in San Francisco
II. Poverty by Selected Characteristics in the United States
Before we start with working on any of the three data sets, let’s load the appropriate libraries that will be commonly used throughout.
library(tidyr)
library(dplyr)
library(ggplot2)
library(stringr)
library(rgdal)
library(readxl)Now that we’ve loaded the appropriate libraries, let’s move on to our datasets.
Food Trucks in San Francisco
Data Source: https://data.sfgov.org/Economy-and-Community/Mobile-Food-Facility-Permit/rqzj-sfat
Goal: Using this data set, find where in San Francisco is the highest concentration of food trucks.
Our first step is to read in the csv file downloaded from the data source and to take a brief look at what’s inside.
food_full <- read.csv('Mobile_Food_Facility_Permit.csv')
head(food_full)## locationid Applicant FacilityType cnn
## 1 1222440 Faith Sandwich Push Cart 9090000
## 2 751253 Pipo's Grill Truck 5688000
## 3 735318 Ziaurehman Amini Push Cart 30727000
## 4 364218 The Chai Cart Push Cart 9543000
## 5 735315 Ziaurehman Amini Push Cart 4969000
## 6 773095 Athena SF Gyro Push Cart 30747000
## LocationDescription
## 1 MISSION ST: SHAW ALY to ANTHONY ST (543 - 586)
## 2 FOLSOM ST: 14TH ST to 15TH ST (1800 - 1899)
## 3 MARKET ST: DRUMM ST intersection
## 4 NEW MONTGOMERY ST: AMBROSE BIERCE ST to MISSION ST (77 - 99)
## 5 DRUMM ST: MARKET ST to CALIFORNIA ST (1 - 6)
## 6 MARKET ST: 11TH ST intersection
## Address blocklot block lot permit Status
## 1 560 MISSION ST 3708095 3708 095 18MFF-0108 REQUESTED
## 2 1800 FOLSOM ST 3549083 3549 083 16MFF-0010 REQUESTED
## 3 5 THE EMBARCADERO 0234017 0234 017 15MFF-0159 REQUESTED
## 4 79 NEW MONTGOMERY ST 3707014 3707 014 12MFF-0083 SUSPEND
## 5 1 CALIFORNIA ST 0264004 0264 004 15MFF-0159 REQUESTED
## 6 10 SOUTH VAN NESS AVE 3506004 3506 004 15MFF-0145 REQUESTED
## FoodItems
## 1 Vietnamese sandwiches: various meat rice plates & bowls: vermicelli: spring rolls: sticky rice: Vietnamese Goi: pho: noodles: coffee: various flavored tea : various soda and juices: water
## 2 Tacos: Burritos: Hot Dogs: and Hamburgers
## 3
## 4 Hot Indian Chai (Tea)
## 5
## 6 Gyro pita bread (Lamb or chicken): lamb over rice: chicken over rice: chicken biryani rice: soft drinks
## X Y Latitude Longitude
## 1 6012851 2115275 37.78886 -122.3994
## 2 6007857 2107724 37.76785 -122.4161
## 3 6013917 2117244 37.79433 -122.3958
## 4 6012504 2114927 37.78789 -122.4005
## 5 6013553 2116844 37.79321 -122.3970
## 6 6006927 2110076 37.77426 -122.4195
## Schedule
## 1 http://bsm.sfdpw.org/PermitsTracker/reports/report.aspx?title=schedule&report=rptSchedule¶ms=permit=18MFF-0108&ExportPDF=1&Filename=18MFF-0108_schedule.pdf
## 2 http://bsm.sfdpw.org/PermitsTracker/reports/report.aspx?title=schedule&report=rptSchedule¶ms=permit=16MFF-0010&ExportPDF=1&Filename=16MFF-0010_schedule.pdf
## 3 http://bsm.sfdpw.org/PermitsTracker/reports/report.aspx?title=schedule&report=rptSchedule¶ms=permit=15MFF-0159&ExportPDF=1&Filename=15MFF-0159_schedule.pdf
## 4 http://bsm.sfdpw.org/PermitsTracker/reports/report.aspx?title=schedule&report=rptSchedule¶ms=permit=12MFF-0083&ExportPDF=1&Filename=12MFF-0083_schedule.pdf
## 5 http://bsm.sfdpw.org/PermitsTracker/reports/report.aspx?title=schedule&report=rptSchedule¶ms=permit=15MFF-0159&ExportPDF=1&Filename=15MFF-0159_schedule.pdf
## 6 http://bsm.sfdpw.org/PermitsTracker/reports/report.aspx?title=schedule&report=rptSchedule¶ms=permit=15MFF-0145&ExportPDF=1&Filename=15MFF-0145_schedule.pdf
## dayshours NOISent Approved Received PriorPermit
## 1 Mo-Fr:8AM-3PM 2018-09-25 0
## 2 2016-02-04 0
## 3 2015-12-31 0
## 4 Mo-Su:7AM-6PM 2012-04-03 0
## 5 2015-12-31 0
## 6 We/Th/Fr:6AM-6PM 2015-09-01 0
## ExpirationDate Location
## 1 07/15/2019 12:00:00 AM (37.788864715343, -122.399359351363)
## 2 (37.7678524427181, -122.416104892532)
## 3 03/15/2016 12:00:00 AM (37.7943310032468, -122.395811053023)
## 4 (37.7878896999061, -122.400535326777)
## 5 03/15/2016 12:00:00 AM (37.7932137316634, -122.397043036718)
## 6 (37.77425926306, -122.419485988398)
There are 24 columns in this data frame, most of which are not applicable for our purposes. We also notice that the FoodItems column is colon-separated where it appears their primary food item is first in the list. Because there are so many food types, we’ll only pull the first item for reference.
We’ll also work on keeping only the columns that matter to us, where the status is “Approved”, and where latitude and longitude are values that make sense.
food <- food_full %>%
filter(Status == 'APPROVED', Latitude != 0, Longitude != 0) %>%
select(Applicant, Latitude, Longitude, FoodItems) %>%
separate(FoodItems, sep = ":", into = c('FoodItems')) ## Warning: Expected 1 pieces. Additional pieces discarded in 353 rows [1, 2,
## 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
head(food)## Applicant Latitude Longitude FoodItems
## 1 Senor Sisig 37.79295 -122.3981 Senor Sisig
## 2 Senor Sisig 37.78215 -122.4066 Senor Sisig
## 3 Los 2 Cuates 37.78305 -122.3941 south American-Peruvian food
## 4 Quan Catering 37.74418 -122.3867 Cold Truck
## 5 Anas Goodies Catering 37.72398 -122.3959 Cold Truck
## 6 BH & MT LLC 37.76832 -122.4271 Cold Truck
Now that we have a preliminary data set for exploratory purposes, let’s work on building a map of San Francisco and plot the locations of each approved food truck/cart. Ideally, using the ggmaps library and utilizing the goodle maps api would have been preferable, but it appears they’ve changed their pricing guidelines and it’s no longer free. To substitute this, I’ve downloaded the shapefile for all the neighborhoods in San Francisco instead, which will give us what we need. Let’s build our map.
sf <- readOGR(dsn = "/Users/chesterpoon/Project2/sf",layer = "sf")## OGR data source with driver: ESRI Shapefile
## Source: "/Users/chesterpoon/Project2/sf", layer: "sf"
## with 92 features
## It has 3 fields
sf_df <- fortify(sf)## Regions defined for each Polygons
sf_map <- ggplot() +
geom_polygon(data = sf_df,
aes(x = long, y = lat, group = group),
color = 'black', fill = '#fce3c4', size = .05) +
theme(rect = element_blank())
sf_map + geom_point(data = food,
aes(x = Longitude, y = Latitude),
color = '#37a347')It appears that most of our food vendors are concentrated along the eastern part of the city. A little more research reveals that these are the financial sectors of San Francisco.
Now, let’s take a look to see what the most common food items are.
nfood <- table(toupper(food$FoodItems))
nfood <- data.frame(sort(nfood, decreasing = TRUE)[1:5])
colnames(nfood) <- c("Foodtype", "n")
ggplot(nfood,aes(x = Foodtype, y = n, fill = Foodtype)) +
geom_bar(stat = 'identity') +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
axis.title.x = element_blank())It’s a bit unclear what the meaning of “Cold Truck” is. It’s also somewhat misleading that a food truck sells everything but hot dogs. Clearly there are limitations with the data stemming from the data collection process. In any case, the most common type of food truck/cart is “Cold Truck”.
For fun, let’s pretend we would like to start a food truck business. Let’s see where the top 3 types of food trucks (“Cold Truck”, “Burgers”, “Hot Dogs”) are located.
top3 <- food %>%
filter(toupper(FoodItems) == "COLD TRUCK" |
toupper(FoodItems) == "BURGERS" |
toupper(FoodItems) == "HOT DOGS")
sf_map +
geom_point(data = top3,
aes(x=Longitude, y=Latitude, colour = toupper(FoodItems))) +
scale_color_hue("Legend")Conclusion & Final Thoughts
From our analysis, a vast majority of food trucks/carts exist on the eastern side of San Francisco. The most common type of food that is sold are “Cold Truck”, Burgers and hot dogs. If we were to open our own imaginary food truck/cart, we would probably do well with a good burger truck in the northeast corner of San Francisco. This is assuming that business would not do so well on the western section of the city, which could explain why there is such a dearth of food trucks there.
A better analysis could occur if the data collection was better with clarifying information on the meaning of “Cold Truck”. My suspicion is that vendors input the food type they sell as free text when completing their application. Perhaps a standardized method of classifying food type would be beneficial.
Poverty by Selected Characteristics in the United States
Data source: https://www2.census.gov/programs-surveys/demo/tables/p60/263/pov_table3.xls
Goal: What is the change in poverty rate by race and gender?
The downloaded data from the census website is in the form of a Microsoft Excel file. We’ll read in the file using read_excel.
pov_full <- read_excel('pov_table3.xls')
head(pov_full,n = 15)## # A tibble: 15 x 13
## `Table with row… X__1 X__2 X__3 X__4 X__5 X__6 X__7 X__8 X__9
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Table 3. <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 2 People in Pover… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 3 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 4 (Numbers in tho… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 5 Characteristic 2016 <NA> <NA> <NA> <NA> 2017 <NA> <NA> <NA>
## 6 <NA> Total Belo… <NA> <NA> <NA> Total Belo… <NA> <NA>
## 7 <NA> <NA> Numb… Marg… Perc… Marg… <NA> Numb… Marg… Perc…
## 8 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 9 PEOPLE <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 10 Total..........… 3199… 40616 739 12.6… 0.20… 3225… 39698 915 12.3…
## 11 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 12 Race3 and Hispa… <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## 13 White…………………………… 2459… 27113 547 11 0.20… 2472… 26436 714 10.6…
## 14 White, not Hisp… 1952… 17263 493 8.80… 0.29… 1952… 16993 571 8.69…
## 15 Black…………………………… 41962 9234 388 22 0.90… 42474 8993 373 21.1…
## # ... with 3 more variables: X__10 <chr>, X__11 <chr>, X__12 <chr>
Unfortunately, the dataset is quite messy where the true column names are inconsistently located throughout the table. For this dataset, I decided to rename all the columns in the set. The variables I care about for this analysis, I’ve given true names to better identify the columns I need. Let’s properly construct the dataframe with the goal of feeding the data to ggplot2. Below is the list of tasks we will do to clean the data:
- Rename columns in the dataframe
- Select just the columns we want
- Filter out blank rows in the dataset, the rows that have “characteristic”, and any row that starts with “Total”.
- Gather the appropriate columns to create a “long” version of the dataframe.
- Split the year and the “poverty vs total” column into two columns: one showing year and the other column that identifies if the number shown is the total population or if it’s the population living below the poverty line.
Spreadthe columnPov|Totalto go “wide” so that I can more easily calculate the poverty rate.- Get rid of the multiple periods that occur after all the demographics in the Demographic column.
- Create a new column where we can appropriately group demographic types into their proper categories.
colnames(pov_full) <- c('Demographic','2016-Total',
'2016-Below Poverty','d','e','f',
'2017-Total','2017-Below Poverty',
'i','j','k','l','m')
poverty <- pov_full %>%
select(Demographic,
`2016-Total`,
`2016-Below Poverty`,
`2017-Total`,
`2017-Below Poverty`) %>%
filter(!is.na(`2016-Total`),
!is.na(Demographic),
Demographic != "Characteristic",
!str_detect(Demographic, "^Total\\,")) %>%
gather("Year_Descr","n",2:5) %>%
separate(Year_Descr, sep = "-", c("Year","Pov|Total")) %>%
spread(`Pov|Total`,n) %>%
mutate(`Poverty Rate` = as.numeric(`Below Poverty`) / as.numeric(Total))
poverty$Demographic <- gsub("\\…*\\.*", "", poverty$Demographic)
poverty$demo_type <- 'Race'
poverty$demo_type[
poverty$Demographic=='Male' | poverty$Demographic=='Female'
] <- 'Sex'
poverty$demo_type[
str_detect(poverty$Demographic,
fixed("age", ignore_case = TRUE))
] <- 'Age'
poverty$demo_type[
str_detect(poverty$Demographic, "cities") |
str_detect(poverty$Demographic, "area")
] <- 'Residence'
poverty$demo_type[
str_detect(poverty$Demographic, "born") |
str_detect(poverty$Demographic, "citizen")
] <- 'Nativity'
poverty$demo_type[
str_detect(poverty$Demographic, fixed("east", ignore_case = TRUE)) |
str_detect(poverty$Demographic, fixed("west", ignore_case = TRUE)) |
str_detect(poverty$Demographic, fixed("north", ignore_case = TRUE)) |
str_detect(poverty$Demographic, fixed("south", ignore_case = TRUE))
] <- 'Region'
poverty$demo_type[
str_detect(poverty$Demographic,
fixed("work", ignore_case = TRUE)) |
str_detect(poverty$Demographic, "full-time")
] <- 'Work'
poverty$demo_type[
str_detect(poverty$Demographic, fixed("degree", ignore_case = TRUE)) |
str_detect(poverty$Demographic, fixed("school", ignore_case = TRUE))
] <- 'Education'
poverty$demo_type[
str_detect(poverty$Demographic, "disability")
] <- 'Disability'
poverty$demo_type[
str_detect(poverty$Demographic, "Total")
] <- 'Overall'
poverty## # A tibble: 66 x 6
## Demographic Year `Below Poverty` Total `Poverty Rate` demo_type
## <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 Aged 18 to 64 2016 22795 1970… 0.116 Age
## 2 Aged 18 to 64 2017 22209 1981… 0.112 Age
## 3 Aged 65 and older 2016 4568 49274 0.0927 Age
## 4 Aged 65 and older 2017 4681 51080 0.0916 Age
## 5 All workers 2016 8743 1509… 0.0579 Work
## 6 All workers 2017 8135 1521… 0.0534 Work
## 7 Asian 2016 1908 18879 0.101 Race
## 8 Asian 2017 1953 19475 0.100 Race
## 9 Bachelor's degree… 2016 3299 74103 0.0445 Education
## 10 Bachelor's degree… 2017 3661 76924 0.0476 Education
## # ... with 56 more rows
Now that our data is clean and useable, we can feed the information into ggplot2. We’ll display the data using facet_wrap to get an idea of how poverty levels may have changed from 2016 to 2017.
d1 <- poverty %>%
filter(demo_type == 'Age' |
demo_type == 'Nativity' |
demo_type == 'Race' |
demo_type == 'Sex' |
demo_type == 'Residence')
d2 <- poverty %>%
filter(demo_type == 'Work' |
demo_type == 'Education' |
demo_type == 'Region' |
demo_type == 'Overall' |
demo_type == 'Disability')
d_1 <- ggplot(d1[which(d1$`Poverty Rate`>0),], aes(x=Demographic, y=`Poverty Rate`))
d_1 +
geom_bar(stat = "sum", position = "dodge", aes(fill = Year)) +
guides(colour = "colorbar",size = "none") +
facet_wrap( ~ demo_type, scales = "free_x") +
theme_bw() +
theme(axis.title.x = element_blank(),
axis.text.x = element_text(angle = 60, hjust = 1, size = 6))d_2 <- ggplot(d2[which(d2$`Poverty Rate`>0),], aes(x=Demographic, y=`Poverty Rate`))
d_2 +
geom_bar(stat = "sum", position = "dodge", aes(fill = Year)) +
guides(colour = "colorbar",size = "none") +
facet_wrap( ~ demo_type, scales = "free_x") +
theme_bw() +
theme(axis.title.x = element_blank(),
axis.text.x = element_text(angle = 60, hjust = 1, size = 6))Conclusion & Final Thoughts
We can see that there has mostly been a slight decrease in the poverty rate from 2016 to 2017. If we drill down a bit further and take a look at the poverty rates across each characteristic, we find generally the same decrease in poverty rate. There is nothing particularly surprising about the data, but if we were to take a closer look at the intersections of each demographic characteristic (i.e. African American women from the South), that would be more interesting. This type of analysis would require the original raw data set.
Hate Crime Statistics
Data Source: https://ucr.fbi.gov/hate-crime/2016/tables/table-4 https://ucr.fbi.gov/hate-crime/2015/tables-and-data-declarations/4tabledatadecpdf
Goal: Has there been an increase/decrease in hate crimes from 2015 to 2016? What are the most significant changes if any?
To do a comparitive analysis between 2015 and 2016 (Trump stopped tracking hate crime statistics shortly after taking office), we need to join two datasets: 2015 & 2016. Let’s read them both in and take a look. I’ve also skipped the first 5 rows of each dataset to better display the data.
hate_2015 <- read_excel('hate_crimes_2015.xls', skip = 5)
hate_2016 <- read_excel('hate_crimes_2016.xls', skip = 5)
hate_2015 <- hate_2015[,-c(4:5)]
hate_2016 <- hate_2016[,-c(4:5)]
hate_2015## # A tibble: 47 x 15
## X__1 X__2 `Murder and\nno… `Aggravated\nas… `Simple\nassaul…
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Total 6885 18 882 1696
## 2 Sing… 6837 17 876 1690
## 3 Race… 4029 11 557 967
## 4 Anti… 734 1 101 206
## 5 Anti… 2125 10 279 488
## 6 Anti… 137 0 8 18
## 7 Anti… 132 0 21 32
## 8 Anti… 6 0 3 1
## 9 Anti… 138 0 12 15
## 10 Anti… 47 0 2 22
## # ... with 37 more rows, and 10 more variables: Intimidation <dbl>,
## # Other3 <dbl>, Robbery <dbl>, Burglary <dbl>, `Larceny-\ntheft` <dbl>,
## # `Motor\nvehicle\ntheft` <dbl>, Arson <dbl>,
## # `Destruction/\ndamage/\nvandalism` <dbl>, Other3__1 <dbl>, X__3 <dbl>
hate_2016## # A tibble: 47 x 15
## X__1 X__2 `Murder and\nno… `Aggravated\nas… `Simple\nassaul…
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Total 7321 9 873 1687
## 2 Sing… 7227 9 866 1677
## 3 Race… 4229 7 548 1002
## 4 Anti… 876 5 120 241
## 5 Anti… 2122 2 273 455
## 6 Anti… 161 0 8 17
## 7 Anti… 131 0 15 40
## 8 Anti… 9 0 0 4
## 9 Anti… 178 0 17 36
## 10 Anti… 56 0 8 16
## # ... with 37 more rows, and 10 more variables: Intimidation <dbl>,
## # Other3 <dbl>, Robbery <dbl>, Burglary <dbl>, `Larceny-\ntheft` <dbl>,
## # `Motor\nvehicle\ntheft` <dbl>, Arson <dbl>,
## # `Destruction/\ndamage/\nvandalism` <dbl>, Other3__1 <dbl>, X__3 <dbl>
For our purposes, we won’t need the hate crime type for our analysis, so we can remove all those columns. We’re only really interested in the hate crime numbers for each demographic. We’ll clean the data doing the following:
- Change the column names that hold the number of incidents to be the year and the demographic as “type of hate crime”.
- Filter out the notes section of the data frame at the bottom, which will have a value of
NAfor year column, otherwise known as the number of incidents column. - We’ll select the columns we want for each data set.
- Join each data frame to form one.
- Calculate the change in number of incidents from 2015 to 2016.
- Calculate the change proportional to the number of incidents in 2015.
- Create a new column to determine if the change was a negative or positive change (negative = decrease, positive = increase)
- Create two data frames to feed into
ggplot2: one for a “macro” categorical hate crime set and the other for a “micro” categorical hate crime set.
colnames(hate_2015)[colnames(hate_2015)=="X__2"] <- "2015"
colnames(hate_2016)[colnames(hate_2016)=="X__2"] <- "2016"
colnames(hate_2015)[colnames(hate_2015)=="X__1"] <- "Type"
colnames(hate_2016)[colnames(hate_2016)=="X__1"] <- "Type"
hate_2015 <- hate_2015 %>%
filter(!is.na(`2015`)) %>%
select(Type,`2015`)
hate_2016 <- hate_2016 %>%
filter(!is.na(`2016`)) %>%
select(Type,`2016`)
hate_crimes <- full_join(hate_2015, hate_2016, by = "Type")
hate_crimes$Change <- hate_crimes$`2016`-hate_crimes$`2015`
hate_crimes$`Percent Change` <- hate_crimes$Change/hate_crimes$`2015`
num_sign <- vector()
for (i in hate_crimes$Change) {
if (i >= 0) {
num_sign <- c(num_sign,'pos')
} else {
num_sign <- c(num_sign,'neg')
}
}
hate_crimes$num_sign <- num_sign
hate_crimes_micro <- hate_crimes %>%
filter(str_detect(Type,'Anti'))
hate_crimes_macro <- hate_crimes %>%
filter(!str_detect(Type,'Anti'),
Type != "Total")
hate_crimes_macro## # A tibble: 8 x 6
## Type `2015` `2016` Change `Percent Change` num_sign
## <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Single-Bias Incidents 6837 7227 390 0.0570 pos
## 2 Race/Ethnicity/Ancestry: 4029 4229 200 0.0496 pos
## 3 Religion: 1354 1538 184 0.136 pos
## 4 Sexual Orientation: 1219 1218 -1 -0.000820 neg
## 5 Disability: 88 76 -12 -0.136 neg
## 6 Gender: 29 36 7 0.241 pos
## 7 Gender Identity: 118 130 12 0.102 pos
## 8 Multiple-Bias Incidents4 48 94 46 0.958 pos
hate_crimes_micro## # A tibble: 34 x 6
## Type `2015` `2016` Change `Percent Change` num_sign
## <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Anti-White 734 876 142 0.193 pos
## 2 Anti-Black or African A… 2125 2122 -3 -0.00141 neg
## 3 Anti-American Indian or… 137 161 24 0.175 pos
## 4 Anti-Asian 132 131 -1 -0.00758 neg
## 5 Anti-Native Hawaiian or… 6 9 3 0.5 pos
## 6 Anti-Multiple Races, Gr… 138 178 40 0.290 pos
## 7 Anti-Arab 47 56 9 0.191 pos
## 8 Anti-Hispanic or Latino 379 449 70 0.185 pos
## 9 Anti-Other Race/Ethnici… 331 247 -84 -0.254 neg
## 10 Anti-Jewish 695 834 139 0.2 pos
## # ... with 24 more rows
Now that we have our datasets setup, let’s first plot our “macro” level dataset and it’s change proportional to the 2015 level. We’ll also take a look at the raw change in hate crimes
ggplot(hate_crimes_macro, aes(Type, `Percent Change`)) +
geom_bar(stat = "identity", aes(fill = Type)) +
theme(axis.text.x = element_text(angle = 60, hjust = 1))ggplot(hate_crimes_macro, aes(Type, Change)) +
geom_bar(stat = "identity", aes(fill = Type)) +
theme(axis.text.x = element_text(angle = 60, hjust = 1))We can see that race and religious affilation has the highest increase in count of reported hate crimes from 2015 to 2016. Proportionally though, the “multiple-bias incident” saw the highest proportional increase, which is most likely due to a lower count.
Now we’ll take a look at our “micro” level analysis.
ggplot(hate_crimes_micro, aes(Type,`Percent Change`)) +
geom_bar(stat = "identity", aes(fill = num_sign)) +
coord_flip() +
theme_bw() +
theme(legend.title=element_blank())ggplot(hate_crimes_micro, aes(Type,Change)) +
geom_bar(stat = "identity", aes(fill = num_sign)) +
coord_flip() +
theme_bw() +
theme(legend.title=element_blank())The largest increase in overall count of reported hate crimes are anti-white, anti-Jewish, anti-Islamic, and anti_Hispanic in nature. The proportional change is much smaller for these same groups also indicating that their overall count of hate crimes is higher. Let’s take a closer look at the total hate crime incidents across both years for both the macro and micro groups. First we need to add a “Total” column.
hate_crimes_macro$Total <- hate_crimes_macro$`2015` + hate_crimes_macro$`2016`
hate_crimes_micro$Total <- hate_crimes_micro$`2015` + hate_crimes_micro$`2016`
hate_crimes_macro## # A tibble: 8 x 7
## Type `2015` `2016` Change `Percent Change` num_sign Total
## <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 Single-Bias Incide… 6837 7227 390 0.0570 pos 14064
## 2 Race/Ethnicity/Anc… 4029 4229 200 0.0496 pos 8258
## 3 Religion: 1354 1538 184 0.136 pos 2892
## 4 Sexual Orientation: 1219 1218 -1 -0.000820 neg 2437
## 5 Disability: 88 76 -12 -0.136 neg 164
## 6 Gender: 29 36 7 0.241 pos 65
## 7 Gender Identity: 118 130 12 0.102 pos 248
## 8 Multiple-Bias Inci… 48 94 46 0.958 pos 142
hate_crimes_micro## # A tibble: 34 x 7
## Type `2015` `2016` Change `Percent Change` num_sign Total
## <chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
## 1 Anti-White 734 876 142 0.193 pos 1610
## 2 Anti-Black or Afr… 2125 2122 -3 -0.00141 neg 4247
## 3 Anti-American Ind… 137 161 24 0.175 pos 298
## 4 Anti-Asian 132 131 -1 -0.00758 neg 263
## 5 Anti-Native Hawai… 6 9 3 0.5 pos 15
## 6 Anti-Multiple Rac… 138 178 40 0.290 pos 316
## 7 Anti-Arab 47 56 9 0.191 pos 103
## 8 Anti-Hispanic or … 379 449 70 0.185 pos 828
## 9 Anti-Other Race/E… 331 247 -84 -0.254 neg 578
## 10 Anti-Jewish 695 834 139 0.2 pos 1529
## # ... with 24 more rows
Now that we’ve added the “Total” column to both data frames, let’s plot them.
ggplot(hate_crimes_macro, aes(Type,Total, fill = Total)) +
geom_bar(stat = "identity") +
scale_colour_gradientn(colors = 'navy') +
coord_flip() +
theme_bw() +
theme(legend.title=element_blank())ggplot(hate_crimes_micro, aes(Type,Total, fill = Total)) +
geom_bar(stat = "identity") +
scale_colour_gradientn(colors = 'navy') +
coord_flip() +
theme_bw() +
theme(legend.title=element_blank())Conclusion & Final Thoughts
Of all reported hate crimes, the most frequent (most prominent peaks) are anti-Black, anti-Jewish, anti-gay, and anti-white. For a more overall view, the most frequent are rooted in race/ethnicity or sexual orientation. Understanding that these are the most frequent reported hate crimes, their proportional increase from 2015 to 2016 in context is more disturbing. Suddenly “small” proportional increases of 20-25% can mean significant increases in the shear number of reported hate crimes for certain groups. Of significant note are the moderate proportional increases with anti-white, anti-Jewish, anti-Hispanic, and anti-Islamic hate crimes suggesting America’s growing tribalism due to the 2016 election played a significant factor in the increase.
Further analysis would be ideal in observing any kind of trend over the last 20 years. There are also limitations in the data where hate crime must be reported to be logged in the database.