Three Data Sets for Analysis

The three data sets that will be looked at are detailed below. You can click on any section listed below to jump to it.

I. Food Trucks in San Francisco

II. Poverty by Selected Characteristics in the United States

III. Hate Crime Statistics

Before we start with working on any of the three data sets, let’s load the appropriate libraries that will be commonly used throughout.

library(tidyr)
library(dplyr)
library(ggplot2)
library(stringr)
library(rgdal) 
library(readxl)

Now that we’ve loaded the appropriate libraries, let’s move on to our datasets.

Food Trucks in San Francisco

Data Source: https://data.sfgov.org/Economy-and-Community/Mobile-Food-Facility-Permit/rqzj-sfat

Goal: Using this data set, find where in San Francisco is the highest concentration of food trucks.

Our first step is to read in the csv file downloaded from the data source and to take a brief look at what’s inside.

food_full <- read.csv('Mobile_Food_Facility_Permit.csv')
head(food_full)
##   locationid        Applicant FacilityType      cnn
## 1    1222440   Faith Sandwich    Push Cart  9090000
## 2     751253     Pipo's Grill        Truck  5688000
## 3     735318 Ziaurehman Amini    Push Cart 30727000
## 4     364218    The Chai Cart    Push Cart  9543000
## 5     735315 Ziaurehman Amini    Push Cart  4969000
## 6     773095   Athena SF Gyro    Push Cart 30747000
##                                            LocationDescription
## 1               MISSION ST: SHAW ALY to ANTHONY ST (543 - 586)
## 2                  FOLSOM ST: 14TH ST to 15TH ST (1800 - 1899)
## 3                             MARKET ST: DRUMM ST intersection
## 4 NEW MONTGOMERY ST: AMBROSE BIERCE ST to MISSION ST (77 - 99)
## 5                 DRUMM ST: MARKET ST to CALIFORNIA ST (1 - 6)
## 6                              MARKET ST: 11TH ST intersection
##                 Address blocklot block lot     permit    Status
## 1        560 MISSION ST  3708095  3708 095 18MFF-0108 REQUESTED
## 2        1800 FOLSOM ST  3549083  3549 083 16MFF-0010 REQUESTED
## 3     5 THE EMBARCADERO  0234017  0234 017 15MFF-0159 REQUESTED
## 4  79 NEW MONTGOMERY ST  3707014  3707 014 12MFF-0083   SUSPEND
## 5       1 CALIFORNIA ST  0264004  0264 004 15MFF-0159 REQUESTED
## 6 10 SOUTH VAN NESS AVE  3506004  3506 004 15MFF-0145 REQUESTED
##                                                                                                                                                                                      FoodItems
## 1 Vietnamese sandwiches: various meat rice plates & bowls: vermicelli: spring rolls: sticky rice: Vietnamese Goi: pho: noodles: coffee:  various flavored tea : various soda and juices: water
## 2                                                                                                                                                    Tacos: Burritos: Hot Dogs: and Hamburgers
## 3                                                                                                                                                                                             
## 4                                                                                                                                                                        Hot Indian Chai (Tea)
## 5                                                                                                                                                                                             
## 6                                                                                      Gyro pita bread (Lamb or chicken): lamb over rice: chicken over rice: chicken biryani rice: soft drinks
##         X       Y Latitude Longitude
## 1 6012851 2115275 37.78886 -122.3994
## 2 6007857 2107724 37.76785 -122.4161
## 3 6013917 2117244 37.79433 -122.3958
## 4 6012504 2114927 37.78789 -122.4005
## 5 6013553 2116844 37.79321 -122.3970
## 6 6006927 2110076 37.77426 -122.4195
##                                                                                                                                                          Schedule
## 1 http://bsm.sfdpw.org/PermitsTracker/reports/report.aspx?title=schedule&report=rptSchedule&params=permit=18MFF-0108&ExportPDF=1&Filename=18MFF-0108_schedule.pdf
## 2 http://bsm.sfdpw.org/PermitsTracker/reports/report.aspx?title=schedule&report=rptSchedule&params=permit=16MFF-0010&ExportPDF=1&Filename=16MFF-0010_schedule.pdf
## 3 http://bsm.sfdpw.org/PermitsTracker/reports/report.aspx?title=schedule&report=rptSchedule&params=permit=15MFF-0159&ExportPDF=1&Filename=15MFF-0159_schedule.pdf
## 4 http://bsm.sfdpw.org/PermitsTracker/reports/report.aspx?title=schedule&report=rptSchedule&params=permit=12MFF-0083&ExportPDF=1&Filename=12MFF-0083_schedule.pdf
## 5 http://bsm.sfdpw.org/PermitsTracker/reports/report.aspx?title=schedule&report=rptSchedule&params=permit=15MFF-0159&ExportPDF=1&Filename=15MFF-0159_schedule.pdf
## 6 http://bsm.sfdpw.org/PermitsTracker/reports/report.aspx?title=schedule&report=rptSchedule&params=permit=15MFF-0145&ExportPDF=1&Filename=15MFF-0145_schedule.pdf
##          dayshours NOISent Approved   Received PriorPermit
## 1    Mo-Fr:8AM-3PM                  2018-09-25           0
## 2                                   2016-02-04           0
## 3                                   2015-12-31           0
## 4    Mo-Su:7AM-6PM                  2012-04-03           0
## 5                                   2015-12-31           0
## 6 We/Th/Fr:6AM-6PM                  2015-09-01           0
##           ExpirationDate                              Location
## 1 07/15/2019 12:00:00 AM  (37.788864715343, -122.399359351363)
## 2                        (37.7678524427181, -122.416104892532)
## 3 03/15/2016 12:00:00 AM (37.7943310032468, -122.395811053023)
## 4                        (37.7878896999061, -122.400535326777)
## 5 03/15/2016 12:00:00 AM (37.7932137316634, -122.397043036718)
## 6                          (37.77425926306, -122.419485988398)

There are 24 columns in this data frame, most of which are not applicable for our purposes. We also notice that the FoodItems column is colon-separated where it appears their primary food item is first in the list. Because there are so many food types, we’ll only pull the first item for reference.

We’ll also work on keeping only the columns that matter to us, where the status is “Approved”, and where latitude and longitude are values that make sense.

food <- food_full %>%
  filter(Status == 'APPROVED', Latitude != 0, Longitude != 0) %>%
  select(Applicant, Latitude, Longitude, FoodItems) %>%
  separate(FoodItems, sep = ":", into = c('FoodItems')) 
## Warning: Expected 1 pieces. Additional pieces discarded in 353 rows [1, 2,
## 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
head(food)
##               Applicant Latitude Longitude                    FoodItems
## 1           Senor Sisig 37.79295 -122.3981                  Senor Sisig
## 2           Senor Sisig 37.78215 -122.4066                  Senor Sisig
## 3          Los 2 Cuates 37.78305 -122.3941 south American-Peruvian food
## 4         Quan Catering 37.74418 -122.3867                   Cold Truck
## 5 Anas Goodies Catering 37.72398 -122.3959                   Cold Truck
## 6           BH & MT LLC 37.76832 -122.4271                   Cold Truck

Now that we have a preliminary data set for exploratory purposes, let’s work on building a map of San Francisco and plot the locations of each approved food truck/cart. Ideally, using the ggmaps library and utilizing the goodle maps api would have been preferable, but it appears they’ve changed their pricing guidelines and it’s no longer free. To substitute this, I’ve downloaded the shapefile for all the neighborhoods in San Francisco instead, which will give us what we need. Let’s build our map.

sf <- readOGR(dsn = "/Users/chesterpoon/Project2/sf",layer = "sf")
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/chesterpoon/Project2/sf", layer: "sf"
## with 92 features
## It has 3 fields
sf_df <- fortify(sf)
## Regions defined for each Polygons
sf_map <- ggplot() +
  geom_polygon(data = sf_df,
            aes(x = long, y = lat, group = group),
            color = 'black', fill = '#fce3c4', size = .05) +
  theme(rect = element_blank())

sf_map + geom_point(data = food,
             aes(x = Longitude, y = Latitude),
             color = '#37a347')

It appears that most of our food vendors are concentrated along the eastern part of the city. A little more research reveals that these are the financial sectors of San Francisco.

Now, let’s take a look to see what the most common food items are.

nfood <- table(toupper(food$FoodItems))
nfood <- data.frame(sort(nfood, decreasing = TRUE)[1:5])
colnames(nfood) <- c("Foodtype", "n")

ggplot(nfood,aes(x = Foodtype, y = n, fill = Foodtype)) +
  geom_bar(stat = 'identity') +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.title.x = element_blank())

It’s a bit unclear what the meaning of “Cold Truck” is. It’s also somewhat misleading that a food truck sells everything but hot dogs. Clearly there are limitations with the data stemming from the data collection process. In any case, the most common type of food truck/cart is “Cold Truck”.

For fun, let’s pretend we would like to start a food truck business. Let’s see where the top 3 types of food trucks (“Cold Truck”, “Burgers”, “Hot Dogs”) are located.

top3 <- food %>%
  filter(toupper(FoodItems) == "COLD TRUCK" | 
           toupper(FoodItems) == "BURGERS" | 
           toupper(FoodItems) == "HOT DOGS")

sf_map +
  geom_point(data = top3,
             aes(x=Longitude, y=Latitude, colour = toupper(FoodItems))) +
  scale_color_hue("Legend")

Conclusion & Final Thoughts

From our analysis, a vast majority of food trucks/carts exist on the eastern side of San Francisco. The most common type of food that is sold are “Cold Truck”, Burgers and hot dogs. If we were to open our own imaginary food truck/cart, we would probably do well with a good burger truck in the northeast corner of San Francisco. This is assuming that business would not do so well on the western section of the city, which could explain why there is such a dearth of food trucks there.

A better analysis could occur if the data collection was better with clarifying information on the meaning of “Cold Truck”. My suspicion is that vendors input the food type they sell as free text when completing their application. Perhaps a standardized method of classifying food type would be beneficial.

Navigate back to the top

Poverty by Selected Characteristics in the United States

Data source: https://www2.census.gov/programs-surveys/demo/tables/p60/263/pov_table3.xls

Goal: What is the change in poverty rate by race and gender?

The downloaded data from the census website is in the form of a Microsoft Excel file. We’ll read in the file using read_excel.

pov_full <- read_excel('pov_table3.xls')
head(pov_full,n = 15)
## # A tibble: 15 x 13
##    `Table with row… X__1  X__2  X__3  X__4  X__5  X__6  X__7  X__8  X__9 
##    <chr>            <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
##  1 Table 3.         <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
##  2 People in Pover… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
##  3 <NA>             <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
##  4 (Numbers in tho… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
##  5 Characteristic   2016  <NA>  <NA>  <NA>  <NA>  2017  <NA>  <NA>  <NA> 
##  6 <NA>             Total Belo… <NA>  <NA>  <NA>  Total Belo… <NA>  <NA> 
##  7 <NA>             <NA>  Numb… Marg… Perc… Marg… <NA>  Numb… Marg… Perc…
##  8 <NA>             <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
##  9 PEOPLE           <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
## 10 Total..........… 3199… 40616 739   12.6… 0.20… 3225… 39698 915   12.3…
## 11 <NA>             <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
## 12 Race3 and Hispa… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
## 13 White…………………………… 2459… 27113 547   11    0.20… 2472… 26436 714   10.6…
## 14 White, not Hisp… 1952… 17263 493   8.80… 0.29… 1952… 16993 571   8.69…
## 15 Black…………………………… 41962 9234  388   22    0.90… 42474 8993  373   21.1…
## # ... with 3 more variables: X__10 <chr>, X__11 <chr>, X__12 <chr>

Unfortunately, the dataset is quite messy where the true column names are inconsistently located throughout the table. For this dataset, I decided to rename all the columns in the set. The variables I care about for this analysis, I’ve given true names to better identify the columns I need. Let’s properly construct the dataframe with the goal of feeding the data to ggplot2. Below is the list of tasks we will do to clean the data:

  • Rename columns in the dataframe
  • Select just the columns we want
  • Filter out blank rows in the dataset, the rows that have “characteristic”, and any row that starts with “Total”.
  • Gather the appropriate columns to create a “long” version of the dataframe.
  • Split the year and the “poverty vs total” column into two columns: one showing year and the other column that identifies if the number shown is the total population or if it’s the population living below the poverty line.
  • Spread the column Pov|Total to go “wide” so that I can more easily calculate the poverty rate.
  • Get rid of the multiple periods that occur after all the demographics in the Demographic column.
  • Create a new column where we can appropriately group demographic types into their proper categories.
colnames(pov_full) <- c('Demographic','2016-Total',
                        '2016-Below Poverty','d','e','f',
                        '2017-Total','2017-Below Poverty',
                        'i','j','k','l','m')
poverty <- pov_full %>%
  select(Demographic,
         `2016-Total`,
         `2016-Below Poverty`,
         `2017-Total`,
         `2017-Below Poverty`) %>%
  filter(!is.na(`2016-Total`),
         !is.na(Demographic),
         Demographic != "Characteristic",
         !str_detect(Demographic, "^Total\\,")) %>%
  gather("Year_Descr","n",2:5) %>%
  separate(Year_Descr, sep = "-", c("Year","Pov|Total")) %>%
  spread(`Pov|Total`,n) %>%
  mutate(`Poverty Rate` = as.numeric(`Below Poverty`) / as.numeric(Total))

poverty$Demographic <- gsub("\\…*\\.*", "", poverty$Demographic)
poverty$demo_type <- 'Race'
poverty$demo_type[
  poverty$Demographic=='Male' | poverty$Demographic=='Female'
  ] <- 'Sex'
poverty$demo_type[
  str_detect(poverty$Demographic,
             fixed("age", ignore_case = TRUE))
  ] <- 'Age'
poverty$demo_type[
  str_detect(poverty$Demographic, "cities") | 
    str_detect(poverty$Demographic, "area")
  ] <- 'Residence'

poverty$demo_type[
  str_detect(poverty$Demographic, "born") | 
    str_detect(poverty$Demographic, "citizen")
  ] <- 'Nativity'

poverty$demo_type[
  str_detect(poverty$Demographic, fixed("east", ignore_case = TRUE)) |
    str_detect(poverty$Demographic, fixed("west", ignore_case = TRUE)) |
    str_detect(poverty$Demographic, fixed("north", ignore_case = TRUE)) |
    str_detect(poverty$Demographic, fixed("south", ignore_case = TRUE))
  ] <- 'Region'

poverty$demo_type[
  str_detect(poverty$Demographic,
             fixed("work", ignore_case = TRUE)) | 
    str_detect(poverty$Demographic, "full-time")
  ] <- 'Work'

poverty$demo_type[
  str_detect(poverty$Demographic, fixed("degree", ignore_case = TRUE)) |
    str_detect(poverty$Demographic, fixed("school", ignore_case = TRUE))
  ] <- 'Education'

poverty$demo_type[
  str_detect(poverty$Demographic, "disability")
  ] <- 'Disability'
poverty$demo_type[
  str_detect(poverty$Demographic, "Total")
  ] <- 'Overall'

poverty
## # A tibble: 66 x 6
##    Demographic        Year  `Below Poverty` Total `Poverty Rate` demo_type
##    <chr>              <chr> <chr>           <chr>          <dbl> <chr>    
##  1 Aged 18 to 64      2016  22795           1970…         0.116  Age      
##  2 Aged 18 to 64      2017  22209           1981…         0.112  Age      
##  3 Aged 65 and older  2016  4568            49274         0.0927 Age      
##  4 Aged 65 and older  2017  4681            51080         0.0916 Age      
##  5 All workers        2016  8743            1509…         0.0579 Work     
##  6 All workers        2017  8135            1521…         0.0534 Work     
##  7 Asian              2016  1908            18879         0.101  Race     
##  8 Asian              2017  1953            19475         0.100  Race     
##  9 Bachelor's degree… 2016  3299            74103         0.0445 Education
## 10 Bachelor's degree… 2017  3661            76924         0.0476 Education
## # ... with 56 more rows

Now that our data is clean and useable, we can feed the information into ggplot2. We’ll display the data using facet_wrap to get an idea of how poverty levels may have changed from 2016 to 2017.

d1 <- poverty %>%
  filter(demo_type == 'Age' |
           demo_type == 'Nativity' |
           demo_type == 'Race' |
           demo_type == 'Sex' |
           demo_type == 'Residence')

d2 <- poverty %>%
  filter(demo_type == 'Work' |
           demo_type == 'Education' |
           demo_type == 'Region' |
           demo_type == 'Overall' |
           demo_type == 'Disability')

d_1 <- ggplot(d1[which(d1$`Poverty Rate`>0),], aes(x=Demographic, y=`Poverty Rate`))
d_1 +
  geom_bar(stat = "sum", position = "dodge", aes(fill = Year)) +
  guides(colour = "colorbar",size = "none") +
  facet_wrap( ~ demo_type, scales = "free_x") +
  theme_bw() +
  theme(axis.title.x = element_blank(),
        axis.text.x = element_text(angle = 60, hjust = 1, size = 6))

d_2 <- ggplot(d2[which(d2$`Poverty Rate`>0),], aes(x=Demographic, y=`Poverty Rate`))
d_2 +
  geom_bar(stat = "sum", position = "dodge", aes(fill = Year)) +
  guides(colour = "colorbar",size = "none") +
  facet_wrap( ~ demo_type, scales = "free_x") +
  theme_bw() +
  theme(axis.title.x = element_blank(),
        axis.text.x = element_text(angle = 60, hjust = 1, size = 6))

Conclusion & Final Thoughts

We can see that there has mostly been a slight decrease in the poverty rate from 2016 to 2017. If we drill down a bit further and take a look at the poverty rates across each characteristic, we find generally the same decrease in poverty rate. There is nothing particularly surprising about the data, but if we were to take a closer look at the intersections of each demographic characteristic (i.e. African American women from the South), that would be more interesting. This type of analysis would require the original raw data set.

Navigate back to the top

Hate Crime Statistics

Data Source: https://ucr.fbi.gov/hate-crime/2016/tables/table-4 https://ucr.fbi.gov/hate-crime/2015/tables-and-data-declarations/4tabledatadecpdf

Goal: Has there been an increase/decrease in hate crimes from 2015 to 2016? What are the most significant changes if any?

To do a comparitive analysis between 2015 and 2016 (Trump stopped tracking hate crime statistics shortly after taking office), we need to join two datasets: 2015 & 2016. Let’s read them both in and take a look. I’ve also skipped the first 5 rows of each dataset to better display the data.

hate_2015 <- read_excel('hate_crimes_2015.xls', skip = 5)
hate_2016 <- read_excel('hate_crimes_2016.xls', skip = 5)

hate_2015 <- hate_2015[,-c(4:5)]
hate_2016 <- hate_2016[,-c(4:5)]

hate_2015
## # A tibble: 47 x 15
##    X__1   X__2 `Murder and\nno… `Aggravated\nas… `Simple\nassaul…
##    <chr> <dbl>            <dbl>            <dbl>            <dbl>
##  1 Total  6885               18              882             1696
##  2 Sing…  6837               17              876             1690
##  3 Race…  4029               11              557              967
##  4 Anti…   734                1              101              206
##  5 Anti…  2125               10              279              488
##  6 Anti…   137                0                8               18
##  7 Anti…   132                0               21               32
##  8 Anti…     6                0                3                1
##  9 Anti…   138                0               12               15
## 10 Anti…    47                0                2               22
## # ... with 37 more rows, and 10 more variables: Intimidation <dbl>,
## #   Other3 <dbl>, Robbery <dbl>, Burglary <dbl>, `Larceny-\ntheft` <dbl>,
## #   `Motor\nvehicle\ntheft` <dbl>, Arson <dbl>,
## #   `Destruction/\ndamage/\nvandalism` <dbl>, Other3__1 <dbl>, X__3 <dbl>
hate_2016
## # A tibble: 47 x 15
##    X__1   X__2 `Murder and\nno… `Aggravated\nas… `Simple\nassaul…
##    <chr> <dbl>            <dbl>            <dbl>            <dbl>
##  1 Total  7321                9              873             1687
##  2 Sing…  7227                9              866             1677
##  3 Race…  4229                7              548             1002
##  4 Anti…   876                5              120              241
##  5 Anti…  2122                2              273              455
##  6 Anti…   161                0                8               17
##  7 Anti…   131                0               15               40
##  8 Anti…     9                0                0                4
##  9 Anti…   178                0               17               36
## 10 Anti…    56                0                8               16
## # ... with 37 more rows, and 10 more variables: Intimidation <dbl>,
## #   Other3 <dbl>, Robbery <dbl>, Burglary <dbl>, `Larceny-\ntheft` <dbl>,
## #   `Motor\nvehicle\ntheft` <dbl>, Arson <dbl>,
## #   `Destruction/\ndamage/\nvandalism` <dbl>, Other3__1 <dbl>, X__3 <dbl>

For our purposes, we won’t need the hate crime type for our analysis, so we can remove all those columns. We’re only really interested in the hate crime numbers for each demographic. We’ll clean the data doing the following:

  • Change the column names that hold the number of incidents to be the year and the demographic as “type of hate crime”.
  • Filter out the notes section of the data frame at the bottom, which will have a value of NA for year column, otherwise known as the number of incidents column.
  • We’ll select the columns we want for each data set.
  • Join each data frame to form one.
  • Calculate the change in number of incidents from 2015 to 2016.
  • Calculate the change proportional to the number of incidents in 2015.
  • Create a new column to determine if the change was a negative or positive change (negative = decrease, positive = increase)
  • Create two data frames to feed into ggplot2: one for a “macro” categorical hate crime set and the other for a “micro” categorical hate crime set.
colnames(hate_2015)[colnames(hate_2015)=="X__2"] <- "2015"
colnames(hate_2016)[colnames(hate_2016)=="X__2"] <- "2016"
colnames(hate_2015)[colnames(hate_2015)=="X__1"] <- "Type"
colnames(hate_2016)[colnames(hate_2016)=="X__1"] <- "Type"

hate_2015 <- hate_2015 %>%
  filter(!is.na(`2015`)) %>%
  select(Type,`2015`)

hate_2016 <- hate_2016 %>%
  filter(!is.na(`2016`)) %>%
  select(Type,`2016`)

hate_crimes <- full_join(hate_2015, hate_2016, by = "Type")

hate_crimes$Change <- hate_crimes$`2016`-hate_crimes$`2015`
hate_crimes$`Percent Change` <- hate_crimes$Change/hate_crimes$`2015`

num_sign <- vector()

for (i in hate_crimes$Change) {
  if (i >= 0) {
    num_sign <- c(num_sign,'pos')
    } else {
      num_sign <- c(num_sign,'neg')
      }
}

hate_crimes$num_sign <- num_sign
  
hate_crimes_micro <- hate_crimes %>%
  filter(str_detect(Type,'Anti'))

hate_crimes_macro <- hate_crimes %>%
  filter(!str_detect(Type,'Anti'),
         Type != "Total")

hate_crimes_macro
## # A tibble: 8 x 6
##   Type                     `2015` `2016` Change `Percent Change` num_sign
##   <chr>                     <dbl>  <dbl>  <dbl>            <dbl> <chr>   
## 1 Single-Bias Incidents      6837   7227    390         0.0570   pos     
## 2 Race/Ethnicity/Ancestry:   4029   4229    200         0.0496   pos     
## 3 Religion:                  1354   1538    184         0.136    pos     
## 4 Sexual Orientation:        1219   1218     -1        -0.000820 neg     
## 5 Disability:                  88     76    -12        -0.136    neg     
## 6 Gender:                      29     36      7         0.241    pos     
## 7 Gender Identity:            118    130     12         0.102    pos     
## 8 Multiple-Bias Incidents4     48     94     46         0.958    pos
hate_crimes_micro
## # A tibble: 34 x 6
##    Type                     `2015` `2016` Change `Percent Change` num_sign
##    <chr>                     <dbl>  <dbl>  <dbl>            <dbl> <chr>   
##  1 Anti-White                  734    876    142          0.193   pos     
##  2 Anti-Black or African A…   2125   2122     -3         -0.00141 neg     
##  3 Anti-American Indian or…    137    161     24          0.175   pos     
##  4 Anti-Asian                  132    131     -1         -0.00758 neg     
##  5 Anti-Native Hawaiian or…      6      9      3          0.5     pos     
##  6 Anti-Multiple Races, Gr…    138    178     40          0.290   pos     
##  7 Anti-Arab                    47     56      9          0.191   pos     
##  8 Anti-Hispanic or Latino     379    449     70          0.185   pos     
##  9 Anti-Other Race/Ethnici…    331    247    -84         -0.254   neg     
## 10 Anti-Jewish                 695    834    139          0.2     pos     
## # ... with 24 more rows

Now that we have our datasets setup, let’s first plot our “macro” level dataset and it’s change proportional to the 2015 level. We’ll also take a look at the raw change in hate crimes

ggplot(hate_crimes_macro, aes(Type, `Percent Change`)) +
  geom_bar(stat = "identity", aes(fill = Type)) +
  theme(axis.text.x = element_text(angle = 60, hjust = 1))

ggplot(hate_crimes_macro, aes(Type, Change)) +
  geom_bar(stat = "identity", aes(fill = Type)) +
  theme(axis.text.x = element_text(angle = 60, hjust = 1))

We can see that race and religious affilation has the highest increase in count of reported hate crimes from 2015 to 2016. Proportionally though, the “multiple-bias incident” saw the highest proportional increase, which is most likely due to a lower count.

Now we’ll take a look at our “micro” level analysis.

ggplot(hate_crimes_micro, aes(Type,`Percent Change`)) +
  geom_bar(stat = "identity", aes(fill = num_sign)) +
  coord_flip() +
  theme_bw() +
  theme(legend.title=element_blank())

ggplot(hate_crimes_micro, aes(Type,Change)) +
  geom_bar(stat = "identity", aes(fill = num_sign)) +
  coord_flip() +
  theme_bw() +
  theme(legend.title=element_blank())

The largest increase in overall count of reported hate crimes are anti-white, anti-Jewish, anti-Islamic, and anti_Hispanic in nature. The proportional change is much smaller for these same groups also indicating that their overall count of hate crimes is higher. Let’s take a closer look at the total hate crime incidents across both years for both the macro and micro groups. First we need to add a “Total” column.

hate_crimes_macro$Total <- hate_crimes_macro$`2015` + hate_crimes_macro$`2016`
hate_crimes_micro$Total <- hate_crimes_micro$`2015` + hate_crimes_micro$`2016`

hate_crimes_macro
## # A tibble: 8 x 7
##   Type                `2015` `2016` Change `Percent Change` num_sign Total
##   <chr>                <dbl>  <dbl>  <dbl>            <dbl> <chr>    <dbl>
## 1 Single-Bias Incide…   6837   7227    390         0.0570   pos      14064
## 2 Race/Ethnicity/Anc…   4029   4229    200         0.0496   pos       8258
## 3 Religion:             1354   1538    184         0.136    pos       2892
## 4 Sexual Orientation:   1219   1218     -1        -0.000820 neg       2437
## 5 Disability:             88     76    -12        -0.136    neg        164
## 6 Gender:                 29     36      7         0.241    pos         65
## 7 Gender Identity:       118    130     12         0.102    pos        248
## 8 Multiple-Bias Inci…     48     94     46         0.958    pos        142
hate_crimes_micro
## # A tibble: 34 x 7
##    Type               `2015` `2016` Change `Percent Change` num_sign Total
##    <chr>               <dbl>  <dbl>  <dbl>            <dbl> <chr>    <dbl>
##  1 Anti-White            734    876    142          0.193   pos       1610
##  2 Anti-Black or Afr…   2125   2122     -3         -0.00141 neg       4247
##  3 Anti-American Ind…    137    161     24          0.175   pos        298
##  4 Anti-Asian            132    131     -1         -0.00758 neg        263
##  5 Anti-Native Hawai…      6      9      3          0.5     pos         15
##  6 Anti-Multiple Rac…    138    178     40          0.290   pos        316
##  7 Anti-Arab              47     56      9          0.191   pos        103
##  8 Anti-Hispanic or …    379    449     70          0.185   pos        828
##  9 Anti-Other Race/E…    331    247    -84         -0.254   neg        578
## 10 Anti-Jewish           695    834    139          0.2     pos       1529
## # ... with 24 more rows

Now that we’ve added the “Total” column to both data frames, let’s plot them.

ggplot(hate_crimes_macro, aes(Type,Total, fill = Total)) +
  geom_bar(stat = "identity") +
  scale_colour_gradientn(colors = 'navy') +
  coord_flip() +
  theme_bw() +
  theme(legend.title=element_blank())

ggplot(hate_crimes_micro, aes(Type,Total, fill = Total)) +
  geom_bar(stat = "identity") +
  scale_colour_gradientn(colors = 'navy') +
  coord_flip() +
  theme_bw() +
  theme(legend.title=element_blank())

Conclusion & Final Thoughts

Of all reported hate crimes, the most frequent (most prominent peaks) are anti-Black, anti-Jewish, anti-gay, and anti-white. For a more overall view, the most frequent are rooted in race/ethnicity or sexual orientation. Understanding that these are the most frequent reported hate crimes, their proportional increase from 2015 to 2016 in context is more disturbing. Suddenly “small” proportional increases of 20-25% can mean significant increases in the shear number of reported hate crimes for certain groups. Of significant note are the moderate proportional increases with anti-white, anti-Jewish, anti-Hispanic, and anti-Islamic hate crimes suggesting America’s growing tribalism due to the 2016 election played a significant factor in the increase.

Further analysis would be ideal in observing any kind of trend over the last 20 years. There are also limitations in the data where hate crime must be reported to be logged in the database.

Navigate back to the top