A Handful Of Cities Are Driving 2016’s Rise In Murders -
This article is about rise in murder rate across some of the major cities during the period 2014-2016. Also it states that ,in 2016 the rise is more concentrated in a few major cities.
We are using 2 data sets for comparison , the 2015 data set provides information about 2014 and 2015 murders across major cities. Second data set provides murder information about 2015 and 2016 murders. This is a preliminary data set.
Following operations are performed as part of this assignment .
Please find the link to the article.
Load required packages
library(ggplot2)
library(dplyr)
library(tidyr)
library(curl)
Read the 2 data CSV file into data frames
#Read the data from git
murder2016 <- read.csv(curl("https://raw.githubusercontent.com/fivethirtyeight/data/master/murder_2016/murder_2016_prelim.csv"))
murder2015 <- read.csv(curl("https://raw.githubusercontent.com/fivethirtyeight/data/master/murder_2016/murder_2015_final.csv"))
colnames(murder2015)[3] <- 'murders_2014'
colnames(murder2015)[4] <- 'murders_2015'
colnames(murder2016)[3] <- 'murders_2015'
colnames(murder2016)[4] <- 'murders_2016'
summary(murder2016)
## city state murders_2015 murders_2016
## Length:79 Length:79 Min. : 0.00 Min. : 1.00
## Class :character Class :character 1st Qu.: 13.00 1st Qu.: 14.50
## Mode :character Mode :character Median : 30.00 Median : 30.00
## Mean : 56.47 Mean : 62.38
## 3rd Qu.: 71.50 3rd Qu.: 81.50
## Max. :378.00 Max. :536.00
## change source as_of
## Min. :-21.000 Length:79 Length:79
## 1st Qu.: -3.000 Class :character Class :character
## Median : 2.000 Mode :character Mode :character
## Mean : 5.911
## 3rd Qu.: 9.000
## Max. :158.000
summary(murder2015)
## city state murders_2014 murders_2015
## Length:83 Length:83 Min. : 0.00 Min. : 1.00
## Class :character Class :character 1st Qu.: 19.50 1st Qu.: 22.50
## Mode :character Mode :character Median : 32.00 Median : 39.00
## Mean : 65.75 Mean : 75.48
## 3rd Qu.: 82.00 3rd Qu.: 94.00
## Max. :411.00 Max. :478.00
## change
## Min. :-19.000
## 1st Qu.: -3.000
## Median : 4.000
## Mean : 9.735
## 3rd Qu.: 14.000
## Max. :133.000
mean_2014murder <- mean(murder2015$murders_2014)
mean_2015murder <- mean(murder2015$murders_2015)
mean_2016murder <- mean(murder2016$murders_2016)
print(paste("Mean of 2014 murders ",mean_2014murder))
## [1] "Mean of 2014 murders 65.7469879518072"
print(paste("Mean of 2015 murders ",mean_2015murder))
## [1] "Mean of 2015 murders 75.4819277108434"
print(paste("Mean of 2016 murders ",mean_2016murder))
## [1] "Mean of 2016 murders 62.379746835443"
We consider 10 cities where murder rates are higher
# in this example we are focusing only cities where murder rates are increased drastically
#find the 10th highest value
top10value2016 =sort(murder2016$change,TRUE)[10];
top10value2015 =sort(murder2015$change,TRUE)[10];
#consider cities which has greater than top 10th crime increase also subset the columns
murder2016_top10 <-subset(murder2016,change >= top10value2016,select = c(city,state,murders_2015,murders_2016,change))
murder2015_top10 <-subset(murder2015,change >= top10value2015)
Add a new columns for increase rate
#add a new column by deriving value from an existing column
murder2016_top10 <- transform(murder2016_top10 ,increase = round(change/murders_2015 *100,0))
murder2015_top10 <- transform(murder2015_top10 ,increase = round(change/murders_2014 *100,0))
Create a new data from by combining 2 different data frame , which will make the comparison easier. Data frame are joined using city and state.
murder2016_subset <- transform(murder2016 ,increase_rate = round(change/murders_2015 *100,0))
murder2015_subset <- transform(murder2015 ,increase_rate = round(change/murders_2014 *100,0))
murder2016_subset <-subset(murder2016_subset,select = c(city,state,increase_rate,change))
murder2015_subset <-subset(murder2015_subset,select = c(city,state,increase_rate,change))
joinedresult = inner_join(murder2016_subset , murder2015_subset, by = c("city","state"), suffix = c("_2016","_2015"))
joinedresult10 <- head(joinedresult,10)
Display 10 cities and murder rates where murder rates are higher over previous year (2016)
ggplot(data=murder2016_top10, aes(x=city, y=increase,color=city,fill=city)) + geom_bar(stat="identity") +labs(title='Crime Rate Across major cities 2016',x='City',y='Crime Rate increase') + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+geom_text(aes(label=increase), position=position_dodge(width=0.9), vjust=-0.25)
Comparison of murder rate in 2015 and 2016
joinedresult10_new <-subset(joinedresult10,select = c(city,increase_rate_2016,increase_rate_2015))
joinedresult10_new_transform <- joinedresult10_new %>%
gather("Stat", "Value", -city)
ggplot(joinedresult10_new_transform, aes(x = city, y = Value, fill = Stat)) +
geom_col(position = "dodge")+labs(title='Crime Rate Comparison',x='City',y='Crime Rate increase')+ theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
From the data analysis it is clear that there is widespread rise in murder rate in 2015. In 2016 it appears more concentrated in few big cities like Chicago and Orlando.