To analyze a dataset of US arrest.
The dataset if obtained from the below site: http://vincentarelbundock.github.io/
Dataset reference: US Arrests
theURL <- "http://vincentarelbundock.github.io/Rdatasets/csv/datasets/USArrests.csv";
arrest_data <- read.table(file = theURL, header = TRUE, sep = ",");
head(arrest_data);
#State column is named as 'X". Changing the column name to US_State
colnames(arrest_data)[1] <- "US_State";
head(arrest_data);
#Write the data to a CSV
write.table(arrest_data, file = "us_arrest.csv", sep = ",", row.names = FALSE);
#The saved file is uploaded to githib and below is the URL which will be used hereafter.
#https://raw.githubusercontent.com/arunk13/MSDA-Assignments/master/BridgeCourse/Week4/us_arrest.csv
## US_State Murder Assault UrbanPop Rape
## 1 Alabama 13.2 236 58 21.2
## 2 Alaska 10.0 263 48 44.5
## 3 Arizona 8.1 294 80 31.0
## 4 Arkansas 8.8 190 50 19.5
## 5 California 9.0 276 91 40.6
## 6 Colorado 7.9 204 78 38.7
summary(arrest_data);
## US_State Murder Assault UrbanPop
## Alabama : 1 Min. : 0.800 Min. : 45.0 Min. :32.00
## Alaska : 1 1st Qu.: 4.075 1st Qu.:109.0 1st Qu.:54.50
## Arizona : 1 Median : 7.250 Median :159.0 Median :66.00
## Arkansas : 1 Mean : 7.788 Mean :170.8 Mean :65.54
## California: 1 3rd Qu.:11.250 3rd Qu.:249.0 3rd Qu.:77.75
## Colorado : 1 Max. :17.400 Max. :337.0 Max. :91.00
## (Other) :44
## Rape
## Min. : 7.30
## 1st Qu.:15.07
## Median :20.10
## Mean :21.23
## 3rd Qu.:26.18
## Max. :46.00
##
# g_murder <- ggplot(data = arrest_data_sub, aes(y = Murder, x = US_State, fill = US_State));
# g_murder + geom_bar(stat = "identity", width = 0.2) + guides(fill = FALSE) + xlab("US States") + ylab("Murder(per 100K)") + ggtitle("Crime rate: Murder rate versus states");
# g_murder_UrbanPop <- ggplot(data = arrest_data, aes(y = Murder, x = UrbanPop));
# g_murder_UrbanPop + geom_point(shape = 1) + geom_smooth(method = lm);
# g_rape_UrbanPop <- ggplot(data = arrest_data, aes(y = Rape, x = UrbanPop));
# g_rape_UrbanPop + geom_point(shape = 1) + geom_smooth(method = lm);
# g_assault_UrbanPop <- ggplot(data = arrest_data, aes(y = Assault, x = UrbanPop));
# g_assault_UrbanPop + geom_point(shape = 1) + geom_smooth(method = lm);
Sort the dataset by urban population
arrest_data_sub <- arrest_data[order(-arrest_data$UrbanPop),];
## Loading required package: ggplot2
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.800 4.075 7.250 7.788 11.250 17.400
Fact: On an average there was 8 muderer arrest per 100,000 residents in 1973
Extending the above study to top 7 states: 1. California is among one of the top 7 states by rape and assault arrest.
With the available data, it can be concluded that percentage of urban population in cities doesnt have any significant impact on the crime rate. Detailed analysis has to be done on the data of arrests in California to understand if there was an impact of urban poplulatio on the crime rates.