Conclusion: Average murder rate is 5.4 per 100k residents. Average
time served (in months) of convicted murderers is 136.5.
2. Data wrangling: Please perform some basic transformations. They
will need to make sense but could include column renaming, creating a
subset of the data, replacing values, or creating new columns with
derived data (for example – if it makes sense you could sum two columns
together)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
new_MR <- select(MR, "rate", "time", "income", "southern")
colnames(new_MR) <- c("MurderRate","Time_Served", "Family_income", "Southern_Region")
S_new_MR <- subset(new_MR, Southern_Region == "yes")
N_new_MR <- subset(new_MR, Southern_Region == "no")
summary(S_new_MR)
## MurderRate Time_Served Family_income Southern_Region
## Min. : 2.830 Min. : 34.00 Min. :0.760 Length:15
## 1st Qu.: 7.525 1st Qu.: 74.50 1st Qu.:1.195 Class :character
## Median :10.440 Median : 95.00 Median :1.350 Mode :character
## Mean :10.107 Mean : 95.93 Mean :1.401
## 3rd Qu.:12.065 3rd Qu.:124.00 3rd Qu.:1.570
## Max. :19.250 Max. :161.00 Max. :2.070
summary(N_new_MR)
## MurderRate Time_Served Family_income Southern_Region
## Min. :0.810 Min. : 56.0 Min. :1.550 Length:29
## 1st Qu.:1.410 1st Qu.:101.0 1st Qu.:1.810 Class :character
## Median :2.800 Median :148.0 Median :1.970 Mode :character
## Mean :2.971 Mean :157.5 Mean :1.978
## 3rd Qu.:3.710 3rd Qu.:199.0 3rd Qu.:2.120
## Max. :8.310 Max. :298.0 Max. :2.390
head(new_MR)
## MurderRate Time_Served Family_income Southern_Region
## 1 19.25 47 1.10 yes
## 2 7.53 58 0.92 yes
## 3 5.66 82 1.72 no
## 4 3.21 100 2.18 no
## 5 2.80 222 1.75 no
## 6 1.41 164 2.26 no
head(S_new_MR)
## MurderRate Time_Served Family_income Southern_Region
## 1 19.25 47 1.10 yes
## 2 7.53 58 0.92 yes
## 7 6.18 161 2.07 yes
## 8 12.15 70 1.43 yes
## 14 10.44 104 1.35 yes
## 15 9.58 126 1.26 yes
head(N_new_MR)
## MurderRate Time_Served Family_income Southern_Region
## 3 5.66 82 1.72 no
## 4 3.21 100 2.18 no
## 5 2.80 222 1.75 no
## 6 1.41 164 2.26 no
## 9 1.34 219 1.92 no
## 10 3.71 81 1.82 no
3. Graphics: Please make sure to display at least one scatter plot,
box plot and histogram. Don’t be limited to this. Please explore the
many other options in R packages such as ggplot2.
library(ggplot2)
#Historgram group by Southern_Region
ggplot(data=new_MR) + geom_histogram(aes(x=MurderRate)) + labs(title = "No. Murder Rate by Region", x= "MurderRate", y = "Count") + facet_grid(~Southern_Region)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#boxplot group by Southern_Region
ggplot(new_MR,aes(y=MurderRate, x=Southern_Region)) + geom_boxplot() + labs(title = "Murder Rate by Region", x= "Sourthern", y = "Murder Rate")

#Scatterplot showing relationship between Murder Rate and income, group by Southern Region
g <-ggplot(new_MR,aes(x=MurderRate, y=Family_income))
g + geom_point() + labs(title = "Murder Rate vs Family Income by Region", x= "MurderRate", y = "Family Income") + facet_grid(~Southern_Region)

#Scatterplot showing relationship between Murder Rate and Time Served, group by Southern Region
g2 <-ggplot(new_MR,aes(x=MurderRate, y=Time_Served))
g2 + geom_point() + labs(title = "Murder Rate vs Time Served by Region", x= "MurderRate", y = "Time Served") + facet_grid(~Southern_Region)

4. 4. Meaningful question for analysis: Please state at the
beginning a meaningful question for analysis. Use the first three steps
and anything else that would be helpful to answer the question you are
posing from the data set you chose. Please write a brief conclusion
paragraph in R markdown at the end.
Question: How is Sourthern Region different from Non Sourthern
Region in terms of Murder Rate, Time Served, Family income and Non
Caucasian?
Answer:
In terms of Murder Rate: Sourthern Region has a higer Average Murder
Rate. The Median of Sourthern Region Murder Rate is 10.44 per 100k
residents while the Median Non Sourthern Region Murder Rate is only 2.80
per 100k residents. This trend can be visualized in Boxplot graph.The
Box of Sourthern Region is location at higher position in the
graph.
In terms of Time Served: The scatterplot shows despite sourthern
region has a relatively higher Murder Rate, the average Time Served is
lower than Non Sourthern Region. Comparing the median of both regions,
we can see Non Sourthern Region has 53 months more than Sourthern
Region
In terms of Family income: In sourthern region, lower family income
seems to have a higher Murder Rate.We can see most of the observation
are clustered on the lower right area. In Non Sourthern Region, the
Murder rate are below 10. However, we can see more oberservation are
clustered on the top left area. It indicates Family with higer income
may have a higher chance of Murder.
5. BONUS – place the original .csv in a github file and have R read
from the link. This will be a veryuseful skill as you progress in your
data science education and career.
library (readr)
MR2 <- read.csv(url("https://raw.githubusercontent.com/tonyCUNY/test/main/MurderRates.csv"))
summary(MR2)
## X rate convictions executions
## Min. : 1.00 Min. : 0.810 Min. :0.1080 Min. :0.00000
## 1st Qu.:11.75 1st Qu.: 1.808 1st Qu.:0.1663 1st Qu.:0.02625
## Median :22.50 Median : 3.625 Median :0.2260 Median :0.04500
## Mean :22.50 Mean : 5.404 Mean :0.2605 Mean :0.06034
## 3rd Qu.:33.25 3rd Qu.: 7.725 3rd Qu.:0.3202 3rd Qu.:0.08225
## Max. :44.00 Max. :19.250 Max. :0.7570 Max. :0.40000
## time income lfp noncauc
## Min. : 34.0 Min. :0.760 Min. :47.00 Min. :0.00300
## 1st Qu.: 94.0 1st Qu.:1.550 1st Qu.:51.50 1st Qu.:0.02175
## Median :124.0 Median :1.830 Median :53.40 Median :0.06450
## Mean :136.5 Mean :1.781 Mean :53.07 Mean :0.10559
## 3rd Qu.:179.0 3rd Qu.:2.070 3rd Qu.:54.52 3rd Qu.:0.14450
## Max. :298.0 Max. :2.390 Max. :58.80 Max. :0.45400
## southern
## Length:44
## Class :character
## Mode :character
##
##
##