#1Data Exploration: This should include summary statistics, means, medians, quartiles, or any other relevant information about the data set. Please include some conclusions in the R Markdown text.
myData <- read.csv("https://raw.githubusercontent.com/arinolan/Nolan_Weel-3-Assignement/main/MurderRates.csv")
summary(myData)
## X rate convictions executions
## Min. : 1.00 Min. : 0.810 Min. :0.1080 Min. :0.00000
## 1st Qu.:11.75 1st Qu.: 1.808 1st Qu.:0.1663 1st Qu.:0.02625
## Median :22.50 Median : 3.625 Median :0.2260 Median :0.04500
## Mean :22.50 Mean : 5.404 Mean :0.2605 Mean :0.06034
## 3rd Qu.:33.25 3rd Qu.: 7.725 3rd Qu.:0.3202 3rd Qu.:0.08225
## Max. :44.00 Max. :19.250 Max. :0.7570 Max. :0.40000
## time income lfp noncauc
## Min. : 34.0 Min. :0.760 Min. :47.00 Min. :0.00300
## 1st Qu.: 94.0 1st Qu.:1.550 1st Qu.:51.50 1st Qu.:0.02175
## Median :124.0 Median :1.830 Median :53.40 Median :0.06450
## Mean :136.5 Mean :1.781 Mean :53.07 Mean :0.10559
## 3rd Qu.:179.0 3rd Qu.:2.070 3rd Qu.:54.52 3rd Qu.:0.14450
## Max. :298.0 Max. :2.390 Max. :58.80 Max. :0.45400
## southern
## Length:44
## Class :character
## Mode :character
##
##
##
summary(myData$rate)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.810 1.808 3.625 5.404 7.725 19.250
summary(myData$convictions)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1080 0.1663 0.2260 0.2605 0.3202 0.7570
#2 Data wrangling: Please perform some basic transformations. They will need to make sense but could include column renaming, creating a subset of the data, replacing values, or creating new columns with derived data (for example – if it makes sense you could sum two columns together)
myData_subset <- myData[,c ("rate", "income", "southern", "convictions")]
myData_subset$southern <- as.factor((myData_subset$southern))
head(myData_subset)
## rate income southern convictions
## 1 19.25 1.10 yes 0.204
## 2 7.53 0.92 yes 0.327
## 3 5.66 1.72 no 0.401
## 4 3.21 2.18 no 0.318
## 5 2.80 1.75 no 0.350
## 6 1.41 2.26 no 0.283
summary(myData_subset$southern)
## no yes
## 29 15
#3 Graphics: Please make sure to display at least one scatter plot, box plot and histogram. Don’t be limited to this. Please explore the many other options in R packages such as ggplot2.
library("ggplot2")
## Warning: package 'ggplot2' was built under R version 4.1.3
boxplot(myData_subset$rate, main = "Rate", ylab = "Percentage")
hist(myData_subset$convictions, main = "Convictions", xlab = "Convictions")
plot(myData$rate, myData$convictions, main="Rate v Convictions",
xlab="Rate ", ylab="Convictions", pch=19)
plot(myData$rate, myData$executions, main="Rate v Executions",
xlab="Rate ", ylab="Executions", pch=19)
#4 Meaningful question for analysis: Please state at the beginning a meaningful question for analysis. Use the first three steps and anything else that would be helpful to answer the question you are posing from the data set you chose. Please write a brief conclusion paragraph in R markdown at the end.
At first, I wanted to see if there was a correlation between Rate and either Convictions or Executions. By creating the scatterplots, we can see that there is a weak correlation for both rate v convictions and rate v exectuions. When comparing these variables together, there isn't a linear correlation but we do see the points on the plot close along the rate.
}
#5 BONUS – place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.
getHubData <- read.csv("https://raw.githubusercontent.com/arinolan/Nolan_Weel-3-Assignement/main/MurderRates.csv")