#1Data Exploration: This should include summary statistics, means, medians, quartiles, or any other relevant information about the data set. Please include some conclusions in the R Markdown text.

myData <- read.csv("https://raw.githubusercontent.com/arinolan/Nolan_Weel-3-Assignement/main/MurderRates.csv")
summary(myData)
##        X              rate         convictions       executions     
##  Min.   : 1.00   Min.   : 0.810   Min.   :0.1080   Min.   :0.00000  
##  1st Qu.:11.75   1st Qu.: 1.808   1st Qu.:0.1663   1st Qu.:0.02625  
##  Median :22.50   Median : 3.625   Median :0.2260   Median :0.04500  
##  Mean   :22.50   Mean   : 5.404   Mean   :0.2605   Mean   :0.06034  
##  3rd Qu.:33.25   3rd Qu.: 7.725   3rd Qu.:0.3202   3rd Qu.:0.08225  
##  Max.   :44.00   Max.   :19.250   Max.   :0.7570   Max.   :0.40000  
##       time           income           lfp           noncauc       
##  Min.   : 34.0   Min.   :0.760   Min.   :47.00   Min.   :0.00300  
##  1st Qu.: 94.0   1st Qu.:1.550   1st Qu.:51.50   1st Qu.:0.02175  
##  Median :124.0   Median :1.830   Median :53.40   Median :0.06450  
##  Mean   :136.5   Mean   :1.781   Mean   :53.07   Mean   :0.10559  
##  3rd Qu.:179.0   3rd Qu.:2.070   3rd Qu.:54.52   3rd Qu.:0.14450  
##  Max.   :298.0   Max.   :2.390   Max.   :58.80   Max.   :0.45400  
##    southern        
##  Length:44         
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
summary(myData$rate)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.810   1.808   3.625   5.404   7.725  19.250
summary(myData$convictions)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1080  0.1663  0.2260  0.2605  0.3202  0.7570

#2 Data wrangling: Please perform some basic transformations. They will need to make sense but could include column renaming, creating a subset of the data, replacing values, or creating new columns with derived data (for example – if it makes sense you could sum two columns together)

myData_subset <- myData[,c ("rate", "income", "southern", "convictions")]
myData_subset$southern <- as.factor((myData_subset$southern))
head(myData_subset)
##    rate income southern convictions
## 1 19.25   1.10      yes       0.204
## 2  7.53   0.92      yes       0.327
## 3  5.66   1.72       no       0.401
## 4  3.21   2.18       no       0.318
## 5  2.80   1.75       no       0.350
## 6  1.41   2.26       no       0.283
summary(myData_subset$southern)
##  no yes 
##  29  15

#3 Graphics: Please make sure to display at least one scatter plot, box plot and histogram. Don’t be limited to this. Please explore the many other options in R packages such as ggplot2.

library("ggplot2")
## Warning: package 'ggplot2' was built under R version 4.1.3
boxplot(myData_subset$rate, main = "Rate", ylab = "Percentage")

hist(myData_subset$convictions, main = "Convictions", xlab = "Convictions")

plot(myData$rate, myData$convictions, main="Rate v Convictions",
   xlab="Rate ", ylab="Convictions", pch=19)

plot(myData$rate, myData$executions, main="Rate v Executions",
   xlab="Rate ", ylab="Executions", pch=19)

#4 Meaningful question for analysis: Please state at the beginning a meaningful question for analysis. Use the first three steps and anything else that would be helpful to answer the question you are posing from the data set you chose. Please write a brief conclusion paragraph in R markdown at the end.

At first, I wanted to see if there was a correlation between Rate and either Convictions or Executions. By creating the scatterplots, we can see that there is a weak correlation for both rate v convictions and rate v exectuions. When comparing these variables together, there isn't a linear correlation but we do see the points on the plot close along the rate.
}

#5 BONUS – place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.

getHubData <- read.csv("https://raw.githubusercontent.com/arinolan/Nolan_Weel-3-Assignement/main/MurderRates.csv")