We will examine a set of US arrest data available at http://vincentarelbundock.github.io/Rdatasets/

week3 <- read.csv("c:/Users/Nate/Documents/Dataset/USArrests.csv")
View(week3)
attach(week3)

Next will will do some basic data exploration:

summary(week3)
##           X          Murder          Assault         UrbanPop    
##  Alabama   : 1   Min.   : 0.800   Min.   : 45.0   Min.   :32.00  
##  Alaska    : 1   1st Qu.: 4.075   1st Qu.:109.0   1st Qu.:54.50  
##  Arizona   : 1   Median : 7.250   Median :159.0   Median :66.00  
##  Arkansas  : 1   Mean   : 7.788   Mean   :170.8   Mean   :65.54  
##  California: 1   3rd Qu.:11.250   3rd Qu.:249.0   3rd Qu.:77.75  
##  Colorado  : 1   Max.   :17.400   Max.   :337.0   Max.   :91.00  
##  (Other)   :44                                                   
##       Rape      
##  Min.   : 7.30  
##  1st Qu.:15.07  
##  Median :20.10  
##  Mean   :21.23  
##  3rd Qu.:26.18  
##  Max.   :46.00  
## 

We can see the that the murder rates (per 100,000) murder is lowests with mean rate of 7.788 and max of 17.4, and assault the most with a mean of 170.8 and max of 337.0. Note that most of the rates seem symmetrically distrubuted as the medians are close to the mean, but Assault seems to be left skewed wih a median of 159 and mean of 170. Perhaps the min of 45 is an outlier? The Assymetry of the Assault rate is interesting and I wonder if there is a relationship to urban population that affects it’s distrubution?

total_crime <- Assault + Rape + Murder
View(total_crime)
percent_assualt <- Assault/total_crime

We can look at the assualt data:

boxplot(Assault)

hist(percent_assualt)

Here it looks like one state has a very low percent of assault as total crime

hist(Assault)

The assault population looks slightly bimodal with a second peak between 250-300.

plot(UrbanPop,Assault)

plot(UrbanPop, percent_assualt)

There does appear that one state has a high urban population but low pecent of assault.

fit <- lm(Assault~UrbanPop)
summary(fit)
## 
## Call:
## lm(formula = Assault ~ UrbanPop)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -150.78  -61.85  -18.68   58.05  196.85 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  73.0766    53.8508   1.357   0.1811  
## UrbanPop      1.4904     0.8027   1.857   0.0695 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 81.33 on 48 degrees of freedom
## Multiple R-squared:  0.06701,    Adjusted R-squared:  0.04758 
## F-statistic: 3.448 on 1 and 48 DF,  p-value: 0.06948

It does not appear that there is a linear relationship between urban population and assault rates. The assymmetry in the Assault data is caused by one outlier where the percent assault is lower than other locations. Using the View() function in Rstudio this looks to be Hawaii with 83% urban population and only 46 assaults per 100,000. Since mean is sensitive to outliers, the mean is 159 and the median is 170.8. However, there is no statistacally significant relationship between urban population and assault.