Loading Data & Libraries

I first set up the environment and take a look at the data.

#load Libraries and load data

library("ggplot2")
library("bitops")
library("RCurl")
library("cowplot")
## 
## Attaching package: 'cowplot'
## The following object is masked from 'package:ggplot2':
## 
##     ggsave
url = "https://raw.githubusercontent.com/chrisestevez/MSDA-Bridge/master/USArrest1973.csv"

Rdata = getURL(url)

MyData = read.csv(text = Rdata,header = TRUE,sep=",", na.strings = "..")

str(MyData)
## 'data.frame':    50 obs. of  5 variables:
##  $ State   : Factor w/ 50 levels "Alabama","Alaska",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Murder  : num  13.2 10 8.1 8.8 9 7.9 3.3 5.9 15.4 17.4 ...
##  $ Assault : int  236 263 294 190 276 204 110 238 335 211 ...
##  $ UrbanPop: int  58 48 80 50 91 78 77 72 80 60 ...
##  $ Rape    : num  21.2 44.5 31 19.5 40.6 38.7 11.1 15.8 31.9 25.8 ...
summary(MyData)
##         State        Murder          Assault         UrbanPop    
##  Alabama   : 1   Min.   : 0.800   Min.   : 45.0   Min.   :32.00  
##  Alaska    : 1   1st Qu.: 4.075   1st Qu.:109.0   1st Qu.:54.50  
##  Arizona   : 1   Median : 7.250   Median :159.0   Median :66.00  
##  Arkansas  : 1   Mean   : 7.788   Mean   :170.8   Mean   :65.54  
##  California: 1   3rd Qu.:11.250   3rd Qu.:249.0   3rd Qu.:77.75  
##  Colorado  : 1   Max.   :17.400   Max.   :337.0   Max.   :91.00  
##  (Other)   :44                                                   
##       Rape      
##  Min.   : 7.30  
##  1st Qu.:15.07  
##  Median :20.10  
##  Mean   :21.23  
##  3rd Qu.:26.18  
##  Max.   :46.00  
## 

Renaming and subsetting

In the below step I create a data frame from my original source. I also selected the columns of interest to me and named them accordingly. The variables State, Murder, and Rapes per 100k will be analyzed further.

FinalUSAarrests =  data.frame(MyData)

FinalUSAarrests = subset(FinalUSAarrests,select = c(State,Murder,Rape))

colnames( FinalUSAarrests) = c("States","MurderP100k","Rapep100k")

head(FinalUSAarrests, n=10)
##         States MurderP100k Rapep100k
## 1      Alabama        13.2      21.2
## 2       Alaska        10.0      44.5
## 3      Arizona         8.1      31.0
## 4     Arkansas         8.8      19.5
## 5   California         9.0      40.6
## 6     Colorado         7.9      38.7
## 7  Connecticut         3.3      11.1
## 8     Delaware         5.9      15.8
## 9      Florida        15.4      31.9
## 10     Georgia        17.4      25.8

Plot 1 Histogram

h1 =ggplot(FinalUSAarrests, aes(x=MurderP100k)) + geom_histogram(binwidth=3,color = "White",fill=I("blue"))+ ggtitle("USA Arrest 1973 Murders") 

h2=ggplot(FinalUSAarrests, aes(x=Rapep100k)) + geom_histogram(binwidth=3,color = "white",fill=I("red"))+ ggtitle("USA Arrest 1973 Rapes")

HistPlot =plot_grid(h1, h2, align='h')

HistPlot

Plot 2 Box plot

#reduced the number of examples to five.
top = head(FinalUSAarrests, n=5)

BoxplotEx = ggplot(top, aes(y=MurderP100k, x=States)) + geom_boxplot()

BoxplotEx

Plot 3 Scatter

In order to plot a line across the date I used the below guide.

http://r4stats.com/examples/graphics-ggplot2/

scatterP =ggplot(FinalUSAarrests, aes(MurderP100k, Rapep100k))+ geom_point()+ geom_smooth()

scatterP

Conclusion

The data selected was not very ideal to use in the Box plot due to the lack of additional factors. This lead to the usage of various techniques in order to properly plot the data accordingly thought the assignment.