#install.packages("dplyr")
#install.packages("ggplot2")
#install.packages("dslabs")

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(dslabs)

options(scipen = 999)
murders%>%

#summarize(pop=sum(population), tot=sum(total)) %>%
  #mutate(rate = tot/pop*1000000) %>% 

#summarize(meanmurders= mean(total))

#summarize(medianmurders =median(total))



ggplot (aes(x = population/1000000, y = total, color = region))+
geom_point()+

geom_text(aes(label = abb))+



  scale_x_log10() +
  scale_y_log10() +
 ylab("Populations in millions") +
xlab("Total number of murders") +

expand_limits(y =0)+
  ggtitle("US Gun Murders in 2010")
## Warning: Transformation introduced infinite values in continuous y-axis

The data set gives insight into the gun murder rate for the United States in 2010.

The states with the highest gun murders are California followed by Texas and Florida. States with the lowest gun murders were Wyoming, Vermont and Washington DC. The following regions had extreme outliners (the west) with California leading every state with the highest gun murders. This indicated that the data is positively skewed. The Southern region also had an extreme outlier in the case of the state of Washington DC the point was toward the lower end which suggests that the data was negatively skewed.

The average gun murder rate in 2010 was found to be 184.3725 however the mean is sensitive to outliers. Therefore, the median may be better in this case because this dataset as extreme outliners. The median was found to be 97 in this case the median is a better measure of central tendency than the mean because the distribution is skewed. States that fell close to the median include but are not limited to Arizona, Connecticut, and Nevada. The data points on the scatterplot come close to forming a straight line when plotted. Therefore, due to this cluster the correlation between the two variables have a strong relationship. Finally, the gun murder rate overall was found to be 30.3455