R Markdown

The "Guns" dataset was created to see if the availability of more guns was related to less crime from 1977 to 1999 in the United States (plus District of Columbia). We know from recent situations that this is not necessarily true since guns being found by the wrong hands is extremely dangerous and a major cause of the many mass shootings seen in recent years. This is due to lack of mental health awareness, violent video games, family problems/negligence, social media, etc. However, I wanted to see if there is a upward trend in violence over the past years to see if the current violence in our nation could have been predicted. My question is, is there an upward trend in violence from 1977 to 1999 and does the violence trend in New York reflect the same as the full country (is there a correlation amongst the two)?

guns_data<-read.csv('https://raw.githubusercontent.com/Sangeetha-007/R-Practice/master/Bridge-Program/GunsProject/Guns.csv')
head(guns_data)
##   X year violent murder robbery prisoners     afam     cauc     male population
## 1 1 1977   414.4   14.2    96.8        83 8.384873 55.12291 18.17441   3.780403
## 2 2 1978   419.1   13.3    99.1        94 8.352101 55.14367 17.99408   3.831838
## 3 3 1979   413.3   13.2   109.5       144 8.329575 55.13586 17.83934   3.866248
## 4 4 1980   448.5   13.2   132.1       141 8.408386 54.91259 17.73420   3.900368
## 5 5 1981   470.5   11.9   126.5       149 8.483435 54.92513 17.67372   3.918531
## 6 6 1982   447.7   10.6   112.0       183 8.514000 54.89621 17.51052   3.925229
##     income   density   state law
## 1 9563.148 0.0745524 Alabama  no
## 2 9932.000 0.0755667 Alabama  no
## 3 9877.028 0.0762453 Alabama  no
## 4 9541.428 0.0768288 Alabama  no
## 5 9548.351 0.0771866 Alabama  no
## 6 9478.919 0.0773185 Alabama  no

I removed the fields of "afam" (percent of state population that is African-American), "cauc" (percent of state population that is Caucasian, ages 10 to 64), "male" (percent of state population that is male, ages 10 to 29.), "income", and "law" because I felt it was irrelevant to my current analysis.

guns_data_cleaned<-guns_data[c(1:6, 10, 12:13)]
head(guns_data_cleaned)
##   X year violent murder robbery prisoners population   density   state
## 1 1 1977   414.4   14.2    96.8        83   3.780403 0.0745524 Alabama
## 2 2 1978   419.1   13.3    99.1        94   3.831838 0.0755667 Alabama
## 3 3 1979   413.3   13.2   109.5       144   3.866248 0.0762453 Alabama
## 4 4 1980   448.5   13.2   132.1       141   3.900368 0.0768288 Alabama
## 5 5 1981   470.5   11.9   126.5       149   3.918531 0.0771866 Alabama
## 6 6 1982   447.7   10.6   112.0       183   3.925229 0.0773185 Alabama
summary(guns_data_cleaned)
##        X             year         violent           murder      
##  Min.   :   1   Min.   :1977   Min.   :  47.0   Min.   : 0.200  
##  1st Qu.: 294   1st Qu.:1982   1st Qu.: 283.1   1st Qu.: 3.700  
##  Median : 587   Median :1988   Median : 443.0   Median : 6.400  
##  Mean   : 587   Mean   :1988   Mean   : 503.1   Mean   : 7.665  
##  3rd Qu.: 880   3rd Qu.:1994   3rd Qu.: 650.9   3rd Qu.: 9.800  
##  Max.   :1173   Max.   :1999   Max.   :2921.8   Max.   :80.600  
##     robbery         prisoners        population         density         
##  Min.   :   6.4   Min.   :  19.0   Min.   : 0.4027   Min.   : 0.000707  
##  1st Qu.:  71.1   1st Qu.: 114.0   1st Qu.: 1.1877   1st Qu.: 0.031911  
##  Median : 124.1   Median : 187.0   Median : 3.2713   Median : 0.081569  
##  Mean   : 161.8   Mean   : 226.6   Mean   : 4.8163   Mean   : 0.352038  
##  3rd Qu.: 192.7   3rd Qu.: 291.0   3rd Qu.: 5.6856   3rd Qu.: 0.177718  
##  Max.   :1635.1   Max.   :1913.0   Max.   :33.1451   Max.   :11.102120  
##     state          
##  Length:1173       
##  Class :character  
##  Mode  :character  
##                    
##                    
## 

The mean value of violence from 1977 to 1999 in the United States:

mean(guns_data_cleaned$violent)
## [1] 503.0747

Creating the subset for New York:

#print(guns_data_cleaned)
ny_data<-guns_data_cleaned%>% filter(state == 'New York')
head(ny_data)
##     X year violent murder robbery prisoners population   density    state
## 1 737 1977   831.8   10.7   472.6        98   17.81261 0.3724071 New York
## 2 738 1978   841.0   10.3   472.1       108   17.68059 0.3696471 New York
## 3 739 1979   917.4   11.9   529.6       114   17.58384 0.3676244 New York
## 4 740 1980  1029.5   12.7   641.3       120   17.56675 0.3707865 New York
## 5 741 1981  1069.6   12.3   684.0       123   17.56773 0.3708071 New York
## 6 742 1982   990.1   11.4   610.7       145   17.58975 0.3712718 New York
summary(ny_data)
##        X              year         violent           murder     
##  Min.   :737.0   Min.   :1977   Min.   : 588.8   Min.   : 5.00  
##  1st Qu.:742.5   1st Qu.:1982   1st Qu.: 841.5   1st Qu.: 9.80  
##  Median :748.0   Median :1988   Median : 965.6   Median :11.10  
##  Mean   :748.0   Mean   :1988   Mean   : 941.3   Mean   :10.67  
##  3rd Qu.:753.5   3rd Qu.:1994   3rd Qu.:1071.5   3rd Qu.:12.50  
##  Max.   :759.0   Max.   :1999   Max.   :1180.9   Max.   :14.50  
##     robbery        prisoners       population       density      
##  Min.   :240.8   Min.   : 98.0   Min.   :17.57   Min.   :0.3676  
##  1st Qu.:472.4   1st Qu.:151.5   1st Qu.:17.72   1st Qu.:0.3729  
##  Median :514.1   Median :229.0   Median :17.94   Median :0.3787  
##  Mean   :501.8   Mean   :244.7   Mean   :17.91   Mean   :0.3780  
##  3rd Qu.:588.1   3rd Qu.:347.0   3rd Qu.:18.14   3rd Qu.:0.3842  
##  Max.   :684.0   Max.   :397.0   Max.   :18.20   Max.   :0.3853  
##     state          
##  Length:23         
##  Class :character  
##  Mode  :character  
##                    
##                    
## 

The mean value of violence from 1977 to 1999 in NY (it's way higher than the national mean at 503.1!)

mean(ny_data$violent)
## [1] 941.3174

A scatter plot of the frequency of violent behavior/incidents throughout the 22 years I examined for NY state:

ny_data_plot <- ggplot (data=ny_data, mapping=aes(x=year, y=violent)) + geom_point() 
print(ny_data_plot)

boxplot(guns_data_cleaned[,3],xlab="Violence",ylab="Frequency",
           main="Boxplot of Violence Frequency in the United States", col=c("red"))

boxplot(ny_data[,3],xlab="Violence",ylab="Frequency",
           main="Boxplot of Violence Frequency in NY", col=c("blue"))

hist(guns_data_cleaned$violent) 

hist(ny_data$violent) 

Side by side scatter plot comparisions of New York compared to the United States as a whole (the United States plot is on the bottom):

states_guns_plot <- ggplot(data = guns_data_cleaned, mapping = aes(x = year, y = violent))+ geom_point() 
print(states_guns_plot)

According to my analysis, the trends in violence of NY state is not fully correlated to the trends of violence as a nation. New York State's violence mean was way higher than the national mean (1.87 times higher). In my last scatter plot (the one right above), you can see there have been drops in NY's frequency when there were increases nationally (and vice versa). Initially, before doing this analysis I believed there would be a constant rise in violence in both New York State and the country as a whole, as population increased over time, but I was wrong. This means I would need outside data, or possibly have to look at trends in densities of populations instead, before I can see a clearer correlation.