The data set Guns is a balanced panel of data on 50 US states, plus the District of Columbia (for a total of 51 states), by year for 1977–1999.
The Data set contains 13 variables with 1,1173 observations
factor indicating state.
factor indicating year.
violent crime rate (incidents per 100,000 members of the population).
murder rate (incidents per 100,000).
robbery rate (incidents per 100,000).
incarceration rate in the state in the previous year (sentenced prisoners per 100,000 residents; value for the previous year).
percent of state population that is African-American, ages 10 to 64.
percent of state population that is Caucasian, ages 10 to 64.
percent of state population that is male, ages 10 to 29.
state population, in millions of people.
real per capita personal income in the state (US dollars).
population per square mile of land area, divided by 1,000.
factor. Does the state have a shall carry law in effect in that year?
# Libraries
# tidyverse has the data manipulation packages.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(hrbrthemes)
## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
## Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
## if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
library(kableExtra)
##
## Attaching package: 'kableExtra'
##
## The following object is masked from 'package:dplyr':
##
## group_rows
options(knitr.table.format = "html")
gun <- read.table("https://vincentarelbundock.github.io/Rdatasets/csv/AER/Guns.csv",
header = T, stringsAsFactors = F, sep = ',')
gun$X<- NULL
head(gun)
## year violent murder robbery prisoners afam cauc male population
## 1 1977 414.4 14.2 96.8 83 8.384873 55.12291 18.17441 3.780403
## 2 1978 419.1 13.3 99.1 94 8.352101 55.14367 17.99408 3.831838
## 3 1979 413.3 13.2 109.5 144 8.329575 55.13586 17.83934 3.866248
## 4 1980 448.5 13.2 132.1 141 8.408386 54.91259 17.73420 3.900368
## 5 1981 470.5 11.9 126.5 149 8.483435 54.92513 17.67372 3.918531
## 6 1982 447.7 10.6 112.0 183 8.514000 54.89621 17.51052 3.925229
## income density state law
## 1 9563.148 0.0745524 Alabama no
## 2 9932.000 0.0755667 Alabama no
## 3 9877.028 0.0762453 Alabama no
## 4 9541.428 0.0768288 Alabama no
## 5 9548.351 0.0771866 Alabama no
## 6 9478.919 0.0773185 Alabama no
Our exploratory data analysis includes Generating information about our data sets and investigating why so many murders are associated with gun violence. We need to gather insights and make better sense of the data.
## year violent murder robbery
## Min. :1977 Min. : 47.0 Min. : 0.200 Min. : 6.4
## 1st Qu.:1982 1st Qu.: 283.1 1st Qu.: 3.700 1st Qu.: 71.1
## Median :1988 Median : 443.0 Median : 6.400 Median : 124.1
## Mean :1988 Mean : 503.1 Mean : 7.665 Mean : 161.8
## 3rd Qu.:1994 3rd Qu.: 650.9 3rd Qu.: 9.800 3rd Qu.: 192.7
## Max. :1999 Max. :2921.8 Max. :80.600 Max. :1635.1
## prisoners afam cauc male
## Min. : 19.0 Min. : 0.2482 Min. :21.78 Min. :12.21
## 1st Qu.: 114.0 1st Qu.: 2.2022 1st Qu.:59.94 1st Qu.:14.65
## Median : 187.0 Median : 4.0262 Median :65.06 Median :15.90
## Mean : 226.6 Mean : 5.3362 Mean :62.95 Mean :16.08
## 3rd Qu.: 291.0 3rd Qu.: 6.8507 3rd Qu.:69.20 3rd Qu.:17.53
## Max. :1913.0 Max. :26.9796 Max. :76.53 Max. :22.35
## population income density state
## Min. : 0.4027 Min. : 8555 Min. : 0.000707 Length:1173
## 1st Qu.: 1.1877 1st Qu.:11935 1st Qu.: 0.031911 Class :character
## Median : 3.2713 Median :13402 Median : 0.081569 Mode :character
## Mean : 4.8163 Mean :13725 Mean : 0.352038
## 3rd Qu.: 5.6856 3rd Qu.:15271 3rd Qu.: 0.177718
## Max. :33.1451 Max. :23647 Max. :11.102120
## law
## Length:1173
## Class :character
## Mode :character
##
##
##
The mean rate for murder is 7.66 per 100000 members of the population and the median is 6.4. the mean prisoner rate sentenced in the state per 100,000 residents is 226.58 and the median is 187. the mean state population that is African-American, ages 10 to 64 is 5.33 and the median 4.02 the mean for income is 13724.80 and the median is 550000 the mean for violent crime rate is 503.07 incidents per 100,000 members of the population and the median is 443 the mean state population that is Caucasian, ages 10 to 64 is 62.94 and the median is 65.06 Quantile located at the end the result.
# getting insight about the mean murders caused by weapons and prisoners
# Understanding the variables
# murder ----- murder rate (incidents per 100,000).
# prisoners ---- incarceration rate in the state in the previous year (sentenced prisoners per 100,000 residents; # value for the previous year).
apply(gun[c("murder","prisoners","afam","income","violent","cauc")],MARGIN=2, FUN = mean)
## murder prisoners afam income violent cauc
## 7.665132 226.579710 5.336217 13724.796066 503.074680 62.945432
# afma --- percent of state population that is African-American, ages 10 to 64
# income -- real per capita personal income in the state (US dollars).
# violent ---- violent crime rate (incidents per 100,000 members of the population).
# cauc ----- percent of state population that is Caucasian, ages 10 to 64.
apply(gun[c("murder","afam","income","violent","cauc","prisoners")],MARGIN=2, FUN = median)
## murder afam income violent cauc prisoners
## 6.400000 4.026213 13401.550000 443.000000 65.061280 187.000000
apply(gun[c("murder","prisoners","afam","cauc","income","violent")],MARGIN=2, FUN = quantile)
## murder prisoners afam cauc income violent
## 0% 0.2 19 0.2482066 21.78043 8554.884 47.0
## 25% 3.7 114 2.2021960 59.93970 11934.760 283.1
## 50% 6.4 187 4.0262130 65.06128 13401.550 443.0
## 75% 9.8 291 6.8506730 69.20010 15271.010 650.9
## 100% 80.6 1913 26.9795700 76.52575 23646.710 2921.8
Our purpose is discover which state has the most violent crimes in America and how murder is correlated with American Population. We create a subset of the 20 States with the highest crime violence.
crimesdata <- gun %>%
filter(!is.na(violent)) %>%
arrange(violent) %>%
group_by(state) %>%
summarise(violent = sum(violent)) %>%
arrange(violent) %>%
tail(20) %>%
select(state,violent)
# Look at the state that have highest record of gun violence
tail(arrange(crimesdata))
## # A tibble: 6 × 2
## state violent
## <chr> <dbl>
## 1 Illinois 19048.
## 2 Maryland 19634.
## 3 California 20182.
## 4 New York 21650.
## 5 Florida 22982.
## 6 District of Columbia 47126.
# create a new variable to distinguish the States with the highest crime rate
StateCrime <- crimesdata %>% mutate(CrimeStatus = ifelse(violent < 19200, 'Moderate Crime Volume', 'High Crime Volume'))
StateCrime
## # A tibble: 20 × 3
## state violent CrimeStatus
## <chr> <dbl> <chr>
## 1 Alabama 12838 Moderate Crime Volume
## 2 Delaware 12980. Moderate Crime Volume
## 3 Tennessee 13356. Moderate Crime Volume
## 4 Missouri 13401 Moderate Crime Volume
## 5 Georgia 13698. Moderate Crime Volume
## 6 Alaska 13726. Moderate Crime Volume
## 7 Arizona 13986. Moderate Crime Volume
## 8 Texas 14091. Moderate Crime Volume
## 9 Massachusetts 14184. Moderate Crime Volume
## 10 Michigan 15990. Moderate Crime Volume
## 11 New Mexico 17109 Moderate Crime Volume
## 12 Nevada 17366. Moderate Crime Volume
## 13 Louisiana 17904. Moderate Crime Volume
## 14 South Carolina 18406. Moderate Crime Volume
## 15 Illinois 19048. Moderate Crime Volume
## 16 Maryland 19634. High Crime Volume
## 17 California 20182. High Crime Volume
## 18 New York 21650. High Crime Volume
## 19 Florida 22982. High Crime Volume
## 20 District of Columbia 47126. High Crime Volume
# bar Plot
crimesdata %>% mutate(state=factor(state, state)) %>%
ggplot( aes(x=state, y=violent) ) +
geom_bar(stat="identity", fill="#69b3a2") +
coord_flip() +
theme_ipsum() +
theme(
panel.grid.minor.y = element_blank(),
panel.grid.major.y = element_blank(),
legend.position="none"
) +
xlab("Top 20 States with Most gun Violences") + ggtitle("Highest Rated States with Gun Violences")+
ylab("violent crime rate (incidents per 100,000 members of the population)")
This graphical representation displays the amount of Americans who suffered from gun-related injuries from 1977 to 1996. in the 1900s, District of Columbia was greatly affected by large increase in crime. According to Wikipedia, The number of homicides in Washington peaked in 1991 at 482, a rate of 80.6 homicides per 100,000 residents, and the city eventually became known as the “murder capital” of the United States.
We are going wrangling the data to see which state has the highest rate of murder due to gun violence. Some States require a license to carry a firearm, but others have more flexibility for people to own and carry a gun.
References : https://en.wikipedia.org/wiki/Crime_in_Washington,_D.C.
data <- gun %>% select(year,violent, murder,prisoners,afam,cauc,income, state,law) %>% arrange(violent,murder,prisoners)
murder_with_no_license <- data %>% select(state,law,murder) %>% filter(law == 'no') %>%
summarise( aggregate(list(murder = murder), # list function
list(state = state),
sum)) %>% arrange(murder)
murder_with_license <- data %>% select(state,law,murder) %>% filter(law == 'yes') %>%
summarise( aggregate(list(murder = murder), # list function
list(state = state),
sum)) %>% arrange(murder)
murder_with_no_license
## state murder
## 1 North Dakota 12.1
## 2 Maine 13.9
## 3 South Dakota 16.8
## 4 Utah 35.0
## 5 Iowa 46.3
## 6 Idaho 50.7
## 7 Montana 59.6
## 8 Minnesota 61.4
## 9 Oregon 68.2
## 10 West Virginia 73.1
## 11 Pennsylvania 73.3
## 12 Nebraska 74.8
## 13 Massachusetts 78.3
## 14 Wisconsin 81.4
## 15 Rhode Island 83.2
## 16 Wyoming 88.7
## 17 Hawaii 106.5
## 18 Connecticut 109.9
## 19 Delaware 111.6
## 20 New Jersey 121.7
## 21 Kansas 128.5
## 22 Colorado 132.5
## 23 Florida 133.6
## 24 Ohio 137.5
## 25 Arizona 152.2
## 26 Kentucky 153.0
## 27 Virginia 154.5
## 28 Oklahoma 158.4
## 29 Georgia 162.4
## 30 Mississippi 164.0
## 31 Arkansas 174.1
## 32 Tennessee 174.4
## 33 North Carolina 183.2
## 34 Alaska 184.3
## 35 South Carolina 202.8
## 36 Missouri 209.4
## 37 Louisiana 218.0
## 38 Michigan 222.4
## 39 Illinois 223.6
## 40 Maryland 231.8
## 41 New Mexico 234.1
## 42 Nevada 238.7
## 43 New York 245.3
## 44 Alabama 249.0
## 45 California 252.8
## 46 Texas 265.0
## 47 District of Columbia 1133.3
murder_with_license
## state murder
## 1 Kentucky 15.8
## 2 Wyoming 16.0
## 3 North Dakota 17.3
## 4 Texas 19.7
## 5 South Carolina 23.0
## 6 Virginia 26.6
## 7 Oklahoma 26.7
## 8 Idaho 27.5
## 9 South Dakota 27.6
## 10 Montana 28.6
## 11 North Carolina 32.1
## 12 Arkansas 32.4
## 13 Oregon 36.3
## 14 Utah 38.3
## 15 Maine 38.5
## 16 Alaska 40.7
## 17 Arizona 43.2
## 18 Nevada 43.7
## 19 Tennessee 45.2
## 20 New Hampshire 48.9
## 21 West Virginia 52.0
## 22 Vermont 54.6
## 23 Pennsylvania 60.0
## 24 Georgia 98.2
## 25 Florida 102.7
## 26 Washington 109.8
## 27 Mississippi 110.0
## 28 Louisiana 131.2
## 29 Indiana 159.3
Murders that were committed by guns with no license
hist(murder_with_no_license$murder, xlab= "Murder commited By Gun with no license",
col = 'yellow', border = 'blue', main = "Distribution of Murder by Gun")
Murders that were committed by guns with license
hist(murder_with_license$murder, xlab= "Murder commited By Gun with license",
col = 'yellow', border = 'blue', main = "Distribution of Murder by Gun")
More murders are committed with guns with no licences than gun with
license. During that period of time, African-American committed less
number of murders than white Caucasian people.
# Black African american who commited murders
ggplot(data, aes(x=murder, y=afam, alpha=state)) +
geom_point(size=6, color="#69b3a2") +
theme_ipsum()
# white Caucasian american who commited murders
ggplot(data, aes(x=murder, y=cauc, alpha=state)) +
geom_point(size=6, color="#69b3a2") +
theme_ipsum()
# change Year data format
data %>% mutate(new_year = mdy(year)) %>% head(10)
## year violent murder prisoners afam cauc income state law
## 1 1985 47.0 1.0 54 1.636063 67.49587 11862.990 North Dakota no
## 2 1986 51.3 1.0 55 1.683373 67.07564 11931.530 North Dakota yes
## 3 1984 53.6 1.2 51 1.588102 67.86971 11802.410 North Dakota no
## 4 1983 53.7 2.1 47 1.536609 68.22214 11386.000 North Dakota no
## 5 1980 54.0 1.2 19 1.433876 69.05927 9786.855 North Dakota no
## 6 1987 56.8 1.5 53 1.741012 66.72974 11896.500 North Dakota yes
## 7 1988 59.1 1.8 57 1.817105 66.64963 10731.290 North Dakota yes
## 8 1979 61.3 1.5 21 1.401761 69.25468 11522.060 North Dakota no
## 9 1982 61.8 0.7 33 1.502760 68.53995 11341.680 North Dakota no
## 10 1989 63.2 0.6 62 1.885228 66.47451 11528.670 North Dakota yes
## new_year
## 1 <NA>
## 2 <NA>
## 3 <NA>
## 4 <NA>
## 5 <NA>
## 6 <NA>
## 7 <NA>
## 8 <NA>
## 9 <NA>
## 10 <NA>
# Evolution of the Gun Violence in America
data %>%
tail(10) %>%
ggplot( aes(x=year, y=murder)) +
geom_line( color="grey") +
geom_point(shape=21, color="black", fill="#69b3a2", size=6) +
theme_ipsum() +
ggtitle("Evolution of Gun Violence in America")
We create a correlation matrix to identify all possible strong relationships in the data set.
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
corPlot(data[,2:7],main = "Correlation of Gun violence features")
Correlation matrix shows relationship between all its numerical variables. As we notice, violence with gun is correlated with murder. In addition, their specific features have strong links with prisoners and African Americans. Data shown us that from 1977 to 1996 the amount of murders by gun has been trending. Caucasian population do not show strong relations with Violence , murders and prisoner which is not surprising based on relevant facts in history. Our question was which state has the most violent crimes in America and how murder is correlated with American Population. District of Columbia famously called “Murder Capital” in that time periods as the data showed in the bar plot. African Americans are most likely a victims of murder or violence by gun and go to jail according to the correlation matrix. Murder are most like to be committed by gun Violence.
My GitHub account is joewarner89.
gun_data <- read.csv("https://raw.githubusercontent.com/joewarner89/Gun-/main/Guns.csv",
header = T, stringsAsFactors = F, sep = ",")
gun_data$X <- NULL
head(gun_data)
## year violent murder robbery prisoners afam cauc male population
## 1 1977 414.4 14.2 96.8 83 8.384873 55.12291 18.17441 3.780403
## 2 1978 419.1 13.3 99.1 94 8.352101 55.14367 17.99408 3.831838
## 3 1979 413.3 13.2 109.5 144 8.329575 55.13586 17.83934 3.866248
## 4 1980 448.5 13.2 132.1 141 8.408386 54.91259 17.73420 3.900368
## 5 1981 470.5 11.9 126.5 149 8.483435 54.92513 17.67372 3.918531
## 6 1982 447.7 10.6 112.0 183 8.514000 54.89621 17.51052 3.925229
## income density state law
## 1 9563.148 0.0745524 Alabama no
## 2 9932.000 0.0755667 Alabama no
## 3 9877.028 0.0762453 Alabama no
## 4 9541.428 0.0768288 Alabama no
## 5 9548.351 0.0771866 Alabama no
## 6 9478.919 0.0773185 Alabama no