1. Data Exploration: This should include summary statistics, means, medians, quartiles, or any other relevant information about the data set. Please include some conclusions in the R Markdown text.

The data set Guns is a balanced panel of data on 50 US states, plus the District of Columbia (for a total of 51 states), by year for 1977–1999.

The Data set contains 13 variables with 1,1173 observations

state

factor indicating state.

year

factor indicating year.

violent

violent crime rate (incidents per 100,000 members of the population).

murder

murder rate (incidents per 100,000).

robbery

robbery rate (incidents per 100,000).

prisoners

incarceration rate in the state in the previous year (sentenced prisoners per 100,000 residents; value for the previous year).

afam

percent of state population that is African-American, ages 10 to 64.

cauc

percent of state population that is Caucasian, ages 10 to 64.

male

percent of state population that is male, ages 10 to 29.

population

state population, in millions of people.

income

real per capita personal income in the state (US dollars).

density

population per square mile of land area, divided by 1,000.

law

factor. Does the state have a shall carry law in effect in that year?

# Libraries
# tidyverse has the data manipulation packages. 

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(hrbrthemes)
## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
##       Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
##       if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
library(kableExtra)
## 
## Attaching package: 'kableExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows
options(knitr.table.format = "html")
gun <- read.table("https://vincentarelbundock.github.io/Rdatasets/csv/AER/Guns.csv",
                   
                   header = T, stringsAsFactors = F, sep = ',')
gun$X<- NULL
head(gun)
##   year violent murder robbery prisoners     afam     cauc     male population
## 1 1977   414.4   14.2    96.8        83 8.384873 55.12291 18.17441   3.780403
## 2 1978   419.1   13.3    99.1        94 8.352101 55.14367 17.99408   3.831838
## 3 1979   413.3   13.2   109.5       144 8.329575 55.13586 17.83934   3.866248
## 4 1980   448.5   13.2   132.1       141 8.408386 54.91259 17.73420   3.900368
## 5 1981   470.5   11.9   126.5       149 8.483435 54.92513 17.67372   3.918531
## 6 1982   447.7   10.6   112.0       183 8.514000 54.89621 17.51052   3.925229
##     income   density   state law
## 1 9563.148 0.0745524 Alabama  no
## 2 9932.000 0.0755667 Alabama  no
## 3 9877.028 0.0762453 Alabama  no
## 4 9541.428 0.0768288 Alabama  no
## 5 9548.351 0.0771866 Alabama  no
## 6 9478.919 0.0773185 Alabama  no

Explortary Data analysis

Our exploratory data analysis includes Generating information about our data sets and investigating why so many murders are associated with gun violence. We need to gather insights and make better sense of the data.

##       year         violent           murder          robbery      
##  Min.   :1977   Min.   :  47.0   Min.   : 0.200   Min.   :   6.4  
##  1st Qu.:1982   1st Qu.: 283.1   1st Qu.: 3.700   1st Qu.:  71.1  
##  Median :1988   Median : 443.0   Median : 6.400   Median : 124.1  
##  Mean   :1988   Mean   : 503.1   Mean   : 7.665   Mean   : 161.8  
##  3rd Qu.:1994   3rd Qu.: 650.9   3rd Qu.: 9.800   3rd Qu.: 192.7  
##  Max.   :1999   Max.   :2921.8   Max.   :80.600   Max.   :1635.1  
##    prisoners           afam              cauc            male      
##  Min.   :  19.0   Min.   : 0.2482   Min.   :21.78   Min.   :12.21  
##  1st Qu.: 114.0   1st Qu.: 2.2022   1st Qu.:59.94   1st Qu.:14.65  
##  Median : 187.0   Median : 4.0262   Median :65.06   Median :15.90  
##  Mean   : 226.6   Mean   : 5.3362   Mean   :62.95   Mean   :16.08  
##  3rd Qu.: 291.0   3rd Qu.: 6.8507   3rd Qu.:69.20   3rd Qu.:17.53  
##  Max.   :1913.0   Max.   :26.9796   Max.   :76.53   Max.   :22.35  
##    population          income         density             state          
##  Min.   : 0.4027   Min.   : 8555   Min.   : 0.000707   Length:1173       
##  1st Qu.: 1.1877   1st Qu.:11935   1st Qu.: 0.031911   Class :character  
##  Median : 3.2713   Median :13402   Median : 0.081569   Mode  :character  
##  Mean   : 4.8163   Mean   :13725   Mean   : 0.352038                     
##  3rd Qu.: 5.6856   3rd Qu.:15271   3rd Qu.: 0.177718                     
##  Max.   :33.1451   Max.   :23647   Max.   :11.102120                     
##      law           
##  Length:1173       
##  Class :character  
##  Mode  :character  
##                    
##                    
## 

The mean rate for murder is 7.66 per 100000 members of the population and the median is 6.4. the mean prisoner rate sentenced in the state per 100,000 residents is 226.58 and the median is 187. the mean state population that is African-American, ages 10 to 64 is 5.33 and the median 4.02 the mean for income is 13724.80 and the median is 550000 the mean for violent crime rate is 503.07 incidents per 100,000 members of the population and the median is 443 the mean state population that is Caucasian, ages 10 to 64 is 62.94 and the median is 65.06 Quantile located at the end the result.

# getting insight about the mean murders caused by weapons and prisoners
# Understanding the variables
# murder ----- murder rate (incidents per 100,000).
# prisoners ---- incarceration rate in the state in the previous year (sentenced prisoners per 100,000 residents; # value for the previous year).
apply(gun[c("murder","prisoners","afam","income","violent","cauc")],MARGIN=2, FUN = mean)
##       murder    prisoners         afam       income      violent         cauc 
##     7.665132   226.579710     5.336217 13724.796066   503.074680    62.945432
# afma --- percent of state population that is African-American, ages 10 to 64
# income -- real per capita personal income in the state (US dollars).
# violent ---- violent crime rate (incidents per 100,000 members of the population).
# cauc ----- percent of state population that is Caucasian, ages 10 to 64.
apply(gun[c("murder","afam","income","violent","cauc","prisoners")],MARGIN=2, FUN = median)
##       murder         afam       income      violent         cauc    prisoners 
##     6.400000     4.026213 13401.550000   443.000000    65.061280   187.000000
apply(gun[c("murder","prisoners","afam","cauc","income","violent")],MARGIN=2, FUN = quantile)
##      murder prisoners       afam     cauc    income violent
## 0%      0.2        19  0.2482066 21.78043  8554.884    47.0
## 25%     3.7       114  2.2021960 59.93970 11934.760   283.1
## 50%     6.4       187  4.0262130 65.06128 13401.550   443.0
## 75%     9.8       291  6.8506730 69.20010 15271.010   650.9
## 100%   80.6      1913 26.9795700 76.52575 23646.710  2921.8

2. Data wrangling: Please perform some basic transformations. They will need to make sense but could include column renaming, creating a subset of the data, replacing values, or creating new columns with derived data (for example – if it makes sense you could sum two columns together)

Our purpose is discover which state has the most violent crimes in America and how murder is correlated with American Population. We create a subset of the 20 States with the highest crime violence.

crimesdata <- gun %>%
  filter(!is.na(violent)) %>%
  arrange(violent) %>%
  group_by(state) %>%
  summarise(violent = sum(violent)) %>%
  arrange(violent) %>%
  tail(20) %>%
  select(state,violent) 

# Look at the state that have highest record of gun violence
tail(arrange(crimesdata))
## # A tibble: 6 × 2
##   state                violent
##   <chr>                  <dbl>
## 1 Illinois              19048.
## 2 Maryland              19634.
## 3 California            20182.
## 4 New York              21650.
## 5 Florida               22982.
## 6 District of Columbia  47126.
# create a new variable to distinguish the States with the highest crime rate
StateCrime <- crimesdata %>% mutate(CrimeStatus = ifelse(violent < 19200, 'Moderate Crime Volume', 'High Crime Volume')) 
StateCrime
## # A tibble: 20 × 3
##    state                violent CrimeStatus          
##    <chr>                  <dbl> <chr>                
##  1 Alabama               12838  Moderate Crime Volume
##  2 Delaware              12980. Moderate Crime Volume
##  3 Tennessee             13356. Moderate Crime Volume
##  4 Missouri              13401  Moderate Crime Volume
##  5 Georgia               13698. Moderate Crime Volume
##  6 Alaska                13726. Moderate Crime Volume
##  7 Arizona               13986. Moderate Crime Volume
##  8 Texas                 14091. Moderate Crime Volume
##  9 Massachusetts         14184. Moderate Crime Volume
## 10 Michigan              15990. Moderate Crime Volume
## 11 New Mexico            17109  Moderate Crime Volume
## 12 Nevada                17366. Moderate Crime Volume
## 13 Louisiana             17904. Moderate Crime Volume
## 14 South Carolina        18406. Moderate Crime Volume
## 15 Illinois              19048. Moderate Crime Volume
## 16 Maryland              19634. High Crime Volume    
## 17 California            20182. High Crime Volume    
## 18 New York              21650. High Crime Volume    
## 19 Florida               22982. High Crime Volume    
## 20 District of Columbia  47126. High Crime Volume

3. Graphics: Please make sure to display at least one scatter plot, box plot and histogram. Don’t be limited to this. Please explore the many other options in R packages such as ggplot2.

# bar Plot 
 crimesdata %>% mutate(state=factor(state, state)) %>%
  ggplot( aes(x=state, y=violent) ) +
  geom_bar(stat="identity", fill="#69b3a2") +
  coord_flip() +
  theme_ipsum() +
  theme(
    panel.grid.minor.y = element_blank(),
    panel.grid.major.y = element_blank(),
    legend.position="none"
  ) + 
  xlab("Top 20 States with Most gun Violences") +   ggtitle("Highest Rated States with Gun Violences")+
  ylab("violent crime rate (incidents per 100,000 members of the population)") 

This graphical representation displays the amount of Americans who suffered from gun-related injuries from 1977 to 1996. in the 1900s, District of Columbia was greatly affected by large increase in crime. According to Wikipedia, The number of homicides in Washington peaked in 1991 at 482, a rate of 80.6 homicides per 100,000 residents, and the city eventually became known as the “murder capital” of the United States.

We are going wrangling the data to see which state has the highest rate of murder due to gun violence. Some States require a license to carry a firearm, but others have more flexibility for people to own and carry a gun.

References : https://en.wikipedia.org/wiki/Crime_in_Washington,_D.C.

data <- gun %>% select(year,violent, murder,prisoners,afam,cauc,income, state,law) %>% arrange(violent,murder,prisoners)
murder_with_no_license <- data %>% select(state,law,murder) %>% filter(law == 'no') %>%
 summarise( aggregate(list(murder = murder),               # list function
            list(state = state),
            
            sum)) %>% arrange(murder) 

murder_with_license <- data %>% select(state,law,murder) %>% filter(law == 'yes') %>%
  summarise( aggregate(list(murder = murder),               # list function
                       list(state = state),
                       
                       sum)) %>% arrange(murder) 

murder_with_no_license
##                   state murder
## 1          North Dakota   12.1
## 2                 Maine   13.9
## 3          South Dakota   16.8
## 4                  Utah   35.0
## 5                  Iowa   46.3
## 6                 Idaho   50.7
## 7               Montana   59.6
## 8             Minnesota   61.4
## 9                Oregon   68.2
## 10        West Virginia   73.1
## 11         Pennsylvania   73.3
## 12             Nebraska   74.8
## 13        Massachusetts   78.3
## 14            Wisconsin   81.4
## 15         Rhode Island   83.2
## 16              Wyoming   88.7
## 17               Hawaii  106.5
## 18          Connecticut  109.9
## 19             Delaware  111.6
## 20           New Jersey  121.7
## 21               Kansas  128.5
## 22             Colorado  132.5
## 23              Florida  133.6
## 24                 Ohio  137.5
## 25              Arizona  152.2
## 26             Kentucky  153.0
## 27             Virginia  154.5
## 28             Oklahoma  158.4
## 29              Georgia  162.4
## 30          Mississippi  164.0
## 31             Arkansas  174.1
## 32            Tennessee  174.4
## 33       North Carolina  183.2
## 34               Alaska  184.3
## 35       South Carolina  202.8
## 36             Missouri  209.4
## 37            Louisiana  218.0
## 38             Michigan  222.4
## 39             Illinois  223.6
## 40             Maryland  231.8
## 41           New Mexico  234.1
## 42               Nevada  238.7
## 43             New York  245.3
## 44              Alabama  249.0
## 45           California  252.8
## 46                Texas  265.0
## 47 District of Columbia 1133.3
murder_with_license
##             state murder
## 1        Kentucky   15.8
## 2         Wyoming   16.0
## 3    North Dakota   17.3
## 4           Texas   19.7
## 5  South Carolina   23.0
## 6        Virginia   26.6
## 7        Oklahoma   26.7
## 8           Idaho   27.5
## 9    South Dakota   27.6
## 10        Montana   28.6
## 11 North Carolina   32.1
## 12       Arkansas   32.4
## 13         Oregon   36.3
## 14           Utah   38.3
## 15          Maine   38.5
## 16         Alaska   40.7
## 17        Arizona   43.2
## 18         Nevada   43.7
## 19      Tennessee   45.2
## 20  New Hampshire   48.9
## 21  West Virginia   52.0
## 22        Vermont   54.6
## 23   Pennsylvania   60.0
## 24        Georgia   98.2
## 25        Florida  102.7
## 26     Washington  109.8
## 27    Mississippi  110.0
## 28      Louisiana  131.2
## 29        Indiana  159.3

Murders that were committed by guns with no license

hist(murder_with_no_license$murder, xlab= "Murder commited By Gun with no license", 
     col = 'yellow', border = 'blue', main = "Distribution of Murder by Gun")

Murders that were committed by guns with license

hist(murder_with_license$murder, xlab= "Murder commited By Gun with license", 
     col = 'yellow', border = 'blue', main = "Distribution of Murder by Gun")

More murders are committed with guns with no licences than gun with license. During that period of time, African-American committed less number of murders than white Caucasian people.

# Black African american who commited murders
ggplot(data, aes(x=murder, y=afam, alpha=state)) + 
  geom_point(size=6, color="#69b3a2") +
  theme_ipsum()

# white Caucasian american who commited murders
ggplot(data, aes(x=murder, y=cauc, alpha=state)) + 
  geom_point(size=6, color="#69b3a2") +
  theme_ipsum()

4. Meaningful question for analysis: Please state at the beginning a meaningful question for analysis. Use the first three steps and anything else that would be helpful to answer the question you are posing from the data set you chose. Please write a brief conclusion paragraph in R markdown at the end

# change Year data format
data %>% mutate(new_year = mdy(year)) %>% head(10)
##    year violent murder prisoners     afam     cauc    income        state law
## 1  1985    47.0    1.0        54 1.636063 67.49587 11862.990 North Dakota  no
## 2  1986    51.3    1.0        55 1.683373 67.07564 11931.530 North Dakota yes
## 3  1984    53.6    1.2        51 1.588102 67.86971 11802.410 North Dakota  no
## 4  1983    53.7    2.1        47 1.536609 68.22214 11386.000 North Dakota  no
## 5  1980    54.0    1.2        19 1.433876 69.05927  9786.855 North Dakota  no
## 6  1987    56.8    1.5        53 1.741012 66.72974 11896.500 North Dakota yes
## 7  1988    59.1    1.8        57 1.817105 66.64963 10731.290 North Dakota yes
## 8  1979    61.3    1.5        21 1.401761 69.25468 11522.060 North Dakota  no
## 9  1982    61.8    0.7        33 1.502760 68.53995 11341.680 North Dakota  no
## 10 1989    63.2    0.6        62 1.885228 66.47451 11528.670 North Dakota yes
##    new_year
## 1      <NA>
## 2      <NA>
## 3      <NA>
## 4      <NA>
## 5      <NA>
## 6      <NA>
## 7      <NA>
## 8      <NA>
## 9      <NA>
## 10     <NA>
# Evolution of the Gun Violence in America
data %>%
  tail(10) %>%
  ggplot( aes(x=year, y=murder)) +
  geom_line( color="grey") +
  geom_point(shape=21, color="black", fill="#69b3a2", size=6) +
  theme_ipsum() +
  ggtitle("Evolution of Gun Violence in America")

We create a correlation matrix to identify all possible strong relationships in the data set.

library(psych)
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
corPlot(data[,2:7],main = "Correlation of Gun violence features")

Correlation matrix shows relationship between all its numerical variables. As we notice, violence with gun is correlated with murder. In addition, their specific features have strong links with prisoners and African Americans. Data shown us that from 1977 to 1996 the amount of murders by gun has been trending. Caucasian population do not show strong relations with Violence , murders and prisoner which is not surprising based on relevant facts in history. Our question was which state has the most violent crimes in America and how murder is correlated with American Population. District of Columbia famously called “Murder Capital” in that time periods as the data showed in the bar plot. African Americans are most likely a victims of murder or violence by gun and go to jail according to the correlation matrix. Murder are most like to be committed by gun Violence.

5. BONUS – place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.

My GitHub account is joewarner89.

gun_data <- read.csv("https://raw.githubusercontent.com/joewarner89/Gun-/main/Guns.csv", 
                     header = T, stringsAsFactors = F, sep = ",")
gun_data$X <- NULL
head(gun_data)
##   year violent murder robbery prisoners     afam     cauc     male population
## 1 1977   414.4   14.2    96.8        83 8.384873 55.12291 18.17441   3.780403
## 2 1978   419.1   13.3    99.1        94 8.352101 55.14367 17.99408   3.831838
## 3 1979   413.3   13.2   109.5       144 8.329575 55.13586 17.83934   3.866248
## 4 1980   448.5   13.2   132.1       141 8.408386 54.91259 17.73420   3.900368
## 5 1981   470.5   11.9   126.5       149 8.483435 54.92513 17.67372   3.918531
## 6 1982   447.7   10.6   112.0       183 8.514000 54.89621 17.51052   3.925229
##     income   density   state law
## 1 9563.148 0.0745524 Alabama  no
## 2 9932.000 0.0755667 Alabama  no
## 3 9877.028 0.0762453 Alabama  no
## 4 9541.428 0.0768288 Alabama  no
## 5 9548.351 0.0771866 Alabama  no
## 6 9478.919 0.0773185 Alabama  no