Exploring R Final By: Jason Reeves
# A tibble: 239,677 x 29
incident_id date state city_or_county address n_killed n_injured
<int> <fct> <fct> <fct> <fct> <int> <int>
1 461105 1/1/~ Penn~ Mckeesport 1506 V~ 0 4
2 460726 1/1/~ Cali~ Hawthorne 13500 ~ 1 3
3 478855 1/1/~ Ohio Lorain 1776 E~ 1 3
4 478925 1/5/~ Colo~ Aurora 16000 ~ 4 0
5 478959 1/7/~ Nort~ Greensboro 307 Mo~ 2 2
6 478948 1/7/~ Okla~ Tulsa 6000 b~ 4 0
7 479363 1/19~ New ~ Albuquerque 2806 L~ 5 0
8 479374 1/21~ Loui~ New Orleans LaSall~ 0 5
9 479389 1/21~ Cali~ Brentwood 1100 b~ 0 4
10 492151 1/23~ Mary~ Baltimore 1500 b~ 1 6
# ... with 239,667 more rows, and 22 more variables: incident_url <fct>,
# source_url <fct>, incident_url_fields_missing <lgl>,
# congressional_district <int>, gun_stolen <fct>, gun_type <fct>,
# incident_characteristics <fct>, latitude <dbl>,
# location_description <fct>, longitude <dbl>, n_guns_involved <int>,
# notes <fct>, participant_age <fct>, participant_age_group <fct>,
# participant_gender <fct>, participant_name <fct>,
# participant_relationship <fct>, participant_status <fct>,
# participant_type <fct>, sources <fct>, state_house_district <int>,
# state_senate_district <int>
Introduction
In recent months and years there appears to be an increase in gun violence around the country. I wanted to investigate this for myself… “Trust by verify!” This specific data has 239,677 observations and 29 variables. The incidents are from a portion of 2013, all of 2014-2017, and a portion of 2018. An incident was not included in this data. The incident in Las Vegas at the Mandalay Bay. There were a total of 59 killed and 411 injured. I still chose to use this data due to its completeness, otherwise. In this report I will disect the data to show various aspects of the information.
Data at a Glimpse
Observations: 239,677
Variables: 29
$ incident_id <int> 461105, 460726, 478855, 478925, 47...
$ date <fct> 1/1/2013, 1/1/2013, 1/1/2013, 1/5/...
$ state <fct> Pennsylvania, California, Ohio, Co...
$ city_or_county <fct> Mckeesport, Hawthorne, Lorain, Aur...
$ address <fct> 1506 Versailles Avenue and Coursin...
$ n_killed <int> 0, 1, 1, 4, 2, 4, 5, 0, 0, 1, 1, 1...
$ n_injured <int> 4, 3, 3, 0, 2, 0, 0, 5, 4, 6, 3, 3...
$ incident_url <fct> http://www.gunviolencearchive.org/...
$ source_url <fct> http://www.post-gazette.com/local/...
$ incident_url_fields_missing <lgl> FALSE, FALSE, FALSE, FALSE, FALSE,...
$ congressional_district <int> 14, 43, 9, 6, 6, 1, 1, 2, 9, 7, 3,...
$ gun_stolen <fct> , , 0::Unknown||1::Unknown, , 0::U...
$ gun_type <fct> , , 0::Unknown||1::Unknown, , 0::H...
$ incident_characteristics <fct> "Shot - Wounded/Injured||Mass Shoo...
$ latitude <dbl> 40.3467, 33.9090, 41.4455, 39.6518...
$ location_description <fct> , , Cotton Club, , , Fairmont Terr...
$ longitude <dbl> -79.8559, -118.3330, -82.1377, -10...
$ n_guns_involved <int> NA, NA, 2, NA, 2, NA, 2, NA, NA, N...
$ notes <fct> "Julian Sims under investigation: ...
$ participant_age <fct> 0::20, 0::20, 0::25||1::31||2::33|...
$ participant_age_group <fct> 0::Adult 18+||1::Adult 18+||2::Adu...
$ participant_gender <fct> 0::Male||1::Male||3::Male||4::Fema...
$ participant_name <fct> "0::Julian Sims", "0::Bernard Gill...
$ participant_relationship <fct> , , , , 3::Family, , 5::Family, , ...
$ participant_status <fct> "0::Arrested||1::Injured||2::Injur...
$ participant_type <fct> 0::Victim||1::Victim||2::Victim||3...
$ sources <fct> http://pittsburgh.cbslocal.com/201...
$ state_house_district <int> NA, 62, 56, 40, 62, 72, 10, 93, 11...
$ state_senate_district <int> NA, 35, 13, 28, 27, 11, 14, 5, 7, ...
Cleaning up the Data
Using the lubridate package
Here I need to make sure that the data is placed in an organized manner. This is to make analysis easier as the project goeas along. If we don’t do this at the beginning it can really cause trouble later in the analysis.
guns$date <- mdy(guns$date)
guns$year <- year(guns$date)
guns$quarter <- quarter(guns$date)
guns$month <- month(guns$date, label = T)
guns$day <- day(guns$date)
guns <- guns %>%
mutate(month_day = paste(month, day))Date Analysis
Using the lubridate package
According to the data there hs been an increase in incidents every year. The greatest increase was from 2015 to 2016. Is there anything else that the data tells us? It has been theorised that there is more violence during the warmer months. What does the data say? We will look at several different aspects.
By Year
By Quarter
It appears that there is an increase in gun incidents in the warmer months, the second and third quarters. I removed 2013 and 2018 data due for aesthetics.
Running Median of those Killed by Quarter