Exploring R Final By: Jason Reeves

# A tibble: 239,677 x 29
   incident_id date  state city_or_county address n_killed n_injured
         <int> <fct> <fct> <fct>          <fct>      <int>     <int>
 1      461105 1/1/~ Penn~ Mckeesport     1506 V~        0         4
 2      460726 1/1/~ Cali~ Hawthorne      13500 ~        1         3
 3      478855 1/1/~ Ohio  Lorain         1776 E~        1         3
 4      478925 1/5/~ Colo~ Aurora         16000 ~        4         0
 5      478959 1/7/~ Nort~ Greensboro     307 Mo~        2         2
 6      478948 1/7/~ Okla~ Tulsa          6000 b~        4         0
 7      479363 1/19~ New ~ Albuquerque    2806 L~        5         0
 8      479374 1/21~ Loui~ New Orleans    LaSall~        0         5
 9      479389 1/21~ Cali~ Brentwood      1100 b~        0         4
10      492151 1/23~ Mary~ Baltimore      1500 b~        1         6
# ... with 239,667 more rows, and 22 more variables: incident_url <fct>,
#   source_url <fct>, incident_url_fields_missing <lgl>,
#   congressional_district <int>, gun_stolen <fct>, gun_type <fct>,
#   incident_characteristics <fct>, latitude <dbl>,
#   location_description <fct>, longitude <dbl>, n_guns_involved <int>,
#   notes <fct>, participant_age <fct>, participant_age_group <fct>,
#   participant_gender <fct>, participant_name <fct>,
#   participant_relationship <fct>, participant_status <fct>,
#   participant_type <fct>, sources <fct>, state_house_district <int>,
#   state_senate_district <int>

Introduction

In recent months and years there appears to be an increase in gun violence around the country. I wanted to investigate this for myself… “Trust by verify!” This specific data has 239,677 observations and 29 variables. The incidents are from a portion of 2013, all of 2014-2017, and a portion of 2018. An incident was not included in this data. The incident in Las Vegas at the Mandalay Bay. There were a total of 59 killed and 411 injured. I still chose to use this data due to its completeness, otherwise. In this report I will disect the data to show various aspects of the information.


Data at a Glimpse

Observations: 239,677
Variables: 29
$ incident_id                 <int> 461105, 460726, 478855, 478925, 47...
$ date                        <fct> 1/1/2013, 1/1/2013, 1/1/2013, 1/5/...
$ state                       <fct> Pennsylvania, California, Ohio, Co...
$ city_or_county              <fct> Mckeesport, Hawthorne, Lorain, Aur...
$ address                     <fct> 1506 Versailles Avenue and Coursin...
$ n_killed                    <int> 0, 1, 1, 4, 2, 4, 5, 0, 0, 1, 1, 1...
$ n_injured                   <int> 4, 3, 3, 0, 2, 0, 0, 5, 4, 6, 3, 3...
$ incident_url                <fct> http://www.gunviolencearchive.org/...
$ source_url                  <fct> http://www.post-gazette.com/local/...
$ incident_url_fields_missing <lgl> FALSE, FALSE, FALSE, FALSE, FALSE,...
$ congressional_district      <int> 14, 43, 9, 6, 6, 1, 1, 2, 9, 7, 3,...
$ gun_stolen                  <fct> , , 0::Unknown||1::Unknown, , 0::U...
$ gun_type                    <fct> , , 0::Unknown||1::Unknown, , 0::H...
$ incident_characteristics    <fct> "Shot - Wounded/Injured||Mass Shoo...
$ latitude                    <dbl> 40.3467, 33.9090, 41.4455, 39.6518...
$ location_description        <fct> , , Cotton Club, , , Fairmont Terr...
$ longitude                   <dbl> -79.8559, -118.3330, -82.1377, -10...
$ n_guns_involved             <int> NA, NA, 2, NA, 2, NA, 2, NA, NA, N...
$ notes                       <fct> "Julian Sims under investigation: ...
$ participant_age             <fct> 0::20, 0::20, 0::25||1::31||2::33|...
$ participant_age_group       <fct> 0::Adult 18+||1::Adult 18+||2::Adu...
$ participant_gender          <fct> 0::Male||1::Male||3::Male||4::Fema...
$ participant_name            <fct> "0::Julian Sims", "0::Bernard Gill...
$ participant_relationship    <fct> , , , , 3::Family, , 5::Family, , ...
$ participant_status          <fct> "0::Arrested||1::Injured||2::Injur...
$ participant_type            <fct> 0::Victim||1::Victim||2::Victim||3...
$ sources                     <fct> http://pittsburgh.cbslocal.com/201...
$ state_house_district        <int> NA, 62, 56, 40, 62, 72, 10, 93, 11...
$ state_senate_district       <int> NA, 35, 13, 28, 27, 11, 14, 5, 7, ...

Cleaning up the Data

Using the lubridate package

Here I need to make sure that the data is placed in an organized manner. This is to make analysis easier as the project goeas along. If we don’t do this at the beginning it can really cause trouble later in the analysis.

guns$date <- mdy(guns$date)
guns$year <- year(guns$date)
guns$quarter <- quarter(guns$date)
guns$month <- month(guns$date, label = T)
guns$day <- day(guns$date)
guns <- guns %>% 
  mutate(month_day = paste(month, day))

Date Analysis

Using the lubridate package

According to the data there hs been an increase in incidents every year. The greatest increase was from 2015 to 2016. Is there anything else that the data tells us? It has been theorised that there is more violence during the warmer months. What does the data say? We will look at several different aspects.

By Year

By Quarter

It appears that there is an increase in gun incidents in the warmer months, the second and third quarters. I removed 2013 and 2018 data due for aesthetics.


Running Median of those Killed by Quarter

By Month

It appears that July and August show the highest incidents. But third is January… why?

# A tibble: 12 x 2
   Month Total
   <ord> <int>
 1 Jul   21109
 2 Aug   21026
 3 Jan   20620
 4 Mar   20255
 5 May   19917
 6 Oct   19879
 7 Sep   19642
 8 Jun   18739
 9 Apr   18619
10 Dec   18095
11 Nov   17963
12 Feb   16773

By Date

What was the most popular date for incidents in the US during 2014-2017? It appears to be New Years Day and Independence Day. I also notices that July 5th is closely behind the 4th. This indicates to me that these incidents were possibly late in the evening, around fireworks time, and were not reported until the 5th.

# A tibble: 10 x 2
# Groups:   Date [10]
   Date   Total
   <chr>  <int>
 1 Jan 1   1115
 2 Jul 4    876
 3 Jul 5    820
 4 Jul 30   788
 5 Oct 25   742
 6 Jul 17   740
 7 Jul 19   740
 8 Aug 13   734
 9 Aug 1    730
10 Jul 25   730

Top Cities by Incident

There really was no surprise here. Chicago leads this category, and it isn’t even close. With more than 10K incidents it leads me to question the reason behind it. Where is New York City? Well, there is a reason it is not listed here. Please look at the further analysis by state.

# A tibble: 10 x 2
   City         Total
   <fct>        <int>
 1 Chicago      10814
 2 Baltimore     3943
 3 Washington    3279
 4 New Orleans   3071
 5 Philadelphia  2963
 6 Houston       2501
 7 Saint Louis   2501
 8 Milwaukee     2487
 9 Jacksonville  2448
10 Memphis       2386

Further State Analysis

Using the leaflet package

So where are the most incidents? It appears that IL, CA, and FL are the states that have the most occurances. I decided to take a little deeper look at what cities within these states have the most incidents. Select the tab to see further analysis within the selected states.

Illinois

As discussed previously, Chicago has the highest incident rate. What it also has is some very high suburban rates. Please hover your curser over a point for some of the notes that were filed in the reports.

# A tibble: 10 x 2
# Groups:   City [10]
   City                Total
   <fct>               <int>
 1 Chicago             10814
 2 Peoria                920
 3 Rockford              842
 4 Chicago (Englewood)   542
 5 Springfield           303
 6 Champaign             213
 7 Joliet                198
 8 Aurora                191
 9 Kankakee              186
10 Chicago (Roseland)    159

Minnesota

Here we don’t have much of a surprise either. MPLS and SP have the highest rates of incidents. The one city that stood out to me was Albert Lea. Again, hover your curser over a point for the notes that were filed with the report. You can also scroll in and view your own neighborhood.

# A tibble: 10 x 2
# Groups:   City [10]
   City        Total
   <fct>       <int>
 1 Minneapolis   667
 2 Saint Paul    404
 3 Rochester     173
 4 Duluth         65
 5 Saint Cloud    56
 6 Albert Lea     51
 7 Austin         46
 8 Bemidji        40
 9 Moorhead       24
10 Brainerd       21

California

The most incidents in CA occured in Oakland, LA, and Fresno.

# A tibble: 10 x 2
# Groups:   City [10]
   City          Total
   <fct>         <int>
 1 Oakland        1478
 2 Los Angeles    1066
 3 Fresno         1057
 4 Bakersfield     605
 5 Stockton        555
 6 Sacramento      477
 7 San Francisco   421
 8 San Diego       372
 9 Long Beach      329
10 Salinas         301

Florida

# A tibble: 10 x 2
# Groups:   City [10]
   City            Total
   <fct>           <int>
 1 Jacksonville     2317
 2 Orlando          1020
 3 Miami             837
 4 Tampa             601
 5 West Palm Beach   431
 6 Fort Myers        387
 7 Ocala             280
 8 Tallahassee       260
 9 Fort Pierce       235
10 Fort Lauderdale   213

New York

This is very interesting. The Boroughs are split up: Brooklyn, Bronx, Manhattan, Staten Island, and Queens. If these were put together their total would be 3,151. This total would place NYC between DC and New Orleans.

# A tibble: 13 x 2
# Groups:   City [13]
   City                 Total
   <fct>                <int>
 1 Brooklyn              1405
 2 Buffalo                954
 3 Bronx                  845
 4 Rochester              689
 5 Syracuse               642
 6 Staten Island          411
 7 New York (Manhattan)   360
 8 Albany                 203
 9 Utica                  183
10 Niagara Falls          182
11 Corona (Queens)        156
12 Manhattan              140
13 Queens                 130

Alaska

# A tibble: 10 x 2
# Groups:   City [10]
   City        Total
   <fct>       <int>
 1 Anchorage     469
 2 Fairbanks     217
 3 Wasilla       114
 4 Juneau         97
 5 North Pole     31
 6 Palmer         31
 7 Soldotna       20
 8 Kodiak         17
 9 Eagle River    16
10 Ketchikan      16

Conclusion

The gun incidents in the US have been a hot-button issue for years. As we looked at the recent data we notice a disturbing trend. I do not believe in bringing politics into the education sector, only facts. A person can make their own conclusions given the facts without bias. There can be further analysis completed on this data to dig down on the possible gang affiliations and even the districts of interest. The data can further be combined with census data in order to see the per capita incidents. This information can be very powerful for those interested in the safety of their families.

2019-02-06