Read the packages:

library(leaflet)
library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0     ✔ purrr   0.2.5
## ✔ tibble  2.0.0     ✔ dplyr   0.7.8
## ✔ tidyr   0.8.2     ✔ stringr 1.3.1
## ✔ readr   1.3.1     ✔ forcats 0.3.0
## ── Conflicts ─────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
library(htmltools)
library(rtweet)
## 
## Attaching package: 'rtweet'
## The following object is masked from 'package:purrr':
## 
##     flatten
library(readxl)
library(broom)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(DT)

Read the data:

shootings <- read_csv("https://docs.google.com/spreadsheets/d/1b9o6uDO18sLxBqPwl_Gh9bnhW-ev_dABH83M5Vb5L8o/export?format=csv")
## Warning: Duplicated column names deduplicated: 'location' =>
## 'location_1' [8]
## Parsed with column specification:
## cols(
##   .default = col_character(),
##   fatalities = col_double(),
##   injured = col_double(),
##   total_victims = col_double(),
##   age_of_shooter = col_double(),
##   latitude = col_double(),
##   longitude = col_double(),
##   year = col_double()
## )
## See spec(...) for full column specifications.

Assignment: Shootings.

For your notebook, do some further analyses on the shootings data.

  1. Create a map of the shootings. There are columns called latitude and longitude in the data that can generate dots on a map of where the shootings took place. (A). Because there was a shooting in Hawaii, the map is spread out a lot by default. Center on the continental US by looking up the center of the continental US, translating it to computer, finding a good zoom level, and using setView(lng = x, lat = y, zoom = z).
    (B). Change the size of the dots so that the more victims, the larger the dot on the map. You can do this by setting radius = ~fatalities or radius = ~total_victims inside of addCircleMarkers(). If the dots are too big, you can do something like radius = ~total_victims/10, or radius = ~log(total_victims). I like that last one because larger numbers are reduced more than smaller ones.

  2. Report some additional statistics, including:
    A. Median number of total_victims and median number of fatalities.
    B. A histogram of the number of shootings per year. You will want to set nbinsx = 40 inside add_histogram(), because that is the approx. number of years covered by the data.
    C. Heatmaps with add_histogram2dcontour() of the gender and race of the shooter, and the gender and age of the shooter. You’ll notice some problems with the data that you’ll need to fix with fct_collapse().
    D. A scatterplot with plotly of the number injured by the number of fatalities in the shooting.

  3. Conduct a regression analysis testing the hypothesis that the number of shootings has increased over the years.

To create the regression model, y = the number of shootings and x = year, you need to set up the data like this:

num_per_year <- shootings %>% 
  filter(fatalities > 3) %>% 
  count(year) %>% 
  filter(year < 2019)

num_per_year

In your regression model, y = n and x = year.

1. Create a map of the shootings

A.

The geographic center of the United States: (44°58′02″N 103°46′18″W") (from Wiki)

Translating

58/60
## [1] 0.9666667
2/(60*60)
## [1] 0.0005555556
46/60
## [1] 0.7666667
18/(60*60)
## [1] 0.005

should be lat= 44.967, lng = -103.771. For zoom, 3 is about at the continent level, 5 is about at a country level. 5 is too small, so select 4.

leaflet() %>% 
  addTiles() %>% 
  setView(lat= 44.967, lng = -103.771, zoom = 4)

B.

shootings %>% 
  leaflet() %>% 
  addTiles() %>%
  addCircleMarkers(stroke = F, 
                   fillOpacity = .6, 
                   radius = ~log(total_victims))
## Assuming "longitude" and "latitude" are longitude and latitude, respectively

2. Report some additional statistics

A.

Median number of total_victims:

shootings %>% 
  summarize(median_total_victims = median(total_victims)) 

Median number of fatalities:

shootings %>% 
  summarize(median_fatalities = median(fatalities)) 

B.

A histogram of the number of shootings per year:

shootings%>% 
  plot_ly(x = ~year) %>% 
  add_histogram(nbinsx = 40)

C.

Heatmaps of the gender and race of the shooter:

shootings %>% 
  plot_ly(x = ~gender, y = ~race) %>% 
  add_histogram2dcontour()

Heatmaps the gender and age of the shooter:

shootings %>% 
  plot_ly(x = ~gender, y = ~age_of_shooter) %>% 
  add_histogram2dcontour()

Combine the categories:

shootings %>% 
  mutate(gender = as_factor(gender)) %>% 
  mutate(gender = fct_collapse(gender, Male = c("Male", "M"), Female = c("Female", "F"))) %>% 
  count(gender)

D.

A scatterplot of the number injured by the number of fatalities in the shooting:

shootings %>% 
  plot_ly(x = ~fatalities, 
          y = ~injured) %>% 
  add_markers()

3.

The data by calculating the number of shootings per year:

num_per_year <- shootings %>%  
  filter(fatalities > 3) %>% 
  count(year) %>% 
  filter(year < 2019)

num_per_year

regression analysis:

num_per_year_model <- lm( n ~ year , data = num_per_year) 

num_per_year_model
## 
## Call:
## lm(formula = n ~ year, data = num_per_year)
## 
## Coefficients:
## (Intercept)         year  
##   -218.8464       0.1107

More information:

tidy(num_per_year_model)
glance(num_per_year_model)

P>0.05, the number of shootings have nothing to do with increased time. As a result, the number of shootings does not increase with time.