Read the packages:
library(leaflet)
library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0 ✔ purrr 0.2.5
## ✔ tibble 2.0.0 ✔ dplyr 0.7.8
## ✔ tidyr 0.8.2 ✔ stringr 1.3.1
## ✔ readr 1.3.1 ✔ forcats 0.3.0
## ── Conflicts ─────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
library(htmltools)
library(rtweet)
##
## Attaching package: 'rtweet'
## The following object is masked from 'package:purrr':
##
## flatten
library(readxl)
library(broom)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(DT)
Read the data:
shootings <- read_csv("https://docs.google.com/spreadsheets/d/1b9o6uDO18sLxBqPwl_Gh9bnhW-ev_dABH83M5Vb5L8o/export?format=csv")
## Warning: Duplicated column names deduplicated: 'location' =>
## 'location_1' [8]
## Parsed with column specification:
## cols(
## .default = col_character(),
## fatalities = col_double(),
## injured = col_double(),
## total_victims = col_double(),
## age_of_shooter = col_double(),
## latitude = col_double(),
## longitude = col_double(),
## year = col_double()
## )
## See spec(...) for full column specifications.
Assignment: Shootings.
For your notebook, do some further analyses on the shootings data.
Create a map of the shootings. There are columns called latitude and longitude in the data that can generate dots on a map of where the shootings took place. (A). Because there was a shooting in Hawaii, the map is spread out a lot by default. Center on the continental US by looking up the center of the continental US, translating it to computer, finding a good zoom level, and using setView(lng = x, lat = y, zoom = z).
(B). Change the size of the dots so that the more victims, the larger the dot on the map. You can do this by setting radius = ~fatalities or radius = ~total_victims inside of addCircleMarkers(). If the dots are too big, you can do something like radius = ~total_victims/10, or radius = ~log(total_victims). I like that last one because larger numbers are reduced more than smaller ones.
Report some additional statistics, including:
A. Median number of total_victims and median number of fatalities.
B. A histogram of the number of shootings per year. You will want to set nbinsx = 40 inside add_histogram(), because that is the approx. number of years covered by the data.
C. Heatmaps with add_histogram2dcontour() of the gender and race of the shooter, and the gender and age of the shooter. You’ll notice some problems with the data that you’ll need to fix with fct_collapse().
D. A scatterplot with plotly of the number injured by the number of fatalities in the shooting.
Conduct a regression analysis testing the hypothesis that the number of shootings has increased over the years.
To create the regression model, y = the number of shootings and x = year, you need to set up the data like this:
num_per_year <- shootings %>%
filter(fatalities > 3) %>%
count(year) %>%
filter(year < 2019)
num_per_year
In your regression model, y = n and x = year.
The geographic center of the United States: (44°58′02″N 103°46′18″W") (from Wiki)
Translating
58/60
## [1] 0.9666667
2/(60*60)
## [1] 0.0005555556
46/60
## [1] 0.7666667
18/(60*60)
## [1] 0.005
should be lat= 44.967, lng = -103.771. For zoom, 3 is about at the continent level, 5 is about at a country level. 5 is too small, so select 4.
leaflet() %>%
addTiles() %>%
setView(lat= 44.967, lng = -103.771, zoom = 4)
shootings %>%
leaflet() %>%
addTiles() %>%
addCircleMarkers(stroke = F,
fillOpacity = .6,
radius = ~log(total_victims))
## Assuming "longitude" and "latitude" are longitude and latitude, respectively
Median number of total_victims:
shootings %>%
summarize(median_total_victims = median(total_victims))
Median number of fatalities:
shootings %>%
summarize(median_fatalities = median(fatalities))
A histogram of the number of shootings per year:
shootings%>%
plot_ly(x = ~year) %>%
add_histogram(nbinsx = 40)
Heatmaps of the gender and race of the shooter:
shootings %>%
plot_ly(x = ~gender, y = ~race) %>%
add_histogram2dcontour()
Heatmaps the gender and age of the shooter:
shootings %>%
plot_ly(x = ~gender, y = ~age_of_shooter) %>%
add_histogram2dcontour()
Combine the categories:
shootings %>%
mutate(gender = as_factor(gender)) %>%
mutate(gender = fct_collapse(gender, Male = c("Male", "M"), Female = c("Female", "F"))) %>%
count(gender)
A scatterplot of the number injured by the number of fatalities in the shooting:
shootings %>%
plot_ly(x = ~fatalities,
y = ~injured) %>%
add_markers()
The data by calculating the number of shootings per year:
num_per_year <- shootings %>%
filter(fatalities > 3) %>%
count(year) %>%
filter(year < 2019)
num_per_year
regression analysis:
num_per_year_model <- lm( n ~ year , data = num_per_year)
num_per_year_model
##
## Call:
## lm(formula = n ~ year, data = num_per_year)
##
## Coefficients:
## (Intercept) year
## -218.8464 0.1107
More information:
tidy(num_per_year_model)
glance(num_per_year_model)
P>0.05, the number of shootings have nothing to do with increased time. As a result, the number of shootings does not increase with time.