## ── Attaching packages ─────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✔ ggplot2 3.2.1     ✔ purrr   0.3.3
## ✔ tibble  2.1.3     ✔ dplyr   0.8.3
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Data description

The murders dataset from the dslabs package contains statistics on gun murder data from 2010 organized by each state in the United States of America. In the dataset, one can find the US state, the abbreviation of the US state, the geographical region of the state, the state population in 2010, as well as the total number of gun murders in the state in 2010. This information was found on Wikipedia, and the Wikipedia page states that the population data came from the U.S. Census Bureau, while the gun murder data came from FBI reports.

Department of Justice, Federal Bureau of Investigation (September 2016).“Table 4, Crime in the United States by Region, Geographic Division, and State, 2014–2015”.Crime in the United States, 2015.

Population Division, US Census Bureau (December 2016).“Annual Estimates of the Resident Population: April 1, 2010 to July 1, 2016”.www.census.gov.

Summary table template:

Variable Description
State U.S. State (by full name)
Abbreviation U.S. State (by 2 letter abbreviation)
Region Geographical Region of U.S. State (South, Northeast, North Central, West)
Population U.S. State Population (in 2010)
Gun Murders Total Number of Gun Murders in U.S. State (in 2010)

Data visualizations

murders
##                   state abb        region population total
## 1               Alabama  AL         South    4779736   135
## 2                Alaska  AK          West     710231    19
## 3               Arizona  AZ          West    6392017   232
## 4              Arkansas  AR         South    2915918    93
## 5            California  CA          West   37253956  1257
## 6              Colorado  CO          West    5029196    65
## 7           Connecticut  CT     Northeast    3574097    97
## 8              Delaware  DE         South     897934    38
## 9  District of Columbia  DC         South     601723    99
## 10              Florida  FL         South   19687653   669
## 11              Georgia  GA         South    9920000   376
## 12               Hawaii  HI          West    1360301     7
## 13                Idaho  ID          West    1567582    12
## 14             Illinois  IL North Central   12830632   364
## 15              Indiana  IN North Central    6483802   142
## 16                 Iowa  IA North Central    3046355    21
## 17               Kansas  KS North Central    2853118    63
## 18             Kentucky  KY         South    4339367   116
## 19            Louisiana  LA         South    4533372   351
## 20                Maine  ME     Northeast    1328361    11
## 21             Maryland  MD         South    5773552   293
## 22        Massachusetts  MA     Northeast    6547629   118
## 23             Michigan  MI North Central    9883640   413
## 24            Minnesota  MN North Central    5303925    53
## 25          Mississippi  MS         South    2967297   120
## 26             Missouri  MO North Central    5988927   321
## 27              Montana  MT          West     989415    12
## 28             Nebraska  NE North Central    1826341    32
## 29               Nevada  NV          West    2700551    84
## 30        New Hampshire  NH     Northeast    1316470     5
## 31           New Jersey  NJ     Northeast    8791894   246
## 32           New Mexico  NM          West    2059179    67
## 33             New York  NY     Northeast   19378102   517
## 34       North Carolina  NC         South    9535483   286
## 35         North Dakota  ND North Central     672591     4
## 36                 Ohio  OH North Central   11536504   310
## 37             Oklahoma  OK         South    3751351   111
## 38               Oregon  OR          West    3831074    36
## 39         Pennsylvania  PA     Northeast   12702379   457
## 40         Rhode Island  RI     Northeast    1052567    16
## 41       South Carolina  SC         South    4625364   207
## 42         South Dakota  SD North Central     814180     8
## 43            Tennessee  TN         South    6346105   219
## 44                Texas  TX         South   25145561   805
## 45                 Utah  UT          West    2763885    22
## 46              Vermont  VT     Northeast     625741     2
## 47             Virginia  VA         South    8001024   250
## 48           Washington  WA          West    6724540    93
## 49        West Virginia  WV         South    1852994    27
## 50            Wisconsin  WI North Central    5686986    97
## 51              Wyoming  WY          West     563626     5
ggplot(data = murders, aes(x = population, y = total)) +
  geom_point(aes(color = region)) +
  theme_bw() +
  labs(title = "Population of State vs. Total Gun Murders in State in 2010",
       x = "Population of State (in 2010)", 
       y = "Total Gun Murders in State (in 2010)",
       color = "Geographical Region")

In the above scatterplot, we are able to see the population of the state on the x-axis versus the total gun murders in the state during the year of 2010. Furthermore, we are able to see in which geographical region each point comes from. These variables were chosen to see if the population of the state has any effect on the total gun murders in the state, and whether this varies depending on the geographical region.

From the scatterplot we are able to see that there is a positive correlation between the population of a state and the total gun murders in that state. This makes sense, considering that it means that the total gun murders is pretty proportional to the population. There does not seem to be any big outliers to this trend. We are also able to see that the trend does not differ between the distinct geographical regions.

ggplot(data = murders) +
  geom_bar(aes(x = region, y = total, fill = abb), stat = 'identity') +
  theme_bw() +
  labs(title = "Total Gun Murders in Each U.S. Geographical Region in 2010",
       x = "U.S. Geographical Region",
       y = "Total Gun Murders",
       fill = "State")

In the above barplot, we are able to see the total gun murders in each geographical region during the year of 2010. Furthermore, we are able to see about how many gun murders each state is responsible for within the geographical region. These variables were chosen to see if there is a geographical region that seems to have higher or lower total gun murder numbers compared to the other geographical regions, and whether there is a state (or several states) for which those numbers can be attributed to.

From the barplot we are able to see that the Southern geographical region tends to have very high total gun murder numbers compared to the other three regions, which have more similar numbers. This could be attributed to the fact that traditionally, Southern states and Southern citizens are known to own more guns which could be a determining factor as to why there are so many gun murders in Southern states. However, in this particular barplot, due to the large number of states it is difficult to distinguish which particular states are contributing (or not contributing) to the high number of total gun murders in the Southern geographical region.

However, with further thought we can analyze the data more and see that it might have just been misrepresented. If we look into how many states are in each U.S. geographical region, we can see that there are many more states in the South as opposed to any other region (as shown in the barplot below). From this, we can assume that the higher number of total gun murders in the South is because there is a larger number of states in the South, not because people kill more in the South than in any other geographical region. In fact, we can see that the proportions of our second barplot and our third barplot are incredibly similar. Therefore, we are able to circle back to the trend we found from our first scatterplot that is- with more states and more people, there will be more gun murders.

ggplot(data = murders) +
  geom_bar(aes(x = region, fill = abb)) +
  theme_bw() +
  labs(title = "Number of States in Each U.S. Geographical Region",
       x = "U.S. Geographical Region",
       y = "Number of States",
       fill = "State")