## ── Attaching packages ─────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.3
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 1.0.0 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
The murders dataset from the dslabs package contains statistics on gun murder data from 2010 organized by each state in the United States of America. In the dataset, one can find the US state, the abbreviation of the US state, the geographical region of the state, the state population in 2010, as well as the total number of gun murders in the state in 2010. This information was found on Wikipedia, and the Wikipedia page states that the population data came from the U.S. Census Bureau, while the gun murder data came from FBI reports.
Department of Justice, Federal Bureau of Investigation (September 2016).“Table 4, Crime in the United States by Region, Geographic Division, and State, 2014–2015”.Crime in the United States, 2015.
Population Division, US Census Bureau (December 2016).“Annual Estimates of the Resident Population: April 1, 2010 to July 1, 2016”.www.census.gov.
| Variable | Description |
|---|---|
| State | U.S. State (by full name) |
| Abbreviation | U.S. State (by 2 letter abbreviation) |
| Region | Geographical Region of U.S. State (South, Northeast, North Central, West) |
| Population | U.S. State Population (in 2010) |
| Gun Murders | Total Number of Gun Murders in U.S. State (in 2010) |
murders
## state abb region population total
## 1 Alabama AL South 4779736 135
## 2 Alaska AK West 710231 19
## 3 Arizona AZ West 6392017 232
## 4 Arkansas AR South 2915918 93
## 5 California CA West 37253956 1257
## 6 Colorado CO West 5029196 65
## 7 Connecticut CT Northeast 3574097 97
## 8 Delaware DE South 897934 38
## 9 District of Columbia DC South 601723 99
## 10 Florida FL South 19687653 669
## 11 Georgia GA South 9920000 376
## 12 Hawaii HI West 1360301 7
## 13 Idaho ID West 1567582 12
## 14 Illinois IL North Central 12830632 364
## 15 Indiana IN North Central 6483802 142
## 16 Iowa IA North Central 3046355 21
## 17 Kansas KS North Central 2853118 63
## 18 Kentucky KY South 4339367 116
## 19 Louisiana LA South 4533372 351
## 20 Maine ME Northeast 1328361 11
## 21 Maryland MD South 5773552 293
## 22 Massachusetts MA Northeast 6547629 118
## 23 Michigan MI North Central 9883640 413
## 24 Minnesota MN North Central 5303925 53
## 25 Mississippi MS South 2967297 120
## 26 Missouri MO North Central 5988927 321
## 27 Montana MT West 989415 12
## 28 Nebraska NE North Central 1826341 32
## 29 Nevada NV West 2700551 84
## 30 New Hampshire NH Northeast 1316470 5
## 31 New Jersey NJ Northeast 8791894 246
## 32 New Mexico NM West 2059179 67
## 33 New York NY Northeast 19378102 517
## 34 North Carolina NC South 9535483 286
## 35 North Dakota ND North Central 672591 4
## 36 Ohio OH North Central 11536504 310
## 37 Oklahoma OK South 3751351 111
## 38 Oregon OR West 3831074 36
## 39 Pennsylvania PA Northeast 12702379 457
## 40 Rhode Island RI Northeast 1052567 16
## 41 South Carolina SC South 4625364 207
## 42 South Dakota SD North Central 814180 8
## 43 Tennessee TN South 6346105 219
## 44 Texas TX South 25145561 805
## 45 Utah UT West 2763885 22
## 46 Vermont VT Northeast 625741 2
## 47 Virginia VA South 8001024 250
## 48 Washington WA West 6724540 93
## 49 West Virginia WV South 1852994 27
## 50 Wisconsin WI North Central 5686986 97
## 51 Wyoming WY West 563626 5
ggplot(data = murders, aes(x = population, y = total)) +
geom_point(aes(color = region)) +
theme_bw() +
labs(title = "Population of State vs. Total Gun Murders in State in 2010",
x = "Population of State (in 2010)",
y = "Total Gun Murders in State (in 2010)",
color = "Geographical Region")
In the above scatterplot, we are able to see the population of the state on the x-axis versus the total gun murders in the state during the year of 2010. Furthermore, we are able to see in which geographical region each point comes from. These variables were chosen to see if the population of the state has any effect on the total gun murders in the state, and whether this varies depending on the geographical region.
From the scatterplot we are able to see that there is a positive correlation between the population of a state and the total gun murders in that state. This makes sense, considering that it means that the total gun murders is pretty proportional to the population. There does not seem to be any big outliers to this trend. We are also able to see that the trend does not differ between the distinct geographical regions.
ggplot(data = murders) +
geom_bar(aes(x = region, y = total, fill = abb), stat = 'identity') +
theme_bw() +
labs(title = "Total Gun Murders in Each U.S. Geographical Region in 2010",
x = "U.S. Geographical Region",
y = "Total Gun Murders",
fill = "State")
In the above barplot, we are able to see the total gun murders in each geographical region during the year of 2010. Furthermore, we are able to see about how many gun murders each state is responsible for within the geographical region. These variables were chosen to see if there is a geographical region that seems to have higher or lower total gun murder numbers compared to the other geographical regions, and whether there is a state (or several states) for which those numbers can be attributed to.
From the barplot we are able to see that the Southern geographical region tends to have very high total gun murder numbers compared to the other three regions, which have more similar numbers. This could be attributed to the fact that traditionally, Southern states and Southern citizens are known to own more guns which could be a determining factor as to why there are so many gun murders in Southern states. However, in this particular barplot, due to the large number of states it is difficult to distinguish which particular states are contributing (or not contributing) to the high number of total gun murders in the Southern geographical region.
However, with further thought we can analyze the data more and see that it might have just been misrepresented. If we look into how many states are in each U.S. geographical region, we can see that there are many more states in the South as opposed to any other region (as shown in the barplot below). From this, we can assume that the higher number of total gun murders in the South is because there is a larger number of states in the South, not because people kill more in the South than in any other geographical region. In fact, we can see that the proportions of our second barplot and our third barplot are incredibly similar. Therefore, we are able to circle back to the trend we found from our first scatterplot that is- with more states and more people, there will be more gun murders.
ggplot(data = murders) +
geom_bar(aes(x = region, fill = abb)) +
theme_bw() +
labs(title = "Number of States in Each U.S. Geographical Region",
x = "U.S. Geographical Region",
y = "Number of States",
fill = "State")