library(tidyverse)
library(tidyr)
library(highcharter)
library(leaflet)
library(DataExplorer)
library(psych)
library(tidyverse)
library(ggfortify)
library(htmltools)
library(plotly)Final Project
Guns and violence
Tools in the day to day life, they were not only for protection but for hunting, digging, woodworking and kitchen work.
The debate on violence as been since the first days of humanity after all no one wants to suffer any kind of violence. For many years humanity have carried in public dangerous tools in the day to day life such as knifes and woodwork tools, they were not only for hunting, digging, woodworking and kitchen work but for protection also. Objects as swords were common in the waist of all in the streets. These technologies are actually means of violence them selfs, the only way to stop a immigrate aggregation is to mach and overpower the aggressor.
Technology flew by and knifes and swords became pistols and riffles in the idea of a tool for day to day life also used for protection ended when people did not need to hunt for their survival. Today the handgun or riffle as no affect on life besides the usage in others, what does that imply into the minds of the big lab rats that humans are? Are we more violent because of them? more safe? or it simply does not matter? Those were the question decide to research upon and to see if statistics tell us anything about it.
For that the team chose two meaningful data sets for inspection. One gathers the data of violent crimes in the USA, the other gun ownership in the country.
The first is from the FBI Data base for crimes. There were extracted violent crimes from the years of 1979 to 2022 all data is from police reports.
From this year we have the following variables: state_abbr, state_name, population, violent_crime, homicide, rape_legacy, rape_revised, robbery, aggravated_assault, property_crime, burglary, larceny, motor_vehicle_theft and we will use all except for rape_revised and population. All of them numerical, expect for state and state abbreviation, but they were all accrued numbers of each crime.
The second is from the Rand Coorporaton were they track gun ownership by state and some other factors that will be discused furthermore. With data ranging from 1980 to 2016.
This data set as many tributes. Such As the year of the data, the state, FIP codes for the states, Factor scores for household firearm ownership latent factor, standardize factor, background checks, state has permit for purchase law, female and male suicide rates, survey, hunting licences, indicator for question wording changes and many more minor analyses. For the project only the suicide rates, states, year and Factor scores for household firearm ownership latent factor will be used. All that extract from surveys, hunting licences, caring laws, and some more information from the states and federal goverments. They are some one and zeros for laws os caring or hunting in the state and the rest and more important are variables from 0 to 1 on the probability a house hold has a gun in it. Extracted by they statistics of combined data.
The selection of variables was based on used of firearms on the crime and scenario and excluding external legal permits and statistics.
Starting libraries
Every project we must start with our libraries and with what we expect to explore in the data, we want to create interactive graths.
Starting Crimes database
The data is still on the original state and so we must look into it and extract only what is interresting to our project.
metadata1 <- read_csv("estimated_crimes_1979_2022(1).csv")Rows: 2284 Columns: 15
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): state_abbr, state_name, caveats
dbl (12): year, population, violent_crime, homicide, rape_legacy, rape_revis...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(metadata1)# A tibble: 6 × 15
year state_abbr state_name population violent_crime homicide rape_legacy
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1979 <NA> <NA> 220099000 1208030 21460 76390
2 1979 AK Alaska 406000 1994 54 292
3 1979 AL Alabama 3769000 15578 496 1037
4 1979 AR Arkansas 2180000 7984 198 595
5 1979 AZ Arizona 2450000 14528 219 1120
6 1979 CA California 22696000 184087 2952 12239
# ℹ 8 more variables: rape_revised <dbl>, robbery <dbl>,
# aggravated_assault <dbl>, property_crime <dbl>, burglary <dbl>,
# larceny <dbl>, motor_vehicle_theft <dbl>, caveats <chr>
Treating the data
First lets Filter the dataset to mach the other dataset beging date, 1980, or extract the 1979 variables. Also it ends in 2016, and so we must extract the new years also.
filtered1 <- metadata1 |>
filter(year != 1979) |>
filter(year <= 2016)
head(filtered1)# A tibble: 6 × 15
year state_abbr state_name population violent_crime homicide rape_legacy
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1980 <NA> <NA> 225349264 1344520 23040 82990
2 1980 AK Alaska 440142 1919 39 250
3 1980 AL Alabama 3861466 17320 509 1158
4 1980 AR Arkansas 2284037 7656 210 609
5 1980 AZ Arizona 2715357 17673 279 1227
6 1980 CA California 23532680 210290 3411 13693
# ℹ 8 more variables: rape_revised <dbl>, robbery <dbl>,
# aggravated_assault <dbl>, property_crime <dbl>, burglary <dbl>,
# larceny <dbl>, motor_vehicle_theft <dbl>, caveats <chr>
Upon inspection the NA names is for the sum of all states or the usa. And so lets take it out. We want to compare state to state.
NONA1 <- filtered1 |>
filter(!is.na(state_name) & !is.na(state_abbr)) |>
select(-rape_revised, -caveats)
head(NONA1)# A tibble: 6 × 13
year state_abbr state_name population violent_crime homicide rape_legacy
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1980 AK Alaska 440142 1919 39 250
2 1980 AL Alabama 3861466 17320 509 1158
3 1980 AR Arkansas 2284037 7656 210 609
4 1980 AZ Arizona 2715357 17673 279 1227
5 1980 CA California 23532680 210290 3411 13693
6 1980 CO Colorado 2878407 15215 198 1510
# ℹ 6 more variables: robbery <dbl>, aggravated_assault <dbl>,
# property_crime <dbl>, burglary <dbl>, larceny <dbl>,
# motor_vehicle_theft <dbl>
Create data with population
The date needs to be treated in reagards of its own population, let divide all crimes by the state population.
NONA1$violent_crime = 1000*NONA1$violent_crime/NONA1$population
NONA1$homicide = 1000*NONA1$homicide/NONA1$population
NONA1$rape_legacy = 1000*NONA1$rape_legacy/NONA1$population
NONA1$robbery = 1000*NONA1$robbery/NONA1$population
NONA1$aggravated_assault = 1000*NONA1$aggravated_assault/NONA1$population
NONA1$property_crime = 1000*NONA1$property_crime/NONA1$population
NONA1$burglary = 1000*NONA1$burglary/NONA1$population
NONA1$larceny = 1000*NONA1$larceny/NONA1$population
NONA1$robbery = 1000*NONA1$robbery/NONA1$population
NONA1$motor_vehicle_theft = 1000*NONA1$motor_vehicle_theft/NONA1$population
head(NONA1)# A tibble: 6 × 13
year state_abbr state_name population violent_crime homicide rape_legacy
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 1980 AK Alaska 440142 4.36 0.0886 0.568
2 1980 AL Alabama 3861466 4.49 0.132 0.300
3 1980 AR Arkansas 2284037 3.35 0.0919 0.267
4 1980 AZ Arizona 2715357 6.51 0.103 0.452
5 1980 CA California 23532680 8.94 0.145 0.582
6 1980 CO Colorado 2878407 5.29 0.0688 0.525
# ℹ 6 more variables: robbery <dbl>, aggravated_assault <dbl>,
# property_crime <dbl>, burglary <dbl>, larceny <dbl>,
# motor_vehicle_theft <dbl>
Starting Gun ownership database
Now lets treat the other dataset and then we can unite them.
metadata2 <- read_csv("Book1.csv")Rows: 1850 Columns: 20
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): STATE
dbl (19): FIP, Year, HFR, HFR_se, universl, permit, Fem_FS_S, Male_FS_S, BRF...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(metadata2)# A tibble: 6 × 20
FIP Year STATE HFR HFR_se universl permit Fem_FS_S Male_FS_S BRFSS GALLUP
<dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1980 Alab… 0.608 0.031 0 0 0.824 0.834 -9 0.554
2 1 1981 Alab… 0.597 0.047 0 0 0.692 0.831 -9 -9
3 1 1982 Alab… 0.661 0.036 0 0 0.772 0.821 -9 -9
4 1 1983 Alab… 0.586 0.038 0 0 0.688 0.819 -9 0.611
5 1 1984 Alab… 0.624 0.036 0 0 0.71 0.776 -9 -9
6 1 1985 Alab… 0.644 0.031 0 0 0.756 0.835 -9 0.612
# ℹ 9 more variables: GSS <dbl>, PEW <dbl>, HuntLic <dbl>, GunsAmmo <dbl>,
# BackChk <dbl>, PewQChng <dbl>, BS1 <dbl>, BS2 <dbl>, BS3 <dbl>
Uniting the datasets
Upon inspection we saw that some values of feminine suicide were -9 in a original range og 0 to 1, for that we will use 0.
filtered2 <- select(metadata2,Year, STATE, HFR, Fem_FS_S, Male_FS_S)
filtered2$Fem_FS_S[filtered2$Fem_FS_S == -9] <- 0
head(filtered2)# A tibble: 6 × 5
Year STATE HFR Fem_FS_S Male_FS_S
<dbl> <chr> <dbl> <dbl> <dbl>
1 1980 Alabama 0.608 0.824 0.834
2 1981 Alabama 0.597 0.692 0.831
3 1982 Alabama 0.661 0.772 0.821
4 1983 Alabama 0.586 0.688 0.819
5 1984 Alabama 0.624 0.71 0.776
6 1985 Alabama 0.644 0.756 0.835
Uniting the datasets
Now that everything as the values we eant to use lets merge both datasets into one.
nrow(filtered2)[1] 1850
nrow(NONA1)[1] 1887
What?? they sould be the same. LEts See: 50 states times 2016-1979 years equals 1850. Why there are still 37 on the crimes data lets inspect it better.
unique(NONA1$state_name) [1] "Alaska" "Alabama" "Arkansas"
[4] "Arizona" "California" "Colorado"
[7] "Connecticut" "District of Columbia" "Delaware"
[10] "Florida" "Georgia" "Hawaii"
[13] "Iowa" "Idaho" "Illinois"
[16] "Indiana" "Kansas" "Kentucky"
[19] "Louisiana" "Massachusetts" "Maryland"
[22] "Maine" "Michigan" "Minnesota"
[25] "Missouri" "Mississippi" "Montana"
[28] "North Carolina" "North Dakota" "Nebraska"
[31] "New Hampshire" "New Jersey" "New Mexico"
[34] "Nevada" "New York" "Ohio"
[37] "Oklahoma" "Oregon" "Pennsylvania"
[40] "Rhode Island" "South Carolina" "South Dakota"
[43] "Tennessee" "Texas" "Utah"
[46] "Virginia" "Vermont" "Washington"
[49] "Wisconsin" "West Virginia" "Wyoming"
I see there is the Distric of collumbia here still. Lets take it out.
NEWNONA1 <- NONA1 |>
filter(state_name != "District of Columbia")
nrow(NEWNONA1)[1] 1850
New both have the same data set and we can unite them.
Final <- merge(NEWNONA1, filtered2, by.x = c("year", "state_name"), by.y = c("Year", "STATE"))
head(Final) year state_name state_abbr population violent_crime homicide rape_legacy
1 1980 Alabama AL 3861466 4.485343 0.13181522 0.2998861
2 1980 Alaska AK 440142 4.359957 0.08860777 0.5679985
3 1980 Arizona AZ 2715357 6.508536 0.10274892 0.4518743
4 1980 Arkansas AR 2284037 3.351960 0.09194247 0.2666332
5 1980 California CA 23532680 8.936084 0.14494737 0.5818717
6 1980 Colorado CO 2878407 5.285910 0.06878805 0.5245957
robbery aggravated_assault property_crime burglary larceny
1 0.0003421653 2.732382 44.85110 15.26674 26.42235
2 0.0018583045 2.885432 52.09682 12.59821 33.88906
3 0.0007131267 4.017520 75.19932 21.55407 48.91217
4 0.0003542384 2.184290 34.75863 11.19027 21.69755
5 0.0001632758 4.366948 69.39465 23.16515 38.80009
6 0.0005561706 3.091641 68.04875 20.30811 43.25795
motor_vehicle_theft HFR Fem_FS_S Male_FS_S
1 3.162012 0.608 0.8243243 0.8337950
2 5.609553 0.789 0.9230769 0.8727273
3 4.733079 0.563 0.4722222 0.6988636
4 1.870810 0.693 0.6774194 0.8676471
5 7.429413 0.366 0.3232759 0.5558670
6 4.482688 0.540 0.4074074 0.6464088
Relationships in the Dataset
Lets see how each variable can tell something about the other, lets see with we can observe a relationship with a regression model. Specificcly looking at variables the summerise the others, such as violant crimes and property crimes.
library(GGally)Warning: package 'GGally' was built under R version 4.3.3
Registered S3 method overwritten by 'GGally':
method from
+.gg ggplot2
ggpairs(Final, columns = c(5, 10, 14, 15, 16))Wow, look at the discoveries. The graphs above show the relations between the variables in the axis, we can see the male and female disparities in suicide rates by gun fire, are really direct -.634 and 0.760. About the other characteristics we see a very characteristic frontier in the bottom and upper limit limit that gives a arrowhead shape that is interesting. It says that if everyone as weapons there is not much crime, the same if anyone has weapons, but as it gets to the middle points.
ggpairs(Final, columns = c(6, 7, 8, 9, 11, 12, 13, 14))Here we have more interesting data. When we get more into the specifics we see a more direct lines and values that although not large are interesting.
create a model
Can we try to find the weapon in household by giving the crime and suicide rates?
fit_HFR<- lm(HFR ~ violent_crime + homicide + rape_legacy +robbery+aggravated_assault+property_crime+burglary+larceny+motor_vehicle_theft+Fem_FS_S+Male_FS_S, data = Final)
summary(fit_HFR)
Call:
lm(formula = HFR ~ violent_crime + homicide + rape_legacy + robbery +
aggravated_assault + property_crime + burglary + larceny +
motor_vehicle_theft + Fem_FS_S + Male_FS_S, data = Final)
Residuals:
Min 1Q Median 3Q Max
-0.157596 -0.036830 -0.002818 0.030869 0.232265
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.178652 0.008917 -20.035 < 2e-16 ***
violent_crime -0.020513 0.002790 -7.352 2.93e-13 ***
homicide -0.257499 0.081675 -3.153 0.001644 **
rape_legacy 0.131754 0.012899 10.215 < 2e-16 ***
robbery 13.731335 3.173423 4.327 1.59e-05 ***
aggravated_assault 0.011554 0.003360 3.439 0.000597 ***
property_crime 1.470617 1.226528 1.199 0.230679
burglary -1.470048 1.226534 -1.199 0.230862
larceny -1.468315 1.226524 -1.197 0.231408
motor_vehicle_theft -1.474989 1.226531 -1.203 0.229298
Fem_FS_S -0.006101 0.011814 -0.516 0.605642
Male_FS_S 0.965115 0.017429 55.375 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.05476 on 1838 degrees of freedom
Multiple R-squared: 0.8575, Adjusted R-squared: 0.8567
F-statistic: 1006 on 11 and 1838 DF, p-value: < 2.2e-16
autoplot(fit_HFR, 1:4, nrow=2, ncol=2)Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_segment()`).
As we see female homicide, burglary, larceny, motor_vehicle_theft, aggravated_assault, property_crime do not have anything to say about a gun in the household Lets see if we take that away we can see something better. The scale location tell a lot of the problems in the model. Lets use the ones with more significance in the correlation, specific crime values and high significance to the model.
fit_HFR<- lm(HFR ~ Male_FS_S+larceny+motor_vehicle_theft+ homicide + rape_legacy, data = Final)
summary(fit_HFR)
Call:
lm(formula = HFR ~ Male_FS_S + larceny + motor_vehicle_theft +
homicide + rape_legacy, data = Final)
Residuals:
Min 1Q Median 3Q Max
-0.162706 -0.037079 -0.002108 0.033587 0.242700
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.2053257 0.0079438 -25.847 <2e-16 ***
Male_FS_S 1.0031040 0.0129736 77.319 <2e-16 ***
larceny 0.0025816 0.0002315 11.152 <2e-16 ***
motor_vehicle_theft -0.0092045 0.0009790 -9.402 <2e-16 ***
homicide -0.7787315 0.0549467 -14.172 <2e-16 ***
rape_legacy 0.1100952 0.0113305 9.717 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.05677 on 1844 degrees of freedom
Multiple R-squared: 0.8464, Adjusted R-squared: 0.846
F-statistic: 2032 on 5 and 1844 DF, p-value: < 2.2e-16
autoplot(fit_HFR, 1:4, nrow=2, ncol=2)That is a better Adjusted R-squared value of 84% witch is really high. And the diagnostic plots look much better. That give us a model of:
Estimation of weapon rate in the household = 1.0x(Male suicide rate by fire weapon)+ 0.0024*larceny -0.009XMotor vehicle theft-0.8X(homicides)+0.1Xrape -0.2
And so we can, with a verry precise model actually.
plot exploration
Homicide and gun ownership though time by state.
P1<-ggplot(Final, aes(x=year))+
geom_point(Final, mapping=aes(y=HFR, x=year)) +
geom_line(Final, mapping=aes(y=violent_crime, x=year))
P1Well this is not nice, we have to many years in the data frame.
lets just do one at a time.
P2<-ggplot(Final, aes(x=year))+
geom_point(Final, mapping=aes(y=HFR, x=year))
P2P2<-ggplot(Final, aes(x=year))+
geom_point(Final, mapping=aes(y=HFR, x=year, group=state_abbr))
P2There are to many point to see, each year with 50 states. To have a meaningful plt I will need to use only part of it. Lets use tablet to see the points in a more historical manner, and use gun ownership only in 2016 to see how it is affecting it all.
Final <- Final |>
filter(year == 2016)P2<-ggplot(Final, mapping=aes(y=HFR, x=violent_crime))+
geom_point() +
geom_smooth(method='lm',formula=y~x)
P2Not much of a cohelation.
P2<-ggplot(Final, mapping=aes(y=HFR, x=burglary))+
geom_point() +
geom_smooth(method='lm',formula=y~x)
P2Now we see somthing here, really interreting, the more the household has of weapons the more it it likly to be invaded. That mught be because they want the guns. Lets see thie in top of maps.
plot 1
This dashboard on tableu was done by the team to see the transformation in these relationships trought time, out of curiosity. But the findings were choking. Please go see it.
plot 2
With the revelationsof the graphs showned in the team tableau page, we revolved not in exploring any particular year but to see the excalation of violance in the usa as a hole.
For the map we will need the coordinates of all sates:
us_states <- map_data("state")
Final$state_name<-tolower(Final$state_name)
head(Final) year state_name state_abbr population violent_crime homicide rape_legacy
1 2016 alabama AL 4860545 5.324094 0.08373547 0.2839188
2 2016 alaska AK 741522 8.045614 0.07012604 1.0168276
3 2016 arizona AZ 6908642 4.710332 0.05630629 0.3562205
4 2016 arkansas AR 2988231 5.542744 0.07261821 0.5354338
5 2016 california CA 39296476 4.448134 0.04911382 0.2578348
6 2016 colorado CO 5530105 3.441164 0.03417657 0.4846201
robbery aggravated_assault property_crime burglary larceny
1 1.983924e-04 3.882075 29.47386 7.004359 20.05907
2 1.545862e-03 5.409145 33.54722 5.465785 23.95883
3 1.476031e-04 3.156047 30.00836 5.531623 21.80139
4 2.379746e-04 4.018096 32.82611 7.969263 22.45174
5 3.548021e-05 2.656091 25.50025 4.791880 16.21036
6 1.152636e-04 2.112256 27.51232 4.308236 19.65243
motor_vehicle_theft HFR Fem_FS_S Male_FS_S
1 2.410429 0.528 0.6463415 0.7115385
2 4.122602 0.572 0.3695652 0.6530612
3 2.675345 0.360 0.4090909 0.6477157
4 2.405102 0.518 0.5106383 0.6256039
5 4.498011 0.163 0.1825476 0.4285714
6 3.551650 0.379 0.3371648 0.5788313
wmap <- merge(Final, us_states, sort = FALSE, by.x = "state_name", by.y = "region")
wmap<-wmap %>%
group_by(state_name)%>%
summarize(Estimate=HFR, burglary=burglary, lat=mean(lat), long=mean(long), StateName=unique(state_name), population=population)Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
always returns an ungrouped data frame and adjust accordingly.
`summarise()` has grouped output by 'state_name'. You can override using the
`.groups` argument.
wmap<-distinct(wmap)
head(wmap)# A tibble: 6 × 7
# Groups: state_name [6]
state_name Estimate burglary lat long StateName population
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 alabama 0.528 7.00 31.7 -86.9 alabama 4860545
2 arizona 0.36 5.53 34.5 -113. arizona 6908642
3 arkansas 0.518 7.97 34.6 -91.3 arkansas 2988231
4 california 0.163 4.79 36.7 -121. california 39296476
5 colorado 0.379 4.31 38.6 -105. colorado 5530105
6 connecticut 0.188 2.82 41.4 -72.7 connecticut 3587685
popout <- paste0(
"<b>State: </b>", wmap$StateName, "<br>",
"<b>Percentage of household with weapons:</b>", wmap$HFR*100,"<br>",
"<b>Burgulary per 1000 people: </b>", wmap$burglary, "<br>"
)Warning: Unknown or uninitialised column: `HFR`.
ohio_lat <- 38.000000
ohio_lon <--97.000000
leaflet() |>
setView(lng = ohio_lon, lat = ohio_lat, zoom =4.3) |>
addProviderTiles("Esri.WorldStreetMap") |>
addCircles(
data = wmap,
radius = wmap$Estimate*200000,
color = 'darkgray',
fillColor = 'black',
fillOpacity = wmap$burglary/10,
popup=popout
)Assuming "long" and "lat" are longitude and latitude, respectively
In the image above we can see a relationship in the smaller is of the bubble and the darker it is. The color is derived from burglar incidents per population and size from governorship.
Final Essay
The data tract here is really interesting is point to some relationship between the more guns a person has or as the freedom to have, the less crimes are committed in their regions. At least for the crimes listed here. The tableau is very concise in telling the tale historically. Another interesting fact was that ladies do not use weapons as much as men to comit suicide, and that how much that is correlated to the contact with weapons in the day to day life. One thing is to think about ending your own life, another is to have the means to do it, and the data show that if you have, you use it.
I believe there is needed to better understand all the factors that come into play, and although the data here shows some correlation correlation does noe mean causation and so more studies need to happen on the topic, for it all to be better understand, and expressed. And that the firearms culture in the states is alarming its own population.
Bibliography:
https://news.temple.edu/news/2022-12-06/understanding-america-relationship-firearms
https://www.rand.org/pubs/tools/TL354.html