My topic for this project 1 concerns different types of crimes committed by individuals in different states of the United States from 2001 to 2017.In my work, I will focus on three main variables, namely the quantitative variables related to manslaughter, vehicle theft and the categorical variable “jurisdiction” which is represented by the different states of the United States.
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.2 --
## v ggplot2 3.3.6 v purrr 0.3.4
## v tibble 3.1.8 v dplyr 1.0.10
## v tidyr 1.2.1 v stringr 1.4.1
## v readr 2.1.2 v forcats 0.5.2
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(readr)
library(ggplot2)
library(dplyr)
library(ggfortify)
library(RColorBrewer)
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(htmltools)
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
library(alluvial)
library(ggalluvial)
ucr_by_state_1_ <- read_csv("C:/Users/claud/Downloads/ucr_by_state (1).csv")
## Rows: 869 Columns: 15
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): jurisdiction
## dbl (3): year, crime_reporting_change, crimes_estimated
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(ucr_by_state_1_)
## # A tibble: 6 x 15
## jurisd~1 year crime~2 crime~3 state~4 viole~5 murde~6 rape_~7 rape_~8 robbery
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Alaska 2001 0 0 633630 3735 39 501 NA 514
## 2 Alaska 2002 0 0 641482 3627 33 511 NA 489
## 3 Alaska 2003 0 0 648280 3877 39 605 NA 446
## 4 Alaska 2004 0 0 657755 4159 37 558 NA 447
## 5 Alaska 2005 0 0 663253 4194 32 538 NA 537
## 6 Alaska 2006 0 0 670053 4610 36 512 NA 600
## # ... with 5 more variables: agg_assault <dbl>, property_crime_total <dbl>,
## # burglary <dbl>, larceny <dbl>, vehicle_theft <dbl>, and abbreviated
## # variable names 1: jurisdiction, 2: crime_reporting_change,
## # 3: crimes_estimated, 4: state_population, 5: violent_crime_total,
## # 6: murder_manslaughter, 7: rape_legacy, 8: rape_revised
ucr_by_state_1_ <- read_csv("C:/Users/claud/Downloads/ucr_by_state (1).csv")
## Rows: 869 Columns: 15
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): jurisdiction
## dbl (3): year, crime_reporting_change, crimes_estimated
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Create vector with N/A values
na.cols <- which(colSums(is.na(ucr_by_state_1_)) >0)
sort(colSums(sapply(ucr_by_state_1_[na.cols], is.na)),decreasing = TRUE)
## rape_revised rape_legacy
## 612 104
paste('Number of columns with no values:', length(na.cols))
## [1] "Number of columns with no values: 2"
Remove N/A Values from a vector
# Is N/A necessary for analysis?
ucr_by_state<- ucr_by_state_1_ %>%
filter(!is.na(rape_revised ) & !is.na(rape_legacy))
which(is.na(ucr_by_state), arr.ind=TRUE)
## row col
str(ucr_by_state)
## spec_tbl_df [153 x 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ jurisdiction : chr [1:153] "Alaska" "Alaska" "Arizona" "Arizona" ...
## $ year : num [1:153] 2013 2014 2013 2014 2013 ...
## $ crime_reporting_change: num [1:153] 0 0 0 0 0 0 0 0 0 0 ...
## $ crimes_estimated : num [1:153] 0 0 0 0 0 0 0 0 0 0 ...
## $ state_population : num [1:153] 737259 736732 6634997 6731484 4833996 ...
## $ violent_crime_total : num [1:153] 4709 4684 27576 26916 20834 ...
## $ murder_manslaughter : num [1:153] 34 41 355 319 346 ...
## $ rape_legacy : num [1:153] 657 555 2344 2464 1449 ...
## $ rape_revised : num [1:153] 925 771 3174 3378 2055 ...
## $ robbery : num [1:153] 623 629 6656 6249 4645 ...
## $ agg_assault : num [1:153] 3127 3243 17391 16970 13788 ...
## $ property_crime_total : num [1:153] 21211 20334 223294 215240 161835 ...
## $ burglary : num [1:153] 2917 3150 48292 43562 42410 ...
## $ larceny : num [1:153] 16599 15445 158036 154091 108862 ...
## $ vehicle_theft : num [1:153] 1695 1739 16966 17587 10563 ...
## - attr(*, "spec")=
## .. cols(
## .. jurisdiction = col_character(),
## .. year = col_double(),
## .. crime_reporting_change = col_double(),
## .. crimes_estimated = col_double(),
## .. state_population = col_number(),
## .. violent_crime_total = col_number(),
## .. murder_manslaughter = col_number(),
## .. rape_legacy = col_number(),
## .. rape_revised = col_number(),
## .. robbery = col_number(),
## .. agg_assault = col_number(),
## .. property_crime_total = col_number(),
## .. burglary = col_number(),
## .. larceny = col_number(),
## .. vehicle_theft = col_number()
## .. )
## - attr(*, "problems")=<externalptr>
summary(ucr_by_state)
## jurisdiction year crime_reporting_change crimes_estimated
## Length:153 Min. :2013 Min. :0.000000 Min. :0.00000
## Class :character 1st Qu.:2013 1st Qu.:0.000000 1st Qu.:0.00000
## Mode :character Median :2014 Median :0.000000 Median :0.00000
## Mean :2014 Mean :0.006536 Mean :0.01307
## 3rd Qu.:2015 3rd Qu.:0.000000 3rd Qu.:0.00000
## Max. :2015 Max. :1.000000 Max. :1.00000
## state_population violent_crime_total murder_manslaughter rape_legacy
## Min. : 583223 Min. : 622 Min. : 10.0 Min. : 99
## 1st Qu.: 1654930 1st Qu.: 5392 1st Qu.: 56.0 1st Qu.: 522
## Median : 4399583 Median : 15744 Median : 167.0 Median :1168
## Mean : 6146744 Mean : 23279 Mean : 289.1 Mean :1648
## 3rd Qu.: 6828065 3rd Qu.: 26916 3rd Qu.: 399.0 3rd Qu.:1861
## Max. :39144818 Max. :166883 Max. :1861.0 Max. :9387
## rape_revised robbery agg_assault property_crime_total
## Min. : 110 Min. : 53 Min. : 432 Min. : 8806
## 1st Qu.: 700 1st Qu.: 1045 1st Qu.: 3316 1st Qu.: 37717
## Median : 1646 Median : 3674 Median : 9470 Median : 114871
## Mean : 2278 Mean : 6395 Mean :14318 Mean : 161062
## 3rd Qu.: 2531 3rd Qu.: 7114 3rd Qu.:17676 3rd Qu.: 187472
## Max. :12811 Max. :53640 Max. :99349 Max. :1024914
## burglary larceny vehicle_theft
## Min. : 1689 Min. : 6660 Min. : 178
## 1st Qu.: 7777 1st Qu.: 26898 1st Qu.: 3783
## Median : 23122 Median : 83385 Median : 8526
## Mean : 34026 Mean :113399 Mean : 13636
## 3rd Qu.: 40930 3rd Qu.:134514 3rd Qu.: 13630
## Max. :232058 Max. :656517 Max. :170993
As indicated above, to begin my visualization I will focus on murder_manslaughter perpetrated in the period from 2001 to 2017 more specifically in the Mid-Atlantic states which I will filter below.
ucr_by_state<- ucr_by_state_1_ %>%
mutate(murder_manslaughter) %>%
filter(jurisdiction == "Delaware" |jurisdiction == "DC" | jurisdiction == "Maryland"| jurisdiction == "New Jersey" |jurisdiction == "New York" | jurisdiction == "Pennsylvania"| jurisdiction == "Virginia"|jurisdiction == "West Virginia")
The first plot presents the number of murder_manslaughter caused in the different states of the Mid-States from 2001 to 2017.
ggplot(ucr_by_state, aes(x=year, y=murder_manslaughter, color=jurisdiction))+
geom_point(size = 1.2)+
geom_line() +
theme_bw() +
scale_color_brewer(palette = "Set1") +
labs(x= "year", y= "Number of murder_manslaughter") +ggtitle("Mid-Atlantic states'murder_manslaugher reported")
In order to make my project more interesting, I will add a quantitative variable which will correspond to the murder-manslaughter’s rate according to the violent crime total of the mid-Atlantic states.
murderRate_by_state <- read_csv("C:/Users/claud/Downloads/ucr_by_state (1).csv")
## Rows: 869 Columns: 15
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): jurisdiction
## dbl (3): year, crime_reporting_change, crimes_estimated
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
murderRate_by_state<-mutate(murderRate_by_state,murderRate_by_state = murder_manslaughter*100/violent_crime_total)
The variable added, i will create a plot that shows the rate of murder_manslaughter by violent_crime_total in Mid-Atlantic states
ucr_by_state2<-murderRate_by_state %>%
mutate(murderRate_by_state = murder_manslaughter*100/violent_crime_total) %>%
filter(jurisdiction == "Delaware" |jurisdiction == "DC" | jurisdiction == "Maryland"| jurisdiction == "New Jersey" |jurisdiction == "New York" | jurisdiction == "Pennsylvania"| jurisdiction == "Virginia"|jurisdiction == "West Virginia")
p2<-ucr_by_state2%>%
ggplot(aes(x=year, y=murderRate_by_state, alluvium=jurisdiction))+
theme_bw() +
geom_alluvium(aes(fill = jurisdiction),
color = "white",
width = .1,
alpha = .8,
decreasing = FALSE) +
scale_fill_brewer(palette = "Set3")+
scale_fill_discrete(name = " State_jurisdiction")+
labs(x= "year", y= "murderRate_by_state") +ggtitle("Mid-Atlantic states'rate of murder_manslaughter")
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.
p2
For the rest of my project, I want to highlight a different variable which is vehicle_theft in order to compare it to the first variable I worked with.
ucr_by_state<- ucr_by_state_1_ %>%
mutate(vehicle_theft) %>%
filter(jurisdiction == "Delaware" |jurisdiction == "DC" | jurisdiction == "Maryland"| jurisdiction == "New Jersey" |jurisdiction == "New York" | jurisdiction == "Pennsylvania"| jurisdiction == "Virginia"|jurisdiction == "West Virginia")
The third plot presents the number of vehicle_theft caused in the different states of the Mid-States from 2001 to 2017. ’vehicle_theft reported”)
ggplot(ucr_by_state, aes(x=year,y=vehicle_theft, color=jurisdiction))+
geom_boxplot() +
coord_trans() +
theme_grey() +
labs(x= "year", y= "Number of vehicle_theft") +ggtitle("Mid_Atlantic states vehicle Theft reported")
In this last plot, I am going to highlight two types of crimes that are totally different. One which is involuntary and the other which in my opinion (vehicle theft) is more voluntary in order to see which of these homicides is more prevalent in the Mid-Atlantic states.
ggplot(ucr_by_state, aes(x=murder_manslaughter, y=vehicle_theft, color=jurisdiction))+
geom_point(size = 1.2)+
geom_smooth(method=lm, color="darkred")+
labs(title="murder_manslaughter vs vehicle_Theft by jurisdiction",
x="murder_manslaughter", y="vehicle_Theft", color="juridiction")
## `geom_smooth()` using formula 'y ~ x'
My visualization is focused primarily on violent crimes orchestrated by individuals willingly or unwittingly. So during my visualization, seeing the number of states, I decided to filter my focus on the Mid-Atlantic states which include Maryland and surrounding areas as I am there now, and it is often good to be informed about what surrounds us.It is for this purpose that the primary variable I treated was manslaughter, which is a criminal offense in the United States and is defined as homicide committed by a person who did not have the specific intent to cause death, such as road accidents, for example, or the person acted on impulse or through recklessness or negligence.For me, this variable was very important to deal with because nowadays we see that the number of manslaughter continues to increase and that many people lose their lives innocently because of the clumsiness of other individuals. And great was my surprise to find that the state of Maryland ranks third among the states with the highest number of manslaughter between 2001 and 2017 after New York and Pennsylvania (ref_Plot1). Through plot 1, we notice according to the curve, that the State of Maryland recorded a remarkable growth in the number of manslaughter between 2014 and 2017 while the figures in the other States were decreasing. Which should make us all be more careful. After that,given the increasing number of unintentional crimes, I wanted to know a little more about their positioning in the total violent crimes. So I opted to add an additional variable that helped me calculate the rate of unintentional crime over total violent crime and opted for an alluvial type plot. And again, I was surprised to see different results here now, it’s the state of Virginia that’s seeing a big increase in unintentional crime compared to total violent crime. Not wanting to work only with one variable, I decided to focus my interest on another type of variable which falls into the group of voluntary homicides, and which is very frequent in our States. It’s vehicle theft. I first wanted to see which state records the number of vehicle thefts and this one is New York, which didn’t really surprise me given its reputation for being a state in which this type of homicide is recurrent, but again, Maryland is the second state to record many cases of vehicle theft. (Ref-Plot3). Finally, in my last plot, I wanted to do a comparison to see between the results of the number of manslaughter crimes and vehicle thefts by state which ranks among the most at risk and what about the positioning of the state of Maryland. It is this effect that I have concluded that New York and Maryland are the states most exposed to these unintentional and willful crimes. (Ref-Plot4). After this visualization, I would have liked to explore more variables concerning manslaughter in other regions of the United States and in particular on the arrests and incarcerations of the main perpetrators of these crimes according to the states. Admittedly, we know that these references are only estimates and that there are probably more cases, but it should be interesting to also show how the law punishes the perpetrators of these crimes. But unfortunately, I did not have enough information to study this axis.