Source and topic of the data

https://data.world/datasets/crime

Crime and Incarceration by state

Let’s me introduce my dataset topic!

My topic for this project 1 concerns different types of crimes committed by individuals in different states of the United States from 2001 to 2017.In my work, I will focus on three main variables, namely the quantitative variables related to manslaughter, vehicle theft and the categorical variable “jurisdiction” which is represented by the different states of the United States.

Load the libraries and view the dataset

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.2 --
## v ggplot2 3.3.6      v purrr   0.3.4 
## v tibble  3.1.8      v dplyr   1.0.10
## v tidyr   1.2.1      v stringr 1.4.1 
## v readr   2.1.2      v forcats 0.5.2 
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(readr)
library(ggplot2)
library(dplyr)
library(ggfortify)
library(RColorBrewer)
library(plotly)
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
library(htmltools)
library(scales)
## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor
library(alluvial)
library(ggalluvial)
ucr_by_state_1_ <- read_csv("C:/Users/claud/Downloads/ucr_by_state (1).csv")
## Rows: 869 Columns: 15
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): jurisdiction
## dbl (3): year, crime_reporting_change, crimes_estimated
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(ucr_by_state_1_)
## # A tibble: 6 x 15
##   jurisd~1  year crime~2 crime~3 state~4 viole~5 murde~6 rape_~7 rape_~8 robbery
##   <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 Alaska    2001       0       0  633630    3735      39     501      NA     514
## 2 Alaska    2002       0       0  641482    3627      33     511      NA     489
## 3 Alaska    2003       0       0  648280    3877      39     605      NA     446
## 4 Alaska    2004       0       0  657755    4159      37     558      NA     447
## 5 Alaska    2005       0       0  663253    4194      32     538      NA     537
## 6 Alaska    2006       0       0  670053    4610      36     512      NA     600
## # ... with 5 more variables: agg_assault <dbl>, property_crime_total <dbl>,
## #   burglary <dbl>, larceny <dbl>, vehicle_theft <dbl>, and abbreviated
## #   variable names 1: jurisdiction, 2: crime_reporting_change,
## #   3: crimes_estimated, 4: state_population, 5: violent_crime_total,
## #   6: murder_manslaughter, 7: rape_legacy, 8: rape_revised
ucr_by_state_1_ <- read_csv("C:/Users/claud/Downloads/ucr_by_state (1).csv")
## Rows: 869 Columns: 15
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): jurisdiction
## dbl (3): year, crime_reporting_change, crimes_estimated
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

Clean the dataset

check for N/A

Create vector with N/A values

na.cols <- which(colSums(is.na(ucr_by_state_1_)) >0)
sort(colSums(sapply(ucr_by_state_1_[na.cols], is.na)),decreasing = TRUE)
## rape_revised  rape_legacy 
##          612          104
paste('Number of columns with no values:', length(na.cols))
## [1] "Number of columns with no values: 2"

Remove uncessary values

Remove N/A Values from a vector

# Is N/A necessary for analysis?
ucr_by_state<- ucr_by_state_1_ %>%
  filter(!is.na(rape_revised ) & !is.na(rape_legacy))

which(is.na(ucr_by_state), arr.ind=TRUE)
##      row col
str(ucr_by_state)
## spec_tbl_df [153 x 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ jurisdiction          : chr [1:153] "Alaska" "Alaska" "Arizona" "Arizona" ...
##  $ year                  : num [1:153] 2013 2014 2013 2014 2013 ...
##  $ crime_reporting_change: num [1:153] 0 0 0 0 0 0 0 0 0 0 ...
##  $ crimes_estimated      : num [1:153] 0 0 0 0 0 0 0 0 0 0 ...
##  $ state_population      : num [1:153] 737259 736732 6634997 6731484 4833996 ...
##  $ violent_crime_total   : num [1:153] 4709 4684 27576 26916 20834 ...
##  $ murder_manslaughter   : num [1:153] 34 41 355 319 346 ...
##  $ rape_legacy           : num [1:153] 657 555 2344 2464 1449 ...
##  $ rape_revised          : num [1:153] 925 771 3174 3378 2055 ...
##  $ robbery               : num [1:153] 623 629 6656 6249 4645 ...
##  $ agg_assault           : num [1:153] 3127 3243 17391 16970 13788 ...
##  $ property_crime_total  : num [1:153] 21211 20334 223294 215240 161835 ...
##  $ burglary              : num [1:153] 2917 3150 48292 43562 42410 ...
##  $ larceny               : num [1:153] 16599 15445 158036 154091 108862 ...
##  $ vehicle_theft         : num [1:153] 1695 1739 16966 17587 10563 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   jurisdiction = col_character(),
##   ..   year = col_double(),
##   ..   crime_reporting_change = col_double(),
##   ..   crimes_estimated = col_double(),
##   ..   state_population = col_number(),
##   ..   violent_crime_total = col_number(),
##   ..   murder_manslaughter = col_number(),
##   ..   rape_legacy = col_number(),
##   ..   rape_revised = col_number(),
##   ..   robbery = col_number(),
##   ..   agg_assault = col_number(),
##   ..   property_crime_total = col_number(),
##   ..   burglary = col_number(),
##   ..   larceny = col_number(),
##   ..   vehicle_theft = col_number()
##   .. )
##  - attr(*, "problems")=<externalptr>

Summarize and View update vector

summary(ucr_by_state)
##  jurisdiction            year      crime_reporting_change crimes_estimated 
##  Length:153         Min.   :2013   Min.   :0.000000       Min.   :0.00000  
##  Class :character   1st Qu.:2013   1st Qu.:0.000000       1st Qu.:0.00000  
##  Mode  :character   Median :2014   Median :0.000000       Median :0.00000  
##                     Mean   :2014   Mean   :0.006536       Mean   :0.01307  
##                     3rd Qu.:2015   3rd Qu.:0.000000       3rd Qu.:0.00000  
##                     Max.   :2015   Max.   :1.000000       Max.   :1.00000  
##  state_population   violent_crime_total murder_manslaughter  rape_legacy  
##  Min.   :  583223   Min.   :   622      Min.   :  10.0      Min.   :  99  
##  1st Qu.: 1654930   1st Qu.:  5392      1st Qu.:  56.0      1st Qu.: 522  
##  Median : 4399583   Median : 15744      Median : 167.0      Median :1168  
##  Mean   : 6146744   Mean   : 23279      Mean   : 289.1      Mean   :1648  
##  3rd Qu.: 6828065   3rd Qu.: 26916      3rd Qu.: 399.0      3rd Qu.:1861  
##  Max.   :39144818   Max.   :166883      Max.   :1861.0      Max.   :9387  
##   rape_revised      robbery       agg_assault    property_crime_total
##  Min.   :  110   Min.   :   53   Min.   :  432   Min.   :   8806     
##  1st Qu.:  700   1st Qu.: 1045   1st Qu.: 3316   1st Qu.:  37717     
##  Median : 1646   Median : 3674   Median : 9470   Median : 114871     
##  Mean   : 2278   Mean   : 6395   Mean   :14318   Mean   : 161062     
##  3rd Qu.: 2531   3rd Qu.: 7114   3rd Qu.:17676   3rd Qu.: 187472     
##  Max.   :12811   Max.   :53640   Max.   :99349   Max.   :1024914     
##     burglary         larceny       vehicle_theft   
##  Min.   :  1689   Min.   :  6660   Min.   :   178  
##  1st Qu.:  7777   1st Qu.: 26898   1st Qu.:  3783  
##  Median : 23122   Median : 83385   Median :  8526  
##  Mean   : 34026   Mean   :113399   Mean   : 13636  
##  3rd Qu.: 40930   3rd Qu.:134514   3rd Qu.: 13630  
##  Max.   :232058   Max.   :656517   Max.   :170993

Create one data visualization with this dataset

As indicated above, to begin my visualization I will focus on murder_manslaughter perpetrated in the period from 2001 to 2017 more specifically in the Mid-Atlantic states which I will filter below.

ucr_by_state<- ucr_by_state_1_ %>%
mutate(murder_manslaughter) %>%
 filter(jurisdiction == "Delaware" |jurisdiction == "DC" | jurisdiction == "Maryland"| jurisdiction == "New Jersey" |jurisdiction == "New York" | jurisdiction == "Pennsylvania"| jurisdiction == "Virginia"|jurisdiction == "West Virginia")

Plot the Line/Point

Plot 1

The first plot presents the number of murder_manslaughter caused in the different states of the Mid-States from 2001 to 2017.

ggplot(ucr_by_state, aes(x=year, y=murder_manslaughter, color=jurisdiction))+
  geom_point(size = 1.2)+
  geom_line() +
theme_bw() +
  scale_color_brewer(palette = "Set1") +
 labs(x= "year", y= "Number of murder_manslaughter") +ggtitle("Mid-Atlantic states'murder_manslaugher reported")

Add a variable

In order to make my project more interesting, I will add a quantitative variable which will correspond to the murder-manslaughter’s rate according to the violent crime total of the mid-Atlantic states.

murderRate_by_state <- read_csv("C:/Users/claud/Downloads/ucr_by_state (1).csv")
## Rows: 869 Columns: 15
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): jurisdiction
## dbl (3): year, crime_reporting_change, crimes_estimated
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
murderRate_by_state<-mutate(murderRate_by_state,murderRate_by_state = murder_manslaughter*100/violent_crime_total)

Create Plot2

The variable added, i will create a plot that shows the rate of murder_manslaughter by violent_crime_total in Mid-Atlantic states

ucr_by_state2<-murderRate_by_state %>%
mutate(murderRate_by_state = murder_manslaughter*100/violent_crime_total) %>%
filter(jurisdiction == "Delaware" |jurisdiction == "DC" | jurisdiction == "Maryland"| jurisdiction == "New Jersey" |jurisdiction == "New York" | jurisdiction == "Pennsylvania"| jurisdiction == "Virginia"|jurisdiction == "West Virginia")
p2<-ucr_by_state2%>%
ggplot(aes(x=year, y=murderRate_by_state, alluvium=jurisdiction))+ 
 theme_bw() +
geom_alluvium(aes(fill = jurisdiction), 
                color = "white",
                width = .1, 
                alpha = .8,
                decreasing = FALSE) +
  scale_fill_brewer(palette = "Set3")+
 scale_fill_discrete(name = " State_jurisdiction")+
  labs(x= "year", y= "murderRate_by_state") +ggtitle("Mid-Atlantic states'rate of murder_manslaughter")
## Scale for 'fill' is already present. Adding another scale for 'fill', which
## will replace the existing scale.

Plot2- Alluvial

p2

For the rest of my project, I want to highlight a different variable which is vehicle_theft in order to compare it to the first variable I worked with.

Declare the variable and filter “juridiction”

ucr_by_state<- ucr_by_state_1_ %>%
mutate(vehicle_theft) %>%
 filter(jurisdiction == "Delaware" |jurisdiction == "DC" | jurisdiction == "Maryland"| jurisdiction == "New Jersey" |jurisdiction == "New York" | jurisdiction == "Pennsylvania"| jurisdiction == "Virginia"|jurisdiction == "West Virginia")

Plot3- Box Plot

The third plot presents the number of vehicle_theft caused in the different states of the Mid-States from 2001 to 2017. ’vehicle_theft reported”)

ggplot(ucr_by_state, aes(x=year,y=vehicle_theft, color=jurisdiction))+
  geom_boxplot() +
  coord_trans() +
  theme_grey() +
 labs(x= "year", y= "Number of vehicle_theft") +ggtitle("Mid_Atlantic states vehicle Theft reported")

Plot 4

Compare two different variables vehicle theft vs murder-manslaughter

In this last plot, I am going to highlight two types of crimes that are totally different. One which is involuntary and the other which in my opinion (vehicle theft) is more voluntary in order to see which of these homicides is more prevalent in the Mid-Atlantic states.

ggplot(ucr_by_state, aes(x=murder_manslaughter, y=vehicle_theft, color=jurisdiction))+
  geom_point(size = 1.2)+
  geom_smooth(method=lm, color="darkred")+
  labs(title="murder_manslaughter vs vehicle_Theft by jurisdiction",
       x="murder_manslaughter", y="vehicle_Theft", color="juridiction")
## `geom_smooth()` using formula 'y ~ x'

Short Essay

My visualization is focused primarily on violent crimes orchestrated by individuals willingly or unwittingly. So during my visualization, seeing the number of states, I decided to filter my focus on the Mid-Atlantic states which include Maryland and surrounding areas as I am there now, and it is often good to be informed about what surrounds us.It is for this purpose that the primary variable I treated was manslaughter, which is a criminal offense in the United States and is defined as homicide committed by a person who did not have the specific intent to cause death, such as road accidents, for example, or the person acted on impulse or through recklessness or negligence.For me, this variable was very important to deal with because nowadays we see that the number of manslaughter continues to increase and that many people lose their lives innocently because of the clumsiness of other individuals. And great was my surprise to find that the state of Maryland ranks third among the states with the highest number of manslaughter between 2001 and 2017 after New York and Pennsylvania (ref_Plot1). Through plot 1, we notice according to the curve, that the State of Maryland recorded a remarkable growth in the number of manslaughter between 2014 and 2017 while the figures in the other States were decreasing. Which should make us all be more careful. After that,given the increasing number of unintentional crimes, I wanted to know a little more about their positioning in the total violent crimes. So I opted to add an additional variable that helped me calculate the rate of unintentional crime over total violent crime and opted for an alluvial type plot. And again, I was surprised to see different results here now, it’s the state of Virginia that’s seeing a big increase in unintentional crime compared to total violent crime. Not wanting to work only with one variable, I decided to focus my interest on another type of variable which falls into the group of voluntary homicides, and which is very frequent in our States. It’s vehicle theft. I first wanted to see which state records the number of vehicle thefts and this one is New York, which didn’t really surprise me given its reputation for being a state in which this type of homicide is recurrent, but again, Maryland is the second state to record many cases of vehicle theft. (Ref-Plot3). Finally, in my last plot, I wanted to do a comparison to see between the results of the number of manslaughter crimes and vehicle thefts by state which ranks among the most at risk and what about the positioning of the state of Maryland. It is this effect that I have concluded that New York and Maryland are the states most exposed to these unintentional and willful crimes. (Ref-Plot4). After this visualization, I would have liked to explore more variables concerning manslaughter in other regions of the United States and in particular on the arrests and incarcerations of the main perpetrators of these crimes according to the states. Admittedly, we know that these references are only estimates and that there are probably more cases, but it should be interesting to also show how the law punishes the perpetrators of these crimes. But unfortunately, I did not have enough information to study this axis.