Final Project - Is crime decreasing in the US according to the FBI Uniform Crime Reporting (UCR) Program

Dmitriy Burtsev

Load library

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.6     v dplyr   1.0.4
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(htmltab)
## Warning: package 'htmltab' was built under R version 4.0.4

Acquire dataset from web to dataframe

The FBI collects these data through the Uniform Crime Reporting (UCR) Program. Crime in the United States, by Volume and Rate per 100,000 Inhabitants, 1997–2016 Violent crime includes the offenses of murder and nonnegligent manslaughter, rape (legacy definition), robbery, and aggravated assault. Property crime includes the offenses of burglary, larceny-theft, and motor vehicle theft. The UCR Program does not have sufficient data to estimate for arson.

url <- "https://ucr.fbi.gov/crime-in-the-u.s/2016/crime-in-the-u.s.-2016/topic-pages/tables/table-1"
dfrm = htmltab(doc = url, which = 1,rm_nodata_cols = F)
class(dfrm)
## [1] "data.frame"

Convert R data frame to tibble

I have to convert the original R data frame to Tibble data frame because our column names have spaces.

df = as_tibble(dfrm)
class(df)
## [1] "tbl_df"     "tbl"        "data.frame"

Get the structure of the data frame.

Function str Compactly Display the Structure of an Arbitrary R Object

str(df)
## tibble [20 x 22] (S3: tbl_df/tbl/data.frame)
##  $ Year                                     : chr [1:20] "1997" "1998" "1999" "2000" ...
##  $ Population                               : chr [1:20] "267,783,607" "270,248,003" "272,690,813" "281,421,906" ...
##  $ Violentcrime                             : chr [1:20] "1,636,096" "1,533,887" "1,426,044" "1,425,486" ...
##  $ Violent crime rate                       : chr [1:20] "611.0" "567.6" "523.0" "506.5" ...
##  $ Murder andnonnegligent manslaughter      : chr [1:20] "18,208" "16,974" "15,522" "15,586" ...
##  $ Murder and nonnegligent manslaughter rate: chr [1:20] "6.8" "6.3" "5.7" "5.5" ...
##  $ Rape(revised definition)                 : chr [1:20] NA NA NA NA ...
##  $ Rape(revised definition) rate            : chr [1:20] NA NA NA NA ...
##  $ Rape(legacy definition)                  : chr [1:20] "96,153" "93,144" "89,411" "90,178" ...
##  $ Rape(legacy definition) rate             : chr [1:20] "35.9" "34.5" "32.8" "32.0" ...
##  $ Robbery                                  : chr [1:20] "498,534" "447,186" "409,371" "408,016" ...
##  $ Robbery rate                             : chr [1:20] "186.2" "165.5" "150.1" "145.0" ...
##  $ Aggravated assault                       : chr [1:20] "1,023,201" "976,583" "911,740" "911,706" ...
##  $ Aggravated assault rate                  : chr [1:20] "382.1" "361.4" "334.3" "324.0" ...
##  $ Property crime                           : chr [1:20] "11,558,475" "10,951,827" "10,208,334" "10,182,584" ...
##  $ Property crime rate                      : chr [1:20] "4,316.3" "4,052.5" "3,743.6" "3,618.3" ...
##  $ Burglary                                 : chr [1:20] "2,460,526" "2,332,735" "2,100,739" "2,050,992" ...
##  $ Burglary rate                            : chr [1:20] "918.8" "863.2" "770.4" "728.8" ...
##  $ Larceny-theft                            : chr [1:20] "7,743,760" "7,376,311" "6,955,520" "6,971,590" ...
##  $ Larceny-theft rate                       : chr [1:20] "2,891.8" "2,729.5" "2,550.7" "2,477.3" ...
##  $ Motor vehicle theft                      : chr [1:20] "1,354,189" "1,242,781" "1,152,075" "1,160,002" ...
##  $ Motor vehicle theft rate                 : chr [1:20] "505.7" "459.9" "422.5" "412.2" ...

Data clean up and transformations - Remove Rows with NA

Show the number of NA’s in each column of the data frame

colSums(is.na(df))
##                                      Year 
##                                         0 
##                                Population 
##                                         0 
##                              Violentcrime 
##                                         0 
##                        Violent crime rate 
##                                         0 
##       Murder andnonnegligent manslaughter 
##                                         0 
## Murder and nonnegligent manslaughter rate 
##                                         0 
##                  Rape(revised definition) 
##                                        16 
##             Rape(revised definition) rate 
##                                        16 
##                   Rape(legacy definition) 
##                                         0 
##              Rape(legacy definition) rate 
##                                         0 
##                                   Robbery 
##                                         0 
##                              Robbery rate 
##                                         0 
##                        Aggravated assault 
##                                         0 
##                   Aggravated assault rate 
##                                         0 
##                            Property crime 
##                                         0 
##                       Property crime rate 
##                                         0 
##                                  Burglary 
##                                         0 
##                             Burglary rate 
##                                         0 
##                             Larceny-theft 
##                                         0 
##                        Larceny-theft rate 
##                                         0 
##                       Motor vehicle theft 
##                                         0 
##                  Motor vehicle theft rate 
##                                         0

There are two colums with NA: Rape(revised definition) and Rape(revised definition) rate

We should remove colums with NA

df = select(df, -c(`Rape(revised definition)`,`Rape(revised definition) rate`))
str(df)
## tibble [20 x 20] (S3: tbl_df/tbl/data.frame)
##  $ Year                                     : chr [1:20] "1997" "1998" "1999" "2000" ...
##  $ Population                               : chr [1:20] "267,783,607" "270,248,003" "272,690,813" "281,421,906" ...
##  $ Violentcrime                             : chr [1:20] "1,636,096" "1,533,887" "1,426,044" "1,425,486" ...
##  $ Violent crime rate                       : chr [1:20] "611.0" "567.6" "523.0" "506.5" ...
##  $ Murder andnonnegligent manslaughter      : chr [1:20] "18,208" "16,974" "15,522" "15,586" ...
##  $ Murder and nonnegligent manslaughter rate: chr [1:20] "6.8" "6.3" "5.7" "5.5" ...
##  $ Rape(legacy definition)                  : chr [1:20] "96,153" "93,144" "89,411" "90,178" ...
##  $ Rape(legacy definition) rate             : chr [1:20] "35.9" "34.5" "32.8" "32.0" ...
##  $ Robbery                                  : chr [1:20] "498,534" "447,186" "409,371" "408,016" ...
##  $ Robbery rate                             : chr [1:20] "186.2" "165.5" "150.1" "145.0" ...
##  $ Aggravated assault                       : chr [1:20] "1,023,201" "976,583" "911,740" "911,706" ...
##  $ Aggravated assault rate                  : chr [1:20] "382.1" "361.4" "334.3" "324.0" ...
##  $ Property crime                           : chr [1:20] "11,558,475" "10,951,827" "10,208,334" "10,182,584" ...
##  $ Property crime rate                      : chr [1:20] "4,316.3" "4,052.5" "3,743.6" "3,618.3" ...
##  $ Burglary                                 : chr [1:20] "2,460,526" "2,332,735" "2,100,739" "2,050,992" ...
##  $ Burglary rate                            : chr [1:20] "918.8" "863.2" "770.4" "728.8" ...
##  $ Larceny-theft                            : chr [1:20] "7,743,760" "7,376,311" "6,955,520" "6,971,590" ...
##  $ Larceny-theft rate                       : chr [1:20] "2,891.8" "2,729.5" "2,550.7" "2,477.3" ...
##  $ Motor vehicle theft                      : chr [1:20] "1,354,189" "1,242,781" "1,152,075" "1,160,002" ...
##  $ Motor vehicle theft rate                 : chr [1:20] "505.7" "459.9" "422.5" "412.2" ...

All colums in dataframe are string. We should change some to integers or numbers

df[["Year"]] = as.integer(df[["Year"]])
df[["Population"]] = as.integer(gsub(",", "", df[["Population"]], fixed = TRUE))
df[["Violentcrime"]] = as.integer(gsub(",", "", df[["Violentcrime"]], fixed = TRUE))
df[["Violent crime rate"]] = as.numeric(df[["Violent crime rate"]])
df[["Murder andnonnegligent manslaughter"]] = as.integer(gsub(",", "", df[["Murder andnonnegligent manslaughter"]], fixed = TRUE))
df[["Murder and nonnegligent manslaughter rate"]] = as.numeric(df[["Murder and nonnegligent manslaughter rate"]])
df[["Rape(legacy definition)"]] = as.integer(gsub(",", "", df[["Rape(legacy definition)"]], fixed = TRUE))
df[["Rape(legacy definition) rate"]] = as.numeric(df[["Rape(legacy definition) rate"]])
df[["Robbery"]] = as.integer(gsub(",", "", df[["Robbery"]], fixed = TRUE))
df[["Robbery rate"]] = as.numeric(df[["Robbery rate"]])
df[["Aggravated assault"]] = as.integer(gsub(",", "", df[["Aggravated assault"]], fixed = TRUE))
df[["Aggravated assault rate"]] = as.numeric(df[["Aggravated assault rate"]])
df[["Property crime"]] = as.integer(gsub(",", "", df[["Property crime"]], fixed = TRUE))
df[["Property crime rate"]] = as.numeric(gsub(",", "", df[["Property crime rate"]], fixed = TRUE))
df[["Burglary"]] = as.integer(gsub(",", "", df[["Burglary"]], fixed = TRUE))
df[["Burglary rate"]] = as.numeric(df[["Burglary rate"]])
df[["Larceny-theft"]] = as.integer(gsub(",", "", df[["Larceny-theft"]], fixed = TRUE))
df[["Larceny-theft rate"]] = as.numeric(gsub(",", "", df[["Larceny-theft rate"]], fixed = TRUE))
df[["Motor vehicle theft"]] = as.integer(gsub(",", "", df[["Motor vehicle theft"]], fixed = TRUE))
df[["Motor vehicle theft rate"]] = as.numeric(df[["Motor vehicle theft rate"]])

Create a table directly from R Markdown

knitr::kable(df, caption = 'Crime in the United States')
Crime in the United States
Year Population Violentcrime Violent crime rate Murder andnonnegligent manslaughter Murder and nonnegligent manslaughter rate Rape(legacy definition) Rape(legacy definition) rate Robbery Robbery rate Aggravated assault Aggravated assault rate Property crime Property crime rate Burglary Burglary rate Larceny-theft Larceny-theft rate Motor vehicle theft Motor vehicle theft rate
1997 267783607 1636096 611.0 18208 6.8 96153 35.9 498534 186.2 1023201 382.1 11558475 4316.3 2460526 918.8 7743760 2891.8 1354189 505.7
1998 270248003 1533887 567.6 16974 6.3 93144 34.5 447186 165.5 976583 361.4 10951827 4052.5 2332735 863.2 7376311 2729.5 1242781 459.9
1999 272690813 1426044 523.0 15522 5.7 89411 32.8 409371 150.1 911740 334.3 10208334 3743.6 2100739 770.4 6955520 2550.7 1152075 422.5
2000 281421906 1425486 506.5 15586 5.5 90178 32.0 408016 145.0 911706 324.0 10182584 3618.3 2050992 728.8 6971590 2477.3 1160002 412.2
2001 285317559 1439480 504.5 16037 5.6 90863 31.8 423557 148.5 909023 318.6 10437189 3658.1 2116531 741.8 7092267 2485.7 1228391 430.5
2002 287973924 1423677 494.4 16229 5.6 95235 33.1 420806 146.1 891407 309.5 10455277 3630.6 2151252 747.0 7057379 2450.7 1246646 432.9
2003 290788976 1383676 475.8 16528 5.7 93883 32.3 414235 142.5 859030 295.4 10442862 3591.2 2154834 741.0 7026802 2416.5 1261226 433.7
2004 293656842 1360088 463.2 16148 5.5 95089 32.4 401470 136.7 847381 288.6 10319386 3514.1 2144446 730.3 6937089 2362.3 1237851 421.5
2005 296507061 1390745 469.0 16740 5.6 94347 31.8 417438 140.8 862220 290.8 10174754 3431.5 2155448 726.9 6783447 2287.8 1235859 416.8
2006 299398484 1435123 479.3 17309 5.8 94472 31.6 449246 150.0 874096 292.0 10019601 3346.6 2194993 733.1 6626363 2213.2 1198245 400.2
2007 301621157 1422970 471.8 17128 5.7 92160 30.6 447324 148.3 866358 287.2 9882212 3276.4 2190198 726.1 6591542 2185.4 1100472 364.9
2008 304059724 1394461 458.6 16465 5.4 90750 29.8 443563 145.9 843683 277.5 9774152 3214.6 2228887 733.0 6586206 2166.1 959059 315.4
2009 307006550 1325896 431.9 15399 5.0 89241 29.1 408742 133.1 812514 264.7 9337060 3041.3 2203313 717.7 6338095 2064.5 795652 259.2
2010 309330219 1251248 404.5 14722 4.8 85593 27.7 369089 119.3 781844 252.8 9112625 2945.9 2168459 701.0 6204601 2005.8 739565 239.1
2011 311587816 1206005 387.1 14661 4.7 84175 27.0 354746 113.9 752423 241.5 9052743 2905.4 2185140 701.3 6151095 1974.1 716508 230.0
2012 313873685 1217057 387.8 14856 4.7 85141 27.1 355051 113.1 762009 242.8 9001992 2868.0 2109932 672.2 6168874 1965.4 723186 230.4
2013 316497531 1168298 369.1 14319 4.5 82109 25.9 345093 109.0 726777 229.6 8651892 2733.6 1932139 610.5 6019465 1901.9 700288 221.3
2014 318907401 1153022 361.6 14164 4.4 84864 26.6 322905 101.3 731089 229.2 8209010 2574.1 1713153 537.2 5809054 1821.5 686803 215.4
2015 320896618 1199310 373.7 15883 4.9 91261 28.4 328109 102.2 764057 238.1 8024115 2500.5 1587564 494.7 5723488 1783.6 713063 222.2
2016 323127513 1248185 386.3 17250 5.3 95730 29.6 332198 102.8 803007 248.5 7919035 2450.7 1515096 468.9 5638455 1745.0 765484 236.9

Summary of Data in Data Frame

The statistical summary and nature of the data can be obtained by applying summary() function

print(summary(df))  
##       Year        Population         Violentcrime     Violent crime rate
##  Min.   :1997   Min.   :267783607   Min.   :1153022   Min.   :361.6     
##  1st Qu.:2002   1st Qu.:287309833   1st Qu.:1240403   1st Qu.:387.6     
##  Median :2006   Median :300509820   Median :1387211   Median :466.1     
##  Mean   :2006   Mean   :298634769   Mean   :1352038   Mean   :456.3     
##  3rd Qu.:2011   3rd Qu.:312159283   3rd Qu.:1425626   3rd Qu.:496.9     
##  Max.   :2016   Max.   :323127513   Max.   :1636096   Max.   :611.0     
##  Murder andnonnegligent manslaughter Murder and nonnegligent manslaughter rate
##  Min.   :14164                       Min.   :4.400                            
##  1st Qu.:15263                       1st Qu.:4.875                            
##  Median :16092                       Median :5.500                            
##  Mean   :16006                       Mean   :5.375                            
##  3rd Qu.:16799                       3rd Qu.:5.700                            
##  Max.   :18208                       Max.   :6.800                            
##  Rape(legacy definition) Rape(legacy definition) rate    Robbery      
##  Min.   :82109           Min.   :25.90                Min.   :322905  
##  1st Qu.:88329           1st Qu.:28.23                1st Qu.:354975  
##  Median :91062           Median :31.10                Median :409057  
##  Mean   :90690           Mean   :30.50                Mean   :399834  
##  3rd Qu.:94378           3rd Qu.:32.33                3rd Qu.:428559  
##  Max.   :96153           Max.   :35.90                Max.   :498534  
##   Robbery rate   Aggravated assault Aggravated assault rate Property crime    
##  Min.   :101.3   Min.   : 726777    Min.   :229.2           Min.   : 7919035  
##  1st Qu.:113.7   1st Qu.: 777397    1st Qu.:247.1           1st Qu.: 9040055  
##  Median :141.7   Median : 853206    Median :287.9           Median : 9950906  
##  Mean   :135.0   Mean   : 845507    Mean   :285.4           Mean   : 9685756  
##  3rd Qu.:148.3   3rd Qu.: 895811    3rd Qu.:311.8           3rd Qu.:10348837  
##  Max.   :186.2   Max.   :1023201    Max.   :382.1           Max.   :11558475  
##  Property crime rate    Burglary       Burglary rate   Larceny-theft    
##  Min.   :2451        Min.   :1515096   Min.   :468.9   Min.   :5638455  
##  1st Qu.:2896        1st Qu.:2088302   1st Qu.:693.8   1st Qu.:6164429  
##  Median :3312        Median :2153043   Median :727.9   Median :6608952  
##  Mean   :3271        Mean   :2084819   Mean   :703.2   Mean   :6590070  
##  3rd Qu.:3621        3rd Qu.:2191397   3rd Qu.:741.2   3rd Qu.:6985393  
##  Max.   :4316        Max.   :2460526   Max.   :918.8   Max.   :7743760  
##  Larceny-theft rate Motor vehicle theft Motor vehicle theft rate
##  Min.   :1745       Min.   : 686803     Min.   :215.4           
##  1st Qu.:1972       1st Qu.: 735470     1st Qu.:235.3           
##  Median :2199       Median :1126274     Median :382.6           
##  Mean   :2224       Mean   :1010867     Mean   :343.5           
##  3rd Qu.:2457       3rd Qu.:1236357     3rd Qu.:424.5           
##  Max.   :2892       Max.   :1354189     Max.   :505.7

Pivot Year and Violentcrime columns from dataFrame

df_year_violent = select(df, Year, Violentcrime)
tbl2 = df_year_violent %>% pivot_wider(names_from = Year, values_from = c(Violentcrime))
knitr::kable(tbl2, caption = 'Crime in the United States by Year')
Crime in the United States by Year
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
1636096 1533887 1426044 1425486 1439480 1423677 1383676 1360088 1390745 1435123 1422970 1394461 1325896 1251248 1206005 1217057 1168298 1153022 1199310 1248185

Statistical analysys

ggplot(data = df, aes(x=Year)) + geom_point(aes(y = `Violent crime rate`, color = "Violent crime rate")) + 
geom_point(aes(y = `Murder and nonnegligent manslaughter rate`, color = "Murder and nonnegligent manslaughter rate")) +
geom_point(aes(y = `Rape(legacy definition) rate`, color = "Rape(legacy definition) rate")) +
geom_point(aes(y = `Robbery rate`, color = "Robbery rate")) +
geom_point(aes(y = `Aggravated assault rate`, color = "Aggravated assault rate")) + 
geom_point(aes(y = `Property crime rate`, color = "Property crime rate")) +
geom_point(aes(y = `Burglary rate`, color = "Burglary rate")) +  
geom_point(aes(y = `Larceny-theft rate`, color = "Larceny-theft rate")) +
geom_point(aes(y = `Motor vehicle theft rate`, color = "Motor vehicle theft rate"))  

I visualize the results of your simple linear regression.

Add the regression line using geom_smooth() and typing in lm as method for creating the line. I used linear regression.

df.graph<-ggplot(df, aes(x = Year, y=`Violent crime rate`)) + geom_point() + geom_smooth(method="lm", col="black")
df.graph
## `geom_smooth()` using formula 'y ~ x'

Conclusion

Violent crime is decreasing over the years (1997-2016). We have a lover Rate per 100,000 Inhabitants in 2016 than in 1997. There is a small increase in crime from 2014 to 2016. Unfortunately FBI doesn’t publish the data belong the 2016 year.