Dmitriy Burtsev
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.6 v dplyr 1.0.4
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Warning: package 'htmltab' was built under R version 4.0.4
The FBI collects these data through the Uniform Crime Reporting (UCR) Program. Crime in the United States, by Volume and Rate per 100,000 Inhabitants, 1997–2016 Violent crime includes the offenses of murder and nonnegligent manslaughter, rape (legacy definition), robbery, and aggravated assault. Property crime includes the offenses of burglary, larceny-theft, and motor vehicle theft. The UCR Program does not have sufficient data to estimate for arson.
url <- "https://ucr.fbi.gov/crime-in-the-u.s/2016/crime-in-the-u.s.-2016/topic-pages/tables/table-1"
dfrm = htmltab(doc = url, which = 1,rm_nodata_cols = F)
class(dfrm)## [1] "data.frame"
I have to convert the original R data frame to Tibble data frame because our column names have spaces.
## [1] "tbl_df" "tbl" "data.frame"
Function str Compactly Display the Structure of an Arbitrary R Object
## tibble [20 x 22] (S3: tbl_df/tbl/data.frame)
## $ Year : chr [1:20] "1997" "1998" "1999" "2000" ...
## $ Population : chr [1:20] "267,783,607" "270,248,003" "272,690,813" "281,421,906" ...
## $ Violentcrime : chr [1:20] "1,636,096" "1,533,887" "1,426,044" "1,425,486" ...
## $ Violent crime rate : chr [1:20] "611.0" "567.6" "523.0" "506.5" ...
## $ Murder andnonnegligent manslaughter : chr [1:20] "18,208" "16,974" "15,522" "15,586" ...
## $ Murder and nonnegligent manslaughter rate: chr [1:20] "6.8" "6.3" "5.7" "5.5" ...
## $ Rape(revised definition) : chr [1:20] NA NA NA NA ...
## $ Rape(revised definition) rate : chr [1:20] NA NA NA NA ...
## $ Rape(legacy definition) : chr [1:20] "96,153" "93,144" "89,411" "90,178" ...
## $ Rape(legacy definition) rate : chr [1:20] "35.9" "34.5" "32.8" "32.0" ...
## $ Robbery : chr [1:20] "498,534" "447,186" "409,371" "408,016" ...
## $ Robbery rate : chr [1:20] "186.2" "165.5" "150.1" "145.0" ...
## $ Aggravated assault : chr [1:20] "1,023,201" "976,583" "911,740" "911,706" ...
## $ Aggravated assault rate : chr [1:20] "382.1" "361.4" "334.3" "324.0" ...
## $ Property crime : chr [1:20] "11,558,475" "10,951,827" "10,208,334" "10,182,584" ...
## $ Property crime rate : chr [1:20] "4,316.3" "4,052.5" "3,743.6" "3,618.3" ...
## $ Burglary : chr [1:20] "2,460,526" "2,332,735" "2,100,739" "2,050,992" ...
## $ Burglary rate : chr [1:20] "918.8" "863.2" "770.4" "728.8" ...
## $ Larceny-theft : chr [1:20] "7,743,760" "7,376,311" "6,955,520" "6,971,590" ...
## $ Larceny-theft rate : chr [1:20] "2,891.8" "2,729.5" "2,550.7" "2,477.3" ...
## $ Motor vehicle theft : chr [1:20] "1,354,189" "1,242,781" "1,152,075" "1,160,002" ...
## $ Motor vehicle theft rate : chr [1:20] "505.7" "459.9" "422.5" "412.2" ...
Show the number of NA’s in each column of the data frame
## Year
## 0
## Population
## 0
## Violentcrime
## 0
## Violent crime rate
## 0
## Murder andnonnegligent manslaughter
## 0
## Murder and nonnegligent manslaughter rate
## 0
## Rape(revised definition)
## 16
## Rape(revised definition) rate
## 16
## Rape(legacy definition)
## 0
## Rape(legacy definition) rate
## 0
## Robbery
## 0
## Robbery rate
## 0
## Aggravated assault
## 0
## Aggravated assault rate
## 0
## Property crime
## 0
## Property crime rate
## 0
## Burglary
## 0
## Burglary rate
## 0
## Larceny-theft
## 0
## Larceny-theft rate
## 0
## Motor vehicle theft
## 0
## Motor vehicle theft rate
## 0
We should remove colums with NA
## tibble [20 x 20] (S3: tbl_df/tbl/data.frame)
## $ Year : chr [1:20] "1997" "1998" "1999" "2000" ...
## $ Population : chr [1:20] "267,783,607" "270,248,003" "272,690,813" "281,421,906" ...
## $ Violentcrime : chr [1:20] "1,636,096" "1,533,887" "1,426,044" "1,425,486" ...
## $ Violent crime rate : chr [1:20] "611.0" "567.6" "523.0" "506.5" ...
## $ Murder andnonnegligent manslaughter : chr [1:20] "18,208" "16,974" "15,522" "15,586" ...
## $ Murder and nonnegligent manslaughter rate: chr [1:20] "6.8" "6.3" "5.7" "5.5" ...
## $ Rape(legacy definition) : chr [1:20] "96,153" "93,144" "89,411" "90,178" ...
## $ Rape(legacy definition) rate : chr [1:20] "35.9" "34.5" "32.8" "32.0" ...
## $ Robbery : chr [1:20] "498,534" "447,186" "409,371" "408,016" ...
## $ Robbery rate : chr [1:20] "186.2" "165.5" "150.1" "145.0" ...
## $ Aggravated assault : chr [1:20] "1,023,201" "976,583" "911,740" "911,706" ...
## $ Aggravated assault rate : chr [1:20] "382.1" "361.4" "334.3" "324.0" ...
## $ Property crime : chr [1:20] "11,558,475" "10,951,827" "10,208,334" "10,182,584" ...
## $ Property crime rate : chr [1:20] "4,316.3" "4,052.5" "3,743.6" "3,618.3" ...
## $ Burglary : chr [1:20] "2,460,526" "2,332,735" "2,100,739" "2,050,992" ...
## $ Burglary rate : chr [1:20] "918.8" "863.2" "770.4" "728.8" ...
## $ Larceny-theft : chr [1:20] "7,743,760" "7,376,311" "6,955,520" "6,971,590" ...
## $ Larceny-theft rate : chr [1:20] "2,891.8" "2,729.5" "2,550.7" "2,477.3" ...
## $ Motor vehicle theft : chr [1:20] "1,354,189" "1,242,781" "1,152,075" "1,160,002" ...
## $ Motor vehicle theft rate : chr [1:20] "505.7" "459.9" "422.5" "412.2" ...
df[["Year"]] = as.integer(df[["Year"]])
df[["Population"]] = as.integer(gsub(",", "", df[["Population"]], fixed = TRUE))
df[["Violentcrime"]] = as.integer(gsub(",", "", df[["Violentcrime"]], fixed = TRUE))
df[["Violent crime rate"]] = as.numeric(df[["Violent crime rate"]])
df[["Murder andnonnegligent manslaughter"]] = as.integer(gsub(",", "", df[["Murder andnonnegligent manslaughter"]], fixed = TRUE))
df[["Murder and nonnegligent manslaughter rate"]] = as.numeric(df[["Murder and nonnegligent manslaughter rate"]])
df[["Rape(legacy definition)"]] = as.integer(gsub(",", "", df[["Rape(legacy definition)"]], fixed = TRUE))
df[["Rape(legacy definition) rate"]] = as.numeric(df[["Rape(legacy definition) rate"]])
df[["Robbery"]] = as.integer(gsub(",", "", df[["Robbery"]], fixed = TRUE))
df[["Robbery rate"]] = as.numeric(df[["Robbery rate"]])
df[["Aggravated assault"]] = as.integer(gsub(",", "", df[["Aggravated assault"]], fixed = TRUE))
df[["Aggravated assault rate"]] = as.numeric(df[["Aggravated assault rate"]])
df[["Property crime"]] = as.integer(gsub(",", "", df[["Property crime"]], fixed = TRUE))
df[["Property crime rate"]] = as.numeric(gsub(",", "", df[["Property crime rate"]], fixed = TRUE))
df[["Burglary"]] = as.integer(gsub(",", "", df[["Burglary"]], fixed = TRUE))
df[["Burglary rate"]] = as.numeric(df[["Burglary rate"]])
df[["Larceny-theft"]] = as.integer(gsub(",", "", df[["Larceny-theft"]], fixed = TRUE))
df[["Larceny-theft rate"]] = as.numeric(gsub(",", "", df[["Larceny-theft rate"]], fixed = TRUE))
df[["Motor vehicle theft"]] = as.integer(gsub(",", "", df[["Motor vehicle theft"]], fixed = TRUE))
df[["Motor vehicle theft rate"]] = as.numeric(df[["Motor vehicle theft rate"]])| Year | Population | Violentcrime | Violent crime rate | Murder andnonnegligent manslaughter | Murder and nonnegligent manslaughter rate | Rape(legacy definition) | Rape(legacy definition) rate | Robbery | Robbery rate | Aggravated assault | Aggravated assault rate | Property crime | Property crime rate | Burglary | Burglary rate | Larceny-theft | Larceny-theft rate | Motor vehicle theft | Motor vehicle theft rate |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1997 | 267783607 | 1636096 | 611.0 | 18208 | 6.8 | 96153 | 35.9 | 498534 | 186.2 | 1023201 | 382.1 | 11558475 | 4316.3 | 2460526 | 918.8 | 7743760 | 2891.8 | 1354189 | 505.7 |
| 1998 | 270248003 | 1533887 | 567.6 | 16974 | 6.3 | 93144 | 34.5 | 447186 | 165.5 | 976583 | 361.4 | 10951827 | 4052.5 | 2332735 | 863.2 | 7376311 | 2729.5 | 1242781 | 459.9 |
| 1999 | 272690813 | 1426044 | 523.0 | 15522 | 5.7 | 89411 | 32.8 | 409371 | 150.1 | 911740 | 334.3 | 10208334 | 3743.6 | 2100739 | 770.4 | 6955520 | 2550.7 | 1152075 | 422.5 |
| 2000 | 281421906 | 1425486 | 506.5 | 15586 | 5.5 | 90178 | 32.0 | 408016 | 145.0 | 911706 | 324.0 | 10182584 | 3618.3 | 2050992 | 728.8 | 6971590 | 2477.3 | 1160002 | 412.2 |
| 2001 | 285317559 | 1439480 | 504.5 | 16037 | 5.6 | 90863 | 31.8 | 423557 | 148.5 | 909023 | 318.6 | 10437189 | 3658.1 | 2116531 | 741.8 | 7092267 | 2485.7 | 1228391 | 430.5 |
| 2002 | 287973924 | 1423677 | 494.4 | 16229 | 5.6 | 95235 | 33.1 | 420806 | 146.1 | 891407 | 309.5 | 10455277 | 3630.6 | 2151252 | 747.0 | 7057379 | 2450.7 | 1246646 | 432.9 |
| 2003 | 290788976 | 1383676 | 475.8 | 16528 | 5.7 | 93883 | 32.3 | 414235 | 142.5 | 859030 | 295.4 | 10442862 | 3591.2 | 2154834 | 741.0 | 7026802 | 2416.5 | 1261226 | 433.7 |
| 2004 | 293656842 | 1360088 | 463.2 | 16148 | 5.5 | 95089 | 32.4 | 401470 | 136.7 | 847381 | 288.6 | 10319386 | 3514.1 | 2144446 | 730.3 | 6937089 | 2362.3 | 1237851 | 421.5 |
| 2005 | 296507061 | 1390745 | 469.0 | 16740 | 5.6 | 94347 | 31.8 | 417438 | 140.8 | 862220 | 290.8 | 10174754 | 3431.5 | 2155448 | 726.9 | 6783447 | 2287.8 | 1235859 | 416.8 |
| 2006 | 299398484 | 1435123 | 479.3 | 17309 | 5.8 | 94472 | 31.6 | 449246 | 150.0 | 874096 | 292.0 | 10019601 | 3346.6 | 2194993 | 733.1 | 6626363 | 2213.2 | 1198245 | 400.2 |
| 2007 | 301621157 | 1422970 | 471.8 | 17128 | 5.7 | 92160 | 30.6 | 447324 | 148.3 | 866358 | 287.2 | 9882212 | 3276.4 | 2190198 | 726.1 | 6591542 | 2185.4 | 1100472 | 364.9 |
| 2008 | 304059724 | 1394461 | 458.6 | 16465 | 5.4 | 90750 | 29.8 | 443563 | 145.9 | 843683 | 277.5 | 9774152 | 3214.6 | 2228887 | 733.0 | 6586206 | 2166.1 | 959059 | 315.4 |
| 2009 | 307006550 | 1325896 | 431.9 | 15399 | 5.0 | 89241 | 29.1 | 408742 | 133.1 | 812514 | 264.7 | 9337060 | 3041.3 | 2203313 | 717.7 | 6338095 | 2064.5 | 795652 | 259.2 |
| 2010 | 309330219 | 1251248 | 404.5 | 14722 | 4.8 | 85593 | 27.7 | 369089 | 119.3 | 781844 | 252.8 | 9112625 | 2945.9 | 2168459 | 701.0 | 6204601 | 2005.8 | 739565 | 239.1 |
| 2011 | 311587816 | 1206005 | 387.1 | 14661 | 4.7 | 84175 | 27.0 | 354746 | 113.9 | 752423 | 241.5 | 9052743 | 2905.4 | 2185140 | 701.3 | 6151095 | 1974.1 | 716508 | 230.0 |
| 2012 | 313873685 | 1217057 | 387.8 | 14856 | 4.7 | 85141 | 27.1 | 355051 | 113.1 | 762009 | 242.8 | 9001992 | 2868.0 | 2109932 | 672.2 | 6168874 | 1965.4 | 723186 | 230.4 |
| 2013 | 316497531 | 1168298 | 369.1 | 14319 | 4.5 | 82109 | 25.9 | 345093 | 109.0 | 726777 | 229.6 | 8651892 | 2733.6 | 1932139 | 610.5 | 6019465 | 1901.9 | 700288 | 221.3 |
| 2014 | 318907401 | 1153022 | 361.6 | 14164 | 4.4 | 84864 | 26.6 | 322905 | 101.3 | 731089 | 229.2 | 8209010 | 2574.1 | 1713153 | 537.2 | 5809054 | 1821.5 | 686803 | 215.4 |
| 2015 | 320896618 | 1199310 | 373.7 | 15883 | 4.9 | 91261 | 28.4 | 328109 | 102.2 | 764057 | 238.1 | 8024115 | 2500.5 | 1587564 | 494.7 | 5723488 | 1783.6 | 713063 | 222.2 |
| 2016 | 323127513 | 1248185 | 386.3 | 17250 | 5.3 | 95730 | 29.6 | 332198 | 102.8 | 803007 | 248.5 | 7919035 | 2450.7 | 1515096 | 468.9 | 5638455 | 1745.0 | 765484 | 236.9 |
The statistical summary and nature of the data can be obtained by applying summary() function
## Year Population Violentcrime Violent crime rate
## Min. :1997 Min. :267783607 Min. :1153022 Min. :361.6
## 1st Qu.:2002 1st Qu.:287309833 1st Qu.:1240403 1st Qu.:387.6
## Median :2006 Median :300509820 Median :1387211 Median :466.1
## Mean :2006 Mean :298634769 Mean :1352038 Mean :456.3
## 3rd Qu.:2011 3rd Qu.:312159283 3rd Qu.:1425626 3rd Qu.:496.9
## Max. :2016 Max. :323127513 Max. :1636096 Max. :611.0
## Murder andnonnegligent manslaughter Murder and nonnegligent manslaughter rate
## Min. :14164 Min. :4.400
## 1st Qu.:15263 1st Qu.:4.875
## Median :16092 Median :5.500
## Mean :16006 Mean :5.375
## 3rd Qu.:16799 3rd Qu.:5.700
## Max. :18208 Max. :6.800
## Rape(legacy definition) Rape(legacy definition) rate Robbery
## Min. :82109 Min. :25.90 Min. :322905
## 1st Qu.:88329 1st Qu.:28.23 1st Qu.:354975
## Median :91062 Median :31.10 Median :409057
## Mean :90690 Mean :30.50 Mean :399834
## 3rd Qu.:94378 3rd Qu.:32.33 3rd Qu.:428559
## Max. :96153 Max. :35.90 Max. :498534
## Robbery rate Aggravated assault Aggravated assault rate Property crime
## Min. :101.3 Min. : 726777 Min. :229.2 Min. : 7919035
## 1st Qu.:113.7 1st Qu.: 777397 1st Qu.:247.1 1st Qu.: 9040055
## Median :141.7 Median : 853206 Median :287.9 Median : 9950906
## Mean :135.0 Mean : 845507 Mean :285.4 Mean : 9685756
## 3rd Qu.:148.3 3rd Qu.: 895811 3rd Qu.:311.8 3rd Qu.:10348837
## Max. :186.2 Max. :1023201 Max. :382.1 Max. :11558475
## Property crime rate Burglary Burglary rate Larceny-theft
## Min. :2451 Min. :1515096 Min. :468.9 Min. :5638455
## 1st Qu.:2896 1st Qu.:2088302 1st Qu.:693.8 1st Qu.:6164429
## Median :3312 Median :2153043 Median :727.9 Median :6608952
## Mean :3271 Mean :2084819 Mean :703.2 Mean :6590070
## 3rd Qu.:3621 3rd Qu.:2191397 3rd Qu.:741.2 3rd Qu.:6985393
## Max. :4316 Max. :2460526 Max. :918.8 Max. :7743760
## Larceny-theft rate Motor vehicle theft Motor vehicle theft rate
## Min. :1745 Min. : 686803 Min. :215.4
## 1st Qu.:1972 1st Qu.: 735470 1st Qu.:235.3
## Median :2199 Median :1126274 Median :382.6
## Mean :2224 Mean :1010867 Mean :343.5
## 3rd Qu.:2457 3rd Qu.:1236357 3rd Qu.:424.5
## Max. :2892 Max. :1354189 Max. :505.7
df_year_violent = select(df, Year, Violentcrime)
tbl2 = df_year_violent %>% pivot_wider(names_from = Year, values_from = c(Violentcrime))
knitr::kable(tbl2, caption = 'Crime in the United States by Year')| 1997 | 1998 | 1999 | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | 2013 | 2014 | 2015 | 2016 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1636096 | 1533887 | 1426044 | 1425486 | 1439480 | 1423677 | 1383676 | 1360088 | 1390745 | 1435123 | 1422970 | 1394461 | 1325896 | 1251248 | 1206005 | 1217057 | 1168298 | 1153022 | 1199310 | 1248185 |
ggplot(data = df, aes(x=Year)) + geom_point(aes(y = `Violent crime rate`, color = "Violent crime rate")) +
geom_point(aes(y = `Murder and nonnegligent manslaughter rate`, color = "Murder and nonnegligent manslaughter rate")) +
geom_point(aes(y = `Rape(legacy definition) rate`, color = "Rape(legacy definition) rate")) +
geom_point(aes(y = `Robbery rate`, color = "Robbery rate")) +
geom_point(aes(y = `Aggravated assault rate`, color = "Aggravated assault rate")) +
geom_point(aes(y = `Property crime rate`, color = "Property crime rate")) +
geom_point(aes(y = `Burglary rate`, color = "Burglary rate")) +
geom_point(aes(y = `Larceny-theft rate`, color = "Larceny-theft rate")) +
geom_point(aes(y = `Motor vehicle theft rate`, color = "Motor vehicle theft rate")) Add the regression line using geom_smooth() and typing in lm as method for creating the line. I used linear regression.
df.graph<-ggplot(df, aes(x = Year, y=`Violent crime rate`)) + geom_point() + geom_smooth(method="lm", col="black")
df.graph## `geom_smooth()` using formula 'y ~ x'
Violent crime is decreasing over the years (1997-2016). We have a lover Rate per 100,000 Inhabitants in 2016 than in 1997. There is a small increase in crime from 2014 to 2016. Unfortunately FBI doesn’t publish the data belong the 2016 year.