In the previous topics about crimes in Chicago we explored corresponding data from US.gov (2001-2017) and made some conclusions based on statistical analysis:
1. https://rpubs.com/alex-lev/248923
2. https://rpubs.com/alex-lev/249124
3. https://rpubs.com/alex-lev/249354
4. https://rpubs.com/alex-lev/249370
5. https://rpubs.com/alex-lev/249747
6. https://rpubs.com/alex-lev/249997
7. https://rpubs.com/alex-lev/255283
8. https://rpubs.com/alex-lev/254361
Now we want to compare Chicago and London using UK open source data.
For more about UK crimes analysis with R by Carl Goodwin see https://thinkr.biz/wp-content/uploads/2018/02/Univariate_Regression.html.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(tidyquant)
## Loading required package: lubridate
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
## Loading required package: PerformanceAnalytics
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
##
## first, last
##
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
##
## legend
## Loading required package: quantmod
## Loading required package: TTR
## Version 0.4-0 included new data defaults. See ?getSymbols.
## Loading required package: tidyverse
## ── Attaching packages ───────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1 ✔ purrr 0.2.4
## ✔ tibble 1.4.1 ✔ stringr 1.2.0
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## ── Conflicts ──────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ lubridate::as.difftime() masks base::as.difftime()
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ xts::first() masks dplyr::first()
## ✖ lubridate::intersect() masks base::intersect()
## ✖ dplyr::lag() masks stats::lag()
## ✖ xts::last() masks dplyr::last()
## ✖ lubridate::setdiff() masks base::setdiff()
## ✖ lubridate::union() masks base::union()
library(ggplot2)
library(qcc)
## Package 'qcc' version 2.7
## Type 'citation("qcc")' for citing this R package in publications.
library(readr)
url <- "https://files.datapress.com/london/dataset/recorded-crime-summary-data-london-borough-level/2017-01-26T18:50:00/MPS_Borough_Level_Crime.csv"
london_crime_df <-
read_csv(url,
col_types = "cccci",
col_names = c("month", "borough", "maj_cat", "min_cat", "count"),
skip = 1
) %>%
separate(month, c("year", "month"), sep = 4) %>%
mutate(
year = factor(year),
month = factor(month),
maj_cat = factor(maj_cat),
borough = factor(borough)
) %>%
group_by(year, month, borough, maj_cat) %>%
summarise(count = sum(count))
london_crime_df
london_cat_df <- london_crime_df %>%group_by(maj_cat)%>% tally(wt=count) %>% arrange(desc(n)) %>% transmute(Category=maj_cat,Total=n,"%"=100*n/sum(n),Cum=cumsum(n),"Cum%"=100*Cum/sum(n))
london_cat_df
#Many thanks to David Smith for nice chart idea: https://www.r-bloggers.com/how-to-learn-r-part-1-learn-from-a-master-data-scientists-code/
london_cat_df %>%
ggplot(aes(x = Total, y = Category, color = Category)) +
geom_segment(aes(xend = 0, yend = Category), size = 2) +
geom_point(size = 4) +
geom_label(aes(label = paste0(scales::percent(Total/sum(Total)))),
hjust = "inward", size = 3.5) +
expand_limits(x = 0) +
labs(
title = "Crimes in London",
subtitle = "London Police Data for 2014-2017 Years",
x = "Total Number of Crimes", y = "Category of Crime") +
scale_color_tq() +
theme_tq() +
theme(legend.position = "none")
london_mon_df <- london_crime_df %>%
group_by(month,maj_cat)%>% tally(wt=count) %>%
arrange(desc(n))
#Many thanks to David Smith for nice chart idea: https://www.r-bloggers.com/how-to-learn-r-part-1-learn-from-a-master-data-scientists-code/
london_mon_df %>% rename(Month=month,Category=maj_cat,Total=n) %>%
ggplot(aes(Month, Total, fill = Category)) +
geom_bar(stat = "identity") +
geom_text(aes(x = Month, y = Total, label = Month),
vjust = -1, color = palette_light()[[1]], size = 3) +
facet_wrap(~ Category, ncol = 3) +
scale_fill_tq() +
theme_tq() +
labs(
title = "Crimes category in London by month",
subtitle = "London Police Data for 2014-2017 Years",
x = "Month", y = "Total number of crimes"
)
We use Pareto chart as a clue to the comparison though crimes structure is different for London and Chicago as well as time intervals. As we see later it makes almost no difference at all for Pareto principle 20/80 i.e 20% of crimes by category make 80% of all crimes no matter London or Chicago.
pareto_df <- as.vector(london_cat_df$Total)
names(pareto_df) <- as.vector(substr(london_cat_df$Category,1,16))
pareto.chart(pareto_df,cumperc=seq(0,100,20), main="Pareto chart for London crimes")
##
## Pareto chart analysis for pareto_df
## Frequency Cum.Freq. Percentage Cum.Percent.
## Theft & Handling 9.345070e+05 9.345070e+05 3.944590e+01 3.944590e+01
## Violence Against 6.863680e+05 1.620875e+06 2.897186e+01 6.841776e+01
## Burglary 2.375880e+05 1.858463e+06 1.002868e+01 7.844645e+01
## Criminal Damage 2.014040e+05 2.059867e+06 8.501341e+00 8.694779e+01
## Drugs 1.367910e+05 2.196658e+06 5.774001e+00 9.272179e+01
## Robbery 7.373200e+04 2.270390e+06 3.112256e+00 9.583405e+01
## Sexual Offences 5.050800e+04 2.320898e+06 2.131962e+00 9.796601e+01
## Other Notifiable 4.487500e+04 2.365773e+06 1.894191e+00 9.986020e+01
## Fraud & Forgery 3.312000e+03 2.369085e+06 1.398008e-01 1.000000e+02
Compare this chart with Chicago - https://rpubs.com/alex-lev/248923.
Chicago crimes
Any difference?