Preface

In the previous topics about crimes in Chicago we explored corresponding data from US.gov (2001-2017) and made some conclusions based on statistical analysis:
1. https://rpubs.com/alex-lev/248923
2. https://rpubs.com/alex-lev/249124
3. https://rpubs.com/alex-lev/249354
4. https://rpubs.com/alex-lev/249370
5. https://rpubs.com/alex-lev/249747
6. https://rpubs.com/alex-lev/249997
7. https://rpubs.com/alex-lev/255283
8. https://rpubs.com/alex-lev/254361

Now we want to compare Chicago and London using UK open source data.

For more about UK crimes analysis with R by Carl Goodwin see https://thinkr.biz/wp-content/uploads/2018/02/Univariate_Regression.html.

London crimes data

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyr)
library(tidyquant)

## Loading required package: lubridate

## 
## Attaching package: 'lubridate'

## The following object is masked from 'package:base':
## 
##     date

## Loading required package: PerformanceAnalytics

## Loading required package: xts

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## 
## Attaching package: 'xts'

## The following objects are masked from 'package:dplyr':
## 
##     first, last

## 
## Attaching package: 'PerformanceAnalytics'

## The following object is masked from 'package:graphics':
## 
##     legend

## Loading required package: quantmod

## Loading required package: TTR

## Version 0.4-0 included new data defaults. See ?getSymbols.

## Loading required package: tidyverse

## ── Attaching packages ───────────────────────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
## ✔ tibble  1.4.1     ✔ stringr 1.2.0
## ✔ readr   1.1.1     ✔ forcats 0.3.0

## ── Conflicts ──────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ lubridate::as.difftime() masks base::as.difftime()
## ✖ lubridate::date()        masks base::date()
## ✖ dplyr::filter()          masks stats::filter()
## ✖ xts::first()             masks dplyr::first()
## ✖ lubridate::intersect()   masks base::intersect()
## ✖ dplyr::lag()             masks stats::lag()
## ✖ xts::last()              masks dplyr::last()
## ✖ lubridate::setdiff()     masks base::setdiff()
## ✖ lubridate::union()       masks base::union()

library(ggplot2)
library(qcc)

## Package 'qcc' version 2.7

## Type 'citation("qcc")' for citing this R package in publications.

library(readr)

url <- "https://files.datapress.com/london/dataset/recorded-crime-summary-data-london-borough-level/2017-01-26T18:50:00/MPS_Borough_Level_Crime.csv"

london_crime_df <-
  read_csv(url,
    col_types = "cccci",
    col_names = c("month", "borough", "maj_cat", "min_cat", "count"),
    skip = 1
  ) %>%
  separate(month, c("year", "month"), sep = 4) %>%
  mutate(
    year = factor(year),
    month = factor(month),
    maj_cat = factor(maj_cat),
    borough = factor(borough)
  ) %>%
  group_by(year, month, borough, maj_cat) %>%
  summarise(count = sum(count))
london_crime_df

London crimes by category and month

london_cat_df <- london_crime_df %>%group_by(maj_cat)%>% tally(wt=count) %>% arrange(desc(n)) %>% transmute(Category=maj_cat,Total=n,"%"=100*n/sum(n),Cum=cumsum(n),"Cum%"=100*Cum/sum(n)) 
london_cat_df

#Many thanks to David Smith for nice chart idea: https://www.r-bloggers.com/how-to-learn-r-part-1-learn-from-a-master-data-scientists-code/
london_cat_df %>%
  ggplot(aes(x = Total, y = Category, color = Category)) +
  geom_segment(aes(xend = 0, yend = Category), size = 2) +
  geom_point(size = 4) +
  geom_label(aes(label = paste0(scales::percent(Total/sum(Total)))), 
             hjust = "inward", size = 3.5) +
  expand_limits(x = 0) +
  labs(
    title = "Crimes in London",
    subtitle = "London Police Data for 2014-2017 Years",
    x = "Total Number of Crimes", y = "Category of Crime") +
  scale_color_tq() +
  theme_tq() +
  theme(legend.position = "none")

london_mon_df <- london_crime_df %>%
  group_by(month,maj_cat)%>% tally(wt=count) %>% 
  arrange(desc(n)) 

#Many thanks to David Smith for nice chart idea: https://www.r-bloggers.com/how-to-learn-r-part-1-learn-from-a-master-data-scientists-code/
london_mon_df %>% rename(Month=month,Category=maj_cat,Total=n) %>% 
  ggplot(aes(Month, Total, fill = Category)) +
  geom_bar(stat = "identity") +
  geom_text(aes(x = Month, y = Total, label = Month), 
            vjust = -1, color = palette_light()[[1]], size = 3) +
  facet_wrap(~ Category, ncol = 3) +
  scale_fill_tq() +
  theme_tq() +
  labs(
    title = "Crimes category in London by month",
    subtitle = "London Police Data for 2014-2017 Years",
    x = "Month", y = "Total number of crimes"
  )

Pareto chart as a clue

We use Pareto chart as a clue to the comparison though crimes structure is different for London and Chicago as well as time intervals. As we see later it makes almost no difference at all for Pareto principle 20/80 i.e 20% of crimes by category make 80% of all crimes no matter London or Chicago.

pareto_df <- as.vector(london_cat_df$Total)
names(pareto_df) <- as.vector(substr(london_cat_df$Category,1,16))
pareto.chart(pareto_df,cumperc=seq(0,100,20), main="Pareto chart for London crimes")

##                   
## Pareto chart analysis for pareto_df
##                       Frequency    Cum.Freq.   Percentage Cum.Percent.
##   Theft & Handling 9.345070e+05 9.345070e+05 3.944590e+01 3.944590e+01
##   Violence Against 6.863680e+05 1.620875e+06 2.897186e+01 6.841776e+01
##   Burglary         2.375880e+05 1.858463e+06 1.002868e+01 7.844645e+01
##   Criminal Damage  2.014040e+05 2.059867e+06 8.501341e+00 8.694779e+01
##   Drugs            1.367910e+05 2.196658e+06 5.774001e+00 9.272179e+01
##   Robbery          7.373200e+04 2.270390e+06 3.112256e+00 9.583405e+01
##   Sexual Offences  5.050800e+04 2.320898e+06 2.131962e+00 9.796601e+01
##   Other Notifiable 4.487500e+04 2.365773e+06 1.894191e+00 9.986020e+01
##   Fraud & Forgery  3.312000e+03 2.369085e+06 1.398008e-01 1.000000e+02

Compare this chart with Chicago - https://rpubs.com/alex-lev/248923.

Chicago crimes

Any difference?

London versus Chicago by crimes statistics

Alexander Levakov, Senior Research Fellow, Ph.D

March, 2018

Preface

London crimes data

London crimes by category and month

Pareto chart as a clue