This dataset measures the number of impaired driving incidents per 100,000 population. Incidents include impairment from alcohol, narcotics, or prescription medication, and include failure to comply with testing. Data is obtained from Statistics Canada, tables 252-0075 to 252-0081
link to “codebook”: https://www.opendatanetwork.com/dataset/dashboard.edmonton.ca/n5ux-m2xi
also more information from source tables is here: http://www5.statcan.gc.ca/cansim/a26?lang=eng&retrLang=eng&id=2520076&&pattern=&stByVal=1&p1=1&p2=31&tabMode=dataTable&csid=
There are 19 variables and 25 observations
Data is in wide format with 18 Years (1998-2015) and Police Jurisdiction as column names.
“Year”" from 1998 to 2015 and entries are coded as numeric values
“Police Jurisdiction” There are 25 Police jurisdictions in this dataset and are coded as characters
“NA” While the data set contains blank entries, the original data source states that “a process of imputation was applied to derive counts for violations that do not exist on their own in the aggregate survey. For approximately 80% of the aggregate offence codes, there is a 1:1 mapping with a new incident-based violation code. For violations where this was not the case, such as the aggregate other Criminal Code category, it was necessary to estimate (impute) this figure using the distribution of other Criminal Code offences from existing Incident-based UCR2 respondents.”
“Data from the previous year are revised to reflect any updates or changes that have been received from the police services.” The latest years data (2015) may be revised in future, but data prior to 2015 should be correct.
“With the release of 2012 data, revised population estimates at the respondent level were applied back to and including 2004.”
“Prior to 1999, a number of Royal Canadian Mounted Police (RCMP) detachments in Saskatchewan were double counting the number of actual offences of impaired driving. This over-counting was corrected in 1999, therefore, comparisons with previous years should be made with caution. It is recommended that the analysis of impaired driving be based on persons charged data rather than actual offences”
“In 2011, the introduction of the Immediate Roadside Prohibition (IRP) in British Columbia provided an alternative method for officers to proceed with penalties for impaired drivers and may account for the trends reported for 2011 and 2012.”
library(jsonlite) #load package to import json file
library(tidyverse) #load packages to tidy data and create graphics
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
Impaired <- fromJSON("https://dashboard.edmonton.ca/resource/kcvf-z5h9.json")
Impaired <- as_tibble(Impaired)
Impaired
## # A tibble: 25 × 19
## `_1998` `_1999` `_2000` `_2001` `_2002`
## * <chr> <chr> <chr> <chr> <chr>
## 1 107.6 92.5 104.35 174.68 164.62
## 2 212.33 206.56 211.3 214.16 217.33
## 3 147.72 175.07 187.27 176.1 181.23
## 4 340.97 311.08999999999997 300.93 250.99 234.28
## 5 <NA> <NA> <NA> <NA> 184.27
## 6 <NA> <NA> <NA> <NA> <NA>
## 7 119.61 100.64 114.53 106.86 97.11
## 8 102.3 100.58 104.61 95.87 86.87
## 9 114.58 154.58000000000001 146.57 160.33000000000001 139.32
## 10 196.35 186.05 179.34 152.58000000000001 154.84
## # ... with 15 more rows, and 14 more variables: `_2003` <chr>,
## # `_2004` <chr>, `_2005` <chr>, `_2006` <chr>, `_2007` <chr>,
## # `_2008` <chr>, `_2009` <chr>, `_2010` <chr>, `_2011` <chr>,
## # `_2012` <chr>, `_2013` <chr>, `_2014` <chr>, `_2015` <chr>,
## # police_jurisdiction <chr>
str(Impaired)
## Classes 'tbl_df', 'tbl' and 'data.frame': 25 obs. of 19 variables:
## $ _1998 : chr "107.6" "212.33" "147.72" "340.97" ...
## $ _1999 : chr "92.5" "206.56" "175.07" "311.08999999999997" ...
## $ _2000 : chr "104.35" "211.3" "187.27" "300.93" ...
## $ _2001 : chr "174.68" "214.16" "176.1" "250.99" ...
## $ _2002 : chr "164.62" "217.33" "181.23" "234.28" ...
## $ _2003 : chr "179.13" "197.2" "181.7" "220.56" ...
## $ _2004 : chr "155.18" "198.24" "162.25" "212.89" ...
## $ _2005 : chr "172.06" "185.76" "139.43" "199.11" ...
## $ _2006 : chr "159.47999999999999" "166.2" "144.80000000000001" "200.66" ...
## $ _2007 : chr "125.78" "159.07" "139.76" "239.35" ...
## $ _2008 : chr "132.85" "155.46" "134.58000000000001" "255.26" ...
## $ _2009 : chr "182.28" "154.54" "138.03" "253.1" ...
## $ _2010 : chr "211.79" "152.44" "147" "257.83" ...
## $ _2011 : chr "258.07" "155.84" "137.21" "251.9" ...
## $ _2012 : chr "181.23" "149.32" "129.38" "248.46" ...
## $ _2013 : chr "200.54" "121.57" "113.79" "209.32" ...
## $ _2014 : chr "221.22" "105.58" "121.4" "196.87" ...
## $ _2015 : chr "174.91" "86.7" "116.61" "171.71" ...
## $ police_jurisdiction: chr "Burnaby, BC" "Calgary, AB" "Durham Region, ON" "Edmonton, AB" ...
The jsonlite package seems to have imported all the observations as characters despite the datasource information saying these were numeric values. The observations for years 1998-2015 will need to be converted to numerical values
Some observations have many decimal points and some only 1. All will be converted to 2 decimal places
Police Jurisdiction includes both city and province. These will be split into separate fields for regional analysis
names(Impaired) <- gsub("_", "", names(Impaired), fixed = TRUE)
names(Impaired)
## [1] "1998" "1999" "2000"
## [4] "2001" "2002" "2003"
## [7] "2004" "2005" "2006"
## [10] "2007" "2008" "2009"
## [13] "2010" "2011" "2012"
## [16] "2013" "2014" "2015"
## [19] "policejurisdiction"
cols <- c("1998", "1999", "2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008",
"2009", "2010", "2011", "2012", "2013", "2014", "2015")
Impaired[cols] <- sapply(Impaired[cols],as.numeric)
sapply(Impaired, class)
## 1998 1999 2000
## "numeric" "numeric" "numeric"
## 2001 2002 2003
## "numeric" "numeric" "numeric"
## 2004 2005 2006
## "numeric" "numeric" "numeric"
## 2007 2008 2009
## "numeric" "numeric" "numeric"
## 2010 2011 2012
## "numeric" "numeric" "numeric"
## 2013 2014 2015
## "numeric" "numeric" "numeric"
## policejurisdiction
## "character"
19 missing values until 2007. Missing values reduce each year until 2007. From 2008-2015 there are no missing values
Will determine what to do with missing values upon further analysis
## [1] 19
## 1998 1999 2000 2001
## Min. : 52.45 Min. : 46.81 Min. : 35.78 Min. : 76.81
## 1st Qu.:107.60 1st Qu.:106.36 1st Qu.:113.17 1st Qu.:125.75
## Median :147.72 Median :167.01 Median :165.16 Median :157.74
## Mean :173.97 Mean :171.80 Mean :174.02 Mean :169.47
## 3rd Qu.:196.35 3rd Qu.:185.71 3rd Qu.:193.53 3rd Qu.:191.94
## Max. :402.44 Max. :409.65 Max. :441.67 Max. :376.50
## NA's :4 NA's :3 NA's :3 NA's :3
## 2002 2003 2004 2005
## Min. : 83.79 Min. : 64.75 Min. : 72.86 Min. : 60.19
## 1st Qu.:127.83 1st Qu.:117.51 1st Qu.:117.51 1st Qu.:111.70
## Median :154.06 Median :141.38 Median :145.57 Median :136.00
## Mean :162.64 Mean :150.05 Mean :155.17 Mean :142.83
## 3rd Qu.:188.19 3rd Qu.:179.77 3rd Qu.:177.44 3rd Qu.:164.01
## Max. :293.45 Max. :302.69 Max. :391.66 Max. :368.01
## NA's :1 NA's :1 NA's :1 NA's :1
## 2006 2007 2008 2009
## Min. : 61.61 Min. : 64.89 Min. : 77.83 Min. : 70.04
## 1st Qu.:100.52 1st Qu.:103.22 1st Qu.: 94.56 1st Qu.:102.11
## Median :127.61 Median :125.57 Median :128.39 Median :134.58
## Mean :138.32 Mean :141.73 Mean :147.78 Mean :153.99
## 3rd Qu.:167.51 3rd Qu.:165.45 3rd Qu.:170.24 3rd Qu.:198.88
## Max. :317.99 Max. :347.84 Max. :365.25 Max. :377.59
## NA's :1 NA's :1
## 2010 2011 2012 2013
## Min. : 72.01 Min. : 74.06 Min. : 65.66 Min. : 58.58
## 1st Qu.: 92.98 1st Qu.: 95.03 1st Qu.: 84.01 1st Qu.: 85.75
## Median :136.93 Median :125.56 Median :129.38 Median :121.57
## Mean :153.00 Mean :154.49 Mean :142.06 Mean :139.56
## 3rd Qu.:193.89 3rd Qu.:197.85 3rd Qu.:181.23 3rd Qu.:172.57
## Max. :322.16 Max. :309.52 Max. :291.51 Max. :349.13
##
## 2014 2015 policejurisdiction
## Min. : 48.17 Min. : 47.97 Length:25
## 1st Qu.: 88.96 1st Qu.: 86.70 Class :character
## Median :111.10 Median :106.44 Mode :character
## Mean :134.10 Mean :128.90
## 3rd Qu.:161.60 3rd Qu.:168.95
## Max. :299.37 Max. :294.15
##
ImpairedL <-Impaired %>%
gather(year, IncidentRate, -policejurisdiction) %>%
separate(policejurisdiction, into = c("City", "Province"), sep = ", ")
head(ImpairedL)
## # A tibble: 6 × 4
## City Province year IncidentRate
## <chr> <chr> <chr> <dbl>
## 1 Burnaby BC 1998 107.60
## 2 Calgary AB 1998 212.33
## 3 Durham Region ON 1998 147.72
## 4 Edmonton AB 1998 340.97
## 5 Gatineau QB 1998 NA
## 6 Halifax Region NS 1998 NA
tail(ImpairedL)
## # A tibble: 6 × 4
## City Province year IncidentRate
## <chr> <chr> <chr> <dbl>
## 1 Toronto ON 2015 47.97
## 2 Vancouver BC 2015 119.11
## 3 Waterloo Region ON 2015 82.95
## 4 Windsor ON 2015 104.41
## 5 Winnipeg MB 2015 80.18
## 6 York Region ON 2015 94.23
dim(ImpairedL)
## [1] 450 4
Trends over time by city/region> Are number of infractions increasing/decreasing (Visualization using line graphs facet wrap for each jurisdiction)
Year over year % increase/decrease in infractions by city and by region.
Calculate city/regional average over time frame Map with intensity of color (?). What cities regions have highest incidents of impaired driving? Any common factors?