INTRODUCTION

This project is presented as week 3 project for the Developing Data Products course for Johns Hopkins University Data Science specialization on Coursera. The objective of this assignment is to create a web page presentation using R Markdown that features a plot created with Plotly.

DATA

The data is obtained from City of Vancouver’s Open Data Catalogue (Vancouver, British Columbia, Canada). The dataset presents the crime data on a year-by-year basis beginning in 2003 released by Vancouver Police Department (VPD) which is updated every Sunday morning. The data is available at https://data.vancouver.ca/datacatalogue/crime-data.htm.

EXPLORATORY DATA ANALYSIS

Let’s load some packages that will be used in this analysis:

library(dplyr)
library(tidyr)
library(plotly)

And let’s load the data from City of Vancouver’s server:

url <- "ftp://webftp.vancouver.ca/opendata/csv/crime_csv_all_years.zip"
temp <- tempfile()
download.file(url, temp)
data <- read.csv(unz(temp, "crime_csv_all_years.csv"))
unlink(temp)

Below is the list of variables that are included in this dataset:

Attribute Description
TYPE The type of crime activities
YEAR A four-digit field that indicates the year when the reported crime activity occurred
MONTH A numeric field that indicates the month when the reported crime activity occurred
DAY Day of the month when the reported crime activity occurred
HOUR Hour time (in 24 hours format) when the reported crime activity occurred
MINUTE Minute when the reported crime activity occurred
HUNDRED_BLOCK Generalized location of the report crime activity
NEIGHBOURHOOD Neighbourhoods within the City of Vancouver
X Coordinate values are projected in UTM Zone 10
Y Coordinate values are projected in UTM Zone 10

As declared by the City of Vancouver’s website, all coordinates data in this data set are offset and in some cases not disclosed to provide privacy protection. Therefore only the follwing attributes will be used in this analysis:

Attribute Description
TYPE The type of crime activities
YEAR A four-digit field that indicates the year when the reported crime activity occurred
MONTH A numeric field that indicates the month when the reported crime activity occurred
DAY Day of the month when the reported crime activity occurred
HOUR Hour time (in 24 hours format) when the reported crime activity occurred
MINUTE Minute when the reported crime activity occurred
NEIGHBOURHOOD Neighbourhoods within the City of Vancouver

Removing the irrelevant variables:

data <- select(.data = data, -"X", -"Y", -"HUNDRED_BLOCK")
summary(data)
##                                 TYPE             YEAR     
##  Theft from Vehicle               :191704   Min.   :2003  
##  Mischief                         : 77960   1st Qu.:2006  
##  Break and Enter Residential/Other: 64022   Median :2009  
##  Other Theft                      : 58982   Mean   :2010  
##  Offence Against a Person         : 58301   3rd Qu.:2014  
##  Theft of Vehicle                 : 40156   Max.   :2018  
##  (Other)                          : 89796                 
##      MONTH             DAY             HOUR           MINUTE     
##  Min.   : 1.000   Min.   : 1.00   Min.   : 0.00   Min.   : 0.00  
##  1st Qu.: 4.000   1st Qu.: 8.00   1st Qu.: 9.00   1st Qu.: 0.00  
##  Median : 7.000   Median :15.00   Median :15.00   Median :10.00  
##  Mean   : 6.502   Mean   :15.41   Mean   :13.72   Mean   :17.03  
##  3rd Qu.: 9.000   3rd Qu.:23.00   3rd Qu.:19.00   3rd Qu.:30.00  
##  Max.   :12.000   Max.   :31.00   Max.   :23.00   Max.   :59.00  
##                                   NA's   :58540   NA's   :58540  
##                    NEIGHBOURHOOD   
##  Central Business District:124888  
##                           : 60924  
##  West End                 : 45219  
##  Fairview                 : 34631  
##  Mount Pleasant           : 33747  
##  Grandview-Woodland       : 29686  
##  (Other)                  :251826

For this current assignment, the total number of reported crimes (regardless of type) is plotted using Plotly for two popular neighbourhoods in Downtown Vancouver:

dist <- "Central Business District"
dataPlot <- data %>% 
            filter(data$NEIGHBOURHOOD == dist & data$YEAR != format(Sys.Date(), "%Y")) %>% 
            group_by(YEAR) %>% 
            summarize(totalCrime = n())
plot_ly(data = dataPlot, x = as.factor(dataPlot$YEAR), y = dataPlot$totalCrime, type = "scatter", mode = "lines")
dist <- "West End"
dataPlot <- data %>% 
            filter(data$NEIGHBOURHOOD == dist & data$YEAR != format(Sys.Date(), "%Y")) %>% 
            group_by(YEAR) %>% 
            summarize(totalCrime = n())
plot_ly(data = dataPlot, x = as.factor(dataPlot$YEAR), y = dataPlot$totalCrime, type = "scatter", mode = "lines")

More analysis will be performed on this dataset as for the course’s final project.

Until then!