Statement of goals:
My analysis report was mainly a crime location and time analysis, using ggplot, lubridate packages. Since the data is in Baltimore with detailed location, I used ggmap package at first glance. My goal is using 2000 ramdon samples out of 276529, to find out the safe and unsafe area in Baltimore, also to define the risky time. With the analysis report, people who live in Baltimore, or who want to move to Baltimore will have a reference based on data, and live safer.
Statement of methods
The total R script include 3 main data.frame.
1: Basic data.frame: “crimedf”. The sample crimedf is a data.frame with 2000 obs. of 15 varibables. It recorded the date, time, crime type, location, description, district, etc. Most of the variable are categorical, and each variables contains about 200-700 different types.
2. Time series data.frame: “datedf”. From “crimedf”, contains Crime Date, Crime Time, Crime count, and I added extracted “Year”, “Month”,“Weekday” as variables.
3. Map data.fram: “mapdf”. Contains only longitude and lagitude of crime.
According to the 3 data.frame, we go 4 steps: Data Cleaning,Explanatory analysis(summary), Exploratory Data Analysis(location and time), Advanced Map plot by ggmap & maps package.
Step 1: Data cleaning
data <- read.csv(file.choose(), header = T)
set.seed(123)
crime <- data[sample(nrow(data), 2000),]
crimedf <- data.frame(crime)
crimedf$Inside.Outside[crime$Inside.Outside == "Inside"] <- "I"
crimedf$Inside.Outside[crime$Inside.Outside == "Outside"] <- "O"
Step 2: Explanatory analysis
Step 3: Exploratory Data Analysis (Location and Time analysis)
Step 4: Advanced Map plot by ggmap & maps package
Summary:
In analyzing the Crime data from Baltimore, we can see from charts that crime happened more frequent:
Hours: Midnight 23:00PM-1:00AM
Weekdays:Friday
Month:May
District: Northeastern
Description: Lanceny
Also, mapping will help to identify the safer area to live in Baltimore.