Statement of goals:

My analysis report was mainly a crime location and time analysis, using ggplot, lubridate packages. Since the data is in Baltimore with detailed location, I used ggmap package at first glance. My goal is using 2000 ramdon samples out of 276529, to find out the safe and unsafe area in Baltimore, also to define the risky time. With the analysis report, people who live in Baltimore, or who want to move to Baltimore will have a reference based on data, and live safer.

Statement of methods

The total R script include 3 main data.frame.

1: Basic data.frame: “crimedf”. The sample crimedf is a data.frame with 2000 obs. of 15 varibables. It recorded the date, time, crime type, location, description, district, etc. Most of the variable are categorical, and each variables contains about 200-700 different types.

2. Time series data.frame: “datedf”. From “crimedf”, contains Crime Date, Crime Time, Crime count, and I added extracted “Year”, “Month”,“Weekday” as variables.

3. Map data.fram: “mapdf”. Contains only longitude and lagitude of crime.

According to the 3 data.frame, we go 4 steps: Data Cleaning,Explanatory analysis(summary), Exploratory Data Analysis(location and time), Advanced Map plot by ggmap & maps package.

Step 1: Data cleaning

data <- read.csv(file.choose(), header = T)
set.seed(123)
crime <- data[sample(nrow(data), 2000),]
crimedf <- data.frame(crime)
crimedf$Inside.Outside[crime$Inside.Outside == "Inside"] <- "I"
crimedf$Inside.Outside[crime$Inside.Outside == "Outside"] <- "O"

Step 2: Explanatory analysis

Figure 1 : Waffle Chart of Crime Type

Figure 2 : Pie Chart of Crime District

Step 3: Exploratory Data Analysis (Location and Time analysis)

Figure 3: Bar Chart: District against Premise

Step 3-2: Time analysis

P.S. In this “Time analysis” part, thanks to HoiNing’s code in dealing with Year, and Prof.’s factor function to make weekdays in order.

Figure 4: Crime Trend by Year

Figure 5: Crime Count by Month

Figure 6: Crime Count by Hour against Crime Type

Figure 7: Crime Count by Ordered Weekdays

Step 4: Advanced Map plot by ggmap & maps package

Figure 8: Dot Map of Crime in Baltimore

Figure 9: Dot Toner Map of Crime in Baltimore

Figure 10: Density of Crime in District

Summary:

In analyzing the Crime data from Baltimore, we can see from charts that crime happened more frequent:

Hours: Midnight 23:00PM-1:00AM

Weekdays:Friday

Month:May

District: Northeastern

Description: Lanceny

Also, mapping will help to identify the safer area to live in Baltimore.