Read the crime data from the given input file.
crimesSF <- read.csv(file = "sanfrancisco_incidents_summer_2014.csv", header = TRUE, sep = ",")
Here’s the structure, summary, and first few records of the crimesSF data frame.
str(crimesSF)
## 'data.frame': 28993 obs. of 13 variables:
## $ IncidntNum: int 140734311 140736317 146177923 146177531 140734220 140734349 140734349 140734349 140738147 140734258 ...
## $ Category : Factor w/ 34 levels "ARSON","ASSAULT",..: 1 20 16 16 20 7 7 6 21 30 ...
## $ Descript : Factor w/ 368 levels "ABANDONMENT OF CHILD",..: 15 179 143 143 132 247 239 93 107 347 ...
## $ DayOfWeek : Factor w/ 7 levels "Friday","Monday",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ Date : Factor w/ 92 levels "06/01/2014","06/02/2014",..: 92 92 92 92 92 92 92 92 92 92 ...
## $ Time : Factor w/ 1379 levels "00:01","00:02",..: 1370 1365 1351 1351 1344 1334 1334 1334 1321 1321 ...
## $ PdDistrict: Factor w/ 10 levels "BAYVIEW","CENTRAL",..: 1 4 8 7 7 8 8 8 3 2 ...
## $ Resolution: Factor w/ 16 levels "ARREST, BOOKED",..: 12 12 12 12 12 1 1 1 12 2 ...
## $ Address : Factor w/ 8055 levels "0 Block of 10TH ST",..: 6843 4022 1098 6111 5096 1263 1263 1263 1575 5236 ...
## $ X : num -122 -122 -122 -122 -123 ...
## $ Y : num 37.7 37.8 37.8 37.8 37.8 ...
## $ Location : Factor w/ 8732 levels "(37.7080829769301, -122.419241455854)",..: 1970 3730 5834 4802 4777 4993 4993 4993 2543 7598 ...
## $ PdId : num 1.41e+13 1.41e+13 1.46e+13 1.46e+13 1.41e+13 ...
summary(crimesSF)
## IncidntNum Category
## Min. : 10284385 LARCENY/THEFT :9466
## 1st Qu.:140545607 OTHER OFFENSES:3567
## Median :140632022 NON-CRIMINAL :3023
## Mean :142017280 ASSAULT :2882
## 3rd Qu.:140719664 VEHICLE THEFT :1966
## Max. :990367398 WARRANTS :1782
## (Other) :6307
## Descript DayOfWeek Date
## GRAND THEFT FROM LOCKED AUTO: 3766 Friday :4451 06/28/2014: 410
## STOLEN AUTOMOBILE : 1350 Monday :4005 08/09/2014: 410
## LOST PROPERTY : 1202 Saturday :4319 08/08/2014: 403
## PETTY THEFT OF PROPERTY : 1125 Sunday :4218 06/29/2014: 397
## WARRANT ARREST : 980 Thursday :3968 08/29/2014: 388
## PETTY THEFT FROM LOCKED AUTO: 955 Tuesday :3930 06/04/2014: 380
## (Other) :19615 Wednesday:4102 (Other) :26605
## Time PdDistrict Resolution
## 12:00 : 784 SOUTHERN :5739 NONE :19139
## 00:01 : 661 MISSION :3700 ARREST, BOOKED : 6502
## 18:00 : 649 NORTHERN :3589 ARREST, CITED : 1419
## 19:00 : 621 CENTRAL :3513 LOCATED : 1042
## 17:00 : 594 BAYVIEW :2725 UNFOUNDED : 260
## 20:00 : 586 INGLESIDE:2378 JUVENILE BOOKED: 163
## (Other):25098 (Other) :7349 (Other) : 468
## Address X Y
## 800 Block of BRYANT ST : 948 Min. :-122.5 Min. :37.71
## 800 Block of MARKET ST : 288 1st Qu.:-122.4 1st Qu.:37.76
## 900 Block of POTRERO AV : 230 Median :-122.4 Median :37.78
## 1000 Block of POTRERO AV: 199 Mean :-122.4 Mean :37.77
## 2000 Block of MISSION ST: 149 3rd Qu.:-122.4 3rd Qu.:37.79
## 16TH ST / MISSION ST : 116 Max. :-122.4 Max. :37.82
## (Other) :27063
## Location PdId
## (37.775420706711, -122.403404791479) : 940 Min. :1.028e+12
## (37.7571580431915, -122.406604919508): 224 1st Qu.:1.405e+13
## (37.7564864109309, -122.406539115148): 196 Median :1.406e+13
## (37.7650501214668, -122.419671780296): 152 Mean :1.420e+13
## (37.7841893501425, -122.407633520742): 150 3rd Qu.:1.407e+13
## (37.7285280627465, -122.475647460786): 102 Max. :9.904e+13
## (Other) :27229
head(crimesSF)
## IncidntNum Category Descript DayOfWeek
## 1 140734311 ARSON ARSON OF A VEHICLE Sunday
## 2 140736317 NON-CRIMINAL LOST PROPERTY Sunday
## 3 146177923 LARCENY/THEFT GRAND THEFT FROM LOCKED AUTO Sunday
## 4 146177531 LARCENY/THEFT GRAND THEFT FROM LOCKED AUTO Sunday
## 5 140734220 NON-CRIMINAL FOUND PROPERTY Sunday
## 6 140734349 DRUG/NARCOTIC POSSESSION OF MARIJUANA Sunday
## Date Time PdDistrict Resolution Address
## 1 08/31/2014 23:50 BAYVIEW NONE LOOMIS ST / INDUSTRIAL ST
## 2 08/31/2014 23:45 MISSION NONE 400 Block of CASTRO ST
## 3 08/31/2014 23:30 SOUTHERN NONE 1000 Block of MISSION ST
## 4 08/31/2014 23:30 RICHMOND NONE FULTON ST / 26TH AV
## 5 08/31/2014 23:23 RICHMOND NONE 800 Block of LA PLAYA ST
## 6 08/31/2014 23:13 SOUTHERN ARREST, BOOKED 11TH ST / MINNA ST
## X Y Location PdId
## 1 -122.4056 37.73832 (37.7383221869053, -122.405646994567) 1.407343e+13
## 2 -122.4350 37.76177 (37.7617677182954, -122.435012093789) 1.407363e+13
## 3 -122.4098 37.78004 (37.7800356268394, -122.409795194505) 1.461779e+13
## 4 -122.4853 37.77252 (37.7725176473142, -122.485262988324) 1.461775e+13
## 5 -122.5099 37.77231 (37.7723131976814, -122.509895418239) 1.407342e+13
## 6 -122.4166 37.77391 (37.773907074489, -122.416578493475) 1.407343e+13
Perform some data management to facilitate crime analysis. Add two new categorical variables (factors in R) for holding the hour of crime and information about whether the crime was some form of theft into the data set. A crime in this analysis is considered a theft if it categorizes as one of burglary, extortion, kidnapping, larceny, robbery, or vehicle theft.
crimesSF$Hour <- factor(substr(crimesSF$Time, 1, 2))
crimesSF$Theft <- factor(with(crimesSF, ifelse(Category=="BURGLARY" | Category=="EXTORTION" | Category=="KIDNAPPING" | Category=="LARCENY/THEFT" | Category=="ROBBERY" | Category=="VEHICLE THEFT","Yes","No")))
library(ggplot2)
ggplot(crimesSF, aes(x=Category, fill=Theft)) + geom_bar() + labs(x = "Different types of crimes reported", y = "Count of crimes", title = "Frequencies of the different types of crimes reported in San Francisco") + coord_flip()
Larceny/theft is by far the most frequent crime in San Francisco. However, other types of crimes seem to occur slightly more than thefts in this city.
ggplot(crimesSF, aes(x=PdDistrict, fill=Theft)) + geom_bar(position="dodge") + labs(x = "PD Districts in San Francisco", y = "Count of crimes", title = "Reported crimes by police department districts in San Francisco") + theme(axis.text.x=element_text(angle=50, vjust=0.5))
The Southern district seems to be the worst affected region in San Francisco, while Richmond and Park districts seem to be least affected ones. Crimes of theft are less prevalent in Bayview, Mission, and Tenderloin districts in comparison to other crimes.
ggplot(crimesSF, aes(x=Hour, fill=Theft)) + geom_bar(position="dodge") + labs(x = "Hours of a Day", y = "Count of Crimes in San Francisco", title = "Reported crimes in San Francisco by hours of a day")
The plot shows that the criminal activities increase as the sun goes up and decrease as the night falls. Both theft and non-theft crimes seem to follow this pattern. Thefts are at their peak in the early evenings when traffic is typically congested and it will take longer for the authorities to show up at the affected residences.
ggplot(crimesSF, aes(x=Hour, fill=Theft)) + geom_bar(position="dodge") + facet_grid(PdDistrict ~ .) + theme(strip.text.y=element_text(angle=0)) + labs(x = "Hours of a Day", y = "Count of Crimes in San Francisco by District", title = "Reported crimes in San Francisco by hours of a day")
The distribution of criminal activities is consistent over all districts and hours of the day with the pattern seen above when these variable were considered independently. The Southern district seems to be rife with crimes, particularly in the early evening hours, in comparison to other places and times of the day. Travellers are best advised to avoid that part of the city!