Get San Francisco crime data into R

Read the crime data from the given input file.

crimesSF <- read.csv(file = "sanfrancisco_incidents_summer_2014.csv", header = TRUE, sep = ",")

Here’s the structure, summary, and first few records of the crimesSF data frame.

str(crimesSF)
## 'data.frame':    28993 obs. of  13 variables:
##  $ IncidntNum: int  140734311 140736317 146177923 146177531 140734220 140734349 140734349 140734349 140738147 140734258 ...
##  $ Category  : Factor w/ 34 levels "ARSON","ASSAULT",..: 1 20 16 16 20 7 7 6 21 30 ...
##  $ Descript  : Factor w/ 368 levels "ABANDONMENT OF CHILD",..: 15 179 143 143 132 247 239 93 107 347 ...
##  $ DayOfWeek : Factor w/ 7 levels "Friday","Monday",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ Date      : Factor w/ 92 levels "06/01/2014","06/02/2014",..: 92 92 92 92 92 92 92 92 92 92 ...
##  $ Time      : Factor w/ 1379 levels "00:01","00:02",..: 1370 1365 1351 1351 1344 1334 1334 1334 1321 1321 ...
##  $ PdDistrict: Factor w/ 10 levels "BAYVIEW","CENTRAL",..: 1 4 8 7 7 8 8 8 3 2 ...
##  $ Resolution: Factor w/ 16 levels "ARREST, BOOKED",..: 12 12 12 12 12 1 1 1 12 2 ...
##  $ Address   : Factor w/ 8055 levels "0 Block of 10TH ST",..: 6843 4022 1098 6111 5096 1263 1263 1263 1575 5236 ...
##  $ X         : num  -122 -122 -122 -122 -123 ...
##  $ Y         : num  37.7 37.8 37.8 37.8 37.8 ...
##  $ Location  : Factor w/ 8732 levels "(37.7080829769301, -122.419241455854)",..: 1970 3730 5834 4802 4777 4993 4993 4993 2543 7598 ...
##  $ PdId      : num  1.41e+13 1.41e+13 1.46e+13 1.46e+13 1.41e+13 ...
summary(crimesSF)
##    IncidntNum                  Category   
##  Min.   : 10284385   LARCENY/THEFT :9466  
##  1st Qu.:140545607   OTHER OFFENSES:3567  
##  Median :140632022   NON-CRIMINAL  :3023  
##  Mean   :142017280   ASSAULT       :2882  
##  3rd Qu.:140719664   VEHICLE THEFT :1966  
##  Max.   :990367398   WARRANTS      :1782  
##                      (Other)       :6307  
##                          Descript         DayOfWeek            Date      
##  GRAND THEFT FROM LOCKED AUTO: 3766   Friday   :4451   06/28/2014:  410  
##  STOLEN AUTOMOBILE           : 1350   Monday   :4005   08/09/2014:  410  
##  LOST PROPERTY               : 1202   Saturday :4319   08/08/2014:  403  
##  PETTY THEFT OF PROPERTY     : 1125   Sunday   :4218   06/29/2014:  397  
##  WARRANT ARREST              :  980   Thursday :3968   08/29/2014:  388  
##  PETTY THEFT FROM LOCKED AUTO:  955   Tuesday  :3930   06/04/2014:  380  
##  (Other)                     :19615   Wednesday:4102   (Other)   :26605  
##       Time           PdDistrict             Resolution   
##  12:00  :  784   SOUTHERN :5739   NONE           :19139  
##  00:01  :  661   MISSION  :3700   ARREST, BOOKED : 6502  
##  18:00  :  649   NORTHERN :3589   ARREST, CITED  : 1419  
##  19:00  :  621   CENTRAL  :3513   LOCATED        : 1042  
##  17:00  :  594   BAYVIEW  :2725   UNFOUNDED      :  260  
##  20:00  :  586   INGLESIDE:2378   JUVENILE BOOKED:  163  
##  (Other):25098   (Other)  :7349   (Other)        :  468  
##                      Address            X                Y        
##  800 Block of BRYANT ST  :  948   Min.   :-122.5   Min.   :37.71  
##  800 Block of MARKET ST  :  288   1st Qu.:-122.4   1st Qu.:37.76  
##  900 Block of POTRERO AV :  230   Median :-122.4   Median :37.78  
##  1000 Block of POTRERO AV:  199   Mean   :-122.4   Mean   :37.77  
##  2000 Block of MISSION ST:  149   3rd Qu.:-122.4   3rd Qu.:37.79  
##  16TH ST / MISSION ST    :  116   Max.   :-122.4   Max.   :37.82  
##  (Other)                 :27063                                   
##                                   Location          PdId          
##  (37.775420706711, -122.403404791479) :  940   Min.   :1.028e+12  
##  (37.7571580431915, -122.406604919508):  224   1st Qu.:1.405e+13  
##  (37.7564864109309, -122.406539115148):  196   Median :1.406e+13  
##  (37.7650501214668, -122.419671780296):  152   Mean   :1.420e+13  
##  (37.7841893501425, -122.407633520742):  150   3rd Qu.:1.407e+13  
##  (37.7285280627465, -122.475647460786):  102   Max.   :9.904e+13  
##  (Other)                              :27229
head(crimesSF)
##   IncidntNum      Category                     Descript DayOfWeek
## 1  140734311         ARSON           ARSON OF A VEHICLE    Sunday
## 2  140736317  NON-CRIMINAL                LOST PROPERTY    Sunday
## 3  146177923 LARCENY/THEFT GRAND THEFT FROM LOCKED AUTO    Sunday
## 4  146177531 LARCENY/THEFT GRAND THEFT FROM LOCKED AUTO    Sunday
## 5  140734220  NON-CRIMINAL               FOUND PROPERTY    Sunday
## 6  140734349 DRUG/NARCOTIC      POSSESSION OF MARIJUANA    Sunday
##         Date  Time PdDistrict     Resolution                   Address
## 1 08/31/2014 23:50    BAYVIEW           NONE LOOMIS ST / INDUSTRIAL ST
## 2 08/31/2014 23:45    MISSION           NONE    400 Block of CASTRO ST
## 3 08/31/2014 23:30   SOUTHERN           NONE  1000 Block of MISSION ST
## 4 08/31/2014 23:30   RICHMOND           NONE       FULTON ST / 26TH AV
## 5 08/31/2014 23:23   RICHMOND           NONE  800 Block of LA PLAYA ST
## 6 08/31/2014 23:13   SOUTHERN ARREST, BOOKED        11TH ST / MINNA ST
##           X        Y                              Location         PdId
## 1 -122.4056 37.73832 (37.7383221869053, -122.405646994567) 1.407343e+13
## 2 -122.4350 37.76177 (37.7617677182954, -122.435012093789) 1.407363e+13
## 3 -122.4098 37.78004 (37.7800356268394, -122.409795194505) 1.461779e+13
## 4 -122.4853 37.77252 (37.7725176473142, -122.485262988324) 1.461775e+13
## 5 -122.5099 37.77231 (37.7723131976814, -122.509895418239) 1.407342e+13
## 6 -122.4166 37.77391  (37.773907074489, -122.416578493475) 1.407343e+13

Perform some data management to facilitate crime analysis. Add two new categorical variables (factors in R) for holding the hour of crime and information about whether the crime was some form of theft into the data set. A crime in this analysis is considered a theft if it categorizes as one of burglary, extortion, kidnapping, larceny, robbery, or vehicle theft.

crimesSF$Hour <- factor(substr(crimesSF$Time, 1, 2))
crimesSF$Theft <- factor(with(crimesSF, ifelse(Category=="BURGLARY" | Category=="EXTORTION" | Category=="KIDNAPPING" | Category=="LARCENY/THEFT" | Category=="ROBBERY" | Category=="VEHICLE THEFT","Yes","No")))

Add the ggplot library for visualization

library(ggplot2)

What are the different types of crimes reported in the database? What are their frequencies?

ggplot(crimesSF, aes(x=Category, fill=Theft)) + geom_bar() + labs(x = "Different types of crimes reported", y = "Count of crimes", title = "Frequencies of the different types of crimes reported in San Francisco") + coord_flip()

Larceny/theft is by far the most frequent crime in San Francisco. However, other types of crimes seem to occur slightly more than thefts in this city.

Where in San Francisco do the criminals strike most often? What kind of crimes occur in various districts of the city?

ggplot(crimesSF, aes(x=PdDistrict, fill=Theft)) + geom_bar(position="dodge") + labs(x = "PD Districts in San Francisco", y = "Count of crimes", title = "Reported crimes by police department districts in San Francisco") + theme(axis.text.x=element_text(angle=50, vjust=0.5))

The Southern district seems to be the worst affected region in San Francisco, while Richmond and Park districts seem to be least affected ones. Crimes of theft are less prevalent in Bayview, Mission, and Tenderloin districts in comparison to other crimes.

At what hours are the criminals most active?

ggplot(crimesSF, aes(x=Hour, fill=Theft)) + geom_bar(position="dodge") + labs(x = "Hours of a Day", y = "Count of Crimes in San Francisco", title = "Reported crimes in San Francisco by hours of a day")

The plot shows that the criminal activities increase as the sun goes up and decrease as the night falls. Both theft and non-theft crimes seem to follow this pattern. Thefts are at their peak in the early evenings when traffic is typically congested and it will take longer for the authorities to show up at the affected residences.

Criminal activities by time of the day and district in San Francisco

ggplot(crimesSF, aes(x=Hour, fill=Theft)) + geom_bar(position="dodge") + facet_grid(PdDistrict ~ .) + theme(strip.text.y=element_text(angle=0)) + labs(x = "Hours of a Day", y = "Count of Crimes in San Francisco by District", title = "Reported crimes in San Francisco by hours of a day")

The distribution of criminal activities is consistent over all districts and hours of the day with the pattern seen above when these variable were considered independently. The Southern district seems to be rife with crimes, particularly in the early evening hours, in comparison to other places and times of the day. Travellers are best advised to avoid that part of the city!