Required packages:

Be sure to have these packages installed in R before running any code! Most of them should be on CRAN and regularly updated, so you should be able to run install.packages("PACKAGE_NAME") to install anything that’s missing!

##   ggplot2     dplyr lubridate  reshape2  magrittr   stringi gridExtra 
##      TRUE      TRUE      TRUE      TRUE      TRUE      TRUE      TRUE 
##     tidyr     ggmap  ggthemes    scales  RSocrata animation 
##      TRUE      TRUE      TRUE      TRUE      TRUE      TRUE



Get Data from City of Austin APIs

Using the https://data.austintexas.gov & the Socrata API


APD INCIDENT EXTRACT YTD

APD.file_path   <- "Data/APD_YTD.csv"
APD.socrata_URL <- "https://data.austintexas.gov/resource/b4y9-5x39.csv"

# Import data
APD <- RSocrata::read.socrata(APD.socrata_URL)

# Write as a .csv
write.csv(APD, file = APD.file_path)

Load the dataset for the last 18 months of APD incidents. See description on APD Incidents webpage for more details.

The live-updated Socrata path is https://data.austintexas.gov/resource/b4y9-5x39.csv and the cached comma separated values (CSV) file may be found under the Data/APD_YTD.csv sub-directory.


From their website
* Due to the methodological differences in data collection, different data sources may produce different results.
* Our on-line database is continuously being updated. The data provided here represents a particular point in time.
* Updates to the police report database occur daily. Information is available from today’s date back 18 months.
* Due to several factors (once-a-day updates, offense reclassification, reported versus occurred dates, etc.) comparisons should not be made between numbers generated with this database to any other official police reports. * Data provided represents only calls for police service where a report was written.

The Austin Police Department does not assume any liability for any decision made or action taken or not taken by the recipient in reliance upon any information or data provided.


MUNICIPAL COURT CASELOAD INFORMATION

Court.file_path   <- "Data/CourtData.csv"
Court.socrata_URL <- "https://data.austintexas.gov/resource/8jyt-x94k.csv"

# Import data
Court <- RSocrata::read.socrata(Court.socrata_URL)

# Write as a .csv
write.csv(Court, file = Court.file_path)

This data is provided to help with analysis of various violations charged throughout the City of Austin. See Court Cases webpage for more details.

The live-updated Socrata path is https://data.austintexas.gov/resource/8jyt-x94k.csv and the cached comma separated values (CSV) file may be found under the Data/CourtData.csv sub-directory.



What Crimes Are Committed Most Often?

## Totals by crime (irregardless of date)
By.Crime <- d %>% group_by(Crime.Type) %>% 
  
  # Collapse data.frame by number of observations
  summarise(total = sum(Count)) %>% 
  
  # Add columns for cumulative distance and ranking
  mutate(cume_dist = cume_dist(total),
         rank      = dense_rank(total), 
         rank      = max(rank) + 1 - rank) %>% arrange(rank)

To kick things off, let’s take a look the frequency of each Crimes.Type under the APD Incident Report. To do this, we’ll aggregate each Crimes.Type into a data.frame using the summarise() function in dplyr.

This collapses our data.frame by an aggregate function, yeilding a new data.frame with two columns (the type of crime & the total number of overervations of that crime) and 129777 rows. For convience, I added two aditional columns to help me visualize and select some basic attributes by rank.

Distribution of Rank by Frequency

Now lets plot the ranking of each Crime.Type versus the number of occurances so far this year

Figure 01a

Looks like it’s log distibuted. I’ll apply two transofrmations to the plot I just created to yeild:

Figure 01b & Figure 01c

That’s much better! Looks like there are quite a few crimes that occur with great frequency, so we’ll investigate those first.

Top 50 Crimes

Crime.Type total cume_dist rank
CRASH/LEAVING THE SCENE 293874 1.0000000 1
THEFT 259057 0.9975728 2
BURGLARY OF VEHICLE 223076 0.9951456 3
FAMILY DISTURBANCE 217208 0.9927184 4
CRIMINAL TRESPASS NOTICE 87720 0.9902913 5
CRIMINAL MISCHIEF 84229 0.9878641 6
THEFT BY SHOPLIFTING 37916 0.9854369 7
DWI 36047 0.9830097 8
LOST PROP 35447 0.9805825 9
DISTURBANCE - OTHER 33626 0.9781553 10
ASSAULT W/INJURY-FAM/DATE VIOL 32052 0.9757282 11
HARASSMENT 24786 0.9733010 12
BURGLARY OF RESIDENCE 23622 0.9708738 13
WARRANT ARREST NON TRAFFIC 20215 0.9684466 14
PUBLIC INTOXICATION 20038 0.9660194 15
REQUEST TO APPREHEND 19464 0.9635922 16
ASSAULT WITH INJURY 17997 0.9611650 17
ABANDONED VEH 16024 0.9587379 18
AUTO THEFT 13840 0.9563107 19
POSS MARIJUANA 13798 0.9538835 20
FOUND PROPERTY 12201 0.9514563 21
FRAUD - OTHER 10454 0.9490291 22
CUSTODY ARREST TRAFFIC WARR 10288 0.9466019 23
IDENTITY THEFT 9865 0.9441748 24
EMERGENCY PROTECTIVE ORDER 9295 0.9417476 25
ASSAULT BY CONTACT 7334 0.9393204 26
DRIVING WHILE LICENSE INVALID 7111 0.9368932 27
POSS CONTROLLED SUB/NARCOTIC 7036 0.9344660 28
THEFT OF BICYCLE 6447 0.9320388 29
BURGLARY NON RESIDENCE 5494 0.9296117 30
CRIMINAL TRESPASS 4705 0.9271845 31
VIOL CITY ORDINANCE - OTHER 4546 0.9247573 32
DEBIT CARD ABUSE 3390 0.9223301 33
ASSAULT BY CONTACT FAM/DATING 3187 0.9199029 34
POSS OF DRUG PARAPHERNALIA 3069 0.9174757 35
THEFT OF SERVICE 3048 0.9150485 36
DATING DISTURBANCE 2790 0.9126214 37
BICYCLE REGISTRATION 2637 0.9101942 38
ASSAULT BY THREAT 2378 0.9077670 39
ASSIST COMPLAINANT 2272 0.9053398 40
BURGLARY INFORMATION 2050 0.9029126 41
CRED CARD ABUSE - OTHER 2010 0.9004854 42
TERRORISTIC THREAT 2007 0.8980583 43
DWI .15 BAC OR ABOVE 1780 0.8956311 44
THEFT INFORMATION 1701 0.8932039 45
DWI 2ND 1678 0.8907767 46
IMPOUNDED VEH 1440 0.8883495 47
ASSAULT INFORMATION 1423 0.8859223 48
AGG ASLT STRANGLE/SUFFOCATE 1331 0.8834951 49
FOUND CONTROLLED SUBSTANCE 1300 0.8810680 50



Incidents This Year

Now let’s look at the data set from the time perspective

## Totals by date (irregardless of crime)
By.Day <- d %>% group_by(Date) %>% 
  summarise(total = n()) %>% 
  filter(Date < as.Date("2015-11-01"))

Scatter Plot | Boring!

Figure 02

This is just plain black and white… So boring!

By Day of The Week

Scatter Plot | Colored by Day of Week

Figure 03a

Scatter Plot + Linear Model | Colored by Day of Week

Figure 03b

Line Plot Colored by Day of Week

Figure 03c

It looks like Friday might be a bad day for crime. Let’s look at each day’s linear model, but include a scatter plot so we can get a better estimate

Scatter Plot + Colored Linear Models | Now we’re talking

Figure 03d

Looks like we might be on to something with our Friday hypothesis. We can make a Violin Plot (a boxplot with the width representing density) to further investigate

Violin Plot | By Day of Week

Figure 03e

Ah, that’s much easier to visualize.

This is still work in progress. Everything below this chunck is most 
certianly incomplete and will (hopefully) be finished at a later date!

Fig 04

Doesn’t look like anything special happens

GIFs

Fig02

Fig02

Fig02

Fig02

Contact

Hunter Ratliff

Email: HunterRatliff1@gmail.com
Twitter: @HunterRatliff1

Copyright (C) 2015 Hunter Ratliff

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.