Be sure to have these packages installed in R before running any code! Most of them should be on CRAN and regularly updated, so you should be able to run install.packages("PACKAGE_NAME") to install anything that’s missing!
## ggplot2 dplyr lubridate reshape2 magrittr stringi gridExtra
## TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## tidyr ggmap ggthemes scales RSocrata animation
## TRUE TRUE TRUE TRUE TRUE TRUE
Using the https://data.austintexas.gov & the Socrata API
APD.file_path <- "Data/APD_YTD.csv"
APD.socrata_URL <- "https://data.austintexas.gov/resource/b4y9-5x39.csv"
# Import data
APD <- RSocrata::read.socrata(APD.socrata_URL)
# Write as a .csv
write.csv(APD, file = APD.file_path)
Load the dataset for the last 18 months of APD incidents. See description on APD Incidents webpage for more details.
The live-updated Socrata path is https://data.austintexas.gov/resource/b4y9-5x39.csv and the cached comma separated values (CSV) file may be found under the Data/APD_YTD.csv sub-directory.
From their website
* Due to the methodological differences in data collection, different data sources may produce different results.
* Our on-line database is continuously being updated. The data provided here represents a particular point in time.
* Updates to the police report database occur daily. Information is available from today’s date back 18 months.
* Due to several factors (once-a-day updates, offense reclassification, reported versus occurred dates, etc.) comparisons should not be made between numbers generated with this database to any other official police reports. * Data provided represents only calls for police service where a report was written.
The Austin Police Department does not assume any liability for any decision made or action taken or not taken by the recipient in reliance upon any information or data provided.
Court.file_path <- "Data/CourtData.csv"
Court.socrata_URL <- "https://data.austintexas.gov/resource/8jyt-x94k.csv"
# Import data
Court <- RSocrata::read.socrata(Court.socrata_URL)
# Write as a .csv
write.csv(Court, file = Court.file_path)
This data is provided to help with analysis of various violations charged throughout the City of Austin. See Court Cases webpage for more details.
The live-updated Socrata path ishttps://data.austintexas.gov/resource/8jyt-x94k.csv and the cached comma separated values (CSV) file may be found under the Data/CourtData.csv sub-directory.
## Totals by crime (irregardless of date)
By.Crime <- d %>% group_by(Crime.Type) %>%
# Collapse data.frame by number of observations
summarise(total = sum(Count)) %>%
# Add columns for cumulative distance and ranking
mutate(cume_dist = cume_dist(total),
rank = dense_rank(total),
rank = max(rank) + 1 - rank) %>% arrange(rank)
To kick things off, let’s take a look the frequency of each Crimes.Type under the APD Incident Report. To do this, we’ll aggregate each Crimes.Type into a data.frame using the summarise() function in dplyr.
This collapses our data.frame by an aggregate function, yeilding a new data.frame with two columns (the type of crime & the total number of overervations of that crime) and 129777 rows. For convience, I added two aditional columns to help me visualize and select some basic attributes by rank.
Now lets plot the ranking of each Crime.Type versus the number of occurances so far this year
Looks like it’s log distibuted. I’ll apply two transofrmations to the plot I just created to yeild:
That’s much better! Looks like there are quite a few crimes that occur with great frequency, so we’ll investigate those first.
| Crime.Type | total | cume_dist | rank |
|---|---|---|---|
| CRASH/LEAVING THE SCENE | 293874 | 1.0000000 | 1 |
| THEFT | 259057 | 0.9975728 | 2 |
| BURGLARY OF VEHICLE | 223076 | 0.9951456 | 3 |
| FAMILY DISTURBANCE | 217208 | 0.9927184 | 4 |
| CRIMINAL TRESPASS NOTICE | 87720 | 0.9902913 | 5 |
| CRIMINAL MISCHIEF | 84229 | 0.9878641 | 6 |
| THEFT BY SHOPLIFTING | 37916 | 0.9854369 | 7 |
| DWI | 36047 | 0.9830097 | 8 |
| LOST PROP | 35447 | 0.9805825 | 9 |
| DISTURBANCE - OTHER | 33626 | 0.9781553 | 10 |
| ASSAULT W/INJURY-FAM/DATE VIOL | 32052 | 0.9757282 | 11 |
| HARASSMENT | 24786 | 0.9733010 | 12 |
| BURGLARY OF RESIDENCE | 23622 | 0.9708738 | 13 |
| WARRANT ARREST NON TRAFFIC | 20215 | 0.9684466 | 14 |
| PUBLIC INTOXICATION | 20038 | 0.9660194 | 15 |
| REQUEST TO APPREHEND | 19464 | 0.9635922 | 16 |
| ASSAULT WITH INJURY | 17997 | 0.9611650 | 17 |
| ABANDONED VEH | 16024 | 0.9587379 | 18 |
| AUTO THEFT | 13840 | 0.9563107 | 19 |
| POSS MARIJUANA | 13798 | 0.9538835 | 20 |
| FOUND PROPERTY | 12201 | 0.9514563 | 21 |
| FRAUD - OTHER | 10454 | 0.9490291 | 22 |
| CUSTODY ARREST TRAFFIC WARR | 10288 | 0.9466019 | 23 |
| IDENTITY THEFT | 9865 | 0.9441748 | 24 |
| EMERGENCY PROTECTIVE ORDER | 9295 | 0.9417476 | 25 |
| ASSAULT BY CONTACT | 7334 | 0.9393204 | 26 |
| DRIVING WHILE LICENSE INVALID | 7111 | 0.9368932 | 27 |
| POSS CONTROLLED SUB/NARCOTIC | 7036 | 0.9344660 | 28 |
| THEFT OF BICYCLE | 6447 | 0.9320388 | 29 |
| BURGLARY NON RESIDENCE | 5494 | 0.9296117 | 30 |
| CRIMINAL TRESPASS | 4705 | 0.9271845 | 31 |
| VIOL CITY ORDINANCE - OTHER | 4546 | 0.9247573 | 32 |
| DEBIT CARD ABUSE | 3390 | 0.9223301 | 33 |
| ASSAULT BY CONTACT FAM/DATING | 3187 | 0.9199029 | 34 |
| POSS OF DRUG PARAPHERNALIA | 3069 | 0.9174757 | 35 |
| THEFT OF SERVICE | 3048 | 0.9150485 | 36 |
| DATING DISTURBANCE | 2790 | 0.9126214 | 37 |
| BICYCLE REGISTRATION | 2637 | 0.9101942 | 38 |
| ASSAULT BY THREAT | 2378 | 0.9077670 | 39 |
| ASSIST COMPLAINANT | 2272 | 0.9053398 | 40 |
| BURGLARY INFORMATION | 2050 | 0.9029126 | 41 |
| CRED CARD ABUSE - OTHER | 2010 | 0.9004854 | 42 |
| TERRORISTIC THREAT | 2007 | 0.8980583 | 43 |
| DWI .15 BAC OR ABOVE | 1780 | 0.8956311 | 44 |
| THEFT INFORMATION | 1701 | 0.8932039 | 45 |
| DWI 2ND | 1678 | 0.8907767 | 46 |
| IMPOUNDED VEH | 1440 | 0.8883495 | 47 |
| ASSAULT INFORMATION | 1423 | 0.8859223 | 48 |
| AGG ASLT STRANGLE/SUFFOCATE | 1331 | 0.8834951 | 49 |
| FOUND CONTROLLED SUBSTANCE | 1300 | 0.8810680 | 50 |
Now let’s look at the data set from the time perspective
## Totals by date (irregardless of crime)
By.Day <- d %>% group_by(Date) %>%
summarise(total = n()) %>%
filter(Date < as.Date("2015-11-01"))
This is just plain black and white… So boring!
It looks like Friday might be a bad day for crime. Let’s look at each day’s linear model, but include a scatter plot so we can get a better estimate
Looks like we might be on to something with our Friday hypothesis. We can make a Violin Plot (a boxplot with the width representing density) to further investigate
Ah, that’s much easier to visualize.
This is still work in progress. Everything below this chunck is most
certianly incomplete and will (hopefully) be finished at a later date!
Doesn’t look like anything special happens
Hunter Ratliff
Email: HunterRatliff1@gmail.com
Twitter: @HunterRatliff1
Copyright (C) 2015 Hunter Ratliff
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.