You can use R and QGIS together to streamline mapping process. R is a much better tool for data processing and data exploration. And then once you get comfortable with R, you can use it to perform analysis to inform your analysis and supplement your maps.
Open R Studio - it should look similar to below, though your panes may be in a different order
Source/learn more about R Studio’s functionality
We’re going to create a script to import a dataset of all license applications in New York City since 1998.
Let’s go!
Create new R script File > R Scriptand save it in your class12 folder as explore_nyc_licenses.R
At the top of your script, write a comment describing the purpose of this file:
# Processing and exploring the dataset of all license applications in New York City since 1998 to create a dataset of the new tobacco licenses in Brooklyn in 2021
# source = https://data.cityofnewyork.us/Business/License-Applications/ptev-4hud
Things to notice: Notice how that box above is gray? For this lesson, the gray box indicates it is something you should type into your script in R Studio. The boxes with a white background (see the white box below, after
library(tidyverse)) are the output that you should expect to see in your Console pane in R Studio.
First load the tidyverse collection of packages to your environment (if you have not used the tidyverse before, you will have to install the Package in the Package pane)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.4 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Things to notice:
- Notice the conflicts message that’s printed when you load the tidyverse. Those are just fine. It tells you that dplyr overwrites some functions in base R. If you want to use the base version of these functions after loading dplyr, you’ll need to use their full names: stats::filter() and stats::lag().
- You only need to install a package once, but you must load a package into your R session every time
- install.packages(“package_name”) installs the package on your computer, or use the interface in the Packages pane
- use library(“package_name”) to load the package into the current R environment
read_csv() function.
raw_license to indicate that this is the original form of the data.raw_license <- read_csv("/Users/sarahodges/spatial/NewSchool/methods3_sara/class12/data/raw_data/license_applications/license_applications.csv")
# use the glimpse function to see the columns and data type of the data
glimpse(raw_license)
## Rows: 413,299
## Columns: 20
## $ `Application ID` <chr> "1066-2017-RHIC", "7066-2014-ASLS", "1164-…
## $ `License Number` <chr> "1294131-DCA", "2009515-DCA", "1472251-DCA…
## $ `License Type` <chr> "Business", "Business", "Business", "Busin…
## $ `Application or Renewal` <chr> "Renewal", "Application", "Renewal", "Rene…
## $ `Business Name` <chr> "PEYKO TZENOV", "7TH AVE GOURMET INC.", "A…
## $ Status <chr> "Issued", "Issued", "Issued", "Issued", "I…
## $ Year <dbl> 2017, 2014, 2019, 2017, 2019, 2019, 2017, …
## $ `Temp Op Letter Issued` <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Temp Op Letter Expiration` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `License Category` <chr> "Home Improvement Contractor", "Stoop Line…
## $ `Application Category` <chr> "Special", "Basic", "Basic", "Basic", "Bas…
## $ `Building Number` <chr> "3280", "75", "23", "886", "3015", "5509",…
## $ Street <chr> "RESERVOIR OVAL E", "7TH AVE", "DIVISION S…
## $ `Street 2` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `Unit Type` <chr> NA, NA, NA, NA, NA, NA, NA, NA, "UNIT", NA…
## $ Unit <chr> NA, NA, NA, NA, NA, NA, NA, NA, "303", NA,…
## $ Description <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ City <chr> "BRONX", "NEW YORK", "NEW YORK", "BROOKLYN…
## $ State <chr> "NY", "NY", "NY", "NY", "NY", "NY", "NY", …
## $ Zip <chr> "10467", "10011", "10002", "11222", "11226…
Things to notice:
<-is the ‘assignment operator’, it is used to define a value or dataframe in R- R doesn’t like spaces in column names
- If there is a space, you have to put your variable name in single-quotes or back-ticks to call it
- You can open the help section for the
read_csv()function by typing?read_csvin your Console* pane
Click on your data frame in your Environment pane to open the file within R Studio.
# Explore this data
# What are the columns in the data?
names(raw_license)
## [1] "Application ID" "License Number"
## [3] "License Type" "Application or Renewal"
## [5] "Business Name" "Status"
## [7] "Year" "Temp Op Letter Issued"
## [9] "Temp Op Letter Expiration" "License Category"
## [11] "Application Category" "Building Number"
## [13] "Street" "Street 2"
## [15] "Unit Type" "Unit"
## [17] "Description" "City"
## [19] "State" "Zip"
# What kinds of licenses are given out?
unique(raw_license$`License Category`)
## [1] "Home Improvement Contractor" "Stoop Line Stand"
## [3] "Dealer In Products" "Electronics Store"
## [5] "Laundries" "Debt Collection Agency"
## [7] "Pedicab Driver" "Home Improvement Salesperson"
## [9] "Tobacco Retail Dealer" "Locksmith"
## [11] "Amusement Device Temporary" "Temporary Street Fair Vendor"
## [13] "Ticket Seller" "Scale Dealer Repairer"
## [15] "Pawnbroker" "Sidewalk Cafe"
## [17] "Electronic Cigarette Dealer" "General Vendor"
## [19] "Employment Agency" "Electronic & Appliance Service"
## [21] "Pedicab Business" "Secondhand Dealer - General"
## [23] "Garage" "Amusement Device Portable"
## [25] "Sightseeing Guide" "Newsstand"
## [27] "Garage and Parking Lot" "Tow Truck Driver"
## [29] "Laundry" "Tow Truck Company"
## [31] "Car Wash" "Secondhand Dealer - Auto"
## [33] "Games of Chance" "Scrap Metal Processor"
## [35] "Amusement Arcade" "Auctioneer"
## [37] "Gaming Cafe" "Special Sale"
## [39] "Parking Lot" "Process Server Individual"
## [41] "Auction House Premises" "Process Serving Agency"
## [43] "Pool or Billiard Room" "Horse Drawn Driver"
## [45] "Horse Drawn Cab Owner" "Commercial Lessor"
## [47] "Amusement Device Permanent" "General Vendor Distributor"
## [49] "Bingo Game Operator" "Laundry Jobber"
## [51] "Tow Truck Exemption" "Storage Warehouse"
## [53] "Motion Picture Projectionist" "Catering Establishment"
## [55] "Locksmith Apprentice" "Ticket Seller Business"
## [57] "Cabaret" "Booting Company"
## [59] "Sightseeing Bus" "Secondhand Dealer - Firearms"
# How many licenses of each category were given out?
table(raw_license$`License Category`)
##
## Amusement Arcade Amusement Device Permanent
## 82 946
## Amusement Device Portable Amusement Device Temporary
## 4106 534
## Auction House Premises Auctioneer
## 180 1423
## Bingo Game Operator Booting Company
## 81 18
## Cabaret Car Wash
## 296 472
## Catering Establishment Commercial Lessor
## 127 44
## Dealer In Products Debt Collection Agency
## 3683 6843
## Electronic & Appliance Service Electronic Cigarette Dealer
## 7977 7543
## Electronics Store Employment Agency
## 15554 1465
## Games of Chance Gaming Cafe
## 502 137
## Garage Garage and Parking Lot
## 5268 588
## General Vendor General Vendor Distributor
## 17946 41
## Home Improvement Contractor Home Improvement Salesperson
## 59828 52806
## Horse Drawn Cab Owner Horse Drawn Driver
## 303 1107
## Laundries Laundry
## 8924 3783
## Laundry Jobber Locksmith
## 2707 10530
## Locksmith Apprentice Motion Picture Projectionist
## 75 274
## Newsstand Parking Lot
## 1573 1699
## Pawnbroker Pedicab Business
## 2716 1787
## Pedicab Driver Pool or Billiard Room
## 5582 158
## Process Server Individual Process Serving Agency
## 3955 524
## Scale Dealer Repairer Scrap Metal Processor
## 276 568
## Secondhand Dealer - Auto Secondhand Dealer - Firearms
## 2937 13
## Secondhand Dealer - General Sidewalk Cafe
## 20125 4498
## Sightseeing Bus Sightseeing Guide
## 56 13668
## Special Sale Stoop Line Stand
## 581 9859
## Storage Warehouse Temporary Street Fair Vendor
## 214 17316
## Ticket Seller Ticket Seller Business
## 6736 15
## Tobacco Retail Dealer Tow Truck Company
## 87389 2295
## Tow Truck Driver Tow Truck Exemption
## 12422 144
# What years does this data cover?
table(raw_license$Year)
##
## 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
## 5233 3195 3219 4311 4007 3618 3758 3369 3633 3260 3058 5736 5868
## 2013 2014 2015 2016 2017 2018 2019 2020 2021
## 4709 50038 51853 48963 48449 51292 51050 22593 28986
# Let's look at the data.
After we’ve looked at the data in the Viewer and explored a bit, we can begin to process the data.
We will:
We’ll learn four dplyr functions that are the backbone of data transformation in R
- mutate() - create or redefine a variable (column)
- select() - subset variables (columns) by their names
- filter() - subset observations (rows) by their values
- rename() - rename the variables
# Process the data to select new tobacco license applications in Brooklyn that we given out in 2021
# and create a new column with street address for geocoding
tobacco <- raw_license %>%
filter(`License Category` == "Tobacco Retail Dealer" &
`Application or Renewal` == "Application" &
Year == "2021" &
City == "BROOKLYN") %>%
mutate(address1 = paste(`Building Number`, Street, sep = " ")) %>%
rename(app_id = `Application ID`,
bus_name = `Business Name`,
street2 = `Street 2`,
unit_type = `Unit Type`) %>% # rename variables we want to keep so all have no spaces
select(app_id, bus_name, address1, City, State, Zip, street2, unit_type, Unit, Description) # select the variables to keep
Things to notice:
- The tidyverse uses a pipe operator “%>%” to string together multiple commands
- think of it as meaning “and then”
- you’ll eventually love the “%>%”
- keystroke = Cmd + Shift + M(Mac)/Ctrl + Shift + M(Windows)
- To create a new variable mutate(var_name = equation)
- you use the equal sign within dplyr functions
- Within a code chunk, you don’t need to use the dollar sign ($) before a column name
Write out your processed csv and use it in QGIS!
write_csv(tobacco, "/Users/sarahodges/spatial/NewSchool/methods3_sara/class12/data/processed/bk_tobacco_licenses_2021.csv")