R to QGIS workflow






You can use R and QGIS together to streamline mapping process. R is a much better tool for data processing and data exploration. And then once you get comfortable with R, you can use it to perform analysis to inform your analysis and supplement your maps.

Some examples

  • process census data in R —> join to shapefile in QGIS
  • explore dataset in R —> map it in QGIS
  • create a map of census tracts in QGIS — agggregate the data up to neighborhood-level in R to provide context
  • process address data in R —> geocode it in QGIS —> map it in QGIS


Today we’re going to explore a data set in R, select data from it, and prepare it to geocode. Then we’ll write it as a csv to our computer to geocode it using the MMQGIS Plugin in QGIS.




Introduction to R and R Studio


Terms and Definitions

  • R: a programming language and software environment for statistical computing and graphics
  • R Studio: an application that helps you write in R in a user-friendly way
  • R script: a recipe for your analysis
    • a collection of commands, equations, and functions that you write to do something (like answer a data question!)
    • scripts make your analysis reproducible and are a record of what you have done
  • Base R: the functions that come standard with your R installation
  • R package: a collection of functions and datasets, created by the community. Verified and published for other R users to access.


R Studio layout

Open R Studio - it should look similar to below, though your panes may be in a different order

  • If you wish to to move your panes so they are in different locations:
    • Select View >> Panes >> Pane Layout
    • It’s completely personal choice


Source/learn more about R Studio’s functionality


Some things to remember

  • R scripts sometimes run slightly differently on different machines
    • different operating systems, R versions, package versions, …
    • this shouldn’t impact your results
  • You can google R, try it! (“r import csv”)
  • Other people have probably had the same problem as you or asked the same question
  • Style matters! Make it readable for future you
  • Include comments explaining what you are doing/thinking
    • put a hash # before any text that you don’t want the computer to run
  • Learn R shortcuts
    • cheatsheet link
    • cheatsheet keystroke in R Studio:
      • Option + Shift + K (Mac)
      • Alt + Shift + K (WINDOWS)





Create a script

We’re going to create a script to import a dataset of all license applications in New York City since 1998.

  • We will look at the dataset
  • Explore the data a bit
  • Create a new dataset of the new tobacco licenses in Brooklyn in 2021
  • Create a new column that formats the street address for geocoding
  • Write the dataset to your computer as a csv
  • Geocode the addresses in QGIS

Let’s go!

Create new R script File > R Scriptand save it in your class12 folder as explore_nyc_licenses.R

At the top of your script, write a comment describing the purpose of this file:

# Processing and exploring the dataset of all license applications in New York City since 1998 to create a dataset of the new tobacco licenses in Brooklyn in 2021
# source = https://data.cityofnewyork.us/Business/License-Applications/ptev-4hud

Things to notice: Notice how that box above is gray? For this lesson, the gray box indicates it is something you should type into your script in R Studio. The boxes with a white background (see the white box below, after library(tidyverse)) are the output that you should expect to see in your Console pane in R Studio.


First load the tidyverse collection of packages to your environment (if you have not used the tidyverse before, you will have to install the Package in the Package pane)

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.4     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Things to notice:

  • Notice the conflicts message that’s printed when you load the tidyverse. Those are just fine. It tells you that dplyr overwrites some functions in base R. If you want to use the base version of these functions after loading dplyr, you’ll need to use their full names: stats::filter() and stats::lag().
  • You only need to install a package once, but you must load a package into your R session every time
    • install.packages(“package_name”) installs the package on your computer, or use the interface in the Packages pane
    • use library(“package_name”) to load the package into the current R environment



Import the csv

  • Read the dataset into your R Environment using the read_csv() function.
    • Use the full filepath - you can copy it from the Import Dataset window by clicking on the file the Files pane in R Studio
  • Name it raw_license to indicate that this is the original form of the data.
  • Run your code by either:
    • Click Run on the upper right above your script - this will run your whole script
    • Highlight the portion of your script you want to run and click RUN
    • Highlight the portion of your script you want to run and press Cmd + Enter(Mac)/Cntrl + Enter(Windows)
raw_license <- read_csv("/Users/sarahodges/spatial/NewSchool/methods3_sara/class12/data/raw_data/license_applications/license_applications.csv") 

# use the glimpse function to see the columns and data type of the data
glimpse(raw_license)
## Rows: 413,299
## Columns: 20
## $ `Application ID`            <chr> "1066-2017-RHIC", "7066-2014-ASLS", "1164-…
## $ `License Number`            <chr> "1294131-DCA", "2009515-DCA", "1472251-DCA…
## $ `License Type`              <chr> "Business", "Business", "Business", "Busin…
## $ `Application or Renewal`    <chr> "Renewal", "Application", "Renewal", "Rene…
## $ `Business Name`             <chr> "PEYKO TZENOV", "7TH AVE GOURMET INC.", "A…
## $ Status                      <chr> "Issued", "Issued", "Issued", "Issued", "I…
## $ Year                        <dbl> 2017, 2014, 2019, 2017, 2019, 2019, 2017, …
## $ `Temp Op Letter Issued`     <date> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ `Temp Op Letter Expiration` <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `License Category`          <chr> "Home Improvement Contractor", "Stoop Line…
## $ `Application Category`      <chr> "Special", "Basic", "Basic", "Basic", "Bas…
## $ `Building Number`           <chr> "3280", "75", "23", "886", "3015", "5509",…
## $ Street                      <chr> "RESERVOIR OVAL E", "7TH AVE", "DIVISION S…
## $ `Street 2`                  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ `Unit Type`                 <chr> NA, NA, NA, NA, NA, NA, NA, NA, "UNIT", NA…
## $ Unit                        <chr> NA, NA, NA, NA, NA, NA, NA, NA, "303", NA,…
## $ Description                 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ City                        <chr> "BRONX", "NEW YORK", "NEW YORK", "BROOKLYN…
## $ State                       <chr> "NY", "NY", "NY", "NY", "NY", "NY", "NY", …
## $ Zip                         <chr> "10467", "10011", "10002", "11222", "11226…


Things to notice:

  • <- is the ‘assignment operator’, it is used to define a value or dataframe in R
  • R doesn’t like spaces in column names
    • If there is a space, you have to put your variable name in single-quotes or back-ticks to call it
    • You can open the help section for the read_csv() function by typing ?read_csv in your Console* pane

Explore your data

Click on your data frame in your Environment pane to open the file within R Studio.

  • You can sort a column by clicking on the column header
  • You can filter the rows by clicking on the Filter button above the table and typing into the white box above each column
  • You can also use R functions to explore your data
  • To call a specific column, use the format dataframe$column name
# Explore this data

# What are the columns in the data?
names(raw_license)
##  [1] "Application ID"            "License Number"           
##  [3] "License Type"              "Application or Renewal"   
##  [5] "Business Name"             "Status"                   
##  [7] "Year"                      "Temp Op Letter Issued"    
##  [9] "Temp Op Letter Expiration" "License Category"         
## [11] "Application Category"      "Building Number"          
## [13] "Street"                    "Street 2"                 
## [15] "Unit Type"                 "Unit"                     
## [17] "Description"               "City"                     
## [19] "State"                     "Zip"
# What kinds of licenses are given out?
unique(raw_license$`License Category`)
##  [1] "Home Improvement Contractor"    "Stoop Line Stand"              
##  [3] "Dealer In Products"             "Electronics Store"             
##  [5] "Laundries"                      "Debt Collection Agency"        
##  [7] "Pedicab Driver"                 "Home Improvement Salesperson"  
##  [9] "Tobacco Retail Dealer"          "Locksmith"                     
## [11] "Amusement Device Temporary"     "Temporary Street Fair Vendor"  
## [13] "Ticket Seller"                  "Scale Dealer Repairer"         
## [15] "Pawnbroker"                     "Sidewalk Cafe"                 
## [17] "Electronic Cigarette Dealer"    "General Vendor"                
## [19] "Employment Agency"              "Electronic & Appliance Service"
## [21] "Pedicab Business"               "Secondhand Dealer - General"   
## [23] "Garage"                         "Amusement Device Portable"     
## [25] "Sightseeing Guide"              "Newsstand"                     
## [27] "Garage and Parking Lot"         "Tow Truck Driver"              
## [29] "Laundry"                        "Tow Truck Company"             
## [31] "Car Wash"                       "Secondhand Dealer - Auto"      
## [33] "Games of Chance"                "Scrap Metal Processor"         
## [35] "Amusement Arcade"               "Auctioneer"                    
## [37] "Gaming Cafe"                    "Special Sale"                  
## [39] "Parking Lot"                    "Process Server Individual"     
## [41] "Auction House Premises"         "Process Serving Agency"        
## [43] "Pool or Billiard Room"          "Horse Drawn Driver"            
## [45] "Horse Drawn Cab Owner"          "Commercial Lessor"             
## [47] "Amusement Device Permanent"     "General Vendor Distributor"    
## [49] "Bingo Game Operator"            "Laundry Jobber"                
## [51] "Tow Truck Exemption"            "Storage Warehouse"             
## [53] "Motion Picture Projectionist"   "Catering Establishment"        
## [55] "Locksmith Apprentice"           "Ticket Seller Business"        
## [57] "Cabaret"                        "Booting Company"               
## [59] "Sightseeing Bus"                "Secondhand Dealer - Firearms"
# How many licenses of each category were given out?
table(raw_license$`License Category`)
## 
##               Amusement Arcade     Amusement Device Permanent 
##                             82                            946 
##      Amusement Device Portable     Amusement Device Temporary 
##                           4106                            534 
##         Auction House Premises                     Auctioneer 
##                            180                           1423 
##            Bingo Game Operator                Booting Company 
##                             81                             18 
##                        Cabaret                       Car Wash 
##                            296                            472 
##         Catering Establishment              Commercial Lessor 
##                            127                             44 
##             Dealer In Products         Debt Collection Agency 
##                           3683                           6843 
## Electronic & Appliance Service    Electronic Cigarette Dealer 
##                           7977                           7543 
##              Electronics Store              Employment Agency 
##                          15554                           1465 
##                Games of Chance                    Gaming Cafe 
##                            502                            137 
##                         Garage         Garage and Parking Lot 
##                           5268                            588 
##                 General Vendor     General Vendor Distributor 
##                          17946                             41 
##    Home Improvement Contractor   Home Improvement Salesperson 
##                          59828                          52806 
##          Horse Drawn Cab Owner             Horse Drawn Driver 
##                            303                           1107 
##                      Laundries                        Laundry 
##                           8924                           3783 
##                 Laundry Jobber                      Locksmith 
##                           2707                          10530 
##           Locksmith Apprentice   Motion Picture Projectionist 
##                             75                            274 
##                      Newsstand                    Parking Lot 
##                           1573                           1699 
##                     Pawnbroker               Pedicab Business 
##                           2716                           1787 
##                 Pedicab Driver          Pool or Billiard Room 
##                           5582                            158 
##      Process Server Individual         Process Serving Agency 
##                           3955                            524 
##          Scale Dealer Repairer          Scrap Metal Processor 
##                            276                            568 
##       Secondhand Dealer - Auto   Secondhand Dealer - Firearms 
##                           2937                             13 
##    Secondhand Dealer - General                  Sidewalk Cafe 
##                          20125                           4498 
##                Sightseeing Bus              Sightseeing Guide 
##                             56                          13668 
##                   Special Sale               Stoop Line Stand 
##                            581                           9859 
##              Storage Warehouse   Temporary Street Fair Vendor 
##                            214                          17316 
##                  Ticket Seller         Ticket Seller Business 
##                           6736                             15 
##          Tobacco Retail Dealer              Tow Truck Company 
##                          87389                           2295 
##               Tow Truck Driver            Tow Truck Exemption 
##                          12422                            144
# What years does this data cover?
table(raw_license$Year)
## 
##  2000  2001  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011  2012 
##  5233  3195  3219  4311  4007  3618  3758  3369  3633  3260  3058  5736  5868 
##  2013  2014  2015  2016  2017  2018  2019  2020  2021 
##  4709 50038 51853 48963 48449 51292 51050 22593 28986
# Let's look at the data.

After we’ve looked at the data in the Viewer and explored a bit, we can begin to process the data.



Transform the data

We will:

  • select new tobacco license applications in Brooklyn that were given out in 2021
  • create a new column with street address for geocoding to use in this QGIS tool:



We’ll learn four dplyr functions that are the backbone of data transformation in R

  • mutate() - create or redefine a variable (column)
  • select() - subset variables (columns) by their names
  • filter() - subset observations (rows) by their values
  • rename() - rename the variables



# Process the data to select new tobacco license applications in Brooklyn that we given out in 2021
# and create a new column with street address for geocoding
tobacco <- raw_license %>% 
  filter(`License Category` == "Tobacco Retail Dealer" & 
           `Application or Renewal` == "Application" &
           Year == "2021" & 
           City == "BROOKLYN") %>% 
  mutate(address1 = paste(`Building Number`, Street, sep = " ")) %>% 
  rename(app_id = `Application ID`,
         bus_name = `Business Name`,
         street2 = `Street 2`,
         unit_type = `Unit Type`) %>% # rename variables we want to keep so all have no spaces
  select(app_id, bus_name, address1, City, State, Zip, street2, unit_type, Unit, Description) # select the variables to keep

Things to notice:

  • The tidyverse uses a pipe operator “%>%” to string together multiple commands
    • think of it as meaning “and then”
    • you’ll eventually love the “%>%”
    • keystroke = Cmd + Shift + M(Mac)/Ctrl + Shift + M(Windows)
  • To create a new variable mutate(var_name = equation)
    • you use the equal sign within dplyr functions
  • Within a code chunk, you don’t need to use the dollar sign ($) before a column name





Export data to your computer

Write out your processed csv and use it in QGIS!

write_csv(tobacco, "/Users/sarahodges/spatial/NewSchool/methods3_sara/class12/data/processed/bk_tobacco_licenses_2021.csv")