Methods 1, Week 1

PGUD 5160 DUE Methods 1

URBAN DATA ANALYSIS, MAPPING AND VISUALIZATION Section A | Fall 2022

Sara Hodges (she/her/hers) | hodgess@newschool.edu

Slack workspace: duemethods1fall2022.slack.com

Class: Mondays, 9am - 11:40am

Building: Parsons 2 W 13th St

Room: 1108

TODAY

  • Introduction
  • Class Description
    • Learning Goals
    • Assignments
    • Projects
  • Introductions
  • LAB
    • Install R and R Studio
    • Intro to R

Learning Goals

  • Develop research methods to analyze the urban world that acknowledge and moderate bias
  • Define thoughtful data-driven research projects
  • Combine qualitative and quantitative sources and methods
  • Create compelling and truthful visualizations to accompany your research

Technical Learning Goals

  • Learn R for data analysis and visualization.
  • Use it to enhance your research and advocacy.
    • Write analysis R scripts that are easy to understand
      • Clear analysis
      • Well-documented
      • Reproducible
  • Produce simple, effective data visualizations that make your analysis easy to understand and use for advocacy

Class Structure

Lecture and/or Reading Discussion
Homework Assignments Questions
Lab

  • Technical skill development
  • Mostly R

Canvas and Slack

  • Between class discussion/Share interesting stuff
  • Homework help

Weekly Assignments

Each week:

  • Readings
    • Theory
    • Technical instruction
  • Technical
    • Practice skills learned in the lab
    • Data analysis to enrich current or previous readings

Ongoing:

  • Research Journal

Final Project

Quantitative Research Project using R, on the topic of your choosing

  • Research Journal
  • Project Proposal
  • Publish results in a presentation, book, or e-book made with Quarto
  • Presentation on last day of class

Introductions - Sara

Introductions - Sara

Introductions - Sara

Introductions - Sara

My favorite visualization tools

Group Introductions

  • 7 Questions
  • We’ll create our first class dataset
  • 3 - 6 volunteers to collect the data
  • Link to the dataset

Group Introductions

  • Your name and preferred pronouns
  • The last state or country you lived in
  • When you moved to New York City area
  • Any particular research interests that you hope to explore in this class?
  • Have you used R before?
  • Have you used any other data analysis or visualization tool?
  • Name one small thing that made you happy in the last 6 months

LET’S TAKE A BREAK!

Intro to R and R Studio



From R for Data Science Hadley Wickham & Garrett Grolemund

Terms/Definitions

  • R: a programming language and software environment for statistical computing and graphics
  • RStudio: an application that helps you write in R in a user-friendly way
  • R script: an executable document with instructions in R to complete a task.
  • function: a set of statements organized together to perform a specific task.
  • Base R: the functions that come standard with your R installation.
  • R package: a collection of functions created by the community that other R users can use.

R, R Studio, and CRAN

R is free, and open source

  • anyone can create a package, though it has to be approved

CRAN: software repository for R + gatekeepers for new packages

  • supported by the R Foundation
  • team of volunteers that maintain R and manage new packages

RStudio: the company that created RStudio application

  • pays people to create new, comprehensive packages that people use a lot and trust

A few things

  • R scripts sometimes run slightly differently on different machines
    (different operating systems, R versions, package versions, etc)
  • You can google R, try it! (“r import csv”)
  • Other people have probably had the same problem as you or asked the same question
  • Style matters! Make it readable for future you, and for your colleagues
    • Indent for readability
    • Include comments explaining what you are doing/thinking
    • Include sources in your script
  • Learn R shortcuts (cheatsheet)

Installation


Install R

Install RStudio

Lecture Format: Quarto

I use Quarto to make slides and an e-book for this course.

  • It contains text and executable R code together

  • Text in a box is a code chunk - something you can execute in R

  • The output from the code chunk displays below the box

# this is a gray box
print("this is the output from the code chunk")
[1] "this is the output from the code chunk"
  • You can follow along with all of the lectures in R Studio on your computer.

R Studio Layout

The Console pane

The Console is where you can type code that executes immediately, and where you view the output.


Type into your console, and then press enter:

4 + 5
[1] 9
14 - 7
[1] 7

The Console Pane


Notice

  • Press return/enter on your keyboard to execute a command in the console
  • R keeps a history of your console commands
    • use up or down arrows to view previous commands

The Console Pane

Create new objects (variables):

variable_name <- value of the variable

<- is the ‘assignment operator’, it is like an equal sign

# create a variable to hold the value 9
calc1 <- 4 + 5

# create a variable to hold the value 7
calc2 <- 14 - 7

# add the two variables together
total <- calc1 + calc2

total 
[1] 16

The Console Pane

Notice

  • In R, create a new object with an assignment operator <-
    • Alt+- (Windows) / Option+- (Mac)
    • See all the keyboard shortcuts!
      • Alt+Shift+K (Windows) / Option+Shift+K (Mac)
  • A hash # tells R not to run that line of code
    • In R-speak, it’s called “comment out”
  • When you define an object, the console does not display the value
    • type the object name to return the current value
    • you can see all defined objects in the Environment pane
  • Object names must begin with a letter, and only contain letters, numbers, _(underscores) and .(periods).

The Console Pane


calc3 <- 4 + calc2
calc3
[1] 11
# try this one too
# calc4 <- 4 + seven


Notice

  • You can perform math operations on numbers and an object assigned as a number

The Source pane

Use the Source Pane to write scripts to save your work.

Or to open and run existing scripts.

In-class exercise: Create your first script

  • First, create a file structure so that everything is organized and everyone in the class has the same file structure. This will help a lot when you are helping each other!
  • If you haven’t already, create a folder for this class. Call it “methods1”.
  • Within your methods1 folder, create a new folder called “class1”.
  • Within your class1 folder, create a new folder called “data”.
  • In R Studio, create a new script
    • File > New File > R Script
    • At the top, type “# this is my first script” (make sure you include the hash #)
    • Save the script in your class1 folder as my_first_script.R
    • Create the same variables in your script as above
    • Make sure you add comments to describe your work
  • Run the script by clicking the down arrow next to RUN on the top-right of your Source Window
  • Save your script

Notice

  • You can save your work easily with a script
  • In R, a hash (#) comments out a line, meaning the the computer ignores it
    • Use # to explain your script as you go
    • You can comment before a line, or on the same line
  • There are lots of different ways to run your script
  • Place your cursor at the end of a line, Cmd+Return (Mac) / Ctrl+Return (Windows)
  • Place your cursor at the end of a line, Click RUN
  • Highlight the code to run, use keyboard shortcut or Click RUN
  • CMD-S/CNTRL-S is the keyboard shortcut to save - do it a lot

Projects

Projects are a good way to keep track of all of the files for a specific task or project. We’ll create projects for each class in this course.

In-class exercise: Create a project

  • Create a new project
  • File New Project
  • Existing Directory navigate to your class1 folder
  • Click Create Project

Notice

  • There is now a class1.Rproj file in your class1 folder
    • It keeps track of your file path and a few other things
  • Projects make it easy to use a shorter file path
  • Within a project, the Files window shows all of the files in your project

The Files Pane


There are lots of useful tabs in this pane

The Files Pane


The Files window is like file explorer

Notice

You should be in your class1 folder to see your script



The Plots window display charts and maps you create

The Files Pane


The Packages window lists the packages you have installed and provides a user interface to search for other packages and install them.

Packages are collections of functions and datasets developed by the R community to expand the things you can do in R.

  • Some, like the tidyverse, have become the backbone of analysis in R.

In-class exercise: Install your first package

  • Let’s install the tidyverse:
  • You can either click Install in the Packages pane and use the user interface to install the package. OR type into the Console:
  • install.packages('tidyverse')

The Files Pane

The Help window is where you learn about packages and functions.

Two ways to open documentation:

  • Help tab
    • In the Help Search bar, type readr and press enter
  • Console
    • type directly into the console:
    • ??readr
      • opens documentation about a function or package

The Environment pane


The Environment shows all of the objects that you have in your workspace



If you are following along, you should have at least 4 objects in your Environment.

In-class exercise: Import a CSV

  • Download the 2019 education dataset from EdBuild,
  • Move it to the data folder in your class1 folder
  • Load the tidyverse package into your environment (add this at the top of your script):
## Load packages necessary for this script
library(tidyverse)

Now we’ll import our first dataset into R using the read_csv function:

## import the education dataset for 2018 using read_csv from the readr package of the tidyverse
ed18 <- read_csv("data/full_data_19_geo_exc.csv")

Notice

  • You installed tidyverse on your computer earlier, but you can’t use it until you load it into your environment
    • you load all of the packages necessary for your script at the top of the script
  • The table is listed in your Environment
    • To view details about the data, click the down arrow next to the filename
  • Notice the messages in the Console
    • They provide information about the table and the process of importing it
    • The data type for each column was automatically determined

Data frames

Data tables are called dataframes in R.

  • When you read a table into your R Environment, all of changes that you make to the dataframe are not saved to your computer until you save the dataframe to your computer (in R we say write it out).

Let’s explore our first data frame by typing some functions in our script

## list all of the column names in the console
names(ed18)

# print information about each column 
glimpse(ed18)



## list all of the column names in the console
names(ed18)
 [1] "NCESID"               "state_id"             "State"               
 [4] "STATE_FIPS"           "NAME"                 "County"              
 [7] "CONUM"                "ENROLL"               "LRPP"                
[10] "SRPP"                 "SLRPP"                "LR"                  
[13] "SR"                   "SLR"                  "SRPP_cola"           
[16] "LRPP_cola"            "SLRPP_cola"           "dType"               
[19] "dUrbanicity"          "dOperational_schools" "dEnroll_district"    
[22] "dLEP"                 "dWhite"               "dBlack"              
[25] "dHispanic"            "dAsian_PI"            "dHawaiian_PI"        
[28] "dAmIndian_Aknative"   "d2plus_races"         "pctNonwhite"         
[31] "TPop"                 "StPop"                "StPov"               
[34] "StPovRate"            "MHI"                  "MPV"                 
[37] "sd_area"              "student_per_sq_mile"  "sdType"              
[40] "dIEP"                


# print information about each column 
glimpse(ed18)
Rows: 13,291
Columns: 40
$ NCESID               <chr> "0100240", "0100270", "0100300", "0101410", "0100…
$ state_id             <chr> "AL-001", "AL-002", "AL-003", "AL-133", "AL-004",…
$ State                <chr> "Alabama", "Alabama", "Alabama", "Alabama", "Alab…
$ STATE_FIPS           <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
$ NAME                 <chr> "Autauga County School District", "Baldwin County…
$ County               <chr> "Autauga County", "Baldwin County", "Barbour Coun…
$ CONUM                <chr> "01001", "01003", "01005", "01005", "01007", "010…
$ ENROLL               <dbl> 9094, 32267, 755, 5427, 3236, 7651, 1503, 1438, 3…
$ LRPP                 <dbl> 2268.749, 6040.382, 2903.311, 1389.165, 2054.079,…
$ SRPP                 <dbl> 6269.958, 5160.443, 7099.338, 5306.799, 7152.658,…
$ SLRPP                <dbl> 8538.707, 11200.824, 10002.649, 6695.965, 9206.73…
$ LR                   <dbl> 20632000, 194905000, 2192000, 7539000, 6647000, 1…
$ SR                   <dbl> 57019000, 166512000, 5360000, 28800000, 23146000,…
$ SLR                  <dbl> 77651000, 361417000, 7552000, 36339000, 29793000,…
$ SRPP_cola            <dbl> 6593.016, 5420.633, 7932.221, 5929.385, 7956.238,…
$ LRPP_cola            <dbl> 2385.645, 6344.939, 3243.923, 1552.140, 2284.849,…
$ SLRPP_cola           <dbl> 8978.661, 11765.572, 11176.144, 7481.525, 10241.0…
$ dType                <chr> "1-Regular local school district that is NOT a co…
$ dUrbanicity          <chr> "41-Rural: Fringe", "41-Rural: Fringe", "43-Rural…
$ dOperational_schools <dbl> 15, 45, 3, 6, 9, 17, 3, 4, 7, 7, 19, 2, 3, 7, 11,…
$ dEnroll_district     <dbl> 9094, 32267, 755, 5427, 3236, 7651, 1503, 1438, 3…
$ dLEP                 <dbl> 159, 1309, 49, 122, 51, 396, 97, 146, 8, 28, 167,…
$ dWhite               <dbl> 5952, 22769, 47, 2735, 2349, 6300, 989, 31, 1030,…
$ dBlack               <dbl> 2358, 3716, 592, 2232, 706, 101, 78, 1187, 1896, …
$ dHispanic            <dbl> 336, 3069, 108, 350, 121, 1181, 349, 216, 42, 92,…
$ dAsian_PI            <dbl> 204, 284, 3, 54, 10, 15, 17, 2, 15, 5, 54, 27, 7,…
$ dHawaiian_PI         <dbl> 7, 21, 1, 14, 1, 6, 2, NA, 2, 2, 20, 3, 1, 4, 1, …
$ dAmIndian_Aknative   <dbl> 30, 89, 2, 29, 9, 12, 5, 2, 2, NA, 25, 1, 4, 4, 7…
$ d2plus_races         <dbl> 207, 2319, 2, 13, 40, 36, 63, NA, 28, 18, NA, 3, …
$ pctNonwhite          <dbl> 0.34550253, 0.29435646, 0.93774834, 0.49603833, 0…
$ TPop                 <dbl> 55869, 223234, 12909, 11777, 22394, 51188, 6638, …
$ StPop                <dbl> 9688, 35515, 1622, 2151, 3311, 8816, 1001, 1541, …
$ StPov                <dbl> 1376, 4641, 671, 797, 808, 1843, 224, 591, 1024, …
$ StPovRate            <dbl> 0.1420314, 0.1306772, 0.4136868, 0.3705253, 0.244…
$ MHI                  <dbl> 58731, 58320, 32243, 33132, 47542, 50198, 39669, …
$ MPV                  <dbl> 154500, 197900, 65500, 128100, 92800, 125100, 156…
$ sd_area              <dbl> 604.4, 1609.9, 831.0, 73.5, 626.2, 634.8, 15.8, 6…
$ student_per_sq_mile  <dbl> 15.046, 20.043, 0.909, 73.837, 5.168, 12.053, 95.…
$ sdType               <chr> "uni", "uni", "uni", "uni", "uni", "uni", "uni", …
$ dIEP                 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…

View the dataframe by clicking on it in the Environment pane or typing View(ed18) in the Console.

NCESID state_id State STATE_FIPS NAME County CONUM ENROLL LRPP SRPP SLRPP LR SR SLR SRPP_cola LRPP_cola SLRPP_cola dType dUrbanicity dOperational_schools dEnroll_district dLEP dWhite dBlack dHispanic dAsian_PI dHawaiian_PI dAmIndian_Aknative d2plus_races pctNonwhite TPop StPop StPov StPovRate MHI MPV sd_area student_per_sq_mile sdType dIEP
0100240 AL-001 Alabama 1 Autauga County School District Autauga County 01001 9094 2268.749 6269.958 8538.707 20632000 57019000 77651000 6593.016 2385.645 8978.661 1-Regular local school district that is NOT a component of a supervisory union 41-Rural: Fringe 15 9094 159 5952 2358 336 204 7 30 207 0.3455025 55869 9688 1376 0.1420314 58731 154500 604.4 15.046 uni NA
0100270 AL-002 Alabama 1 Baldwin County School District Baldwin County 01003 32267 6040.382 5160.443 11200.824 194905000 166512000 361417000 5420.633 6344.939 11765.572 1-Regular local school district that is NOT a component of a supervisory union 41-Rural: Fringe 45 32267 1309 22769 3716 3069 284 21 89 2319 0.2943565 223234 35515 4641 0.1306772 58320 197900 1609.9 20.043 uni NA

Viewing a dataframe

Notice

  • You can Search or Filter when you view the dataframe
  • The data type is displayed in the head and in the Environment window
  • You can also open the data frame by clicking on the data frame name in the Environment pane

Basic data types in R

  • Numeric
    • Integers (whole numbers)
    • Doubles (fractions)
  • Character (string)
  • Logical (boolean) - TRUE or FALSE

Notice

  • The whole column is always the same type, if there is one character in a numeric column, the whole column will be type = character.
  • NA for missing value

In-class exercise: Create two new columns in our dataframe:

  • County Number = numeric version of the county number
  • Percent of students with Limited English Proficiency (LEP) = Number of students with LEP / Number of enrolled students
### Sometimes you want an id to be numeric instead of string
# Create a new numeric column for county number
ed18$county_num <- as.numeric(ed18$CONUM)

# Create a new numeric column - Percent of LEP students 
ed18$percent_lep <- ed18$dLEP/ed18$dEnroll_district

View the dataframe

NCESID state_id State STATE_FIPS NAME County CONUM ENROLL LRPP SRPP SLRPP LR SR SLR SRPP_cola LRPP_cola SLRPP_cola dType dUrbanicity dOperational_schools dEnroll_district dLEP dWhite dBlack dHispanic dAsian_PI dHawaiian_PI dAmIndian_Aknative d2plus_races pctNonwhite TPop StPop StPov StPovRate MHI MPV sd_area student_per_sq_mile sdType dIEP county_num percent_lep
0100240 AL-001 Alabama 1 Autauga County School District Autauga County 01001 9094 2268.749 6269.958 8538.707 20632000 57019000 77651000 6593.016 2385.645 8978.661 1-Regular local school district that is NOT a component of a supervisory union 41-Rural: Fringe 15 9094 159 5952 2358 336 204 7 30 207 0.3455025 55869 9688 1376 0.1420314 58731 154500 604.4 15.046 uni NA 1001 0.0174841
0100270 AL-002 Alabama 1 Baldwin County School District Baldwin County 01003 32267 6040.382 5160.443 11200.824 194905000 166512000 361417000 5420.633 6344.939 11765.572 1-Regular local school district that is NOT a component of a supervisory union 41-Rural: Fringe 45 32267 1309 22769 3716 3069 284 21 89 2319 0.2943565 223234 35515 4641 0.1306772 58320 197900 1609.9 20.043 uni NA 1003 0.0405678

Notice

Notice

  • The new column is added to the end of the data frame
  • You can sort ascending and descending by any column

Create a new dataframe of New York school districts

### Create a new dataframe for your New York
newyork18 <- subset(ed18, State == "New York")

# Calculate how many districts are there in New York?
ny_districts <- nrow(newyork18)
ny_districts
[1] 682


Notice

  • The value for ny_districts in your Environment has an L after it - this means its an integer

Assignments

See assignments for week 1 in Canvas.

  • Assignment 1a: Readings
  • Assignment 1b: Idea from Research Journals
  • Assignment 1c: Technical: Import and explore data in R Studio