Methods 1, Week 1

PGUD 5160 DUE Methods 1

URBAN DATA ANALYSIS, MAPPING AND VISUALIZATION Section A | Fall 2024

Sara Hodges (she/her/hers) | hodgess@newschool.edu

Class: Mondays, 9am - 11:40am

Building: Parsons 2 W 13th St

Room: 1108

TODAY

  • Introduction
  • Class Description
    • Learning Goals
    • Assignments
    • Projects
  • Introductions
  • LAB
    • Install R and R Studio
    • Intro to R

Learning Goals

  • Develop research methods to analyze the urban world that acknowledge and moderate bias
  • Define thoughtful data-driven research projects
  • Combine qualitative and quantitative sources and methods
  • Create compelling and truthful visualizations to accompany your research

Technical Learning Goals

  • Learn R for data analysis and visualization.
  • Use it to enhance your research and advocacy.
    • Write analysis R scripts that are easy to understand
      • Clear analysis
      • Well-documented
      • Reproducible
  • Produce simple, effective data visualizations that make your analysis easy to understand and use for advocacy

Class Structure

Lecture and/or Reading Discussion
Homework Assignments Questions
Lab

  • Technical skill development
  • Mostly R

Canvas

  • Between class discussion/Share interesting stuff
  • Homework help

Weekly Assignments

Each week:

  • Readings
    • Theory
    • Technical instruction
  • Technical
    • Practice skills learned in the lab
    • Data analysis to enrich current or previous readings

Ongoing:

  • Research Journal

Keep Up With Assignments

  • Assignments build upon each other
  • We go over assignments at the beginning of each class
  • There are resources to help when you struggle with assignments
    • in-class lab time
    • your classmates
    • me
  • If you make an attempt, I am a very generous grader
  • I will deduct 20% if the assignment is late

Final Project

Quantitative Research Project using R, on the topic of your choosing

  • Research Journal
  • Project Proposal
  • Publish results in a presentation, book, or e-book made with Quarto
  • Presentation on last day of class

Final Project Examples

Final Project Examples

Introductions - Sara

Introductions - Sara

Introductions - Sara

Introductions - Sara

My favorite visualization tools

Group Introductions

  • Your name and preferred pronouns
  • The last state or country you lived in
  • When you moved to New York City area
  • Any particular research interests that you hope to explore in this class?
  • Have you used R before?
  • Have you used any other data analysis or visualization tool?
  • Name one small thing that made you happy in the last 6 months

Intro to R and R Studio

R is a powerful programming language and software environment to handle data. You will use it to:

  • manipulate your data so that it is useful in answering research questions
  • learn things by exploring your data
  • learn things by performing simple analyses
  • explain what you have learned to others with tables, charts, and visualizations
  • make your analysis repeatable by you and others

Working with data



From R for Data Science Hadley Wickham & Garrett Grolemund

Terms/Definitions

  • R: a programming language and software environment for statistical computing and graphics
  • RStudio: an application that helps you write in R in a user-friendly way
  • R script: an executable document with instructions in R to complete a task.
  • function: a set of statements organized together to perform a specific task.
  • Base R: the functions that come standard with your R installation.
  • R package: a collection of functions created by the community that other R users can use.

R, R Studio, and CRAN

R is free, and open source

  • anyone can create a package, though it has to be approved

CRAN: software repository for R + gatekeepers for new packages

  • supported by the R Foundation
  • team of volunteers that maintain R and manage new packages

RStudio: the company that created RStudio application

  • pays people to create new, comprehensive packages that people use a lot and trust

A few things

  • R scripts sometimes run slightly differently on different machines
    (different operating systems, R versions, package versions, etc)
  • You can google R, try it! (“r import csv”)
  • Other people have probably had the same problem as you or asked the same question
  • Style matters! Make it readable for future you, and for your colleagues
    • Indent for readability
    • Include comments explaining what you are doing/thinking
    • Include sources in your script
  • Learn R shortcuts (cheatsheet)

AI

  • AI can be very helpful to learn R
  • It is not very helpful for providing the right code when you don’t understand
  • DO use it to explain what a function is or why something works
  • DO NOT use it to write using the R language
    • it is often wrong
    • or worse, nearly right which makes you spend a bunch of time trouble-shooting it’s code rather than learning the language

Installation


Install R

Install RStudio

LET’S TAKE A BREAK!

Lecture Format: Quarto

I use Quarto to make slides and an e-book for this course.

  • It contains text and executable R code together

  • Text in a box is a code chunk - something you can execute in R

  • The output from the code chunk displays below the box

# this is a gray box
print("this is the output from the code chunk")
[1] "this is the output from the code chunk"
  • You can follow along with all of the lectures in R Studio on your computer.

R Studio Layout

The Console pane

The Console is where you can type code that executes immediately, and where you view the output.


Type into your console, and then press enter:

4 + 5
[1] 9
14 - 7
[1] 7

The Console Pane


Notice

  • Press return/enter on your keyboard to execute a command in the console
  • R keeps a history of your console commands
    • use up or down arrows to view previous commands

The Console Pane

Create new objects (variables):

variable_name <- value of the variable

<- is the ‘assignment operator’, it is like an equal sign

# create a variable to hold the value 9
calc1 <- 4 + 5

# create a variable to hold the value 7
calc2 <- 14 - 7

# add the two variables together
total <- calc1 + calc2

total 
[1] 16

The Console Pane

Notice

  • In R, create a new object with an assignment operator <-
    • Alt+- (Windows) / Option+- (Mac)
    • See all the keyboard shortcuts!
      • Alt+Shift+K (Windows) / Option+Shift+K (Mac)
  • A hash # tells R not to run that line of code
    • In R-speak, it’s called “comment out”
  • When you define an object, the console does not display the value
    • type the object name to return the current value
    • you can see all defined objects in the Environment pane
  • Object names must begin with a letter, and only contain letters, numbers, _(underscores) and .(periods).

The Console Pane


calc3 <- 4 + calc2
calc3
[1] 11
# try this one too
# calc4 <- 4 + seven


Notice

  • You can perform math operations on numbers and an object assigned as a number

The Source pane

Use the Source Pane to write scripts to save your work.

Or to open and run existing scripts.

In-class exercise: Create your first script

First, we will set up our files so that they are organized and everyone in the class has the same file structure.

  • Download the methods1 folder which contains the file structure and all of the data that we will all use for the first few weeks of class.
    • Move the methods1 folder from Downloads to the location you will keep your methods1 files (Choose wisely! It will be a major pain if you move this folder later)
-   Right-click the methods1 folder to download to your computer
-   Open two File Explorer(windows) or Finder(mac) panes
-   In one navigate to your Downloads Folder
    -   Double-click on the *methods1-xxxxx.zip* file to unzip the files - you will see a methods1 folder in your Downloads folder now
-   In the other navigate to the place that you want your methods1 work to live 
-   Drag the methods1 folder from Downloads
  • In R Studio, create a new script
    • File > New File > R Script
    • At the top, type “# this is my first script” (make sure you include the hash #)
    • Save the script in your methods1/part1/scripts folder as my_first_script.R
    • Create the same variables (cal1, cal2, etc) in your script
    • Run the script by clicking the down arrow next to RUN on the top-right of your Source Window
  • Save your script

Notice

  • You can save your work easily with a script
  • In R, a hash (#) comments out a line, meaning the the computer ignores it
    • Use # to explain your script as you go
    • You can comment before a line, or on the same line
  • There are lots of different ways to run your script
  • Place your cursor at the end of a line, Cmd+Return (Mac) / Ctrl+Return (Windows)
  • Place your cursor at the end of a line, Click RUN
  • Highlight the code to run, use keyboard shortcut or Click RUN
  • CMD-S/CNTRL-S is the keyboard shortcut to save - do it a lot

Projects

Projects are a good way to keep track of all of the files for a specific task or project. We’ll create projects for each class in this course.

In-class exercise: Create a project

  • Create a new project
  • File New Project
  • Existing Directory navigate to your part1 folder
  • Click Create Project

Notice

  • There is now a part1.Rproj file in your part1 folder
    • It keeps track of your file path and a few other things
  • Projects make it easy to use a shorter file path
  • Within a project, the Files window shows all of the files in your project

The Files Pane


There are lots of useful tabs in this pane

The Files Pane


The Files window is like file explorer

Notice

You should be in your part1 folder to see your script



The Plots window display charts and maps you create

The Files Pane


The Packages window lists the packages you have installed and provides a user interface to search for other packages and install them.

Packages are collections of functions and datasets developed by the R community to expand the things you can do in R.

  • Some, like the tidyverse, have become the backbone of analysis in R.

In-class exercise: Install your first package

  • Let’s install the tidyverse:
  • You can either click Install in the Packages pane and use the user interface to install the package. OR type into the Console:
  • install.packages('tidyverse')

The Files Pane

The Help window is where you learn about packages and functions.

Two ways to open documentation:

  • Help tab
    • In the Help Search bar, type readr and press enter
  • Console
    • type directly into the console:
    • ??readr
      • opens documentation about a function or package

The Environment pane


The Environment shows all of the objects that you have in your workspace



If you are following along, you should have at least 4 objects in your Environment.

In-class exercise: Import a CSV

  • Open a new script
  • File > New File > R Script
  • Save it in your methods1/part1/scripts folder as explore_student_demographics.R
  • Load the tidyverse package into your environment
    • add this at the top of your script:
## Load packages necessary for this script
library(tidyverse)

Now we’ll import our first dataset into R using the read_csv function:

## import the education dataset for 2022 using read_csv from the readr package of the tidyverse
ed22 <- read_csv("data/raw/school_district_demographics_2022.csv")

Notice

  • You installed tidyverse on your computer earlier, but you can’t use it until you load it into your environment
    • you load all of the packages necessary for your script at the top of the script
  • The table is listed in your Environment
    • To view details about the data, click the down arrow next to the filename
  • Notice the messages in the Console
    • They provide information about the table and the process of importing it
    • The data type for each column was automatically determined

Data frames

Data tables are called dataframes in R.

  • When you read a table into your R Environment, all of changes that you make to the dataframe are not saved to your computer until you save the dataframe to your computer (in R we say write it out).

Let’s explore our first data frame by typing some functions in our script

## list all of the column names in the console
names(ed22)

# print information about each column 
glimpse(ed22)



## list all of the column names in the console
names(ed22)
 [1] "district_id"          "district"             "state"               
 [4] "postal"               "county"               "conum"               
 [7] "enroll"               "native_enroll"        "aapi_enroll"         
[10] "latinx_enroll"        "black_enroll"         "white_enroll"        
[13] "hawpi_enroll"         "two_plus_race_enroll" "pct_bipoc"           
[16] "fte"                 


# print information about each column 
glimpse(ed22)
Rows: 13,025
Columns: 16
$ district_id          <chr> "1700105", "2700106", "4500690", "5500030", "4807…
$ district             <chr> "A-C CENTRAL CUSD 262", "A.C.G.C. PUBLIC SCHOOL D…
$ state                <chr> "Illinois", "Minnesota", "South Carolina", "Wisco…
$ postal               <chr> "IL", "MN", "SC", "WI", "TX", "CA", "ID", "MS", "…
$ county               <chr> "Cass County", "Meeker County", "Abbeville County…
$ conum                <chr> "17017", "27093", "45001", "55019", "48217", "060…
$ enroll               <dbl> 369, 889, 2946, 772, 283, 18889, 690, 1043, 3262,…
$ native_enroll        <dbl> NA, 4, 3, 1, 0, 41, 5, 0, 75, 267, 0, 6, 37, 2, 1…
$ aapi_enroll          <dbl> NA, 6, 11, NA, 0, 6641, 1, 9, 52, 191, 1, 4, 186,…
$ latinx_enroll        <dbl> 6, 72, 48, 476, 40, 8583, 420, 0, 1152, 334, 431,…
$ black_enroll         <dbl> 4, 1, 984, 8, 4, 1414, 6, 1005, 34, 100, 14, 6, 1…
$ white_enroll         <dbl> 351, 770, 1831, 278, 234, 979, 245, 13, 1695, 326…
$ hawpi_enroll         <dbl> NA, NA, 2, NA, 0, 98, 1, 0, 8, 19, 0, NA, 30, NA,…
$ two_plus_race_enroll <dbl> 8, 36, 67, 9, 5, 1133, 12, 16, 246, 246, 11, 57, …
$ pct_bipoc            <dbl> 0.049, 0.134, 0.378, 0.640, 0.173, 0.948, 0.645, …
$ fte                  <dbl> 41.50, 68.18, 222.87, 58.97, 23.56, 842.75, 50.11…

View the dataframe by clicking on it in the Environment pane or typing View(ed22) in the Console.

district_id district state postal county conum enroll native_enroll aapi_enroll latinx_enroll black_enroll white_enroll hawpi_enroll two_plus_race_enroll pct_bipoc fte
1700105 A-C CENTRAL CUSD 262 Illinois IL Cass County 17017 369 NA NA 6 4 351 NA 8 0.049 41.50
2700106 A.C.G.C. PUBLIC SCHOOL DISTRICT Minnesota MN Meeker County 27093 889 4 6 72 1 770 NA 36 0.134 68.18

Viewing a dataframe

Notice

When you view the dataframe you can:

  • Search - type within the white box with a magnifying glass to view rows with your search terms in any column
  • Filter - Click the Filter button to view rows by filtering one column
  • Sort - Click on any column to sort ascending or descending

In-class exercise: Explore data and metadata

You should always look at the metadata (information about your dataset) so that you understand what you’re looking at and the limitations of the data.

Import data/raw/school_district_demographics_metadata.xlsx to see the definitions of each column.

## import the education dataset for 2022 using read_csv from the readr package of the tidyverse
ed22_meta <- readxl::read_excel("data/raw/school_district_demographics_metadata.xlsx")
  • What does fte mean and what does that variable count?
  • What is the variable conum?

Now look at the ed22 dataframe to answer some questions by filtering and sorting the dataframe.

  • What district has the second highest enrollment?
  • How many districts have “Central” in their name?

Assignments

See assignments for week 1 in Canvas.

  • Assignment 1a: Readings
  • Assignment 1b: Idea from Research Journals
  • Assignment 1c: Technical: Import and explore data in R Studio