Methods 1, Week 11

Outline

  • Course schedule

  • Homework Questions

  • Methods

  • R Notebooks

  • Interactivity

  • Assignment questions and overview

  • Homework

Course schedule

  • Class 11: Intro to R Notebooks / Interactive Plots
  • Class 12: R Notebooks / NYC Open Data
  • Class 13: Final Project Lab 1
  • Class 14: Final Project Lab 2
  • Class 15: Final Project Presentations (December 11th)
  • Wednesday, December 13th 11:59pm: Final Project due

Methods Section


The methods section for a data-driven research project should describe:

  • the purpose of the research
  • data sources used
  • equations and/or methods used to calculate results
  • justification of the methods
    • theoretical justification or other research that has used similar methods
  • explain how the results were analyzed and interpreted

Methods Examples

Dividing Lines: full and rather complicated methods section

Housing Affordability: Census-based, simple methods section

R Markdown and R Notebooks


R Markdown is a scripting language to create documents that run R scripts, but don’t require R or R Studio to open and view.

R Notebook is an R Markdown document with chunks that can be executed independently and interactively, with output visible immediately beneath the input.

They are excellent tools for collaboration and to publish results.

R Notebook Example

R Notebook of City Educational Attainment scatterplot

R Notebook

Create an R Notebook

An R Notebook is a document that uses the R Markdown language to create an interactive document that can dsiplay output from R code. It includes formatted text and R code chunks that can be executed independently and interactively, with output visible immediately beneath the input.

Create a new R Notebook File -> New File -> R Notebook

  • An R Notebook template opens with simple instructions and examples.

Text and Code in an R Notebook

  • Select the Source view to type in your text and code - I always work in Source mode

  • Select the Visual view to see what your Notebook will look like

  • Type text directly into the Notebook (we’ll discuss formatting later)

  • Insert R code into a Notebook in a chunk

    • windows: Ctrl + Alt + I
    • macOS: Cmd + Option + I

Preview your document

Preview or Knit as you add elements to your Notebook to see the output in the Viewer pane

  • Preview renders your text and any code that has already been run
  • Knit runs your code, renders all text and code, and displays your Notebook in the Viewer
    • For mysterious reasons, the Preview will often convert to Knit (both are fine, but Knitting takes longer)
    • I like to Preview or Knit on Save so that my notebook automatically updates

Format your text with R Markdown

You can type directly into your R Notebook with regular text. All formatting uses the R Markdown syntax.

Text formatting practice

  • Change the title of your R Notebook to Example Notebook
  • Replace the text between the title section and the r code with the following:
  • Click Preview to see what your Notebook looks like so far

Preview your code

To test your R code, click the green arrow within a code chunk to Run it. The output will display below.

  • RUN the code chunk that is in the R Notebook template
  • Click Preview to see what your Notebook looks like so far

In-class exercise overview

Create an R Notebook to publish the Percent West Indian ancestry map

  • Slowly, you will Copy / Paste your Percent West Indian ancestry script into the code section of your R Notebook
    • Add methods section
    • Add text to explain the map
  • Adjust it to display in the R Notebook nicely
  • Publish the R Notebook to RPubs

Detailed Instructions, part 1: prep

  • Open a new R Notebook file
  • Save it in part2/scripts as west_indian_map_notebook.Rmd
  • First, carefully read the instructions in the R Notebook
    • You will delete the instructions before you publish, but you can keep them to refer to as you work if you wish
  • Open your West Indian ancestry map script – you will add it to the Notebook slowly following the instructions in the following slides
    • you want the script to be neat and easy for someone else to follow
    • you may have to clean it up as you add it to a notebook

Detailed Instructions, part 2: notebook

  • Change the title to West Indian Ancestry in Brooklyn
    • be sure you don’t change ANYTHING other than the title name in this section
    • R Notebooks are VERY picky about this title section
  • Below the title section, type a short description of the analysis (1-3 sentences)
  • In an r code chunk insert the portion of your script where you load your packages
  • Click Preview to see what your Notebook looks like so far (you will likely see conflict messages)

Detailed Instructions, part 3: suppress messages

To hide the console messages in your R Notebook, add 2 parameters to your code chunk

  • message = FALSE: hides messages
  • results='hide': hides text output

See the R Markdown Reference Guide for a complete list of knitr chunk options.

Detailed Instructions, part 4: methods section

  • After the code chunk, type RETURN twice to create 2 blank lines, then type “### Methods”
    • The 3 hashes will format the word “Methods” as a header
  • Add lines of code to explain the methods for getting the data (see example below)

Detailed Instructions, part 5: import data

  • In a code chunk, insert the code from your script to import the ancestry data and process it
    • include the parameters to hide the messages and text output
    • don’t include the load_variables() part, it is not necessary to run the code and will be a distraction

Detailed Instructions, part 6: map it

  • After the code chunk, type RETURN twice to create 2 blank lines, then type “### Results”
  • Insert the code to create a map of census tracts with the Brooklyn boundary
  • Write a short description of what the visualization shows.

Detailed Instructions, part 7: add a tooltip for interactivity

You can add a tooltip to any ggplot to make it interactive using the plotly package.

  • A tooltip is a text box that opens when you hover over a data point in your ggplot

Install the plotly package and add it to your script in the packages code chunk:

See the next slides for how to add the tooltip to your map and display it with ggplotly.

Detailed Instructions, part 7.2: add a tooltip for interactivity

  • Give your ggplot a name (I named it west_indian_map)
  • Move your data definition and aes() to the geom_sf() function in your ggplot
  • Then, within the aes(), define the tooltip text in a new text parameter
    • Notice I am using the paste0() function to construct it
    • <br> makes a “break”, to go to a new line
  • I also changed the outline color to “transparent”

Detailed Instructions, part 7.3: add a tooltip for interactivity

Then, display it with the ggplotly() function:

Detailed Instructions, part 8: publish it

  • Preview your R Notebook and edit your text and code until it looks neat and readable
  • To Publish, Click the down arrow next to Preview or Knit
    • select Knit to HTML
  • In the Viewer, Click Publish
    • Select RPubs in the popup-window
  • It will prompt you to create a free account (create an account name you will want to share)
    • Follow the instructions and create a slug and descriptive name for published notebook
  • Copy the link to your published Notebook and paste it in Assignment 11a

Assignments

Assignment 11a

  • Complete the in-class assignment. Submit the url to your published R Notebook.

Assignment 11b

  • Publish an R Notebook with a census tract-level analysis of race in New York City described on the next slide.

OR

  • Publish an R Notebook with a housing affordability analysis of one state, one city or one borough of New York City.

OR

  • Publish an R Notebook with an analysis of your choice

Assignment 11b description, option 1

  • Publish an R Notebook with a census tract-level analysis of race in New York City from the decennial census. Download census data to create the following variables at the census-tract level:
    • Percent Hispanic or Latino
    • Percent Black-alone, not Hispanic or Latino
    • Percent Asian-alone, not Hispanic or Latino
    • Percent White-alone, not Hispanic or Latino

In the Notebook, include:

  • A short description of your analysis
  • The code to process the data
  • A methods section
  • A results section with: * A summary table that displays the percent of each racial category by county,
    • A map of each variable
      • Use a different color gradient for each variable
    • A description of the results of your analysis

Upload a link to your published R Notebook to Canvas

Assignment 11b description, option 2

  • Publish an R Notebook with a census tract-level analysis of housing affordability analysis of one state, one city or one borough of New York City similar to Assignment 8b. Calculate the affordability difference for every census tract to determine what parts of the state, city or borough are affordable.

In the Notebook, include:

  • A short description of your analysis
  • The code to process the data
  • A methods section
  • A results section with: * A map of census tracts colored by whether the median house is affordable to the median household. * A summary table that compares at least 3 variables in the tracts where they are affordable and not. * ex: Median Household Income, Percent Hispanic, Percent BIPOC
    • A description of the results of your analysis

Upload a link to your published R Notebook to Canvas