Methods 1, Week 10

Outline

  • Course schedule

  • Research Project Framing and Visual Communication

  • Methods

  • R Notebooks

  • Assignment questions and overview

  • Homework

Course schedule

  • Class 10: Research Project Framing / Methods Sections
  • Class 11: R Notebooks / NYC Open Data
  • Class 12: Interactive Plots
  • Class 13: Final Project Lab 1
  • Class 14: Final Project Lab 2
  • Class 15: Final Project Presentations

Research project examples and frameworks


Methods Section


The methods section for a data-driven research project should describe:

  • the purpose of the research
  • data sources used
  • equations and/or methods used to calculate results
  • justification of the methods
    • theoretical justification or other research that has used similar methods
  • explain how the results were analyzed and interpreted

Methods Examples

Fractured: full and rather complicated methods section

Housing Affordability: Census-based, simple methods section

R Markdown and R Notebooks


R Markdown is a scripting language to create documents that run R scripts, but don’t require R or R Studio to open and view.

R Notebook is an R Markdown document with chunks that can be executed independently and interactively, with output visible immediately beneath the input.

They are excellent tools for collaboration and to publish results.

R Notebook Example

R Notebook of City Educational Attainment scatterplot

R Notebook

Create an R Notebook

  • File > New File > R Notebook
  • You can add normal text for humans to read in the white areas
  • You can add R code in a code chunk
    • the code and the output will be displayed below

In-class exercise overview

Create an R Notebook to publish an analysis and visualization you have already completed.

  • Select a script that you want to share
  • Create an R Notebook that contains the code, with extra description for your reader
  • Publish the R Notebook to RPubs
  • Post your notebook in a discussion on Canvas by Wednesday, November 9
  • See detailed instructions in the next 3 slides

Detailed Instructions, part 1: prep

  • First, carefully read the instructions in the R Notebook
    • You will delete those instructions before you publish, but you can keep them to refer to as you work
  • Pick your favorite recent script that you want to try out as an R Notebook
    • you want the script to be neat and easy for someone else to follow
    • you may have to clean it up as you add it to a notebook
  • Save the R Notebook script in main_data/scripts
    • name it something that describes your analysis!

Detailed Instructions, part 2: notebook

  • Change the title
    • be sure you don’t change ANYTHING other than the title name in this section
    • R Notebooks are VERY picky about this title section
  • Below the title section, type a short description of your analysis (2-3 sentences)
  • Click Preview to see what your Notebook looks like so far
  • Insert the data processing portion of your script in a code chunk
  • After the code chunk, type “### OUTPUT”
    • The 3 hashes will format the work “output” as a header
  • In a code chunk, insert the code to create the summary table or visualization from your script
  • After the code chunk, write a short description of what the visualization shows
  • Preview your R Notebook and edit your text and code until it looks neat and readable
  • To Publish, you will Knit to HTML, and then Publish
    • You will be prompted to create a free RPubs account when you publish
  • Go to the next slide to see the instructions for publishing your R Notebook

Detailed Instructions, part 3: publish

  • Click the down arrow next to the Preview button, select Knit to HTML
  • In the Viewer, Click Publish
    • Select RPubs in the popup-window
  • It will prompt you to create a free account (create an account name you will want to share)
    • Follow the instructions and create a slug and descriptive name for published notebook
  • Copy the link to your published Notebook and paste it in Assignment 10A by Wednesday, November 9

Assignments

Assignment 10a

  • Complete the in-class assignment by EOD Wednesday, November 9
  • Review a classmates R Notebook and reply to their script with any questions, comments, or corrections
    • Be kind and inquisitive! Start with something that is interesting about the analysis. Ask at least one question. If you see an error, politely point it out.

Assignment 10b

  • Publish an R Notebook with a census tract-level analysis of race in New York City described on the next slide.

OR

  • Publish an R Notebook with a housing affordability analysis of one state, one city or one borough of New York City.

Assignment 10b description, option 1

  • Publish an R Notebook with a census tract-level analysis of race in New York City from the decennial census. Download census data to create the following variables at the census-tract level:
    • Percent Hispanic or Latino
    • Percent Black-alone, not Hispanic or Latino
    • Percent Asian-alone, not Hispanic or Latino
    • Percent White-alone, not Hispanic or Latino

In the Notebook, include:

  • A short description of your analysis
  • The code to process the data
  • A methods section
  • A results section with: * A summary table that displays the percent of each racial category by county,
    • A map of each variable
      • Use a different color grqadient for each variable
    • A description of the results of your analysis

Upload a link to your published R Notebook to Canvas

Assignment 10b description, option 2

  • Publish an R Notebook with a census tract-level analysis of housing affordability analysis of one state, one city or one borough of New York City similar to Assignment 7b. Calculate the affordability difference for every census tract to determine what parts of the state, city or borough are affordable.

In the Notebook, include:

  • A short description of your analysis
  • The code to process the data
  • A methods section
  • A results section with: * A map of census tracts Colored by whether the median house is affordable to the median household. * A summary table that compares at least 3 variables in the tracts where they are affordable and not. * ex: Median Household Income, Percent Hispanic, Percent BIPOC
    • A description of the results of your analysis

Upload a link to your published R Notebook to Canvas