Outline

  • General rules of R Notebooks
  • RMarkdown syntax
  • R Notebook formatting
  • NYC Open Data
  • In-class exercise

General rules of R Notebooks

An R Notebook is a document that uses the R Markdown language to create an interactive document that can dsiplay output from R code. It includes formatted text and R code chunks that can be executed independently and interactively, with output visible immediately beneath the input.

When you create a new R Notebook File -> New File -> R Notebook you open an R Notebook template with simple instructions and examples.

  • Select the Source view to type in your text and code - I always work in Source mode

  • Select the Visual view to see what your Notebook will look like

  • Type text directly into the Notebook (we’ll discuss formatting later)

  • Insert R code into a Notebook in a chunk

    • windows: Ctrl + Alt + I
    • macOS: Cmd + Option + I


Preview your document

Preview or Knit as you add elements to your Notebook to see the output in the Viewer pane

  • Preview quickly renders your code into a notebook and displays it in the Viewer
  • Knit renders your code into a notebook in the publication format and displays it in the Viewer
    • For mysterious reasons, the Preview will often convert to Knit (both are fine, but Knitting takes longer)
    • I like to Preview or Knit on Save so that my notebook automatically updates

Preview your code

To test your R code, click the green arrow within a code chunk to Run it. The output will display below.

R Markdown syntax

AKA how to format text in R Markdown

You can type directly into your R Notebook with regular text. All formatting uses the R Markdown syntax.

  • See the R Markdown cheat sheet for details and examples. These are the most common:

  • IMPORTANT You must put a blank line after any header or it won’t register


R Notebook display options

As you have noticed, you don’t always want to display the output from your code. You can define how your code displays using chunk options in 2 ways:

  • globally: defines the default for every code chunk
  • by chunk: defines the display options for the chunk only

The most common chunk options are:

  • include = FALSE prevents code and results from appearing in the finished file. R Markdown still runs the code in the chunk, and the results can be used by other chunks.
  • echo = FALSE prevents code, but not the results from appearing in the finished file. This is a useful way to embed figures.
  • message = FALSE prevents messages that are generated by code from appearing in the finished file.
  • warning = FALSE prevents warnings that are generated by code from appearing in the finished.
  • fig.cap = "..." adds a caption to graphical results.

See the R Markdown Reference Guide for a complete list of knitr chunk options.

global display options

At the top of your R Notebook, under the title insert a code chunk with global options using the knitr::opts_chunk$setparameter. These will define the default display options for all code chunks in the Notebook. For example, the following :


  • names the chunk “setup”

  • uses include=FALSE : do not display this code chunk in the output

  • echo = T : defines global default to show the code chunk

  • quietly = T : defines global default to suppress messages

  • message = F : defines global default to suppress messages (different and more robust method)

This code chunk should be alone - it should not include any other code


code chunk display options

To change those global display options for one code chunk (like load_acs which is very noisy), add any chunk options within the {r} of your code chunk. For example, to display the output, but the not the code for a code chunk:


This chunk displays the following formatted table using the knitr package, but doesn’t display the code.

YAML

You can also add define pre-designed themes, add table of contents, and much more in the title section called the yaml (Yet Another Markup Language = stupid coding joke). The formatting is very finicky, when you are following an example make it look EXACTLY the same. It is complicated but we’ll learn a few.

See this chapter for more details on adjusting the html document in the title section.


  • Theme

    • add styles with pre-packaged themes
    • this notebook uses yeti, see here for more
  • Table of Contents

    • toc: true: create a table of contents _ toc_depth: 3: create entries in the Table of Contents for Header 3 and higher
    • toc_float: true: toc sticks to the side so you can always see it



NYC Open Data

NYC Open Data is free public data published by New York City agencies and other partners.

https://opendata.cityofnewyork.us/


There is a vast amount of data. You can download data from NYC Open Data, or use the RSocrata to import the data directly into R.

We’ll use an example from Boyan Kostadinov at City Tech.

In-class exercise:

The goal of this activity is to explore the 2021 DOE Middle School Directory data from the New York City Open Data Portal. This activity is an introduction to exploratory data analysis and visualizations using R and RStudio.

Install three new packages:

  • RSocrata: for loading the data from NYC Open Data
  • knitr: for printing tables
  • DT: for interactive tables in html format

Create a new R Notebook

  • name it nyc_middle_schools.Rmd
  • save it in main_data/scripts
  • load the tidyverse and the three new packages

Import middle school data

  • Go to the NYC Open Data Portal
  • Find the 2021 DOE Middle School Directory data
    • We can use the RSocrata package to load the data in the CSV format, directly from the API tab in the NYC Open Data Portal, using the unique identifier f6s7-vytj for the data.
    • Find the link for the data by clicking on the API button, and copying the link for the CSV version of the data

library(tidyverse)
library(RSocrata)
library(knitr)
library(DT)

# import the data directly into RStudio using url path
data <- read.socrata("https://data.cityofnewyork.us/resource/f6s7-vytj.csv")
## Warning in read.socrata("https://data.cityofnewyork.us/resource/f6s7-vytj.csv"):
## Dates and currency fields will be converted to character

Explore and process the data

Follow the instructions in Kostadinov’s R Notebook to select columns, and create summary statistics of the number of math professors, and correct some missing values.

Use this dataset to test out different ways to style and format an R Notebook

Create an R Notebook to share

  • Create a new R Notebook to share your middle school math teacher analysis.

    • Define global display options
    • Define display options for at least one code chunk
    • Use the kable or datatable functions to create formatted tables

Asssignment 11a: NYC Open Data analysis

Explore NYC Open Data to see what data is available. Select a dataset to import via R Socrata and answer a question about Corona or your final project. Create a R Notebook to share your analysis. Include:

  • Define global display options
  • Define display options for at least one code chunk
  • Use the kable or datatable functions to create at least one formatted tables
  • Create at least one plot or map

Some suggestions:

  • Active Tobacco Retailer Dealer Licenses:
    • import this dataset of all tobacco licenses in New York City
    • summarize by Borough or Neighborhood Tabulation Area to see how many tobacco licenses were granted in Corona vs other neighborhoods in the last year
  • Motor Vehicle Collisions - Crashes
    • import the dataset of all crashes in NYC
    • summarize by Borough or Zip Code to see how many crashes occurred in Corona vs other neighborhoods in the last year
    • filter the crashes in the Corona Zip codes use the Lat/Long to create a map of the crash locations in Corona (hint: use st_as_sf())

Add the link to 2 Notebooks on CANVAS: for your in-class assignment and for this analysis.



Assignment 11b: Final Project framing

Answer the “Beginning of the Project” framing questions for your final project:

  • What is the question I want to answer with this research?
  • What is the goal?
  • Who is the audience?
  • What will the audience expect the answer to be?
  • What data and specific research questions will answer the question?

Upload to the assignment on CANVAS.