Outline
- Homework
- RMarkdown syntax
- R Notebook formatting
- NYC Open Data
- In-class exercise
R Notebook display options
As you have noticed, you don’t always want to display the output from
your code. You can define how your code displays using chunk options in
2 ways:
- globally: defines the default for every code chunk
- by chunk: defines the display options for the chunk
only
The most common chunk options are:
include = FALSE
prevents code and results from
appearing in the finished file. R Markdown still runs the code in the
chunk, and the results can be used by other chunks.
echo = FALSE
prevents code, but not the results from
appearing in the finished file. This is a useful way to embed
figures.
message = FALSE
prevents messages that are generated by
code from appearing in the finished file.
warning = FALSE
prevents warnings that are generated by
code from appearing in the finished.
fig.cap = "..."
adds a caption to graphical
results.
See the R Markdown Reference Guide for a complete list of
knitr chunk options.
global display options
At the top of your R Notebook, under the title insert a code chunk
with global options using the
knitr::opts_chunk$set
parameter. These will define the
default display options for all code chunks in the Notebook. For
example, the following :

names the chunk “setup”
uses include=FALSE
: do not display this code chunk
in the output
echo = T
: defines global default to show the code
chunk
quietly = T
: defines global default to suppress
messages
message = F
: defines global default to suppress
messages (different and more robust method)
This code chunk should be alone - it should not include any other
code
code chunk display options
To change those global display options for one code chunk (like
load_acs which is very noisy), add any chunk options within the
{r}
of your code chunk. For example, to display the output,
but the not the code for a code chunk:

This chunk displays the following formatted table using the
knitr
package, but doesn’t display the code.

YAML
You can also add define pre-designed themes, add table of contents,
and much more in the title section called the yaml (Yet Another
Markup Language = stupid coding joke). The formatting is very finicky,
when you are following an example make it look EXACTLY
the same. It is complicated but we’ll learn a few.
See this chapter for more details on adjusting the html
document in the title section.
Theme
- add styles with pre-packaged themes
- this notebook uses yeti, see here for more
Table of Contents
toc: true
: create a table of contents _
toc_depth: 3
: create entries in the Table of Contents for
Header 3 and higher
toc_float: true
: toc sticks to the side so you can
always see it

NYC Open Data
NYC Open Data is free public data published by New York City agencies
and other partners.
https://opendata.cityofnewyork.us/
There is a vast amount of data. You can download data from NYC Open
Data, or use the RSocrata
to import the data directly into
R.
We’ll use an example from Boyan Kostadinov at City Tech.
In-class exercise:
The goal of this activity is to explore the 2021 DOE Middle School Directory data from the New
York City Open Data Portal. This activity is an introduction to
exploratory data analysis and visualizations using R and RStudio.
Install three new packages:
RSocrata
: for loading the data from NYC Open Data
knitr
: for printing tables
DT
: for interactive tables in html format
Create a new R Notebook
- name it
nyc_middle_schools.Rmd
- save it in
part2/scripts
- load the tidyverse and the three new packages
Import middle school data
- Go to the NYC Open Data Portal
- Find the 2021 DOE Middle School Directory data
- We can use the RSocrata package to load the data in the CSV format,
directly from the API tab in the NYC Open Data Portal, using the unique
identifier f6s7-vytj for the data.
- Find the link for the data by clicking on the API button, and
copying the link for the CSV version of the data

library(tidyverse)
library(RSocrata)
library(knitr)
library(DT)
# import the data directly into RStudio using url path
data <- read.socrata("https://data.cityofnewyork.us/resource/f6s7-vytj.csv")
## Warning in
## read.socrata("https://data.cityofnewyork.us/resource/f6s7-vytj.csv"): Dates and
## currency fields will be converted to character
Explore and process the data
Follow the instructions in Kostadinov’s R Notebook to select columns, and
create summary statistics of the number of math professors, and correct
some missing values.
Use this dataset to test out different ways to style and format an R
Notebook
Create an R Notebook to share
Asssignment 12a: NYC Open Data analysis
Explore NYC Open Data to see what data is available. Select
a dataset to import via R Socrata and answer a research question about
New York City or related to your final project. Create a R Notebook to
share your analysis. Include:
- Define global display options
- Define display options for at least one code chunk
- Use the
kable
or datatable
functions to
create at least one formatted tables
- Create at least one plot or map
Some suggestions:
- Noise Complaints on 311 on one week:
- import this dataset of all 311 calls in New York City for one
month
- filter to noise complaints
- summarize by Borough or Neighborhood Tabulation Area to see how many 311
calls to report a noise complaint in Sunset Park vs other
neighborhoods
- Motor Vehicle Collisions - Crashes
- import the dataset of all crashes in NYC
- summarize by Borough or Zip Code to see how many crashes occurred in
one neighbohood vs other neighborhoods in the last year
- filter the crashes in the Zip codes in your chosen neighborhood and
use the Lat/Long to create a map of the crash locations in one
neighbohood (hint: use
st_as_sf()
)
Add the link to 2 Notebooks on CANVAS: for your in-class assignment
and for this analysis.
