Outline
- Homework
- RMarkdown syntax
- R Notebook formatting
- NYC Open Data
- In-class exercise
R Notebook display options
As you have noticed, you don’t always want to display the output from
your code. You can define how your code displays using chunk options in
2 ways:
- globally: defines the default for every code chunk
- by chunk: defines the display options for the chunk
only
The most common chunk options are:
include = FALSE
prevents code and results from
appearing in the finished file. R Markdown still runs the code in the
chunk, and the results can be used by other chunks.
echo = FALSE
prevents code, but not the results from
appearing in the finished file. This is a useful way to embed
figures.
message = FALSE
prevents messages that are generated by
code from appearing in the finished file.
warning = FALSE
prevents warnings that are generated by
code from appearing in the finished.
fig.cap = "..."
adds a caption to graphical
results.
See the R Markdown Reference Guide for a complete list of
knitr chunk options.
global display options
At the top of your R Notebook, under the title insert a code chunk
with global options using the
knitr::opts_chunk$set
parameter. These will define the
default display options for all code chunks in the Notebook. For
example, the following :

names the chunk “setup”
uses include=FALSE
: do not display this code chunk
in the output
echo = T
: defines global default to show the code
chunk
quietly = T
: defines global default to suppress
messages
message = F
: defines global default to suppress
messages (different and more robust method)
This code chunk should be alone - it should not include any other
code
code chunk display options
To change those global display options for one code chunk (like
load_acs which is very noisy), add any chunk options within the
{r}
of your code chunk. For example, to display the output,
but the not the code for a code chunk:

This chunk displays the following formatted table using the
knitr
package, but doesn’t display the code.

YAML
You can also add define pre-designed themes, add table of contents,
and much more in the title section called the yaml (Yet Another
Markup Language = stupid coding joke). The formatting is very finicky,
when you are following an example make it look EXACTLY
the same. It is complicated but we’ll learn a few.
See this chapter for more details on adjusting the html
document in the title section.
Theme
- add styles with pre-packaged themes
- this notebook uses yeti, see here for more
Table of Contents
toc: true
: create a table of contents _
toc_depth: 3
: create entries in the Table of Contents for
Header 3 and higher
toc_float: true
: toc sticks to the side so you can
always see it

NYC Open Data
NYC Open Data is free public data published by New York City agencies
and other partners.
https://opendata.cityofnewyork.us/
There is a vast amount of data. You can download data from NYC Open
Data, or use the RSocrata
to import the data directly into
R.
We’ll use an example from Boyan Kostadinov at City Tech.
In-class exercise:
The goal of this activity is to explore the 2021 DOE Middle School Directory data from the New
York City Open Data Portal. This activity is an introduction to
exploratory data analysis and visualizations using R and RStudio.
Install three new packages:
RSocrata
: for loading the data from NYC Open Data
knitr
: for printing tables
DT
: for interactive tables in html format
Create a new R Notebook
- name it
nyc_middle_schools.Rmd
- save it in
part2/scripts
- load the tidyverse and the three new packages
Import middle school data
- Go to the NYC Open Data Portal
- Find the 2021 DOE Middle School Directory data
- We can use the RSocrata package to load the data in the CSV format,
directly from the API tab in the NYC Open Data Portal, using the unique
identifier f6s7-vytj for the data.
- Find the link for the data by clicking on the API button, and
copying the link for the CSV version of the data

library(tidyverse)
library(RSocrata)
library(knitr)
library(DT)
# import the data directly into RStudio using url path
data <- read.socrata("https://data.cityofnewyork.us/resource/f6s7-vytj.csv")
## Warning in
## read.socrata("https://data.cityofnewyork.us/resource/f6s7-vytj.csv"): Dates and
## currency fields will be converted to character
Explore and process the data
Follow the instructions in Kostadinov’s R Notebook to select columns, and
create summary statistics of the number of math professors, and correct
some missing values.
Use this dataset to test out different ways to style and format an R
Notebook
Create an R Notebook to share
Asssignment 12a: NYC Open Data analysis
Explore NYC Open Data to see what data is available. Select
a dataset to import via R Socrata and answer a research question about
New York City or related to your final project. Create a R Notebook to
share your analysis. Include:
- Define global display options
- Define display options for at least one code chunk
- Use the
kable
or datatable
functions to
create at least one formatted tables
- Create at least one plot or map
Some suggestions:
- Noise Complaints on 311 on one week:
- import this dataset of all 311 calls in New York City for one
month
- filter to noise complaints
- summarize by Borough or Neighborhood Tabulation Area to see how many 311
calls to report a noise complaint in Sunset Park vs other
neighborhoods
- Motor Vehicle Collisions - Crashes
- import the dataset of all crashes in NYC
- summarize by Borough or Zip Code to see how many crashes occurred in
one neighbohood vs other neighborhoods in the last year
- filter the crashes in the Zip codes in your chosen neighborhood and
use the Lat/Long to create a map of the crash locations in one
neighbohood (hint: use
st_as_sf()
)
Add the link to 2 Notebooks on CANVAS: for your in-class assignment
and for this analysis.
---
title: "R Notebooks & NYC Open Data"
output:
  html_document:
    theme: yeti
    toc: true
    toc_depth: 3
    toc_float: true
    code_download: true
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = T,
                      quietly = T,
                      message = F)
```

## Outline

-   Homework
-   RMarkdown syntax
-   R Notebook formatting
-   NYC Open Data
-   In-class exercise

</br>
</br>

## Homework Questions

</br>

R Notebook example: [Unemployment in New Jersey](https://rpubs.com/spatialcollections/nj-unemployment){target="_blank"}



### <span style="color:purple">Who else wants to share their RNotebook?</span>

</br>
</br></br>

## R Notebook display options

As you have noticed, you don't always want to display the output from your code. You can define how your code displays using chunk options in 2 ways:

-   *globally*: defines the default for every code chunk
-   *by chunk*: defines the display options for the chunk only

The most common chunk options are:

-   `include = FALSE` prevents code and results from appearing in the finished file. R Markdown still runs the code in the chunk, and the results can be used by other chunks.
-   `echo = FALSE` prevents code, but not the results from appearing in the finished file. This is a useful way to embed figures.
-   `message = FALSE` prevents messages that are generated by code from appearing in the finished file.
-   `warning = FALSE` prevents warnings that are generated by code from appearing in the finished.
-   `fig.cap = "..."` adds a caption to graphical results.

See the [R Markdown Reference Guide](https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf?_ga=2.173068196.662875585.1668357439-2095049193.1630274735){target="_blank"} for a complete list of knitr chunk options.

### global display options

At the top of your R Notebook, under the title insert a code chunk with global options using the `knitr::opts_chunk$set`parameter. These will define the default display options for all code chunks in the Notebook. For example, the following :

<br>

![](img/global.png) 

- names the chunk "setup"

- uses `include=FALSE` : do not display this code chunk in the output

- `echo = T` : defines global default to show the code chunk

- `quietly = T` : defines global default to suppress messages

- `message = F` : defines global default to suppress messages (different and more robust method)

This code chunk should be alone - it should not include any other code

<br>

### code chunk display options

To change those global display options for one code chunk (like load_acs which is very *noisy*), add any chunk options within the `{r}` of your code chunk. For example, to display the output, but the not the code for a code chunk:

![](img/echo_false.png)

<br>

This chunk displays the following formatted table using the `knitr` package, but doesn't display the code.

![](img/kable_output.png)

### YAML

You can also add define pre-designed themes, add table of contents, and much more in the title section called the *yaml* (Yet Another Markup Language = stupid coding joke). The formatting is very finicky, when you are following an example make it look **EXACTLY** the same. It is complicated but we'll learn a few.

See [this chapter](https://bookdown.org/yihui/rmarkdown/html-document.html){target="_blank"} for more details on adjusting the html document in the title section.

<br>

-   Theme

    -   add styles with pre-packaged themes
    -   this notebook uses yeti, see [here](https://www.datadreaming.org/post/r-markdown-theme-gallery/){target="_blank"} for more

-   Table of Contents

    -   `toc: true`: create a table of contents \_ `toc_depth: 3`: create entries in the Table of Contents for Header 3 and higher
    -   `toc_float: true`: toc sticks to the side so you can always see it

![](img/yaml.png)

<br> <br>

## NYC Open Data

NYC Open Data is free public data published by New York City agencies and other partners.

[https://opendata.cityofnewyork.us/](https://opendata.cityofnewyork.us/){target="_blank"}

<br>

There is a vast amount of data. You can download data from NYC Open Data, or use the `RSocrata` to import the data directly into R.

We'll use an [example from Boyan Kostadinov](https://rpubs.com/bkostadi/data_analysis_cite2022){target="_blank"} at City Tech.

## In-class exercise:

The goal of this activity is to explore the [2021 DOE Middle School Directory data](https://data.cityofnewyork.us/Education/2021-DOE-Middle-School-Directory/f6s7-vytj){target="_blank"} from the New York City Open Data Portal. This activity is an introduction to exploratory data analysis and visualizations using R and RStudio.

### Install three new packages:

-   `RSocrata`: for loading the data from NYC Open Data
-   `knitr`: for printing tables
-   `DT`: for interactive tables in html format

### Create a new R Notebook

-   name it `nyc_middle_schools.Rmd`
-   save it in `part2/scripts`
-   load the tidyverse and the three new packages

### Import middle school data

-   Go to the NYC Open Data Portal
-   Find the [2021 DOE Middle School Directory data](https://data.cityofnewyork.us/Education/2021-DOE-Middle-School-Directory/f6s7-vytj){target="_blank"}
    -   We can use the RSocrata package to load the data in the CSV format, directly from the API tab in the NYC Open Data Portal, using the unique identifier f6s7-vytj for the data.
    -   Find the link for the data by clicking on the API button, and copying the link for the CSV version of the data

![](img/middle_school.png)

```{r}
library(tidyverse)
library(RSocrata)
library(knitr)
library(DT)

# import the data directly into RStudio using url path
data <- read.socrata("https://data.cityofnewyork.us/resource/f6s7-vytj.csv")

```

### Explore and process the data

Follow the instructions in [Kostadinov's R Notebook](https://rpubs.com/bkostadi/data_analysis_cite2022){target="_blank"} to select columns, and create summary statistics of the number of math professors, and correct some missing values.

Use this dataset to test out different ways to style and format an R Notebook

### Create an R Notebook to share

-   Create a new R Notebook to share your middle school math teacher analysis.

    -   Define global display options
    -   Define display options for at least one code chunk
    -   Use the `kable` or `datatable` functions to create formatted tables

## Asssignment 12a: NYC Open Data analysis

Explore [NYC Open Data](https://opendata.cityofnewyork.us/){target="_blank"} to see what data is available. Select a dataset to import via R Socrata and answer a research question about New York City or related to your final project. Create a R Notebook to share your analysis. Include:

-   Define global display options
-   Define display options for at least one code chunk
-   Use the `kable` or `datatable` functions to create at least one formatted tables
-   Create at least one plot or map

Some suggestions:

-   [Noise Complaints on 311 on one week](https://data.cityofnewyork.us/City-Government/311-Call-Center-Inquiry/wewp-mm3p/about_data){target="_blank"}:
    -   import this dataset of all 311 calls in New York City for one month
    -   filter to noise complaints 
    -   summarize by Borough or [Neighborhood Tabulation Area](https://data.cityofnewyork.us/City-Government/NTA-map/d3qk-pfyz){target="_blank"} to see how many 311 calls to report a noise complaint in Sunset Park vs other neighborhoods
-   [Motor Vehicle Collisions - Crashes](https://data.cityofnewyork.us/Public-Safety/Motor-Vehicle-Collisions-Crashes/h9gi-nx95){target="_blank"}
    -   import the dataset of all crashes in NYC
    -   summarize by Borough or Zip Code to see how many crashes occurred in one neighbohood vs other neighborhoods in the last year
    -   filter the crashes in the Zip codes in your chosen neighborhood and use the Lat/Long to create a map of the crash locations in one neighbohood (hint: use `st_as_sf()`)

Add the link to 2 Notebooks on CANVAS: for your in-class assignment and for this analysis.

<br> <br>


