Outline

  • Homework
  • RMarkdown syntax
  • R Notebook formatting
  • NYC Open Data
  • In-class exercise



Homework Questions


R Notebook example: Unemployment in New Jersey

Who else wants to share their RNotebook?




R Notebook display options

As you have noticed, you don’t always want to display the output from your code. You can define how your code displays using chunk options in 2 ways:

  • globally: defines the default for every code chunk
  • by chunk: defines the display options for the chunk only

The most common chunk options are:

  • include = FALSE prevents code and results from appearing in the finished file. R Markdown still runs the code in the chunk, and the results can be used by other chunks.
  • echo = FALSE prevents code, but not the results from appearing in the finished file. This is a useful way to embed figures.
  • message = FALSE prevents messages that are generated by code from appearing in the finished file.
  • warning = FALSE prevents warnings that are generated by code from appearing in the finished.
  • fig.cap = "..." adds a caption to graphical results.

See the R Markdown Reference Guide for a complete list of knitr chunk options.

global display options

At the top of your R Notebook, under the title insert a code chunk with global options using the knitr::opts_chunk$setparameter. These will define the default display options for all code chunks in the Notebook. For example, the following :


  • names the chunk “setup”

  • uses include=FALSE : do not display this code chunk in the output

  • echo = T : defines global default to show the code chunk

  • quietly = T : defines global default to suppress messages

  • message = F : defines global default to suppress messages (different and more robust method)

This code chunk should be alone - it should not include any other code


code chunk display options

To change those global display options for one code chunk (like load_acs which is very noisy), add any chunk options within the {r} of your code chunk. For example, to display the output, but the not the code for a code chunk:


This chunk displays the following formatted table using the knitr package, but doesn’t display the code.

YAML

You can also add define pre-designed themes, add table of contents, and much more in the title section called the yaml (Yet Another Markup Language = stupid coding joke). The formatting is very finicky, when you are following an example make it look EXACTLY the same. It is complicated but we’ll learn a few.

See this chapter for more details on adjusting the html document in the title section.


  • Theme

    • add styles with pre-packaged themes
    • this notebook uses yeti, see here for more
  • Table of Contents

    • toc: true: create a table of contents _ toc_depth: 3: create entries in the Table of Contents for Header 3 and higher
    • toc_float: true: toc sticks to the side so you can always see it



NYC Open Data

NYC Open Data is free public data published by New York City agencies and other partners.

https://opendata.cityofnewyork.us/


There is a vast amount of data. You can download data from NYC Open Data, or use the RSocrata to import the data directly into R.

We’ll use an example from Boyan Kostadinov at City Tech.

In-class exercise:

The goal of this activity is to explore the 2021 DOE Middle School Directory data from the New York City Open Data Portal. This activity is an introduction to exploratory data analysis and visualizations using R and RStudio.

Install three new packages:

  • RSocrata: for loading the data from NYC Open Data
  • knitr: for printing tables
  • DT: for interactive tables in html format

Create a new R Notebook

  • name it nyc_middle_schools.Rmd
  • save it in part2/scripts
  • load the tidyverse and the three new packages

Import middle school data

  • Go to the NYC Open Data Portal
  • Find the 2021 DOE Middle School Directory data
    • We can use the RSocrata package to load the data in the CSV format, directly from the API tab in the NYC Open Data Portal, using the unique identifier f6s7-vytj for the data.
    • Find the link for the data by clicking on the API button, and copying the link for the CSV version of the data

library(tidyverse)
library(RSocrata)
library(knitr)
library(DT)

# import the data directly into RStudio using url path
data <- read.socrata("https://data.cityofnewyork.us/resource/f6s7-vytj.csv")
## Warning in
## read.socrata("https://data.cityofnewyork.us/resource/f6s7-vytj.csv"): Dates and
## currency fields will be converted to character

Explore and process the data

Follow the instructions in Kostadinov’s R Notebook to select columns, and create summary statistics of the number of math professors, and correct some missing values.

Use this dataset to test out different ways to style and format an R Notebook

Create an R Notebook to share

  • Create a new R Notebook to share your middle school math teacher analysis.

    • Define global display options
    • Define display options for at least one code chunk
    • Use the kable or datatable functions to create formatted tables

Asssignment 12a: NYC Open Data analysis

Explore NYC Open Data to see what data is available. Select a dataset to import via R Socrata and answer a research question about New York City or related to your final project. Create a R Notebook to share your analysis. Include:

  • Define global display options
  • Define display options for at least one code chunk
  • Use the kable or datatable functions to create at least one formatted tables
  • Create at least one plot or map

Some suggestions:

  • Active Tobacco Retailer Dealer Licenses:
    • import this dataset of all tobacco licenses in New York City
    • summarize by Borough or Neighborhood Tabulation Area to see how many tobacco licenses were granted in Corona vs other neighborhoods in the last year
  • Motor Vehicle Collisions - Crashes
    • import the dataset of all crashes in NYC
    • summarize by Borough or Zip Code to see how many crashes occurred in one neighbohood vs other neighborhoods in the last year
    • filter the crashes in the Zip codes in your chosen neighborhood and use the Lat/Long to create a map of the crash locations in one neighbohood (hint: use st_as_sf())

Add the link to 2 Notebooks on CANVAS: for your in-class assignment and for this analysis.



---
title: "R Notebooks & NYC Open Data"
output:
  html_document:
    theme: yeti
    toc: true
    toc_depth: 3
    toc_float: true
    code_download: true
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = T,
                      quietly = T,
                      message = F)
```

## Outline

-   Homework
-   RMarkdown syntax
-   R Notebook formatting
-   NYC Open Data
-   In-class exercise

</br>
</br>

## Homework Questions

</br>

R Notebook example: [Unemployment in New Jersey](https://rpubs.com/spatialcollections/nj-unemployment){target="_blank"}



### <span style="color:purple">Who else wants to share their RNotebook?</span>

</br>
</br></br>

## R Notebook display options

As you have noticed, you don't always want to display the output from your code. You can define how your code displays using chunk options in 2 ways:

-   *globally*: defines the default for every code chunk
-   *by chunk*: defines the display options for the chunk only

The most common chunk options are:

-   `include = FALSE` prevents code and results from appearing in the finished file. R Markdown still runs the code in the chunk, and the results can be used by other chunks.
-   `echo = FALSE` prevents code, but not the results from appearing in the finished file. This is a useful way to embed figures.
-   `message = FALSE` prevents messages that are generated by code from appearing in the finished file.
-   `warning = FALSE` prevents warnings that are generated by code from appearing in the finished.
-   `fig.cap = "..."` adds a caption to graphical results.

See the [R Markdown Reference Guide](https://www.rstudio.com/wp-content/uploads/2015/03/rmarkdown-reference.pdf?_ga=2.173068196.662875585.1668357439-2095049193.1630274735){target="_blank"} for a complete list of knitr chunk options.

### global display options

At the top of your R Notebook, under the title insert a code chunk with global options using the `knitr::opts_chunk$set`parameter. These will define the default display options for all code chunks in the Notebook. For example, the following :

<br>

![](img/global.png) 

- names the chunk "setup"

- uses `include=FALSE` : do not display this code chunk in the output

- `echo = T` : defines global default to show the code chunk

- `quietly = T` : defines global default to suppress messages

- `message = F` : defines global default to suppress messages (different and more robust method)

This code chunk should be alone - it should not include any other code

<br>

### code chunk display options

To change those global display options for one code chunk (like load_acs which is very *noisy*), add any chunk options within the `{r}` of your code chunk. For example, to display the output, but the not the code for a code chunk:

![](img/echo_false.png)

<br>

This chunk displays the following formatted table using the `knitr` package, but doesn't display the code.

![](img/kable_output.png)

### YAML

You can also add define pre-designed themes, add table of contents, and much more in the title section called the *yaml* (Yet Another Markup Language = stupid coding joke). The formatting is very finicky, when you are following an example make it look **EXACTLY** the same. It is complicated but we'll learn a few.

See [this chapter](https://bookdown.org/yihui/rmarkdown/html-document.html){target="_blank"} for more details on adjusting the html document in the title section.

<br>

-   Theme

    -   add styles with pre-packaged themes
    -   this notebook uses yeti, see [here](https://www.datadreaming.org/post/r-markdown-theme-gallery/){target="_blank"} for more

-   Table of Contents

    -   `toc: true`: create a table of contents \_ `toc_depth: 3`: create entries in the Table of Contents for Header 3 and higher
    -   `toc_float: true`: toc sticks to the side so you can always see it

![](img/yaml.png)

<br> <br>

## NYC Open Data

NYC Open Data is free public data published by New York City agencies and other partners.

[https://opendata.cityofnewyork.us/](https://opendata.cityofnewyork.us/){target="_blank"}

<br>

There is a vast amount of data. You can download data from NYC Open Data, or use the `RSocrata` to import the data directly into R.

We'll use an [example from Boyan Kostadinov](https://rpubs.com/bkostadi/data_analysis_cite2022){target="_blank"} at City Tech.

## In-class exercise:

The goal of this activity is to explore the [2021 DOE Middle School Directory data](https://data.cityofnewyork.us/Education/2021-DOE-Middle-School-Directory/f6s7-vytj){target="_blank"} from the New York City Open Data Portal. This activity is an introduction to exploratory data analysis and visualizations using R and RStudio.

### Install three new packages:

-   `RSocrata`: for loading the data from NYC Open Data
-   `knitr`: for printing tables
-   `DT`: for interactive tables in html format

### Create a new R Notebook

-   name it `nyc_middle_schools.Rmd`
-   save it in `part2/scripts`
-   load the tidyverse and the three new packages

### Import middle school data

-   Go to the NYC Open Data Portal
-   Find the [2021 DOE Middle School Directory data](https://data.cityofnewyork.us/Education/2021-DOE-Middle-School-Directory/f6s7-vytj){target="_blank"}
    -   We can use the RSocrata package to load the data in the CSV format, directly from the API tab in the NYC Open Data Portal, using the unique identifier f6s7-vytj for the data.
    -   Find the link for the data by clicking on the API button, and copying the link for the CSV version of the data

![](img/middle_school.png)

```{r}
library(tidyverse)
library(RSocrata)
library(knitr)
library(DT)

# import the data directly into RStudio using url path
data <- read.socrata("https://data.cityofnewyork.us/resource/f6s7-vytj.csv")

```

### Explore and process the data

Follow the instructions in [Kostadinov's R Notebook](https://rpubs.com/bkostadi/data_analysis_cite2022){target="_blank"} to select columns, and create summary statistics of the number of math professors, and correct some missing values.

Use this dataset to test out different ways to style and format an R Notebook

### Create an R Notebook to share

-   Create a new R Notebook to share your middle school math teacher analysis.

    -   Define global display options
    -   Define display options for at least one code chunk
    -   Use the `kable` or `datatable` functions to create formatted tables

## Asssignment 12a: NYC Open Data analysis

Explore [NYC Open Data](https://opendata.cityofnewyork.us/){target="_blank"} to see what data is available. Select a dataset to import via R Socrata and answer a research question about New York City or related to your final project. Create a R Notebook to share your analysis. Include:

-   Define global display options
-   Define display options for at least one code chunk
-   Use the `kable` or `datatable` functions to create at least one formatted tables
-   Create at least one plot or map

Some suggestions:

-   [Active Tobacco Retailer Dealer Licenses](https://data.cityofnewyork.us/Business/Active-Tobacco-Retail-Dealer-Licenses/adw8-wvxb/data){target="_blank"}:
    -   import this dataset of all tobacco licenses in New York City
    -   summarize by Borough or [Neighborhood Tabulation Area](https://data.cityofnewyork.us/City-Government/NTA-map/d3qk-pfyz){target="_blank"} to see how many tobacco licenses were granted in Corona vs other neighborhoods in the last year
-   [Motor Vehicle Collisions - Crashes](https://data.cityofnewyork.us/Public-Safety/Motor-Vehicle-Collisions-Crashes/h9gi-nx95){target="_blank"}
    -   import the dataset of all crashes in NYC
    -   summarize by Borough or Zip Code to see how many crashes occurred in one neighbohood vs other neighborhoods in the last year
    -   filter the crashes in the Zip codes in your chosen neighborhood and use the Lat/Long to create a map of the crash locations in one neighbohood (hint: use `st_as_sf()`)

Add the link to 2 Notebooks on CANVAS: for your in-class assignment and for this analysis.

<br> <br>


