Today we’re going to learn a whole lot more about R Markdown and how to use it to share your analysis with people that don’t know R.

Homework overview





Link to Published Plot



R Markdown

Markdown is a mark-up language that is used to convert text to HTML. An R Markdown document uses the Markdown language and R code to create a user-friendly page with formatted text and output from R code. You can export an R Markdown document as:

  1. HTML file
  2. PDF
  3. Word doc


R Markdown Resources

Today’s exercises borrow heavily from the R Markdown chapter in R for Data Science. I recommend bookmarking all of these resources, and reading some or all of them if you continue to use R Markdown.


Purpose

Common uses for R Markdown docs are:

  • Interactive display of research
  • Simple way to share your analysis on the web, for free
  • Lessons/Tutorials
  • Technical Books
  • Research Collaboration
  • Research Papers


Today we’re going to use this R Notebook to create an HTML document that can be viewed in a web browser that displays:

  • the output of our educational attainment script from class 10
  • the sources
  • the methods
  • where viewers can download our R scripts to reproduce our research


Create an R Markdown document:

  • File > R Markdown
    • select Document, HTML, and name it r_markdown_test
  • Save it to your class12 folder, name it r_markdown_test
  • Look at the files created in your class 12 folder
  • In R Studio, Knit r_markdown_test
    • See the HTML document that pops up
    • Look at the files created in your class 12 folder
    • Now we’ll look at the document together


Components

There are three basic components of an R Markdown document:

  • metadata: instructions for R Studio on the type of document and how it looks
    • between a pair of three dashes —
    • you can add themes and table of contents
    • it is VERY precisely formatted (indentation matters)
  • text: the text that you want to display in your document
    • written in the Markdown language
  • code: the R code that will create the tables, charts, maps, graphics displayed in the doc
    • A code chunk starts with and ends with three backticks
    • An inline R code expression starts with and ends with one backtick.




R Notebooks

R Notebooks are the most modern (and most common) type of R Markdown document. They are R Markdown documents that default as an HTML file. They also provide the ability to Preview as you work, and the published web page has the option for viewers to download the .Rmd document. This makes them the best choice for collaboration.

Note: The Preview only displays code that has been run. To see all of the R code output, you have to Run the code.

Create an R Notebook document:

  • File > R Notebook
  • Save it to your class12 folder, name it educational_attainment
  • In R Studio, Preview educational_attainment
    • Look at the HTML document that pops up
  • Click the green arrow on the upper right of the first code chunk
  • Preview educational_attainment again
    • Notice that it now displays the the output of the code chunk
    • Click on the the Code button on the upper right of your Preview
    • Download the Rmd, find it in your Downloads and open it



R code in R Markdown

To run code inside an R Markdown document and display the output in your html document you need to add the R code inside a code chunk so that R Studio knows execute it. To insert a chunk:

  • The keyboard shortcut Cmd/Ctrl + Alt + I
  • The “Insert” button icon in the toolbar

Inside the code chunk you can do anything you would in a regular R script - import data, process it, display it. Instead of displaying it in your R Studio window, it displays the output within the R Notebook and creates an HTML document that displays the output.

You’ll want to put code in different chunks, depending on whether you want to display the code or not.

Setup

The first code chunk is always used to define the parameters and you don’t want to display it.

The above code chunk:

  • defines this code chunk as the setup so it runs before any other code chunk, no matter where it is.
    • You can only have one setup.
    • The setup code chunk is never shown in yor output HTML document
  • echo = TRUE sets the default to show the code in your HTML document
  • loads the libraries


Setup your R Notebook:

  • In educational_attainment:
  • Delete everything after the metadata
  • Add setup code chunk like above


Add R code

After you set up your R Notebook, you can add your R code to do the work.


Add R code to your R Notebook:

  • In educational_attainment:
  • Below your setup code, start a new code chunk
  • Insert the code below (or your own code from your homework if you prefer)


acs_vars <- load_variables(2019, "acs1", cache = T)

c15003_vars <- acs_vars %>% 
  filter(grepl("C15003", name)) %>% 
  mutate(label = str_replace(label, "Estimate!!", ""),
         label = str_replace(label, ":!!", "_")) %>% 
  select(name, label) %>% 
  rename(variable = name)

c_education <- get_acs(survey = "acs1", 
                       geography = "state", 
                       state = "NY", 
                       table = "C15003",
                       year = "2019") %>% 
  left_join(c15003_vars, by = "variable")

#### Import census data for all places ####

##### Population data #####
raw_city_pop <- get_acs(survey = "acs1", 
                         geography = "place", 
                         variables = "B01003_001",
                         year = "2019")

##### Educational attainment data #####
raw_city_education <- get_acs(survey = "acs1", 
                              geography = "place", 
                              table = "C15003",
                              year = "2019") %>% 
  left_join(c15003_vars, by = "variable")

##### Per-capita income data #####
raw_city_pc_income <- get_acs(survey = "acs1", 
                              geography = "place", 
                              variables = "B19301_001",
                              year = "2019")

#### Process census data for all places ####

city_pop <- raw_city_pop %>% 
  rename(pop = estimate) %>% 
  select(GEOID, pop)

city_income <-   raw_city_pc_income %>% 
  rename(per_capita_income = estimate) %>% 
  select(GEOID, per_capita_income)

city_ed_attainment <- raw_city_education %>% 
  filter(label == "Total_Bachelor's degree" | 
           label == "Total_Master's degree" | 
           label == "Total_Professional school degree" | 
           label == "Total_Doctorate degree" | 
           label == "Total:") %>% 
  select(GEOID, NAME, estimate, label) %>% 
  pivot_wider(names_from = label, values_from = estimate) %>% 
  mutate(bachelor_and_higher = rowSums(across(`Total_Bachelor's degree`:`Total_Doctorate degree`)),
         # quick test of rows sums, I add up the first row, 11436+7687+950+2223, it worked!
         pct_at_least_bachelors = bachelor_and_higher/`Total:`) 

#### Create data frame for analysis and scatterplot ####
city_analysis <- city_ed_attainment %>% 
  select(GEOID, NAME, pct_at_least_bachelors) %>% 
  separate(NAME, c("city","state"), sep = ",") %>% 
  left_join(city_pop, by = "GEOID") %>% 
  left_join(city_income, by = "GEOID") %>% 
  filter(pop > 65000)

#### Make it interactive ####
city_plot <- city_analysis %>% 
  ggplot(aes(x = pct_at_least_bachelors, y = per_capita_income, size = pop,
             color = state,
             text = paste0(city,", ", state,
                           "<br>Population : ", scales::comma(pop, accuracy=1L),
                           "<br>Adults with at least a Bachelor's Degree : ", scales::percent(pct_at_least_bachelors, accuracy=1L),
                           "<br>Per-capita income : ", scales::dollar(per_capita_income, accuracy=1L)))) +
  geom_point(alpha = .75) +
  guides(size = "none",
         color = "none") +
  # make sure you have the `scales` package loaded!
  scale_x_continuous(labels = percent_format(accuracy = 1)) +
  scale_y_continuous(labels = dollar_format(accuracy = 1)) + 
  # change legend label formatting
  scale_size_area(labels = comma, max_size = 10) +
  labs(x = "Proportion of Adults with at least a Bachelor's Degree", y = "Per-capita Income",
       title = "Educational Attainment and Per Capita Income",
       caption = "Sources: ACS, 5-yr 2015-19", 
       # add nice label for size element
       size = "Enrollment",
       color = "Urbanicity") +
  theme_bw()  +
  theme(legend.position = 'none')

ggplotly(city_plot, tooltip = "text") %>% 
  layout(margin = list(t = 25))


Hide R code


In some cases, you may want to display your R code, but usually you want to keep it hidden and show the output only. For each code chunk, you can define the chunk options to determine whether the code, warnings, and output are displayed in your html document. The following are the most commonly used options:

  • eval = FALSE prevents code from being evaluated. (And obviously if the code is not run, no results will be generated). This is useful for displaying example code, or for disabling a large block of code without commenting each line.
  • include = FALSE runs the code, but doesn’t show the code or results in the final document. Use this for setup code that you don’t want cluttering your report.
  • echo = FALSE prevents code, but not the results from appearing in the finished file. Use this when writing reports aimed at people who don’t want to see the underlying R code.
  • message = FALSE or warning = FALSE prevents messages or warnings from appearing in the finished file.
  • results = 'hide' hides printed output; fig.show = 'hide' hides plots.

You can also set the options in the YAML header to be able to download the Rmd so that R-curious readers can look at your code. Add code_download: true to the YAML header like this:

{width:200px}


Hide R code in your R Notebook:

  • add the option echo=FALSE, message=F, warning=F so that the beginning of you code chunk looks like this:
  • Add code_download: true to the YAML header



Text Formatting in R Markdown

You can also add spaces with the line break tag - < br >



Add text to your R Notebook:

  • In r_notebook_test:
  • Create citations for the educational attainment data from the assignment 10
    • Make the citations look like below:
      • Data Sources & Headers are 4th level headers
      • Variables are bold
      • Table names are italicized
      • tidycensus lnks to https://walker-data.com/tidycensus/
      • lists of variables use bullet points








Assignment 12

R Assignment

Read through the lesson from today again. Complete the R Notebook of the educational attainment assignment and publish it to RPubs. Make sure that you make your Rmd downloadable!

Submit the link to your RPubs web page on Canvas.

---
title: "Methods 1 - Class 12"
output:
  html_document:
    toc: yes
    toc_depth: '4'
    toc_float: yes
    df_print: paged
    code_download: true
---

<br><br><br><hr>

Today we're going to learn a whole lot more about R Markdown and how to use it to share your analysis with people that don't know R.

### Homework overview

<br><hr><br><br>

```{r echo=F, message=F, warning=F, fig.width=7}
library(tidycensus)
library(tidyverse)
library(plotly)
library(scales)

nj_unemployment <- read_csv("data/output/nj_unemployment.csv")

nj_unemployment_plot <- ggplot(data=nj_unemployment, aes(x=reorder(group,unemployment_rate), 
                                 y=unemployment_rate,
                                 text = paste0("The unemployment rate for people that identify as <br>", group, 
                                               " in the Census is ",
                                               scales::percent(unemployment_rate, accuracy=1),
                                               " (+/- ", scales::comma(moe, accuracy=.1), ")."))) +
  geom_col(fill = "#c5c6d0") +
  geom_hline(aes(yintercept = 0.055),size = 1, colour = "#32a991", linetype="dashed") + 
  theme(axis.text.x=element_text(angle=25,hjust=1),
        axis.title = element_blank()) +
  scale_y_continuous(label=scales::percent_format(accuracy = 1L)) +
  annotate("text", x=1.75, y=0.06, label= "State-wide Unemployment Rate", color = "#32a991") +
  labs(title = "New Jersey Unemployment Rate by Race and Ethnicity") +
  theme(panel.border = element_blank(), 
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), 
        panel.background = element_blank(), 
        axis.line = element_line(colour = "black"))

ggplotly(nj_unemployment_plot, tooltip = "text") %>% 
  layout(margin = list(t = 75))

```


[Link to Published Plot](https://rpubs.com/DUE-methods1/nj-unemployment)

<br><br>

### R Markdown

Markdown is a mark-up language that is used to convert text to HTML.  An R Markdown document uses the Markdown language and R code to create a user-friendly page with formatted text and output from R code. You can export an R Markdown document as:

1. HTML file
2. PDF
3. Word doc

<br>

#### R Markdown Resources

* [R for Data Science: R Markdown chapter](https://r4ds.had.co.nz/r-markdown.html)
* [R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/basics.html)
* [R Markdown Cookbook](https://bookdown.org/yihui/rmarkdown-cookbook/how-to-read-this-book.html)
* [R Markdown Cheatsheet](https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf)

Today's exercises borrow heavily from the R Markdown chapter in R for Data Science. I recommend bookmarking all of these resources, and reading some or all of them if you continue to use R Markdown.

<br>

#### Purpose

Common uses for R Markdown docs are:

* Interactive display of research
* Simple way to share your analysis on the web, for free
* Lessons/Tutorials
* Technical Books
* Research Collaboration
* Research Papers

<br>

**Today we're going to use this R Notebook to create an HTML document that can be viewed in a web browser that displays: **

* **the output of our educational attainment script from class 10**
* **the sources**
* **the methods**
* **where viewers can download our R scripts to reproduce our research**

<br>

#### **<font color="#953445">Create an R Markdown document:**

> * File > R Markdown 
>   + select Document, HTML, and name it r_markdown_test
> * Save it to your class12 folder, name it r_markdown_test
> * Look at the files created in your class 12 folder
> * In R Studio, `Knit` r_markdown_test
>   + See the HTML document that pops up
>   + Look at the files created in your class 12 folder
>   + Now we'll look at the document together

</font>

<br>

#### Components

There are three basic components of an R Markdown document:

* **metadata**: instructions for R Studio on the type of document and how it looks
   + between a pair of three dashes  ---
   + you can add themes and table of contents
   + it is VERY precisely formatted (indentation matters)
* **text**: the text that you want to display in your document
   + written in the Markdown language
* **code**: the R code that will create the tables, charts, maps, graphics displayed in the doc
   + A code chunk starts with and ends with *three* backticks 
   + An inline R code expression starts with and ends with *one* backtick.

<br><hr>

![](img/rnotebook_anatomy_rotated.png){width=900px}

<br><hr>

### R Notebooks

R Notebooks are the most modern (and most common) type of R Markdown document. They are R Markdown documents that default as an HTML file. They also provide the ability to Preview as you work, and the published web page has the option for viewers to download the .Rmd document. This makes them the best choice for collaboration.

*Note:* The `Preview` only displays code that has been run.  To see all of the R code output, you have to `Run` the code.

#### **<font color="#953445">Create an R Notebook document:**

> * File > R Notebook 
> * Save it to your class12 folder, name it educational_attainment
> * In R Studio, `Preview` educational_attainment
>   + Look at  the HTML document that pops up
> * Click the green arrow on the upper right of the first code chunk
> * `Preview` educational_attainment again
>   + Notice that it now displays the the output of the code chunk
>   + Click on the the *Code*  button on the upper right of your Preview
>   + Download the Rmd, find it in your Downloads and open it

<br>

</font>

<br>

### R code in R Markdown

To run code inside an R Markdown document and display the output in your html document you need to add the R code inside a code chunk so that R Studio knows execute it. To insert a chunk:

* The keyboard shortcut Cmd/Ctrl + Alt + I
* The “Insert” button icon in the toolbar

Inside the code chunk you can do anything you would in a regular R script - import data, process it, display it.  Instead of displaying it in your R Studio window, it displays the output within the R Notebook and creates an HTML document that displays the output.  

You'll want to put code in different chunks, depending on whether you want to display the code or not.  


#### Setup 

The first code chunk is always used to define the parameters and you don't want to display it.

![](img/setup.png){width=900px}

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

library(tidyverse)
library(tidycensus)
library(plotly)
library(scales)

```


The above code chunk:

* defines this code chunk as the `setup` so it runs before any other code chunk, no matter where it is.  
    + **You can only have one setup**. 
    + The setup code chunk is never shown in yor output HTML document
* `echo = TRUE` sets the default to show the code in your HTML document
* loads the libraries

<br>

#### **<font color="#953445">Setup your R Notebook:**

> * In educational_attainment:
> * Delete everything after the metadata
> * Add setup code chunk like above
</font>

<br>

#### Add R code

After you set up your R Notebook, you can add your R code to do the work. 

<br>

#### **<font color="#953445">Add R code to your R Notebook:**

> * In educational_attainment:
> * Below your setup code, start a new code chunk
> * Insert the code below (or your own code from your homework if you prefer)
</font>

<br>

```{r, eval=FALSE}
acs_vars <- load_variables(2019, "acs1", cache = T)

c15003_vars <- acs_vars %>% 
  filter(grepl("C15003", name)) %>% 
  mutate(label = str_replace(label, "Estimate!!", ""),
         label = str_replace(label, ":!!", "_")) %>% 
  select(name, label) %>% 
  rename(variable = name)

c_education <- get_acs(survey = "acs1", 
                       geography = "state", 
                       state = "NY", 
                       table = "C15003",
                       year = "2019") %>% 
  left_join(c15003_vars, by = "variable")

#### Import census data for all places ####

##### Population data #####
raw_city_pop <- get_acs(survey = "acs1", 
                         geography = "place", 
                         variables = "B01003_001",
                         year = "2019")

##### Educational attainment data #####
raw_city_education <- get_acs(survey = "acs1", 
                              geography = "place", 
                              table = "C15003",
                              year = "2019") %>% 
  left_join(c15003_vars, by = "variable")

##### Per-capita income data #####
raw_city_pc_income <- get_acs(survey = "acs1", 
                              geography = "place", 
                              variables = "B19301_001",
                              year = "2019")

#### Process census data for all places ####

city_pop <- raw_city_pop %>% 
  rename(pop = estimate) %>% 
  select(GEOID, pop)

city_income <-   raw_city_pc_income %>% 
  rename(per_capita_income = estimate) %>% 
  select(GEOID, per_capita_income)

city_ed_attainment <- raw_city_education %>% 
  filter(label == "Total_Bachelor's degree" | 
           label == "Total_Master's degree" | 
           label == "Total_Professional school degree" | 
           label == "Total_Doctorate degree" | 
           label == "Total:") %>% 
  select(GEOID, NAME, estimate, label) %>% 
  pivot_wider(names_from = label, values_from = estimate) %>% 
  mutate(bachelor_and_higher = rowSums(across(`Total_Bachelor's degree`:`Total_Doctorate degree`)),
         # quick test of rows sums, I add up the first row, 11436+7687+950+2223, it worked!
         pct_at_least_bachelors = bachelor_and_higher/`Total:`) 

#### Create data frame for analysis and scatterplot ####
city_analysis <- city_ed_attainment %>% 
  select(GEOID, NAME, pct_at_least_bachelors) %>% 
  separate(NAME, c("city","state"), sep = ",") %>% 
  left_join(city_pop, by = "GEOID") %>% 
  left_join(city_income, by = "GEOID") %>% 
  filter(pop > 65000)

#### Make it interactive ####
city_plot <- city_analysis %>% 
  ggplot(aes(x = pct_at_least_bachelors, y = per_capita_income, size = pop,
             color = state,
             text = paste0(city,", ", state,
                           "<br>Population : ", scales::comma(pop, accuracy=1L),
                           "<br>Adults with at least a Bachelor's Degree : ", scales::percent(pct_at_least_bachelors, accuracy=1L),
                           "<br>Per-capita income : ", scales::dollar(per_capita_income, accuracy=1L)))) +
  geom_point(alpha = .75) +
  guides(size = "none",
         color = "none") +
  # make sure you have the `scales` package loaded!
  scale_x_continuous(labels = percent_format(accuracy = 1)) +
  scale_y_continuous(labels = dollar_format(accuracy = 1)) + 
  # change legend label formatting
  scale_size_area(labels = comma, max_size = 10) +
  labs(x = "Proportion of Adults with at least a Bachelor's Degree", y = "Per-capita Income",
       title = "Educational Attainment and Per Capita Income",
       caption = "Sources: ACS, 5-yr 2015-19", 
       # add nice label for size element
       size = "Enrollment",
       color = "Urbanicity") +
  theme_bw()  +
  theme(legend.position = 'none')

ggplotly(city_plot, tooltip = "text") %>% 
  layout(margin = list(t = 25))
```


<br>

#### Hide R code
<br>

In some cases, you may want to display your R code, but usually you want to keep it hidden and show the output only.  For each code chunk, you can define the chunk options to determine whether the code, warnings, and output are displayed in your html document.  The following are the most commonly used options:

* `eval = FALSE` prevents code from being evaluated. (And obviously if the code is not run, no results will be generated). This is useful for displaying example code, or for disabling a large block of code without commenting each line.
* `include = FALSE` runs the code, but doesn’t show the code or results in the final document. Use this for setup code that you don’t want cluttering your report.
* `echo = FALSE` prevents code, but not the results from appearing in the finished file. Use this when writing reports aimed at people who don’t want to see the underlying R code.
* `message = FALSE` or warning = FALSE prevents messages or warnings from appearing in the finished file.
* `results = 'hide'` hides printed output; `fig.show = 'hide'` hides plots.

You can also set the options in the YAML header to be able to download the Rmd so that R-curious readers can look at your code.  Add `code_download: true` to the YAML header like this:

![](img/yaml.png){width:200px}

<br>

#### **<font color="#953445">Hide R code in your R Notebook:**

> * add the option `echo=FALSE, message=F, warning=F` so that the beginning of you code chunk looks like this:
> ![](img/echo.png){width=250px}
> *  Add `code_download: true` to the YAML header
</font>

<br><br>

### Text Formatting in R Markdown

![](img/markdown_text.png){width=300px}

You can also add spaces with the line break tag - [< br >](https://www.w3schools.com/tags/tag_br.asp)

<br><br>

#### **<font color="#953445">Add text to your R Notebook:**

> * In r_notebook_test:
> * Create citations for the educational attainment data from the assignment 10 
>   + Make the citations look like below:
>       + Data Sources & Headers are 4th level headers
>       + Variables are **bold**
>       + Table names are *italicized*
>       + tidycensus lnks to https://walker-data.com/tidycensus/
>       + lists of variables use bullet points

</font>

![](img/citations.png){width=600px}
<br><br><br><hr><br><br><br>


### Assignment 12

##### R Assignment

Read through the lesson from today again. Complete the R Notebook of the educational attainment assignment and publish it to RPubs. Make sure that you make your Rmd downloadable!

Submit the link to your RPubs web page on Canvas.

##### Readings: 

Read [United, Chapter One of Minor Feelings by Cathy Park Hong](https://drive.google.com/file/d/1Avq15KBGRNVwonmWoCBsA5dvDfSXc3ij/view?usp=sharing)

Look through all of the R Markdown resources:

* [R for Data Science: R Markdown chapter](https://r4ds.had.co.nz/r-markdown.html)
* [R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown/basics.html)
* [R Markdown Cookbook](https://bookdown.org/yihui/rmarkdown-cookbook/how-to-read-this-book.html)
* [R Markdown Cheatsheet](https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf)


<br><br><br><hr><br><br><br>