Today we’re going to learn a whole lot more about R Markdown and how to use it to share your analysis with people that don’t know R.
Markdown is a mark-up language that is used to convert text to HTML. An R Markdown document uses the Markdown language and R code to create a user-friendly page with formatted text and output from R code. You can export an R Markdown document as:
Today’s exercises borrow heavily from the R Markdown chapter in R for Data Science. I recommend bookmarking all of these resources, and reading some or all of them if you continue to use R Markdown.
Common uses for R Markdown docs are:
Today we’re going to use this R Notebook to create an HTML document that can be viewed in a web browser that displays:
- File > R Markdown
- select Document, HTML, and name it r_markdown_test
- Save it to your class12 folder, name it r_markdown_test
- Look at the files created in your class 12 folder
- In R Studio,
Knitr_markdown_test
- See the HTML document that pops up
- Look at the files created in your class 12 folder
- Now we’ll look at the document together
There are three basic components of an R Markdown document:
R Notebooks are the most modern (and most common) type of R Markdown document. They are R Markdown documents that default as an HTML file. They also provide the ability to Preview as you work, and the published web page has the option for viewers to download the .Rmd document. This makes them the best choice for collaboration.
Note: The Preview only displays code that has been run. To see all of the R code output, you have to Run the code.
- File > R Notebook
- Save it to your class12 folder, name it educational_attainment
- In R Studio,
Previeweducational_attainment
- Look at the HTML document that pops up
- Click the green arrow on the upper right of the first code chunk
Previeweducational_attainment again
- Notice that it now displays the the output of the code chunk
- Click on the the Code button on the upper right of your Preview
- Download the Rmd, find it in your Downloads and open it
To run code inside an R Markdown document and display the output in your html document you need to add the R code inside a code chunk so that R Studio knows execute it. To insert a chunk:
Inside the code chunk you can do anything you would in a regular R script - import data, process it, display it. Instead of displaying it in your R Studio window, it displays the output within the R Notebook and creates an HTML document that displays the output.
You’ll want to put code in different chunks, depending on whether you want to display the code or not.
The first code chunk is always used to define the parameters and you don’t want to display it.
The above code chunk:
setup so it runs before any other code chunk, no matter where it is.
echo = TRUE sets the default to show the code in your HTML document
- In educational_attainment:
- Delete everything after the metadata
- Add setup code chunk like above
After you set up your R Notebook, you can add your R code to do the work.
- In educational_attainment:
- Below your setup code, start a new code chunk
- Insert the code below (or your own code from your homework if you prefer)
acs_vars <- load_variables(2019, "acs1", cache = T)
c15003_vars <- acs_vars %>%
filter(grepl("C15003", name)) %>%
mutate(label = str_replace(label, "Estimate!!", ""),
label = str_replace(label, ":!!", "_")) %>%
select(name, label) %>%
rename(variable = name)
c_education <- get_acs(survey = "acs1",
geography = "state",
state = "NY",
table = "C15003",
year = "2019") %>%
left_join(c15003_vars, by = "variable")
#### Import census data for all places ####
##### Population data #####
raw_city_pop <- get_acs(survey = "acs1",
geography = "place",
variables = "B01003_001",
year = "2019")
##### Educational attainment data #####
raw_city_education <- get_acs(survey = "acs1",
geography = "place",
table = "C15003",
year = "2019") %>%
left_join(c15003_vars, by = "variable")
##### Per-capita income data #####
raw_city_pc_income <- get_acs(survey = "acs1",
geography = "place",
variables = "B19301_001",
year = "2019")
#### Process census data for all places ####
city_pop <- raw_city_pop %>%
rename(pop = estimate) %>%
select(GEOID, pop)
city_income <- raw_city_pc_income %>%
rename(per_capita_income = estimate) %>%
select(GEOID, per_capita_income)
city_ed_attainment <- raw_city_education %>%
filter(label == "Total_Bachelor's degree" |
label == "Total_Master's degree" |
label == "Total_Professional school degree" |
label == "Total_Doctorate degree" |
label == "Total:") %>%
select(GEOID, NAME, estimate, label) %>%
pivot_wider(names_from = label, values_from = estimate) %>%
mutate(bachelor_and_higher = rowSums(across(`Total_Bachelor's degree`:`Total_Doctorate degree`)),
# quick test of rows sums, I add up the first row, 11436+7687+950+2223, it worked!
pct_at_least_bachelors = bachelor_and_higher/`Total:`)
#### Create data frame for analysis and scatterplot ####
city_analysis <- city_ed_attainment %>%
select(GEOID, NAME, pct_at_least_bachelors) %>%
separate(NAME, c("city","state"), sep = ",") %>%
left_join(city_pop, by = "GEOID") %>%
left_join(city_income, by = "GEOID") %>%
filter(pop > 65000)
#### Make it interactive ####
city_plot <- city_analysis %>%
ggplot(aes(x = pct_at_least_bachelors, y = per_capita_income, size = pop,
color = state,
text = paste0(city,", ", state,
"<br>Population : ", scales::comma(pop, accuracy=1L),
"<br>Adults with at least a Bachelor's Degree : ", scales::percent(pct_at_least_bachelors, accuracy=1L),
"<br>Per-capita income : ", scales::dollar(per_capita_income, accuracy=1L)))) +
geom_point(alpha = .75) +
guides(size = "none",
color = "none") +
# make sure you have the `scales` package loaded!
scale_x_continuous(labels = percent_format(accuracy = 1)) +
scale_y_continuous(labels = dollar_format(accuracy = 1)) +
# change legend label formatting
scale_size_area(labels = comma, max_size = 10) +
labs(x = "Proportion of Adults with at least a Bachelor's Degree", y = "Per-capita Income",
title = "Educational Attainment and Per Capita Income",
caption = "Sources: ACS, 5-yr 2015-19",
# add nice label for size element
size = "Enrollment",
color = "Urbanicity") +
theme_bw() +
theme(legend.position = 'none')
ggplotly(city_plot, tooltip = "text") %>%
layout(margin = list(t = 25))
In some cases, you may want to display your R code, but usually you want to keep it hidden and show the output only. For each code chunk, you can define the chunk options to determine whether the code, warnings, and output are displayed in your html document. The following are the most commonly used options:
eval = FALSE prevents code from being evaluated. (And obviously if the code is not run, no results will be generated). This is useful for displaying example code, or for disabling a large block of code without commenting each line.include = FALSE runs the code, but doesn’t show the code or results in the final document. Use this for setup code that you don’t want cluttering your report.echo = FALSE prevents code, but not the results from appearing in the finished file. Use this when writing reports aimed at people who don’t want to see the underlying R code.message = FALSE or warning = FALSE prevents messages or warnings from appearing in the finished file.results = 'hide' hides printed output; fig.show = 'hide' hides plots.You can also set the options in the YAML header to be able to download the Rmd so that R-curious readers can look at your code. Add code_download: true to the YAML header like this:
{width:200px}
- add the option
echo=FALSE, message=F, warning=Fso that the beginning of you code chunk looks like this:- Add
code_download: trueto the YAML header
You can also add spaces with the line break tag - < br >
- In r_notebook_test:
- Create citations for the educational attainment data from the assignment 10
- Make the citations look like below:
- Data Sources & Headers are 4th level headers
- Variables are bold
- Table names are italicized
- tidycensus lnks to https://walker-data.com/tidycensus/
- lists of variables use bullet points
Read through the lesson from today again. Complete the R Notebook of the educational attainment assignment and publish it to RPubs. Make sure that you make your Rmd downloadable!
Submit the link to your RPubs web page on Canvas.
Read United, Chapter One of Minor Feelings by Cathy Park Hong
Look through all of the R Markdown resources: