February 7, 2017

Class overview

  • Test ability to create PDF
  • Refresh R skills with 2 exercises
  • Develop skills to create and format 1-page PDF document
  • Build your own PDF

Your project for the course

  • Create a 1-page PDF document
    • Include a title
    • Include at least 1 chart
    • Include at least 1 table
    • Submit the 1-page PDF to me by email on or before February 16, 2018

Brief overview of Markdown + knitr

  • Markdown is ‘an authoring framework for data science’
  • Multiple markdown languages exist
  • We will use R Markdown
  • R Markdown files integrate well with HTML, CSS, & Pandoc Markdown
  • We will use a little of each when it makes development of your PDF easier
  • For our purposes, knitr is a button at the top of the R Studio window that compiles your document
    • knitr is magical and so much more than just a button
  • Resources

Test ability to create PDF

  • Open a new .Rmd file (File > New File > R Markdown)
    • Select PDF from ‘Document’
  • Save the file you just created
  • Click ‘Knit’ to create PDF at the top of the R Studio window (look for the blue yarn)
  • The document should render in the R Studio viewer
  • The PDF file should appear in the same location you saved your .Rmd file

Test ability to create PDF

Did you run into problems? If so…

  1. Confirm you installed a Tex compiler (MacTex on iOS and MikTex on Windows). Install now if you haven’t already.
  2. If prompted, install Tex packages
  3. Try reinstalling the packages below
  4. Ask for additional help
install.packages(c('rmarkdown', 'knitr'))

Exercises

Exercise 1

  • Answer the following question
    • In donor, what is the average donation and largest donation amounts for contributors employed by the employers in the string below?
    • c('MICROSOFT', 'THE COCA-COLA COMPANY', 'NUCOR STEEL SEATTLE, INC.', 'FARMERS INSURANCE', 'UNIVERSITY OF WASHINGTON')
  • Your output should be a dataframe
  • You will need to use filter and group_by() %>% summarise()
  • These employer name values are in the contributor_employer_name variable

Hint

Exercise 1

  • Answer the following question
    • In donor, what is the average donation and largest donation amounts for contributors employed by the employers in the string below?
    • c('MICROSOFT', 'THE COCA-COLA COMPANY', 'NUCOR STEEL SEATTLE, INC.', 'FARMERS INSURANCE', 'UNIVERSITY OF WASHINGTON')
  • Your output should be a dataframe
  • You will need to use filter and group_by() %>% summarise()
  • These employer name values are in the contributor_employer_name variable

Hint

donor %>% 
    filter(contributor_employer_name %in% c('MICROSOFT', 'THE COCA-COLA COMPANY'
                                            , 'NUCOR STEEL SEATTLE, INC.'
                                            , 'FARMERS INSURANCE' 
                                            , 'UNIVERSITY OF WASHINGTON')) %>% 
    group_by() %>% 
    summarise()

Exercise 1

  • Answer the following question
    • In donor, what is the average donation and largest donation amounts for contributors employed by the employers in the string below?
    • c('MICROSOFT', 'THE COCA-COLA COMPANY', 'NUCOR STEEL SEATTLE, INC.', 'FARMERS INSURANCE', 'UNIVERSITY OF WASHINGTON')
  • Your output should be a dataframe
  • You will need to use filter and group_by() %>% summarise()
  • These employer name values are in the contributor_employer_name variable

Hint

donor %>% 
    filter(contributor_employer_name %in% c('MICROSOFT', 'THE COCA-COLA COMPANY'
                                            , 'NUCOR STEEL SEATTLE, INC.'
                                            , 'FARMERS INSURANCE' 
                                            , 'UNIVERSITY OF WASHINGTON')) %>% 
    group_by(contributor_employer_name) %>% 
    summarise(
        avg_amount = mean(amount)
        , max_amount = max(amount)
        ) 

Exercise 1

donor %>% 
    filter(contributor_employer_name %in% c('MICROSOFT', 'THE COCA-COLA COMPANY'
                                            , 'NUCOR STEEL SEATTLE, INC.'
                                            , 'FARMERS INSURANCE' 
                                            , 'UNIVERSITY OF WASHINGTON')) %>% 
    group_by(contributor_employer_name) %>% 
    summarise(avg_amount = mean(amount), max_amount = max(amount))

Exercise 1

donor %>% 
    filter(contributor_employer_name %in% c('MICROSOFT', 'THE COCA-COLA COMPANY'
                                            , 'NUCOR STEEL SEATTLE, INC.'
                                            , 'FARMERS INSURANCE' 
                                            , 'UNIVERSITY OF WASHINGTON')) %>% 
    group_by(contributor_employer_name) %>% 
    summarise(avg_amount = mean(amount), max_amount = max(amount))
## # A tibble: 5 x 3
##   contributor_employer_name avg_amount max_amount
##   <fctr>                         <dbl>      <dbl>
## 1 FARMERS INSURANCE              27.1       182  
## 2 MICROSOFT                     387        1400  
## 3 NUCOR STEEL SEATTLE, INC.       1.49       10.0
## 4 THE COCA-COLA COMPANY           9.75       40.4
## 5 UNIVERSITY OF WASHINGTON      144         500

Exercise 2

  • Answer one or both of the questions below
    • In the subset of contributors data analyzed in Exercise 1, which party received the most contributions?
    • To what extent is there a relationship between between party and a geography variable (location, city, county, region, etc.) for the subset of contributors from Exercise 1?
  • Create a visualization that helps answer the question(s)
  • To subset the data, use filter
    • Use the same filter code to subset the data as in Exercise 1
    • You can create a new data object with <- if it helps you build your data visualization
  • Update NA values so that they are Unknown
    • Use ifelse() in the mutate() function

Exercise 2

  • Answer one or both of the questions below
    • In the subset of contributors data analyzed in Exercise 1, which party received the most contributions?
    • To what extent is there a relationship between between party and a geography variable (location, city, county, region, etc.) for the subset of contributors from Exercise 1?
  • Create a visualization that helps answer the question(s)

Hint

donor_sds <- donor %>% 
    filter(contributor_employer_name %in% c('MICROSOFT', 'THE COCA-COLA COMPANY'
                                            , 'NUCOR STEEL SEATTLE, INC.'
                                            , 'FARMERS INSURANCE' 
                                            , 'UNIVERSITY OF WASHINGTON')) 

Exercise 2

  • Answer one or both of the questions below
    • In the subset of contributors data analyzed in Exercise 1, which party received the most contributions?
    • To what extent is there a relationship between between party and a geography variable (location, city, county, region, etc.) for the subset of contributors from Exercise 1?
  • Create a visualization that helps answer the question(s)

Hint

donor_sds <- donor %>% 
    filter(contributor_employer_name %in% c('MICROSOFT', 'THE COCA-COLA COMPANY'
                                            , 'NUCOR STEEL SEATTLE, INC.'
                                            , 'FARMERS INSURANCE' 
                                            , 'UNIVERSITY OF WASHINGTON')) 
                                        
donor_sds %>% ggplot(aes()) + 

Exercise 2

In the subset of contributors data analyzed in Exercise 1, which party received the most contributions?

donor_sds <- donor %>% 
    filter(contributor_employer_name %in% c('MICROSOFT', 'THE COCA-COLA COMPANY'
                                            , 'NUCOR STEEL SEATTLE, INC.'
                                            , 'FARMERS INSURANCE' 
                                            , 'UNIVERSITY OF WASHINGTON')) %>%
    mutate(party = ifelse(party %in% NA, 'Unknown', as.character(party)))

donor_sds %>% ggplot(aes(party)) + geom_bar() + theme_economist() + 
  labs(x = '', y = 'Donations (#)')

Exercise 2

In the subset of contributors data analyzed in Exercise 1, which party received the most contributions?

Exercise 2

To what extent is there a relationship between between party and a geography variable (location, city, county, region, etc.) for the subset of contributors from Exercise 1?

donor_sds <- donor %>% 
    filter(contributor_employer_name %in% c('MICROSOFT', 'THE COCA-COLA COMPANY'
                                            , 'NUCOR STEEL SEATTLE, INC.'
                                            , 'FARMERS INSURANCE' 
                                            , 'UNIVERSITY OF WASHINGTON')) %>%
    mutate(party = ifelse(party %in% NA, 'Unknown', as.character(party))) %>%
    filter(! party %in% 'Unknown')

donor_sds %>% ggplot(aes(x= contributor_state, fill = party)) + 
  geom_bar() + theme_economist() + labs(x = 'State', y = '')

Exercise 2

To what extent is there a relationship between between party and a geography variable (location, city, county, region, etc.) for the subset of contributors from Exercise 1?

Anatomy of your .Rmd

Anatomy of your .Rmd

  • YAML or front-matter
  • Code chunks
  • Code for formatting (Markdown and LaTex)
  • Text

Anatomy of your .Rmd

  • YAML or front-matter
    • Use YAML to make universal configurations
    • Very top of the document between two sets of dashes (---)
  • Code chunks
  • Code for formatting (Markdown and LaTex)
  • Text

YAML

---
title: "What a great title!"
output: pdf_document
---

Anatomy of your .Rmd

  • YAML or front-matter
  • Code chunks
    • Transform data and build tables and visualizations within code chunks
    • Remember to use the 3 backward apostrophes ``` and { } in chunk header
  • Code for formatting (Markdown and LaTex)
  • Text

Code chunks

```{r, warning=FALSE, message=FALSE} library(tidyverse) # load libraries police <- read.csv('https://goo.gl/nNAuDy') # read your data ```

Anatomy of your .Rmd

  • YAML or front-matter
  • Code chunks
  • Code for formatting (Markdown and LaTex)
    • Format text with simple code
  • Text

Code for formatting (Markdown and LaTex)

# Use hashes for headers

**List (bolded)**
- Bullet 1
- Bullet 2

Text

Text size

  • Header
  • Body

Text size

  • Header
    • Write headers with hashes
    • Fewer hashes produces a larger header
  • Body
# Header
## Header
### Header
#### Header
##### Header
###### Header

Text size

  • Header
  • Body
    • Add fontsize to the YAML
    • Font size options are limited when developing PDF docs
    • Only recognizes inputs of 10pt, 11pt, and 12pt
    • fontsize variable affects all text elements
---
title: "Foo"
output: pdf_document
fontsize: 12pt
---

Text color

  • Use LaTex code to change colors
  • LaTex code is required when creating a PDF
    • Other options are available when creating other file types (i.e. HTML)
  • Predefined colors
black, blue, brown, cyan, darkgray, gray, green, lightgray, lime, magenta,  
olive, orange, pink, purple, red, teal, violet, white, yellow
  • To change text colors use the following convention
Look at the \textcolor{red}{balloons}!

Look at the balloons!

Text alignment

  • Use LaTex code to align text
  • To align text use the following convention
  • Aligment options
flushleft, flushright, flushcenter
  • To change text alignment use the following convention
\begin{flushcenter}Look at the \textcolor{red}{balloons}!\end{flushcenter}

Look at the balloons!

Other tricks for text

Add a hyperlink

[Add a hyperlink](https://www.a.url.com)



Bold, italicize, and underline words

**Bold words**
*Italicize words*
\underline{Underline words}

Bullet points

  • Use -, +, or * to create an un-ordered list
  • Use numbers followed by a period to create an ordered list (1.)
  • Indent for sub-items
- Item 1
- Item 2
  - Sub-item 1
  - Sub-item 2
  • Item 1
  • Item 2
    • Sub-item 1
    • Sub-item 2

Font style

  • ‘Off the shelf’ options are limited for font style
  • Three options
    • Roman font: \textrm{...}
    • Sans serif font: \textsf{...}
    • Teletype font: \texttt{...}
Look at the \textcolor{red}{balloons}!  
\textsf{Look at the \textcolor{red}{balloons}!}  
\textrm{Look at the \textcolor{red}{balloons}!}  
\texttt{Look at the \textcolor{red}{balloons}!}  

Background color

  • Update the YAML
    • Note header-includes text and \usepackage text
  • Indicate the color of the page just below the YAML
---
title: "Foo"
output: pdf_document
fontsize: 12pt
header-includes:
- \usepackage{pagecolor}
---
\pagecolor{yellow}

Charts

Rendering charts

  • Create and format charts in code chunks just like in R Notebook
    • Following r in the braces is the name of the code chunk ({r plot})
    • Names must be unique
    • Also include echo=FALSE in braces. This hides your code in the output
    • You can divide code between two or more chunks if it is necessary
```{r plot, echo=FALSE} library(tidyverse) mpg %>% ggplot() + geom_point(aes(displ, hwy)) ```

Rendering charts

  • Create and format charts in code chunks just like in R Notebook
    • Following r in the braces is the name of the code chunk ({r plot})
    • Names must be unique
    • Also include echo=FALSE in braces. This hides your code in the output
    • You can divide code between two or more chunks if it is necessary

Chart size and alignment

  • Between braces use fig.width and fig.height
  • Start with values close to 4 and adjust accordingly
  • Use fig.align to align to move the chart to the left, right, or center
```{r plot, echo=FALSE, fig.width=7, fig.height=4, fig.align='right'} library(tidyverse) mpg %>% ggplot() + geom_point(aes(displ, hwy)) ```

Chart size and alignment

  • Between braces use fig.width and fig.height
  • Start with values close to 4 and adjust accordingly
  • Use fig.align to move the chart to the left, right, or center

Tables

knitr::kable() basics

  • Use kable() from knitr to create tables
  • Not all kable() functionality is available when creating PDFs
  • Include results = 'asis' in the chunk header or else the table will not appear
  • Create a table by calling a dataframe in the kable() function
```{r, echo = FALSE, warning=FALSE, message=FALSE, results = 'asis'} police %>% group_by(event_clearance_ampm) %>% summarise(n = n()) %>% kable(format = 'markdown') ```

knitr::kable() basics

  • Use kable() from knitr to create tables
  • Not all kable() functionality is available when creating PDFs
  • Include results = 'asis' in the chunk header or else the table will not appear
  • Create a table by calling a dataframe in the kable() function
event_clearance_ampm n
AM 9026
PM 897
NA 77

knitr::kable() arguments and functions by formats

arguments latex markdown pandoc
caption X X
digits X X X
align X X X
column_spec X
row_spec X
kable_styling X

knitr::kable() arguments and functions

arguments description
caption Title of the table
digits Number of decimal places for numbers
align Cell alignment (“l”, “r”, “c”)
column_spec Change column width, color, and other characteristics
row_spec Change row color, text angle, and other characteristics
kable_styling Change font size and placement of the table

Your project for the course

  • Create a 1-page PDF document
    • Include a title
    • Include at least 1 chart
    • Include at least 1 table
    • Submit the 1-page PDF to me by email on or before February 16, 2018