BUA 455 - Week 13

Author

Penelope Pooler Eisenbies

Published

December 1, 2022

# this line specifies options for default options for all R Chunks
knitr::opts_chunk$set(echo=T, 
                      highlight=T)

# suppress scientific notation
options(scipen=100,
        getSymbols.warning4.0 = FALSE)

# install helper package (pacman), if needed
if (!require("pacman")) install.packages("pacman", repos = "http://lib.stat.cmu.edu/R/CRAN/")
Loading required package: pacman
# install and load required packages
# pacman should be first package in parentheses and then list others
pacman::p_load(pacman,tidyverse, knitr, gt, tidyquant)

# verify packages (comment out in finished documents)
p_loaded()
 [1] "tidyquant"            "quantmod"             "TTR"                 
 [4] "PerformanceAnalytics" "xts"                  "zoo"                 
 [7] "lubridate"            "timechange"           "gt"                  
[10] "knitr"                "forcats"              "stringr"             
[13] "dplyr"                "purrr"                "readr"               
[16] "tidyr"                "tibble"               "ggplot2"             
[19] "tidyverse"            "pacman"              

Final Projects

Presentations are next week, Tue. 12/6 and Thu. 12/8

  • If your group wants to present on Tue. 12/6, let me know by Thursday.

    • Otherwise, random order will be posted on Friday.
  • Attendance required by all

  • Dress: Business casual with emphasis on casual

  • Suits, Ties, Dresses, and Jackets are NOT required

  • No sweats t-shirts or pjs

  • You will present better if you dress the part (at least a little)

  • All students should be prepared to answer questions about the work presented.

  • Each student will evaluate other groups and their own group members

  • All project components must be submitted by Tuesday, December 13th at 5:00 PM


Project Memos

  • Project Description - Memos are described on Page 5

  • Template for Memo to Supervisor

    • Supervisor Memo’s Goal:

      • Provide your supervisor with what they need
      • They will want to be knowledgeable about the data and dashboard, but have very limited time.
      • Predict questions they (supervisor) might have and questions a client might ask.
  • Template for memo to Colleague

    • Colleague Memo’s Goal:

      • Colleague should be able to follow memo to update dashboard quickly and seamlessly when new data are available.

      • I (or TAs) will follow memo and verify that instructions are clear, links are functional, and I can update dashboard based on this memo. when new data are available.

Questions about Project and Templates?


R Markdown (.Rmd) and Quarto (.qmd) formats

  • RStudio is currently in transition

  • Documents can be rendered from R Markdown (.Rmd) or Quarto (.qmd)

  • Presentations can be rendered from R Markdown (.Rmd) or Quart (.qmd)

    • Powerpoint

      • better for non-technical talks
    • Quarto Presentations (RevealJS) will replace Xaringan

      • These slides are Xaringan

      • Updated options will make better slides with more options

      • Xaringan and RevealJS are preferred for including code and output


Best way to learn Quarto

  • Examine Examples provided in R

  • Examine Examples in Quarto Gallery

    • click on code symbol </> to see the code used to create the documents or presentations

    • Examine and modify code for your document

    • Also use Google, website documentation, and stack overflow for questions


Resources - Where to go next

For all aspects of analytics and R and RStudio


Tutorials

  • As SU Students you also have free access to Linkedin Learning

    • Great tutorials in R, Python, SQL

    • Employers are likely to expect some familiarity with each.

    • R is most versatile and powerful

    • Employers may prefer Python, SQL, or another language/environment because that is what they know.

    • NOTE: Python, SQL, others can all be utilized through RStudio.

    • Different languages can be combined in one RMarkdown document in separate chunks.

  • DataCamp - Not Free, but Excellent.

    • Provides certificates of completion

    • Published this excellent document about data fluency

      • Download this document and save it for when you have to apply for jobs and answer questions about your skillset.
  • Other companies are quickly developing tutorial training too (some are good)


Sharing and Collaborating - GitHub vs. RPubs

  • Last week I introduced you to Rpubs which is ideal from sharing a dashboard.

  • Alternatively, you may have already come across GitHub in searching for files or a package.

    • Slides for this course are stored on GitHub

    • Required for files where data, code and text are maintained together as a project, referred to as a repository or repo.

    • Not required for finished dashboard.

  • GitHub is an online code sharing and code development platform.

  • Many R packages start as development code on GitHub and over time they are refined and published.


More about GitHub

  • Once you create free account, you can learn more about how it works in this tutorial.

  • Collaborative coding is common on GitHub but is a little more complex than working on a shared drive.

    • Developers of games, R packages, other software, etc., have huge code files and need to protect them.

    • There is a system in place (version control) where people can create a project with multiple code versions and edits. Over time a project develops more and more branches, like a tree, but the trunk.

    • Original code is preserved and changes can be incorporated as they are verified and approved.

Evaluations


Course Evaluation QR Code


Material Added on Wednesday 11/30

Plan for today

  • Address some submitted questions about Quarto and R Markdown

  • Tips for a Better Dashboard

  • Tables with GT

  • Five (more) minutes for evaluations

  • In-class work time


Tips for a Better Dashboard

A good rule for this project (every project):

Edit yourself

  • You may have a lot to say and show, BUT always consider:

    • What can you present WELL in the space an time you have?
  • An important skill to develop is the ability to filter the data to a representative subset, especially for a visualization.

  • Think about what data can be presented well and what the audience can digest.

  • Don’t try to show too much in any one plot or panel.

  • You can always show other parts of the data by

    • creating more panels
    • creating multi-plot grids to show other part of the data

Tips for a Better Dashboard

At each stage of the process, take a step back and examine each dashboard panel as if you are seeing it with fresh eyes.

  • As you do that, as yourself these questions:

  • Does this panel achieve a specific goal?

  • Is the point of this panel clear?

  • If not, how can I edit or augment this panel to clarify what it is showing?

  • Are the text and symbols readable from any distance?

  • What can I simplify to make it more clear?

    • Are there aspects I think are important that might not be obvious?

    • If so, how can I modify the panel or dashboard to highlight the key aspects?

  • When in doubt, ask someone like a roommate to look it and ask if they understand what you hope to convey.


Tables in Dashboards and Documents

  • This course focuses mostly on data visualizations, with a few table summaries.

  • Creating a more complex table, may be useful to you and could be included as an additional main panel in your dashboard instead of side panel.

    • Not required, but it may be helpful.
  • Even if you don’t use gt tables in your dashboard, they will definitely be helpful if you use R and RStudio to manage, analyze, and document data in the future.

  • If you have not already done so, please rerun the setup for this lecture, which now includes the gt package and the package.

  • The gt website has many examples with detailed step by step instructions.

  • Notes about examples on gt website:

    • Examples use the older pipe notation %>% which is identical in function to |>.

    • Examples also use different commands to complete the same data management tasks covered in BUA 455.

    • If you are unsure about a command you come across ask me, or google it. The R help index is also good.

      • e.g. ?glue or ?glue::glue is the glue command in glue package

Tables in Dashboards and Documents

First let’s look at what can be customized:


Example: Importing, Summarizing and Displaying Stock Data

getSymbols("^GSPC", from="2022-01-01", to="2022-11-01") 
[1] "^GSPC"
snp22 <- GSPC |>
  fortify.zoo() |> as_tibble(.name_repair = "minimal") |>
  rename("date" = "Index") |>
  mutate(mnth=month(date))|>
  group_by(mnth) |>
  filter(date==max(date)) |>  # filtered data to last day of each month
  ungroup() |>
  select(-c(6,8))    # remove volume and month variables
names(snp22)[2:6] <- c("Open", "High", "Low", "Close", "Adjusted")
head(snp22,3)
# A tibble: 3 × 6
  date        Open  High   Low Close Adjusted
  <date>     <dbl> <dbl> <dbl> <dbl>    <dbl>
1 2022-01-31 4432. 4517. 4414. 4516.    4516.
2 2022-02-28 4354. 4389. 4315. 4374.    4374.
3 2022-03-31 4599. 4603. 4530. 4530.    4530.

Example: Importing, Summarizing and Displaying Stock Data

  • The code below does not make a new tibble or data frame

  • Instead it create a formatted table object that can be output as a .png file like a plot.

    • kable also creates a formatted table object, but gt has more features.
  • As with plots, I am saving this object and then displaying it afterwards.

snp_fmt <- snp22 |>
  gt(rowname_col = "date") |>
  tab_header(title = "S&P 500",
             subtitle = "Last Day of Each Month in 2022") |>
  tab_stubhead(label = "Date") |>
  fmt_date(columns=date, date_style=3) |>  # formats date
  fmt_currency(columns=Open:Adjusted, currency = "USD")|>     # formats values as US$
  tab_footnote("Data Source: https://finance.yahoo.com") |>
  tab_footnote("Symbol: ^GSPC")

In a dashboard or document, the code would be hidden, but it is shown here:

snp_fmt         # code to display created table
S&P 500
Last Day of Each Month in 2022
Date Open High Low Close Adjusted
Mon, Jan 31, 2022 $4,431.79 $4,516.89 $4,414.02 $4,515.55 $4,515.55
Mon, Feb 28, 2022 $4,354.17 $4,388.84 $4,315.12 $4,373.94 $4,373.94
Thu, Mar 31, 2022 $4,599.02 $4,603.07 $4,530.41 $4,530.41 $4,530.41
Fri, Apr 29, 2022 $4,253.75 $4,269.68 $4,124.28 $4,131.93 $4,131.93
Tue, May 31, 2022 $4,151.09 $4,168.34 $4,104.88 $4,132.15 $4,132.15
Thu, Jun 30, 2022 $3,785.99 $3,818.99 $3,738.67 $3,785.38 $3,785.38
Fri, Jul 29, 2022 $4,087.33 $4,140.15 $4,079.22 $4,130.29 $4,130.29
Wed, Aug 31, 2022 $4,000.67 $4,015.37 $3,954.53 $3,955.00 $3,955.00
Fri, Sep 30, 2022 $3,633.48 $3,671.44 $3,584.13 $3,585.62 $3,585.62
Mon, Oct 31, 2022 $3,881.85 $3,893.73 $3,863.18 $3,871.98 $3,871.98
Data Source: https://finance.yahoo.com
Symbol: ^GSPC

Editing a Table for a Presentation

  • A table in dashboard side panel should be small.

  • Previous table would not fit, but we can select variables and shorten the date.

  • Limit the table to key variables that highlight important characteristics of your data.

  • Other variables, e.g. Open, Low, can be shown in plot

(snp_sm <- snp22 |>
select(1,3,6) |>
  gt(rowname_col = "date") |>
  tab_header(title = "S&P 500",
             subtitle = "2022 - Last Day of Each Month") |>
  tab_stubhead(label = "Date") |>
  fmt_date(columns=date, date_style=6) |>                   
  fmt_currency(columns=High:Adjusted, 
               currency = "USD")|>     
  tab_footnote("Source: https://finance.yahoo.com") |>
  tab_footnote("Symbol: ^GSPC"))
S&P 500
2022 - Last Day of Each Month
Date High Adjusted
Jan 31, 2022 $4,516.89 $4,515.55
Feb 28, 2022 $4,388.84 $4,373.94
Mar 31, 2022 $4,603.07 $4,530.41
Apr 29, 2022 $4,269.68 $4,131.93
May 31, 2022 $4,168.34 $4,132.15
Jun 30, 2022 $3,818.99 $3,785.38
Jul 29, 2022 $4,140.15 $4,130.29
Aug 31, 2022 $4,015.37 $3,955.00
Sep 30, 2022 $3,671.44 $3,585.62
Oct 31, 2022 $3,893.73 $3,871.98
Source: https://finance.yahoo.com
Symbol: ^GSPC

Project Questions

  • The rest of class time can be used for group projects.

  • Let me know TODAY if you would prefer to present on Tuesday.

  • Group presentation days and times will be random assigned and posted by Friday (12/1)


Key Points from Week 13

  • Project Info

    • Two Memos - Information, Templates, & Examples provided
  • Taking advantage of RStudio

    • R Markdown and Quarto
      • Data management and reporting are seamless.
      • Can combine R chunks with PYthon, SQL etc.
  • Github and Rpubs

    • For large projects, Github is essential
    • For BUA 455, Rpubs is ideal
  • Links for Learning More

  • Data Camp White Paper about Skillset

You may submit an ‘Engagement Question or Comment’ about Week 13 lectures until Thursday, 12/1, at midnight on Blackboard.