# this line specifies options for default options for all R Chunksknitr::opts_chunk$set(echo=T, highlight=T)# suppress scientific notationoptions(scipen=100,getSymbols.warning4.0 =FALSE)# install helper package (pacman), if neededif (!require("pacman")) install.packages("pacman", repos ="http://lib.stat.cmu.edu/R/CRAN/")
Loading required package: pacman
# install and load required packages# pacman should be first package in parentheses and then list otherspacman::p_load(pacman,tidyverse, knitr, gt, tidyquant)# verify packages (comment out in finished documents)p_loaded()
Colleague should be able to follow memo to update dashboard quickly and seamlessly when new data are available.
I (or TAs) will follow memo and verify that instructions are clear, links are functional, and I can update dashboard based on this memo. when new data are available.
Questions about Project and Templates?
R Markdown (.Rmd) and Quarto (.qmd) formats
RStudio is currently in transition
Documents can be rendered from R Markdown (.Rmd) or Quarto (.qmd)
HTML, Word documents, PDf
R Markdown will be available for forseeable future
Download this document and save it for when you have to apply for jobs and answer questions about your skillset.
Other companies are quickly developing tutorial training too (some are good)
Sharing and Collaborating - GitHub vs. RPubs
Last week I introduced you to Rpubs which is ideal from sharing a dashboard.
Alternatively, you may have already come across GitHub in searching for files or a package.
Slides for this course are stored on GitHub
Required for files where data, code and text are maintained together as a project, referred to as a repository or repo.
Not required for finished dashboard.
GitHub is an online code sharing and code development platform.
Many R packages start as development code on GitHub and over time they are refined and published.
More about GitHub
Once you create free account, you can learn more about how it works in this tutorial.
Collaborative coding is common on GitHub but is a little more complex than working on a shared drive.
Developers of games, R packages, other software, etc., have huge code files and need to protect them.
There is a system in place (version control) where people can create a project with multiple code versions and edits. Over time a project develops more and more branches, like a tree, but the trunk.
Original code is preserved and changes can be incorporated as they are verified and approved.
Some GitHub links
Some tutorial links for collaborative coding on GitHub:
Address some submitted questions about Quarto and R Markdown
Tips for a Better Dashboard
Tables with GT
Five (more) minutes for evaluations
In-class work time
Tips for a Better Dashboard
A good rule for this project (every project):
Edit yourself
You may have a lot to say and show, BUT always consider:
What can you present WELL in the space an time you have?
An important skill to develop is the ability to filter the data to a representative subset, especially for a visualization.
Think about what data can be presented well and what the audience can digest.
Don’t try to show too much in any one plot or panel.
You can always show other parts of the data by
creating more panels
creating multi-plot grids to show other part of the data
Tips for a Better Dashboard
At each stage of the process, take a step back and examine each dashboard panel as if you are seeing it with fresh eyes.
As you do that, as yourself these questions:
Does this panel achieve a specific goal?
Is the point of this panel clear?
If not, how can I edit or augment this panel to clarify what it is showing?
Are the text and symbols readable from any distance?
What can I simplify to make it more clear?
Are there aspects I think are important that might not be obvious?
If so, how can I modify the panel or dashboard to highlight the key aspects?
When in doubt, ask someone like a roommate to look it and ask if they understand what you hope to convey.
Tables in Dashboards and Documents
This course focuses mostly on data visualizations, with a few table summaries.
Creating a more complex table, may be useful to you and could be included as an additional main panel in your dashboard instead of side panel.
Not required, but it may be helpful.
Even if you don’t use gt tables in your dashboard, they will definitely be helpful if you use R and RStudio to manage, analyze, and document data in the future.
If you have not already done so, please rerun the setup for this lecture, which now includes the gt package and the package.
The gt website has many examples with detailed step by step instructions.
Notes about examples on gt website:
Examples use the older pipe notation %>% which is identical in function to |>.
Examples also use different commands to complete the same data management tasks covered in BUA 455.
If you are unsure about a command you come across ask me, or google it. The R help index is also good.
e.g. ?glue or ?glue::glue is the glue command in glue package
Tables in Dashboards and Documents
First let’s look at what can be customized:
Example: Importing, Summarizing and Displaying Stock Data
snp22 <- GSPC |>fortify.zoo() |>as_tibble(.name_repair ="minimal") |>rename("date"="Index") |>mutate(mnth=month(date))|>group_by(mnth) |>filter(date==max(date)) |># filtered data to last day of each monthungroup() |>select(-c(6,8)) # remove volume and month variablesnames(snp22)[2:6] <-c("Open", "High", "Low", "Close", "Adjusted")head(snp22,3)
# A tibble: 3 × 6
date Open High Low Close Adjusted
<date> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2022-01-31 4432. 4517. 4414. 4516. 4516.
2 2022-02-28 4354. 4389. 4315. 4374. 4374.
3 2022-03-31 4599. 4603. 4530. 4530. 4530.
Example: Importing, Summarizing and Displaying Stock Data
The code below does not make a new tibble or data frame
Instead it create a formatted table object that can be output as a .png file like a plot.
kable also creates a formatted table object, but gt has more features.
As with plots, I am saving this object and then displaying it afterwards.
snp_fmt <- snp22 |>gt(rowname_col ="date") |>tab_header(title ="S&P 500",subtitle ="Last Day of Each Month in 2022") |>tab_stubhead(label ="Date") |>fmt_date(columns=date, date_style=3) |># formats datefmt_currency(columns=Open:Adjusted, currency ="USD")|># formats values as US$tab_footnote("Data Source: https://finance.yahoo.com") |>tab_footnote("Symbol: ^GSPC")
In a dashboard or document, the code would be hidden, but it is shown here:
snp_fmt # code to display created table
S&P 500
Last Day of Each Month in 2022
Date
Open
High
Low
Close
Adjusted
Mon, Jan 31, 2022
$4,431.79
$4,516.89
$4,414.02
$4,515.55
$4,515.55
Mon, Feb 28, 2022
$4,354.17
$4,388.84
$4,315.12
$4,373.94
$4,373.94
Thu, Mar 31, 2022
$4,599.02
$4,603.07
$4,530.41
$4,530.41
$4,530.41
Fri, Apr 29, 2022
$4,253.75
$4,269.68
$4,124.28
$4,131.93
$4,131.93
Tue, May 31, 2022
$4,151.09
$4,168.34
$4,104.88
$4,132.15
$4,132.15
Thu, Jun 30, 2022
$3,785.99
$3,818.99
$3,738.67
$3,785.38
$3,785.38
Fri, Jul 29, 2022
$4,087.33
$4,140.15
$4,079.22
$4,130.29
$4,130.29
Wed, Aug 31, 2022
$4,000.67
$4,015.37
$3,954.53
$3,955.00
$3,955.00
Fri, Sep 30, 2022
$3,633.48
$3,671.44
$3,584.13
$3,585.62
$3,585.62
Mon, Oct 31, 2022
$3,881.85
$3,893.73
$3,863.18
$3,871.98
$3,871.98
Data Source: https://finance.yahoo.com
Symbol: ^GSPC
Editing a Table for a Presentation
A table in dashboard side panel should be small.
Previous table would not fit, but we can select variables and shorten the date.
Limit the table to key variables that highlight important characteristics of your data.
Other variables, e.g. Open, Low, can be shown in plot
(snp_sm <- snp22 |>select(1,3,6) |>gt(rowname_col ="date") |>tab_header(title ="S&P 500",subtitle ="2022 - Last Day of Each Month") |>tab_stubhead(label ="Date") |>fmt_date(columns=date, date_style=6) |>fmt_currency(columns=High:Adjusted, currency ="USD")|>tab_footnote("Source: https://finance.yahoo.com") |>tab_footnote("Symbol: ^GSPC"))
S&P 500
2022 - Last Day of Each Month
Date
High
Adjusted
Jan 31, 2022
$4,516.89
$4,515.55
Feb 28, 2022
$4,388.84
$4,373.94
Mar 31, 2022
$4,603.07
$4,530.41
Apr 29, 2022
$4,269.68
$4,131.93
May 31, 2022
$4,168.34
$4,132.15
Jun 30, 2022
$3,818.99
$3,785.38
Jul 29, 2022
$4,140.15
$4,130.29
Aug 31, 2022
$4,015.37
$3,955.00
Sep 30, 2022
$3,671.44
$3,585.62
Oct 31, 2022
$3,893.73
$3,871.98
Source: https://finance.yahoo.com
Symbol: ^GSPC
Project Questions
The rest of class time can be used for group projects.
Let me know TODAY if you would prefer to present on Tuesday.
Group presentation days and times will be random assigned and posted by Friday (12/1)
Key Points from Week 13
Project Info
Two Memos - Information, Templates, & Examples provided
Taking advantage of RStudio
R Markdown and Quarto
Data management and reporting are seamless.
Can combine R chunks with PYthon, SQL etc.
Github and Rpubs
For large projects, Github is essential
For BUA 455, Rpubs is ideal
Links for Learning More
Data Camp White Paper about Skillset
You may submit an ‘Engagement Question or Comment’ about Week 13 lectures until Thursday, 12/1, at midnight on Blackboard.