library(tidyverse)
library(tidyquant)
library(lubridate)
library(dplyr)
library(ggplot2)
library(readxl)
library(esquisse)
library(here)
library(janitor)
library(ggthemes)
library(ggrepel)
library(gt)
library(viridis)
library(hrbrthemes)
library(RColorBrewer)
library(timetk)Session 1
Week 1: RStudio and publishing your first Quarto document
Agenda
- Who these sessions are designed for
- mainly those with no programming background
- A tour of R-Studio
- Basic terminology: objects and data types
- Libraries (and how to import them)
- Executing code
- Typing in the script vs the console
- The importance of the “global environment” concept
- Importing data
- Tidyverse vs Base R
Terminology
It’s important to understand some basic terminology, specifically objects and data types in R, so that you understand how those objects interact with R code, how they should be manipulated, and what people are talking about online when you go asking for help.
Objects
- Objects in R are what we call variables in other programming languages. They are instances of a class. A vector, a matrix, a dataframe (DF), a list, an array, or a factor are all objects in R. Everything in R is an object
- Things to keep in mind when creating an object:
- Should be short and explicit
- Should not start with a number ex. 5Data. R will not recognize this
- “Data” is different than “data” as R is case sensitive
- Do not include spaces in the name of your object. R considers this as separate text
- Avoid using punctuation other than a period (.), dash (-), or underscore ( _ )
- The names of default fundamental functions should not be used as object names in R. eg: function, if, else, repeat, etc.
# Creating an object
distance_km <- 134
# Print the object
distance_km[1] 134
dice <- c(1,2,3,4,5,6)Important concept: to permanently change an object, you need to “save” it as a new object or to the same name as the current object. There are times where we might want to manipulate the data for data vis purposes but do not want to permanently change the underlying DF. In those instances we would not save to a new object.
Example:
str0 <- "2012-03-15"
as.Date(str0, format = "%Y-%m-%d")[1] "2012-03-15"
class(str0)[1] "character"
# We tried to change the data type but it still reads as zero. Need to save it to a new object or the name of the same one.
str0 <- as.Date(str0, format = "%Y-%m-%d")
class(str0)[1] "Date"
Data types
Character (chr)
- Used to specify character or string values in a variable. In programming, a string is a set of characters.
- “Apple” - note the quotations around it. If a variable is labeled a character, you need to include these.
apple <- "Apple"
apple[1] "Apple"
class(apple)[1] "character"
Double (dbl); also known as numeric
- stores regular numbers
class(dice)[1] "numeric"
Integers (int)
- stores numbers that can be written without a decimal point + Do not worry about the difference between these two. R converts freely between them as needed based on the numbers and operations you are passing it.
numbers <- c(1L,2L,3L)
numbers[1] 1 2 3
class(numbers)[1] "integer"
Logicals
- Any time you type TRUE or FALSE, in capital letter (w/out quotation marks), R will treat the input as logical data
logic <- c(TRUE,FALSE)
logic[1] TRUE FALSE
class(logic)[1] "logical"
- Date (dte), POSIXlt, and POSIXct
- dates can be tricky to work with and will typically be imported as strings (character). The problem with that is when we want to manipulate the dates in the DF for data vis, having it classified as a character will usually reject the command.
- there are three basic date and time classes: Date, POSIXlt, and POSIXct. Class Date handles dates w/out times. POSIXct (calendar time) and POSIXlt (local time) represent dates and times (hours, minutes, seconds).
str1 <- "2012-03-15"
date1 <- as.Date(str1, format = "%Y-%m-%d")
date1[1] "2012-03-15"
class(date1)[1] "Date"
date_string <- "2019-01-14 14:17:30"
date_string <- as.POSIXct(date_string)
class(date_string)[1] "POSIXct" "POSIXt"
Libraries and packages
# Library stack -> this is a set of libraries I typically load, mainly because I'm too lazy to retype as needed so just copy paste from previous work. You can dump all this into your file, but it will be useful to load individually so you understand what features map to what library, as needed.
# You will hear the professor talk about different packages like Tidyr and tidyquant. Think of the package as the book you buy and put on your bookshelf. This is what happens when you install the package. When you use the library function, you're telling R to go to our library and open the tidyverse package for use.
library(tidyverse)
library(tidyquant)
library(lubridate)
library(dplyr)
library(ggplot2)
library(readxl)
library(esquisse)
library(here)
library(janitor)
library(ggthemes)
library(ggrepel)
library(gt)
library(viridis)
library(hrbrthemes)
library(RColorBrewer)
library(timetk)
# Question: I tried to load a library and R said it doesn't recognize it. Why?
# Answer: This is probably because you either A) spelled it wrong or B) do not have the package installed. Let's say this happened with lubridate. Fix by installing the package as follows:
#install.packages("lubridate") # note that we need to put this in quotations as the install.packages function expects us to pass it a string rather than number or object (more on this in a bit)Importing Data
# Let's now import a dataset for us to work with:
swaps <- read.csv("001_data//Fed_swaps.csv")
# Observe how this populates in the "Environment" tab in the upper right corner. We can click on it to what the imported data looks like in a visually appealing format. This is a fairly clean data set. We could start working on this with minimal cleaning. We're not going to go into that today, but in subsequent session we'll work with some messier data to show how to get it into a "tidy" format.
# If I call the object by typing it's name in, we can also see what it looks like as output in the quarto document, as show below.
# swaps
# If we use glimpse(), it displays in the console and shows the columns as rows. This is useful for allowing us to see all the column headers at once. Notice how when we call the object as just "swaps", it displays 8 of the 11 columns up front but we have to scroll right. There are a lot of commands in R that allow us to look at the dataset from various angles to understand what data we're working with and how to best clean it. We'll dive into exploratory analysis more in later sessions.
glimpse(swaps)Rows: 529
Columns: 11
$ Operation.Type <chr> "U.S. Dollar Liquidity Swap", "U.S. Dollar Liquidity S…
$ Counterparty <chr> "European Central Bank", "European Central Bank", "Ban…
$ Currency <chr> "USD", "USD", "USD", "USD", "USD", "USD", "USD", "USD"…
$ Trade.Date <chr> "12/21/2022", "12/14/2022", "12/13/2022", "12/7/2022",…
$ Settlement.Date <chr> "12/22/2022", "12/15/2022", "12/15/2022", "12/8/2022",…
$ Maturity.Date <chr> "1/5/2023", "12/22/2022", "12/22/2022", "12/15/2022", …
$ Term..days. <int> 14, 7, 7, 7, 7, 7, 6, 8, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7,…
$ Amount <dbl> 412200000, 209000000, 1000000, 205500000, 1000000, 199…
$ Interest.Rate <dbl> 4.59, 4.59, 4.53, 4.09, 4.09, 4.08, 4.08, 4.08, 4.08, …
$ isSmallValue <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", ""…
$ Last.Updated <chr> "12/22/2022 16:00", "12/15/2022 16:00", "12/15/2022 16…
# Problem: I call a function that I've seen you use before but R tells me it can't find it for use.
# Answer: If you're sure you're spelling/using it correctly, it's likely you haven't loaded the correct library. Do a google search for the function and what library it maps to and see if you have it listed in the code. If it is listed, make sure you've actually run the code so R knows it's active.
# importing from excel with multiple sheets.
dw19q1 <- read_excel("001_data//dw_data_2019-2020.xlsx", sheet = "19Q1", skip = 3)
dw19q1# A tibble: 553 × 28
`Loan date` `Maturity date` Term `Repayment date` Lending F…¹
<dttm> <dttm> <dbl> <dttm> <chr>
1 2019-01-02 00:00:00 2019-01-03 00:00:00 1 2019-01-03 00:00:00 New York (…
2 2019-01-02 00:00:00 2019-01-03 00:00:00 1 2019-01-03 00:00:00 Chicago (7)
3 2019-01-02 00:00:00 2019-01-03 00:00:00 1 2019-01-03 00:00:00 Chicago (7)
4 2019-01-02 00:00:00 2019-01-03 00:00:00 1 2019-01-03 00:00:00 Chicago (7)
5 2019-01-02 00:00:00 2019-02-01 00:00:00 30 2019-01-04 00:00:00 Minneapoli…
6 2019-01-02 00:00:00 2019-01-03 00:00:00 1 2019-01-03 00:00:00 Kansas Cit…
7 2019-01-02 00:00:00 2019-01-03 00:00:00 1 2019-01-03 00:00:00 San Franci…
8 2019-01-03 00:00:00 2019-01-04 00:00:00 1 2019-01-04 00:00:00 Boston (1)
9 2019-01-03 00:00:00 2019-01-04 00:00:00 1 2019-01-04 00:00:00 Chicago (7)
10 2019-01-03 00:00:00 2019-01-04 00:00:00 1 2019-01-04 00:00:00 Chicago (7)
# … with 543 more rows, 23 more variables: Borrower <chr>,
# `Borrower city` <chr>, `Borrower state` <chr>, `Borrower ABA number` <chr>,
# `Type of credit` <chr>, `Interest rate` <dbl>, `Loan amount` <dbl>,
# `Other outstanding loans` <dbl>, `Total outstanding loans` <dbl>,
# `Total collateral1` <dbl>, `Commercial loans1` <dbl>,
# `Residential mortgages1` <dbl>, `Commercial real estate loans1` <dbl>,
# `Consumer loans1` <dbl>, `U.S. Treasury/Agency securities1` <dbl>, …
Tidyverse and other packages vs Base R
Tidyverse is essentially a package of packages (loading it loads ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, and forcats) that provides a series of tools for working with and manipulating data with better flow and visually appealing code. It provides a better interface that’s similar to Python and looks much cleaner. It is important to understand the difference so you do not get confused when trying to figure out how to do something online. Generally for every command in Tidyr, there’s a way to also do it in Base R, though it might not be as clean looking or intuitive.
# Tidyr and dplyr method
swaps %>%
select(c(Trade.Date,Amount)) %>%
filter(Amount > 300000000) Trade.Date Amount
1 12/21/2022 412200000
2 6/29/2022 346500000
3 3/30/2022 365500000
4 3/16/2022 308000000
5 1/5/2022 313500000
6 12/22/2021 939500000
7 12/16/2020 3134200000
8 12/16/2020 514000000
9 10/6/2020 330000000
10 9/29/2020 800000000
11 9/23/2020 670000000
12 9/15/2020 742000000
13 9/9/2020 585000000
14 9/9/2020 640000000
15 9/1/2020 355000000
16 8/25/2020 673000000
17 8/18/2020 405000000
18 8/18/2020 1523000000
19 8/11/2020 2752000000
20 8/5/2020 418000000
21 8/4/2020 3647000000
22 8/4/2020 370000000
23 7/28/2020 4023000000
24 7/28/2020 770000000
25 7/22/2020 457000000
26 7/20/2020 500000000
27 7/14/2020 11860000000
28 7/14/2020 6970000000
29 7/13/2020 1370000000
30 7/8/2020 343500000
31 7/7/2020 19607000000
32 7/7/2020 7910000000
33 7/1/2020 345000000
34 6/30/2020 21182000000
35 6/30/2020 6372000000
36 6/24/2020 1350000000
37 6/23/2020 21303000000
38 6/23/2020 16016000000
39 6/17/2020 3194800000
40 6/16/2020 23114000000
41 6/16/2020 17044000000
42 6/10/2020 480000000
43 6/9/2020 7933000000
44 6/9/2020 15890000000
45 6/3/2020 495000000
46 6/2/2020 2747000000
47 6/2/2020 1355000000
48 5/27/2020 1501000000
49 5/27/2020 1510000000
50 5/26/2020 5250000000
51 5/20/2020 442000000
52 5/20/2020 600000000
53 5/19/2020 9292000000
54 5/19/2020 2373000000
55 5/13/2020 791600000
56 5/13/2020 3245000000
57 5/12/2020 9489000000
58 5/12/2020 2890000000
59 5/7/2020 400000000
60 5/6/2020 1795000000
61 5/6/2020 2291500000
62 5/4/2020 1721300000
63 4/30/2020 2042000000
64 4/30/2020 500000000
65 4/29/2020 3005300000
66 4/29/2020 1610000000
67 4/28/2020 6670000000
68 4/28/2020 1016000000
69 4/27/2020 541000000
70 4/27/2020 1868000000
71 4/24/2020 310000000
72 4/23/2020 722000000
73 4/23/2020 920000000
74 4/22/2020 971000000
75 4/22/2020 3814000000
76 4/22/2020 2003000000
77 4/21/2020 19903000000
78 4/21/2020 1290000000
79 4/20/2020 1020000000
80 4/20/2020 1740000000
81 4/17/2020 640000000
82 4/16/2020 664000000
83 4/16/2020 440000000
84 4/15/2020 1260000000
85 4/15/2020 4805500000
86 4/15/2020 2260200000
87 4/14/2020 2210000000
88 4/14/2020 26958000000
89 4/14/2020 485000000
90 4/13/2020 931000000
91 4/10/2020 600000000
92 4/9/2020 998000000
93 4/9/2020 463000000
94 4/8/2020 1080000000
95 4/8/2020 11230700000
96 4/8/2020 5922300000
97 4/7/2020 29442000000
98 4/7/2020 9360000000
99 4/7/2020 943000000
100 4/6/2020 12880000000
101 4/6/2020 2270000000
102 4/3/2020 5750000000
103 4/2/2020 1135000000
104 4/2/2020 925000000
105 4/1/2020 950000000
106 4/1/2020 6850200000
107 4/1/2020 16468000000
108 3/31/2020 9285000000
109 3/31/2020 29724000000
110 3/31/2020 2950000000
111 3/30/2020 24100000000
112 3/30/2020 6650000000
113 3/27/2020 13100000000
114 3/27/2020 2165000000
115 3/26/2020 2265000000
116 3/26/2020 3205000000
117 3/25/2020 4950000000
118 3/25/2020 27810000000
119 3/25/2020 17267000000
120 3/24/2020 73805000000
121 3/24/2020 15465000000
122 3/24/2020 4115000000
123 3/23/2020 34850000000
124 3/18/2020 36265000000
125 3/18/2020 75820000000
126 3/17/2020 2053000000
127 3/17/2020 30272000000
128 12/18/2019 3728400000
129 9/25/2019 972700000
130 8/28/2019 871500000
131 3/27/2019 1365000000
132 12/19/2018 4197000000
133 6/27/2018 1090200000
134 3/28/2018 5011000000
135 1/24/2018 672000000
# is the same as
# Base method
swaps_filtered <- subset(swaps[,c("Trade.Date", "Amount")], Amount > 300000000)