Session 1

library(tidyverse) 
library(tidyquant)
library(lubridate)
library(dplyr)
library(ggplot2)
library(readxl)
library(esquisse)
library(here)
library(janitor)
library(ggthemes)
library(ggrepel)
library(gt)
library(viridis)
library(hrbrthemes)
library(RColorBrewer)
library(timetk)

Week 1: RStudio and publishing your first Quarto document

Agenda

  • Who these sessions are designed for
    • mainly those with no programming background
  • A tour of R-Studio
    • Basic terminology: objects and data types
    • Libraries (and how to import them)
    • Executing code
    • Typing in the script vs the console
    • The importance of the “global environment” concept
  • Importing data
  • Tidyverse vs Base R

Terminology

It’s important to understand some basic terminology, specifically objects and data types in R, so that you understand how those objects interact with R code, how they should be manipulated, and what people are talking about online when you go asking for help.

Objects

  • Objects in R are what we call variables in other programming languages. They are instances of a class. A vector, a matrix, a dataframe (DF), a list, an array, or a factor are all objects in R. Everything in R is an object
  • Things to keep in mind when creating an object:
    • Should be short and explicit
    • Should not start with a number ex. 5Data. R will not recognize this
    • “Data” is different than “data” as R is case sensitive
    • Do not include spaces in the name of your object. R considers this as separate text
    • Avoid using punctuation other than a period (.), dash (-), or underscore ( _ )
    • The names of default fundamental functions should not be used as object names in R. eg: function, if, else, repeat, etc.
# Creating an object

distance_km <- 134

# Print the object 

distance_km
[1] 134
dice <- c(1,2,3,4,5,6)

Important concept: to permanently change an object, you need to “save” it as a new object or to the same name as the current object. There are times where we might want to manipulate the data for data vis purposes but do not want to permanently change the underlying DF. In those instances we would not save to a new object.

Example:

str0 <- "2012-03-15"

as.Date(str0, format = "%Y-%m-%d")
[1] "2012-03-15"
class(str0)
[1] "character"
# We tried to change the data type but it still reads as zero. Need to save it to a new object or the name of the same one. 

str0 <- as.Date(str0, format = "%Y-%m-%d")

class(str0)
[1] "Date"

Data types

  • Character (chr)

    • Used to specify character or string values in a variable. In programming, a string is a set of characters.
    • “Apple” - note the quotations around it. If a variable is labeled a character, you need to include these.
    apple <- "Apple"

    apple
[1] "Apple"
    class(apple)
[1] "character"
  • Double (dbl); also known as numeric

    • stores regular numbers
    class(dice)
[1] "numeric"
  • Integers (int)

    • stores numbers that can be written without a decimal point + Do not worry about the difference between these two. R converts freely between them as needed based on the numbers and operations you are passing it.
    numbers <- c(1L,2L,3L)

    numbers
[1] 1 2 3
    class(numbers)
[1] "integer"
  • Logicals

    • Any time you type TRUE or FALSE, in capital letter (w/out quotation marks), R will treat the input as logical data
logic <- c(TRUE,FALSE)

logic
[1]  TRUE FALSE
class(logic)
[1] "logical"
  • Date (dte), POSIXlt, and POSIXct
    • dates can be tricky to work with and will typically be imported as strings (character). The problem with that is when we want to manipulate the dates in the DF for data vis, having it classified as a character will usually reject the command.
    • there are three basic date and time classes: Date, POSIXlt, and POSIXct. Class Date handles dates w/out times. POSIXct (calendar time) and POSIXlt (local time) represent dates and times (hours, minutes, seconds).
str1 <- "2012-03-15"

date1 <- as.Date(str1, format = "%Y-%m-%d")

date1
[1] "2012-03-15"
class(date1)
[1] "Date"
date_string <- "2019-01-14 14:17:30"

date_string <- as.POSIXct(date_string)

class(date_string)
[1] "POSIXct" "POSIXt" 

Libraries and packages

# Library stack -> this is a set of libraries I typically load, mainly because I'm too lazy to retype as needed so just copy paste from previous work. You can dump all this into your file, but it will be useful to load individually so you understand what features map to what library, as needed. 

# You will hear the professor talk about different packages like Tidyr and tidyquant. Think of the package as the book you buy and put on your bookshelf. This is what happens when you install the package. When you use the library function, you're telling R to go to our library and open the tidyverse package for use.

library(tidyverse) 
library(tidyquant)
library(lubridate)
library(dplyr)
library(ggplot2)
library(readxl)
library(esquisse)
library(here)
library(janitor)
library(ggthemes)
library(ggrepel)
library(gt)
library(viridis)
library(hrbrthemes)
library(RColorBrewer)
library(timetk)

# Question: I tried to load a library and R said it doesn't recognize it. Why? 

# Answer: This is probably because you either A) spelled it wrong or B) do not have the package installed. Let's say this happened with lubridate. Fix by installing the package as follows: 

#install.packages("lubridate") # note that we need to put this in quotations as the install.packages function expects us to pass it a string rather than number or object (more on this in a bit)

Importing Data

# Let's now import a dataset for us to work with: 
swaps <- read.csv("001_data//Fed_swaps.csv")

# Observe how this populates in the "Environment" tab in the upper right corner. We can click on it to what the imported data looks like in a visually appealing format. This is a fairly clean data set. We could start working on this with minimal cleaning. We're not going to go into that today, but in subsequent session we'll work with some messier data to show how to get it into a "tidy" format. 

# If I call the object by typing it's name in, we can also see what it looks like as output in the quarto document, as show below.

# swaps

# If we use glimpse(), it displays in the console and shows the columns as rows. This is useful for allowing us to see all the column headers at once. Notice how when we call the object as just "swaps", it displays 8 of the 11 columns up front but we have to scroll right. There are a lot of commands in R that allow us to look at the dataset from various angles to understand what data we're working with and how to best clean it. We'll dive into exploratory analysis more in later sessions.  
glimpse(swaps)
Rows: 529
Columns: 11
$ Operation.Type  <chr> "U.S. Dollar Liquidity Swap", "U.S. Dollar Liquidity S…
$ Counterparty    <chr> "European Central Bank", "European Central Bank", "Ban…
$ Currency        <chr> "USD", "USD", "USD", "USD", "USD", "USD", "USD", "USD"…
$ Trade.Date      <chr> "12/21/2022", "12/14/2022", "12/13/2022", "12/7/2022",…
$ Settlement.Date <chr> "12/22/2022", "12/15/2022", "12/15/2022", "12/8/2022",…
$ Maturity.Date   <chr> "1/5/2023", "12/22/2022", "12/22/2022", "12/15/2022", …
$ Term..days.     <int> 14, 7, 7, 7, 7, 7, 6, 8, 8, 7, 7, 7, 7, 7, 7, 7, 7, 7,…
$ Amount          <dbl> 412200000, 209000000, 1000000, 205500000, 1000000, 199…
$ Interest.Rate   <dbl> 4.59, 4.59, 4.53, 4.09, 4.09, 4.08, 4.08, 4.08, 4.08, …
$ isSmallValue    <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", ""…
$ Last.Updated    <chr> "12/22/2022 16:00", "12/15/2022 16:00", "12/15/2022 16…
# Problem: I call a function that I've seen you use before but R tells me it can't find it for use. 
# Answer: If you're sure you're spelling/using it correctly, it's likely you haven't loaded the correct library. Do a google search for the function and what library it maps to and see if you have it listed in the code. If it is listed, make sure you've actually run the code so R knows it's active. 


# importing from excel with multiple sheets.  
dw19q1 <- read_excel("001_data//dw_data_2019-2020.xlsx", sheet = "19Q1", skip = 3)
dw19q1
# A tibble: 553 × 28
   `Loan date`         `Maturity date`      Term `Repayment date`    Lending F…¹
   <dttm>              <dttm>              <dbl> <dttm>              <chr>      
 1 2019-01-02 00:00:00 2019-01-03 00:00:00     1 2019-01-03 00:00:00 New York (…
 2 2019-01-02 00:00:00 2019-01-03 00:00:00     1 2019-01-03 00:00:00 Chicago (7)
 3 2019-01-02 00:00:00 2019-01-03 00:00:00     1 2019-01-03 00:00:00 Chicago (7)
 4 2019-01-02 00:00:00 2019-01-03 00:00:00     1 2019-01-03 00:00:00 Chicago (7)
 5 2019-01-02 00:00:00 2019-02-01 00:00:00    30 2019-01-04 00:00:00 Minneapoli…
 6 2019-01-02 00:00:00 2019-01-03 00:00:00     1 2019-01-03 00:00:00 Kansas Cit…
 7 2019-01-02 00:00:00 2019-01-03 00:00:00     1 2019-01-03 00:00:00 San Franci…
 8 2019-01-03 00:00:00 2019-01-04 00:00:00     1 2019-01-04 00:00:00 Boston (1) 
 9 2019-01-03 00:00:00 2019-01-04 00:00:00     1 2019-01-04 00:00:00 Chicago (7)
10 2019-01-03 00:00:00 2019-01-04 00:00:00     1 2019-01-04 00:00:00 Chicago (7)
# … with 543 more rows, 23 more variables: Borrower <chr>,
#   `Borrower city` <chr>, `Borrower state` <chr>, `Borrower ABA number` <chr>,
#   `Type of credit` <chr>, `Interest rate` <dbl>, `Loan amount` <dbl>,
#   `Other outstanding loans` <dbl>, `Total outstanding loans` <dbl>,
#   `Total collateral1` <dbl>, `Commercial loans1` <dbl>,
#   `Residential mortgages1` <dbl>, `Commercial real estate loans1` <dbl>,
#   `Consumer loans1` <dbl>, `U.S. Treasury/Agency securities1` <dbl>, …

Tidyverse and other packages vs Base R

Tidyverse is essentially a package of packages (loading it loads ggplot2, dplyr, tidyr, readr, purrr, tibble, stringr, and forcats) that provides a series of tools for working with and manipulating data with better flow and visually appealing code. It provides a better interface that’s similar to Python and looks much cleaner. It is important to understand the difference so you do not get confused when trying to figure out how to do something online. Generally for every command in Tidyr, there’s a way to also do it in Base R, though it might not be as clean looking or intuitive.

# Tidyr and dplyr method
swaps %>%
    select(c(Trade.Date,Amount)) %>%
  filter(Amount > 300000000) 
    Trade.Date      Amount
1   12/21/2022   412200000
2    6/29/2022   346500000
3    3/30/2022   365500000
4    3/16/2022   308000000
5     1/5/2022   313500000
6   12/22/2021   939500000
7   12/16/2020  3134200000
8   12/16/2020   514000000
9    10/6/2020   330000000
10   9/29/2020   800000000
11   9/23/2020   670000000
12   9/15/2020   742000000
13    9/9/2020   585000000
14    9/9/2020   640000000
15    9/1/2020   355000000
16   8/25/2020   673000000
17   8/18/2020   405000000
18   8/18/2020  1523000000
19   8/11/2020  2752000000
20    8/5/2020   418000000
21    8/4/2020  3647000000
22    8/4/2020   370000000
23   7/28/2020  4023000000
24   7/28/2020   770000000
25   7/22/2020   457000000
26   7/20/2020   500000000
27   7/14/2020 11860000000
28   7/14/2020  6970000000
29   7/13/2020  1370000000
30    7/8/2020   343500000
31    7/7/2020 19607000000
32    7/7/2020  7910000000
33    7/1/2020   345000000
34   6/30/2020 21182000000
35   6/30/2020  6372000000
36   6/24/2020  1350000000
37   6/23/2020 21303000000
38   6/23/2020 16016000000
39   6/17/2020  3194800000
40   6/16/2020 23114000000
41   6/16/2020 17044000000
42   6/10/2020   480000000
43    6/9/2020  7933000000
44    6/9/2020 15890000000
45    6/3/2020   495000000
46    6/2/2020  2747000000
47    6/2/2020  1355000000
48   5/27/2020  1501000000
49   5/27/2020  1510000000
50   5/26/2020  5250000000
51   5/20/2020   442000000
52   5/20/2020   600000000
53   5/19/2020  9292000000
54   5/19/2020  2373000000
55   5/13/2020   791600000
56   5/13/2020  3245000000
57   5/12/2020  9489000000
58   5/12/2020  2890000000
59    5/7/2020   400000000
60    5/6/2020  1795000000
61    5/6/2020  2291500000
62    5/4/2020  1721300000
63   4/30/2020  2042000000
64   4/30/2020   500000000
65   4/29/2020  3005300000
66   4/29/2020  1610000000
67   4/28/2020  6670000000
68   4/28/2020  1016000000
69   4/27/2020   541000000
70   4/27/2020  1868000000
71   4/24/2020   310000000
72   4/23/2020   722000000
73   4/23/2020   920000000
74   4/22/2020   971000000
75   4/22/2020  3814000000
76   4/22/2020  2003000000
77   4/21/2020 19903000000
78   4/21/2020  1290000000
79   4/20/2020  1020000000
80   4/20/2020  1740000000
81   4/17/2020   640000000
82   4/16/2020   664000000
83   4/16/2020   440000000
84   4/15/2020  1260000000
85   4/15/2020  4805500000
86   4/15/2020  2260200000
87   4/14/2020  2210000000
88   4/14/2020 26958000000
89   4/14/2020   485000000
90   4/13/2020   931000000
91   4/10/2020   600000000
92    4/9/2020   998000000
93    4/9/2020   463000000
94    4/8/2020  1080000000
95    4/8/2020 11230700000
96    4/8/2020  5922300000
97    4/7/2020 29442000000
98    4/7/2020  9360000000
99    4/7/2020   943000000
100   4/6/2020 12880000000
101   4/6/2020  2270000000
102   4/3/2020  5750000000
103   4/2/2020  1135000000
104   4/2/2020   925000000
105   4/1/2020   950000000
106   4/1/2020  6850200000
107   4/1/2020 16468000000
108  3/31/2020  9285000000
109  3/31/2020 29724000000
110  3/31/2020  2950000000
111  3/30/2020 24100000000
112  3/30/2020  6650000000
113  3/27/2020 13100000000
114  3/27/2020  2165000000
115  3/26/2020  2265000000
116  3/26/2020  3205000000
117  3/25/2020  4950000000
118  3/25/2020 27810000000
119  3/25/2020 17267000000
120  3/24/2020 73805000000
121  3/24/2020 15465000000
122  3/24/2020  4115000000
123  3/23/2020 34850000000
124  3/18/2020 36265000000
125  3/18/2020 75820000000
126  3/17/2020  2053000000
127  3/17/2020 30272000000
128 12/18/2019  3728400000
129  9/25/2019   972700000
130  8/28/2019   871500000
131  3/27/2019  1365000000
132 12/19/2018  4197000000
133  6/27/2018  1090200000
134  3/28/2018  5011000000
135  1/24/2018   672000000
# is the same as 

# Base method
swaps_filtered <- subset(swaps[,c("Trade.Date", "Amount")], Amount > 300000000)