The zoo package will help us re-format the period to be a useable date. Zoo stands for Z’s Ordered Observations.
# install.packages("zoo")library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.2 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.2 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(zoo)
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
Loading Data from a Working Directory
You may get data from a url, from a pre-built dataset, or you may load data from a working directory. A working directory is simply the file location on your computer.
The easiest way to find out what your current workinig directory is, use the command getwd().
This command shows you (in your console below) the path to your directory. My current path is: [1] “C:/Users/rsaidi/Dropbox/Rachel/MontColl/DATA110/Notes”
If you want to change the path, there are several ways to do so. I find the easiest way to change it is to click the “Session” tab at the top of R Studio. Select “Set Working Directory”, and then arrow over to “Choose Directory”. At this point, it will take you to your computer folders, and you need to select where your data is held. I suggest you create a folder called “Datasets” and keep all the data you load for this class in that folder.
Notice that down in the console below, it will show the new path you have chosen: setwd(“C:/Users/rsaidi/Dropbox/Rachel/MontColl/Datasets/Datasets”). At this point, I copy that command and put it directly into a new chunk.
Load the data
The following data comes from New York Fed (https://www.newyorkfed.org/microeconomics/hhdc.html) regarding household debt for housing and non-housing expenses.
Download this dataset, Household_debt, from http://bit.ly/2P3084E and save it in your dataset folder. Change your working directory to load the dataset from YOUR folder. Then run this code.
Rows: 64 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Period
dbl (7): Mortgage, HE Revolving, Auto Loan, Credit Card, Student Loan, Other...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Clean data headings and variable names
Very soon, you will find data from other sources. The data will require some cleaning. Here are some important points to check: 1. Be sure the format is .csv 2. Be sure there are no spaces between variable names (headers). 3. Set all variable names to lowercase so you do not have to keep track of capitalizing.
Here are some useful cleaning commands:
Make all headings (column names) lowercase. Remove all spaces between words in headings and replace them with underscores with the gsub command. Then look at it with “head”
Mutate is a powerful command in tidyverse. It creates a new variable (column) in your dataset. In our dataset, “period” is not anything useful if we want to plot chronological data. So we will use mutate from “tidyverse” with the package “zoo” to create a useable date format.
Create a new variable to use instead of “period”
You should see that there are 64 observations and 8 variables. All variables are “col_double” (continuous values) except “period”, which is interpreted as characters.We need to use the library “zoo” package to fix the unusual format of the “period”. We will mutate it to create a new variable, date.
household2 <- household %>%mutate(date =as.Date(as.yearqtr(period, format ="%y:Q%q")))household2
Use “facet_wrap” to show all types of debt together
Facet_wrap allows you to plot all variables together for comparison. In order to do this, you have to “reshape the”data from a wide format to a long format. Finally, I colored each variable with a different color.