Introduction to R: Practical

Task 1 - Create and Environment

Open R Studio and create a new project in which you will complete the assignment.
Ensure that you are working from the new project and download the necessary assignment materials using this link. Extract the data folder from the compressed folder you have downloaded and copy it to your root directory.

Hint: Run getwd() in your console to check the file path of your working directory.

Create a new R Script and save it as [yourstudentnumber].R in your directory. Write and execute the code necessary to complete the rest of the assignment in this R Script.

Task 2 - Load Packages

Using comments, create headings in your script for each of the subsequent tasks. Write your code under the appropriate headings.

Hint: In .R files, comments are created using the pound sign, i.e., #.

Load the pacman package using install.packages("") (if necessary) and library().
Install/load the following packages using the pacman package:
- tidyverse
- huxtable
- fixest
- readxl
- tsibble
- mFilter

Task 3 - Create a Directory

Create a folder named ``data’’ in your current directory using R.

Directory already exists: ./data

Task 4 - Load Data

If your antivirus does not block you:

Create a variable called url that takes the value below:

https://www.columbia.edu/~mu2166/book/empirics/usg_data_quarterly.xls

Use the variable called url to save the excel spreadsheet at that location in the new directory
Complete the following path object, you can use the file name usg_data_quarterly.xls:

file_path <- file.path([directory where data is to be stored], [file name])

Download the file by completing the code below:

download.file([file's address on the internet] , [the file's path once downloaded], mode = "wb")

Read the data into a data frame using read_excel.

Otherwise use the data here:

Task 5 - Getting Unique Values and Creating Your Dataset

Create an array called countries, it should keep only the Country Name row from the data you imported
Replace the countries array with an array that contains only unique() values
Load the function below into R and then include your student number in the [student number] space below.

# Function to assign you a random country based on your student number
my_countries <- function(studentnumber, array) {
  set.seed(studentnumber)

  # Fixed first two numbers
  country_index <- c(204, 174)

  # Generate the third number, ensuring it is not 204 or 174
  repeat {
    third_number <- sample(1:214, 1)  # Generate a random number
    if (third_number != 204 && third_number != 174) break  # Exit loop if valid
  }

  # Add the third number to the list of indices
  country_index <- c(country_index, third_number)

  # Get the corresponding countries
  selected_countries <- array[country_index]

  # Create a dataframe with the selected countries and their indices
  return_countries <- data.frame(Index = country_index, Country = selected_countries)

  return(return_countries)
}
keep_countries <- my_countries(123456789, countries)

Merge keep_countries to usg_data using the data generated above. Call this new data frame raw_countries

Task 6 - Making the Data Useable

You will see that your dataset is unfortunately quite unusuable. Let’s fix that by reshaping the data to be in the standard panel format.

Run the code below and provide comments on what each of the lines in the code below does.

long_data <-raw_countries %>%
  pivot_longer(
    cols = `1960`:`2011`,
    names_to = "Year",
    values_to = "Value"
  )

Run the code below and provide comments on what each of the lines in the code below does.

long_data <- long_data %>%
  select(-`Indicator Name`) %>%
  distinct() %>%
  pivot_wider(
    names_from = `Indicator Code`,
    values_from = Value
  )

I include code with which you can get labels for the variables in the dataset as well. You don’t have to comment on this:

# Attach custom attributes to store Indicator Name as metadata
for (code in unique(raw_countries$`Indicator Code`)) {
  # Find the corresponding indicator name
  label <- unique(raw_countries$`Indicator Name`[raw_countries$`Indicator Code` == code])

  # Assign the label as an attribute to the corresponding column
  attr(long_data[[code]], "label") <- label
}

Convert the Year variable of long_data to a number

Convert your data into a time-series tibble using the code below.

long_data <- long_data %>%
  as_tsibble(index = Year, key = `Country Name`)

In the dataset that you have now created, rename the field NY.GDP.PCAP.KN gdp_pcap, the variable must retain its label without any additional code. You may not create a new data frame, or change the data frame’s name for this task.

Create the following variables:

Create a variabled called cons_pcap that contains the Consumption per Capita in constant local currency units.
Create a variable called gdi_pcap that contains the Investment per Capita in constant local currency units.
Ensure that your series are the correct order of magnitude.

Combine tasks 1-6, along with your comments, into a single tidyverse pipe.
Create a data frame called data_to_use that includes the country name, country code, year, as well as the three per Capita measures you created above. Ensure that data_to_use has no rows with empty values.

Create logged values of the national accounts variables. Call these variables l[name].

Task 7 - Using your data

Provide a table in the form of the one below based on your data.

Hint: Use a tidyverse pipe that looks something like this

summary_stats <- data_to_use %>%
  as_tibble() %>% # Temporarily remove the tsibble structure
  group_by(...) %>%
  summarise(...
  )
# You can confirm you are on the right track with this
summary_stats

Run the regressions below on the United States only. In all questions use the fixest command.

2.1. \(lgdp\_pcap_t = \rho_0 + \tau Year_t\)

2.2. \(lgdp\_pcap_t + \rho_0 + \tau Year_t + \rho_1 lgdp\_pcap_{t-1}\)

2.3. \(\triangle lgdp\_pcap_t + \rho_0 + \tau Year_t + \rho_1 lgdp\_pcap_{t-1}\)

2.4. Combine all tables into a huxreg. It should look something like the below:

	(1)	(2)	(3)
(Intercept)	-28.112 ***	-3.960	-3.960
	(0.694)	(3.078)	(3.078)
Year	0.019 ***	0.003	0.003
	(0.000)	(0.002)	(0.002)
lgdp_pcap_lag		0.835 ***	-0.165
		(0.106)	(0.106)
N	47	46	46
R2	0.986	0.994	0.106
logLik	95.443	114.296	114.296
AIC	-186.886	-222.593	-222.593
* p < 0.001; p < 0.01; * p < 0.05.

2.5. Given your results above, would you say the data is stationary? Support your answer using graphs.

Task 8

Below, I create the linearly detrended series for lgdp_pcap as lgdp_pcap_dlt. Create the quadratically detrended series and comment on your code. (Hint: you can specify poly(Year, n) to create a polyniomial of order n in the linear model environment. Ex. lm(y ~ poly(x,n)))

data_to_use <- data_to_use %>%
  group_by_key() %>%
  mutate(
    # Linear detrending
    lgdp_pcap_ldt = residuals(lm(lgdp_pcap ~ Year, data = cur_data())),
  ) %>%
  ungroup()

data_to_use <- data_to_use %>%
  group_by_key() %>%
  mutate(
    # Quadratic detrending
    lgdp_pcap_qdt = residuals(lm(lgdp_pcap ~ poly(Year,2), data = cur_data())),
  ) %>%
  ungroup()

Below, I detrend the data using the HP-Filter using \lambda = 100 for all series, do the same for lcons_pcap. Create the same for all the series using the \lambda=6.25.

hp_data <- data_to_use %>%
  mutate(across(
    c(lgdp_pcap, lcons_pcap),
    ~ mFilter::hpfilter(.x, freq = 100)$cycle
  ))

Confirm that the HP filtered series above are stationary