Introduction to R: Practical

This practical accompanies the “Introduction to R” document MCom (Economics) students. Nothing in this task is for submission, it is simply an excercise to assist your learning.

Task 1 - Create and Environment

Hint: Run getwd() in your console to check the file path of your working directory.

Task 2 - Load Packages

Hint: In .R files, comments are created using the pound sign, i.e., #.

Task 3 - Create a Directory

Create a folder named ``data’’ in your current directory using R.

Directory already exists: ./data 

Task 4 - Load Data

If your antivirus does not block you:

  1. Create a variable called url that takes the value below:

https://www.columbia.edu/~mu2166/book/empirics/usg_data_quarterly.xls

  1. Use the variable called url to save the excel spreadsheet at that location in the new directory

  2. Complete the following path object, you can use the file name usg_data_quarterly.xls:

file_path <- file.path([directory where data is to be stored], [file name])

  1. Download the file by completing the code below:

download.file([file's address on the internet] , [the file's path once downloaded], mode = "wb")

  1. Read the data into a data frame using read_excel.

Otherwise use the data here:

Task 5 - Getting Unique Values and Creating Your Dataset

  1. Create an array called countries, it should keep only the Country Name row from the data you imported

  2. Replace the countries array with an array that contains only unique() values

  3. Load the function below into R and then include your student number in the [student number] space below.

# Function to assign you a random country based on your student number
my_countries <- function(studentnumber, array) {
  set.seed(studentnumber)

  # Fixed first two numbers
  country_index <- c(204, 174)

  # Generate the third number, ensuring it is not 204 or 174
  repeat {
    third_number <- sample(1:214, 1)  # Generate a random number
    if (third_number != 204 && third_number != 174) break  # Exit loop if valid
  }

  # Add the third number to the list of indices
  country_index <- c(country_index, third_number)

  # Get the corresponding countries
  selected_countries <- array[country_index]

  # Create a dataframe with the selected countries and their indices
  return_countries <- data.frame(Index = country_index, Country = selected_countries)

  return(return_countries)
}
keep_countries <- my_countries(123456789, countries)
  1. Merge keep_countries to usg_data using the data generated above. Call this new data frame raw_countries

Task 6 - Making the Data Useable

You will see that your dataset is unfortunately quite unusuable. Let’s fix that by reshaping the data to be in the standard panel format.

  1. Run the code below and provide comments on what each of the lines in the code below does.
long_data <-raw_countries %>%
  pivot_longer(
    cols = `1960`:`2011`,
    names_to = "Year",
    values_to = "Value"
  )
  1. Run the code below and provide comments on what each of the lines in the code below does.
long_data <- long_data %>%
  select(-`Indicator Name`) %>%
  distinct() %>%
  pivot_wider(
    names_from = `Indicator Code`,
    values_from = Value
  )

I include code with which you can get labels for the variables in the dataset as well. You don’t have to comment on this:

# Attach custom attributes to store Indicator Name as metadata
for (code in unique(raw_countries$`Indicator Code`)) {
  # Find the corresponding indicator name
  label <- unique(raw_countries$`Indicator Name`[raw_countries$`Indicator Code` == code])

  # Assign the label as an attribute to the corresponding column
  attr(long_data[[code]], "label") <- label
}
  1. Convert the Year variable of long_data to a number
  1. Convert your data into a time-series tibble using the code below.
long_data <- long_data %>%
  as_tsibble(index = Year, key = `Country Name`)
  1. In the dataset that you have now created, rename the field NY.GDP.PCAP.KN gdp_pcap, the variable must retain its label without any additional code. You may not create a new data frame, or change the data frame’s name for this task.
  1. Create the following variables:
  1. Combine tasks 1-6, along with your comments, into a single tidyverse pipe.

  2. Create a data frame called data_to_use that includes the country name, country code, year, as well as the three per Capita measures you created above. Ensure that data_to_use has no rows with empty values.

  1. Create logged values of the national accounts variables. Call these variables l[name].

Task 7 - Using your data

  1. Provide a table in the form of the one below based on your data.
it should look something close to this

Hint: Use a tidyverse pipe that looks something like this

summary_stats <- data_to_use %>%
  as_tibble() %>% # Temporarily remove the tsibble structure
  group_by(...) %>%
  summarise(...
  )
# You can confirm you are on the right track with this
summary_stats
  1. Run the regressions below on the United States only. In all questions use the fixest command.

2.1. \(lgdp\_pcap_t = \rho_0 + \tau Year_t\)

2.2. \(lgdp\_pcap_t + \rho_0 + \tau Year_t + \rho_1 lgdp\_pcap_{t-1}\)

2.3. \(\triangle lgdp\_pcap_t + \rho_0 + \tau Year_t + \rho_1 lgdp\_pcap_{t-1}\)

2.4. Combine all tables into a huxreg. It should look something like the below:

(1)(2)(3)
(Intercept)-28.112 ***-3.960    -3.960 
(0.694)   (3.078)   (3.078)
Year0.019 ***0.003    0.003 
(0.000)   (0.002)   (0.002)
lgdp_pcap_lag        0.835 ***-0.165 
        (0.106)   (0.106)
N47        46        46     
R20.986    0.994    0.106 
logLik95.443    114.296    114.296 
AIC-186.886    -222.593    -222.593 
*** p < 0.001; ** p < 0.01; * p < 0.05.

2.5. Given your results above, would you say the data is stationary? Support your answer using graphs.

Task 8

  1. Below, I create the linearly detrended series for lgdp_pcap as lgdp_pcap_dlt. Create the quadratically detrended series and comment on your code. (Hint: you can specify poly(Year, n) to create a polyniomial of order n in the linear model environment. Ex. lm(y ~ poly(x,n)))
data_to_use <- data_to_use %>%
  group_by_key() %>%
  mutate(
    # Linear detrending
    lgdp_pcap_ldt = residuals(lm(lgdp_pcap ~ Year, data = cur_data())),
  ) %>%
  ungroup()

data_to_use <- data_to_use %>%
  group_by_key() %>%
  mutate(
    # Quadratic detrending
    lgdp_pcap_qdt = residuals(lm(lgdp_pcap ~ poly(Year,2), data = cur_data())),
  ) %>%
  ungroup()
  1. Below, I detrend the data using the HP-Filter using \lambda = 100 for all series, do the same for lcons_pcap. Create the same for all the series using the \lambda=6.25.
hp_data <- data_to_use %>%
  mutate(across(
    c(lgdp_pcap, lcons_pcap),
    ~ mFilter::hpfilter(.x, freq = 100)$cycle
  ))
  1. Confirm that the HP filtered series above are stationary