L4 (Lab) - OLS with 1 Regressor

In this lab exercise, you will learn:

How to make a scatter plot
- plot
How to run an OLS regression in R
- lm: run OLS regression
- summary: obtain regression results
- coef: obtain estimated coefficients
How to compute the predicted value of \(y\), \(\hat{y}\)
- predict

The data set used for this exercise is Growth.xlsx from E4.1 of Stock and Watson (2020, e4). Growth.xlsx contains data on average growth rates over 1960-1995 for 65 countries, along with variables that are potentially related to growth. A detailed description is given in Growth_Description.pdf, available in LMS.

Clear the Workspace

rm(list=ls())

Install and Load Needed Packages

Let’s load all the packages needed for this lab exercise (this assumes you’ve already installed them).

#install.packages("openxlsx")   # install R package "openxlsx"
library(openxlsx)               # load the package

## Warning: package 'openxlsx' was built under R version 4.3.3

Import the Growth Data

id <- "1BZAxYZsUtZjeuEugYrHUuHWSlHXZ_4tu"
Growth <- read.xlsx(sprintf("https://docs.google.com/uc?id=%s&export=download",id),
                 sheet=1,startRow=1,colNames=TRUE,rowNames=FALSE)
str(Growth)

## 'data.frame':    65 obs. of  8 variables:
##  $ country_name : chr  "India" "Argentina" "Japan" "Brazil" ...
##  $ growth       : num  1.915 0.618 4.305 2.93 1.712 ...
##  $ oil          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ rgdp60       : num  766 4462 2954 1784 9895 ...
##  $ tradeshare   : num  0.141 0.157 0.158 0.16 0.161 ...
##  $ yearsschool  : num  1.45 4.99 6.71 2.89 8.66 ...
##  $ rev_coups    : num  0.133 0.933 0 0.1 0 ...
##  $ assasinations: num  0.867 1.933 0.2 0.1 0.433 ...

Description of variables:

country_name: Name of country
growth: Average annual percentage growth of real Gross Domestic Product (GDP) from 1960 to 1995.
tradeshare: The average share of trade in the economy from 1960 to 1995, measured as the sum of exports plus imports, divided by GDP; that is, the average value of \((X + M)/GDP\) from 1960 to 1995, where \(X\) = exports and \(M\) = imports (both \(X\) and \(M\) are positive).
rgdp60: The value of GDP* per capita in 1960, converted to 1960 US dollars
yearsschool: Average number of years of schooling of adult residents in that country in 1960
rev_coups: Average annual number of revolutions, insurrections (successful or not) and coup d’etats in that country from 1960 to 1995
assasinations: Average annual number of political assassinations in that country from 1960 to 1995 (per million population)
oil: \(= 1\) if oil accounted for at least half of exports in 1960; \(= 0\) otherwise

1. Scatterplot

Construct a scatterplot of average annual growth rate (\(growth\)) on the average trade share (\(tradeshare\)).

Does there appear to be a relationship betweent the variables?

plot(x=Growth$tradeshare, y=Growth$growth, 
     main="Average annual growth rate (y) vs. average trade share (x)", 
     xlab="trade share", ylab="annual growth rate")

We could also construct a scatterplot with country names attached to each point:

rownames(Growth) <- Growth$country_name # assign country name to each row
plot(growth~tradeshare, data=Growth, ylim=c(-3,8),
     main="Average annual growth rate (y) vs. average trade share (x)", 
     xlab="trade share", ylab="annual growth rate")
text(growth~tradeshare, labels=rownames(Growth),data=Growth, cex=0.5, font=0.3, pos=3)

2. Run a Linear Regression

We want to investigate how growth rate is related to a country’s trade share. Using all observations, run a regression of \(growth\) (\(y\)) on \(tradeshare\) (\(x\)): \[growth = \beta_0 + \beta_1 \cdot tradeshare + u.\] The R function used for OLS regression is lm.

fit <- lm(growth~tradeshare, data=Growth)
summary(fit)

## 
## Call:
## lm(formula = growth ~ tradeshare, data = Growth)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.3739 -0.8864  0.2329  0.9248  5.3889 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   0.6403     0.4900   1.307  0.19606   
## tradeshare    2.3064     0.7735   2.982  0.00407 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.79 on 63 degrees of freedom
## Multiple R-squared:  0.1237, Adjusted R-squared:  0.1098 
## F-statistic: 8.892 on 1 and 63 DF,  p-value: 0.00407

Exercise-1: What’s the estimated slope? How to interpret it? What’s the estimated intercept?

3. Plot the Estimated Regression Functions

Fit the data with the OLS regression line.

plot(x=Growth$tradeshare, y=Growth$growth, 
     main="Average annual growth rate (y) vs. average trade share (x)", 
     xlab="trade share", ylab="annual growth rate")
abline(fit, col="red")

4. Predict the Growth Rate

Use the regression to predict the growth rate for a country with a trade share of \(0.5\) and for another with a trade share equal to \(1.0\): \[\widehat{growth} = \hat\beta_0 + \hat\beta_1 \cdot tradeshare.\]

Step-1: Use function coef to obtain the estimated coefficients.

b0 <- coef(fit)[1]
b1 <- coef(fit)[2]

Step-2: Compute the predicted values.

pre.y1 <- b0 + b1*0.5
pre.y2 <- b0 + b1*1
print(pre.y1)

## (Intercept) 
##    1.793482

print(pre.y2)

## (Intercept) 
##    2.946699

Use predict:

An alternative way of computing the predicted growth rate is to use function predict. Note that the argument newdata in predict(object, newdata) should be a data frame in which to look for variables with which to predict. If omitted, the fitted values are used.

Example: Compute \(\widehat{growth}\) with \(tradeshare=0.5\).

new.x <- data.frame(tradeshare=c(0.5))
predict(fit, newdata = new.x)

##        1 
## 1.793482

Example: Compute \(\widehat{growth}\) with \(tradeshare=0.5\) or \(tradeshare=1\).

new.x <- data.frame(tradeshare=c(0.5, 1))
predict(fit, newdata = new.x)

##        1        2 
## 1.793482 2.946699

Try predict(fit). Without specifying any new data, what do you get using predict?

5. An Outlier - Malta

One country, Malta, has a trade share much larger than the other countries. Find Malta on the scatterplot.

Exercise-2: What’s the trade share of Malta?

6. Re-run the OLS Regression Excluding Malta

Investigate the effect of outliers on the OLS regression.

Step-1: Exclude Malta from the data.

Growth.noM <- subset(Growth, country_name != "Malta")
str(Growth.noM)

## 'data.frame':    64 obs. of  8 variables:
##  $ country_name : chr  "India" "Argentina" "Japan" "Brazil" ...
##  $ growth       : num  1.915 0.618 4.305 2.93 1.712 ...
##  $ oil          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ rgdp60       : num  766 4462 2954 1784 9895 ...
##  $ tradeshare   : num  0.141 0.157 0.158 0.16 0.161 ...
##  $ yearsschool  : num  1.45 4.99 6.71 2.89 8.66 ...
##  $ rev_coups    : num  0.133 0.933 0 0.1 0 ...
##  $ assasinations: num  0.867 1.933 0.2 0.1 0.433 ...

Step-2: Re-run the OLS regression.

fit.noM <- lm(growth~tradeshare, data=Growth.noM)
summary(fit.noM)

## 
## Call:
## lm(formula = growth ~ tradeshare, data = Growth.noM)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4247 -0.9383  0.2091  0.9265  5.3776 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   0.9574     0.5804   1.650   0.1041  
## tradeshare    1.6809     0.9874   1.702   0.0937 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.789 on 62 degrees of freedom
## Multiple R-squared:  0.04466,    Adjusted R-squared:  0.02925 
## F-statistic: 2.898 on 1 and 62 DF,  p-value: 0.09369

Step-3: Scatterplot

plot(x=Growth$tradeshare, y=Growth$growth, 
     main="Average annual growth rate (y) vs. average trade share (x)", 
     xlab="trade share", ylab="annual growth rate")
abline(fit, col="red")
abline(fit.noM, col="blue")
legend(1.5, 0, legend=c("with Malta", "w/o Malta"),
       col=c("red", "blue"), lty=1:2, cex=0.8)

Exercise-3: What’s the impact of an outlier on OLS regression?

7. Lab Assignment (E4.2, Stock and Watson (e4))

For Lab_Assignment_Ch4, the dataset Earnings_and_Height used for E4.2 can be download from LMS or by the following R code:

id <- "1XKjDOQBJcxwslhwipkJAF2qLNmFW9Bfu"
earn <- read.xlsx(sprintf("https://docs.google.com/uc?id=%s&export=download",id),sheet=1,startRow=1,colNames=TRUE,rowNames=FALSE)
str(earn)

A detailed description is given in Earnings_and_Height_Description.pdf, available in LMS.