In this lab exercise, you will learn:
The data set used for this exercise is Growth.xlsx from E4.1 of Stock and Watson (2020, e4). Growth.xlsx contains data on average growth rates over 1960-1995 for 65 countries, along with variables that are potentially related to growth. A detailed description is given in Growth_Description.pdf, available in LMS.
rm(list=ls())
Let’s load all the packages needed for this lab exercise (this assumes you’ve already installed them).
#install.packages("openxlsx") # install R package "openxlsx"
library(openxlsx) # load the package
## Warning: package 'openxlsx' was built under R version 4.3.3
id <- "1BZAxYZsUtZjeuEugYrHUuHWSlHXZ_4tu"
Growth <- read.xlsx(sprintf("https://docs.google.com/uc?id=%s&export=download",id),
sheet=1,startRow=1,colNames=TRUE,rowNames=FALSE)
str(Growth)
## 'data.frame': 65 obs. of 8 variables:
## $ country_name : chr "India" "Argentina" "Japan" "Brazil" ...
## $ growth : num 1.915 0.618 4.305 2.93 1.712 ...
## $ oil : num 0 0 0 0 0 0 0 0 0 0 ...
## $ rgdp60 : num 766 4462 2954 1784 9895 ...
## $ tradeshare : num 0.141 0.157 0.158 0.16 0.161 ...
## $ yearsschool : num 1.45 4.99 6.71 2.89 8.66 ...
## $ rev_coups : num 0.133 0.933 0 0.1 0 ...
## $ assasinations: num 0.867 1.933 0.2 0.1 0.433 ...
Description of variables:
country_name: Name of country
growth: Average annual percentage growth of real Gross Domestic Product (GDP) from 1960 to 1995.
tradeshare: The average share of trade in the economy from 1960 to 1995, measured as the sum of exports plus imports, divided by GDP; that is, the average value of \((X + M)/GDP\) from 1960 to 1995, where \(X\) = exports and \(M\) = imports (both \(X\) and \(M\) are positive).
rgdp60: The value of GDP* per capita in 1960, converted to 1960 US dollars
yearsschool: Average number of years of schooling of adult residents in that country in 1960
rev_coups: Average annual number of revolutions, insurrections (successful or not) and coup d’etats in that country from 1960 to 1995
assasinations: Average annual number of political assassinations in that country from 1960 to 1995 (per million population)
oil: \(= 1\) if oil accounted for at least half of exports in 1960; \(= 0\) otherwise
Construct a scatterplot of average annual growth rate (\(growth\)) on the average trade share (\(tradeshare\)).
Does there appear to be a relationship betweent the variables?
plot(x=Growth$tradeshare, y=Growth$growth,
main="Average annual growth rate (y) vs. average trade share (x)",
xlab="trade share", ylab="annual growth rate")
We could also construct a scatterplot with country names attached to each point:
rownames(Growth) <- Growth$country_name # assign country name to each row
plot(growth~tradeshare, data=Growth, ylim=c(-3,8),
main="Average annual growth rate (y) vs. average trade share (x)",
xlab="trade share", ylab="annual growth rate")
text(growth~tradeshare, labels=rownames(Growth),data=Growth, cex=0.5, font=0.3, pos=3)
We want to investigate how growth rate is related to a country’s trade share. Using all observations, run a regression of \(growth\) (\(y\)) on \(tradeshare\) (\(x\)): \[growth = \beta_0 + \beta_1 \cdot tradeshare + u.\] The R function used for OLS regression is lm.
fit <- lm(growth~tradeshare, data=Growth)
summary(fit)
##
## Call:
## lm(formula = growth ~ tradeshare, data = Growth)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.3739 -0.8864 0.2329 0.9248 5.3889
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.6403 0.4900 1.307 0.19606
## tradeshare 2.3064 0.7735 2.982 0.00407 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.79 on 63 degrees of freedom
## Multiple R-squared: 0.1237, Adjusted R-squared: 0.1098
## F-statistic: 8.892 on 1 and 63 DF, p-value: 0.00407
Exercise-1: What’s the estimated slope? How to interpret it? What’s the estimated intercept?
Fit the data with the OLS regression line.
plot(x=Growth$tradeshare, y=Growth$growth,
main="Average annual growth rate (y) vs. average trade share (x)",
xlab="trade share", ylab="annual growth rate")
abline(fit, col="red")
Use the regression to predict the growth rate for a country with a trade share of \(0.5\) and for another with a trade share equal to \(1.0\): \[\widehat{growth} = \hat\beta_0 + \hat\beta_1 \cdot tradeshare.\]
b0 <- coef(fit)[1]
b1 <- coef(fit)[2]
pre.y1 <- b0 + b1*0.5
pre.y2 <- b0 + b1*1
print(pre.y1)
## (Intercept)
## 1.793482
print(pre.y2)
## (Intercept)
## 2.946699
Use predict:
An alternative way of computing the predicted growth rate is to use function predict. Note that the argument newdata in predict(object, newdata) should be a data frame in which to look for variables with which to predict. If omitted, the fitted values are used.
new.x <- data.frame(tradeshare=c(0.5))
predict(fit, newdata = new.x)
## 1
## 1.793482
new.x <- data.frame(tradeshare=c(0.5, 1))
predict(fit, newdata = new.x)
## 1 2
## 1.793482 2.946699
Try predict(fit). Without specifying any new data, what do you get using predict?
One country, Malta, has a trade share much larger than the other
countries. Find Malta on the scatterplot.
Exercise-2: What’s the trade share of Malta?
Investigate the effect of outliers on the OLS regression.
Growth.noM <- subset(Growth, country_name != "Malta")
str(Growth.noM)
## 'data.frame': 64 obs. of 8 variables:
## $ country_name : chr "India" "Argentina" "Japan" "Brazil" ...
## $ growth : num 1.915 0.618 4.305 2.93 1.712 ...
## $ oil : num 0 0 0 0 0 0 0 0 0 0 ...
## $ rgdp60 : num 766 4462 2954 1784 9895 ...
## $ tradeshare : num 0.141 0.157 0.158 0.16 0.161 ...
## $ yearsschool : num 1.45 4.99 6.71 2.89 8.66 ...
## $ rev_coups : num 0.133 0.933 0 0.1 0 ...
## $ assasinations: num 0.867 1.933 0.2 0.1 0.433 ...
fit.noM <- lm(growth~tradeshare, data=Growth.noM)
summary(fit.noM)
##
## Call:
## lm(formula = growth ~ tradeshare, data = Growth.noM)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.4247 -0.9383 0.2091 0.9265 5.3776
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.9574 0.5804 1.650 0.1041
## tradeshare 1.6809 0.9874 1.702 0.0937 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.789 on 62 degrees of freedom
## Multiple R-squared: 0.04466, Adjusted R-squared: 0.02925
## F-statistic: 2.898 on 1 and 62 DF, p-value: 0.09369
plot(x=Growth$tradeshare, y=Growth$growth,
main="Average annual growth rate (y) vs. average trade share (x)",
xlab="trade share", ylab="annual growth rate")
abline(fit, col="red")
abline(fit.noM, col="blue")
legend(1.5, 0, legend=c("with Malta", "w/o Malta"),
col=c("red", "blue"), lty=1:2, cex=0.8)
Exercise-3: What’s the impact of an outlier on OLS regression?
For Lab_Assignment_Ch4, the dataset Earnings_and_Height used for E4.2 can be download from LMS or by the following R code:
id <- "1XKjDOQBJcxwslhwipkJAF2qLNmFW9Bfu"
earn <- read.xlsx(sprintf("https://docs.google.com/uc?id=%s&export=download",id),sheet=1,startRow=1,colNames=TRUE,rowNames=FALSE)
str(earn)
A detailed description is given in Earnings_and_Height_Description.pdf, available in LMS.