Simple OLS: X and Y

This is a template and example for completing your homework. The goal of this assignment is to collect two variables, plot them together, and run a simple linear regression. This should be data you dig up yourself, but I’ll mention some possible exceptions in class.

Your report should explain what data you are using, where you got that data, and what you believe it is measuring. For example:

I am using the Economic Freedom of the World Index (EFW) and the Economic Complexity Index(ECI).

The EFW is from the Economic Freedom of the World 2015 Report. This project creates an index measure of the market-friendliness of a country’s institutions. It is published by the Fraser Institute.

The ECI measures the sophistication of an economy based on the rarity of the products it exports and the variety of products it exports. It is published by the Center for International Development at the Harvard Kennedy School.

Your works cited should give details about where the data is from. At a minimum there should be enough detail that a reader could track down the data without too much trouble. Some datasets will include specific instructions for citation, follow any such instructions. Consider using Zotero to make the process easy (please view their tutorials before asking me for help).

You should also give the mean and standard deviation of each variable. This allows interpretation of the later regression results relative to the mean values and typical variation of the data. Please give these details in a table for readability. The example below is the minimum requirement for this. Make it prettier if you can. Make sure it’s obvious what the variable names mean (i.e. don’t leave your variables with names like “X1”).

##   Variable        Mean Standard.Deviation
## 1      EFW 6.818697395          0.9310743
## 2      ECI 0.003458884          1.0542632

In general you may or may not want to include more detail about the “shape” of the data. For example, the minimum and maximum may be interesting for some variables, but including it for several variables may be cumbersome. If one of your variables has a notable distribution you may wish to either mention it or include a histogram of that variable.

Your assignment should also include a scatterplot displaying the data with a line of best fit.

library(ggplot2)
plot <- ggplot(aes(EFW,ECI),data=data)
plot + geom_point() + geom_smooth(method="lm")

Finally, you should include a summary of the linear regression results and a brief explanation of the findings along with any relevant interpretations.

summary(out1 <- lm(ECI~EFW,data=data))

## 
## Call:
## lm(formula = ECI ~ EFW, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0429 -0.7177 -0.1226  0.7379  2.6249 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.14569    0.17062  -24.30   <2e-16 ***
## EFW          0.61571    0.02479   24.83   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8928 on 1495 degrees of freedom
##   (119 observations deleted due to missingness)
## Multiple R-squared:  0.292,  Adjusted R-squared:  0.2916 
## F-statistic: 616.7 on 1 and 1495 DF,  p-value: < 2.2e-16

In this case, we can see a strong positive relationship between EFW and ECI. It appears that countries with a high degree of economic freedom also tend to have a high degree of economic complexity (although at this point it’s hard to tell if this is a causal relationship or if I’m missing some other variable). This model explains just under 30% of the variation in ECI (\(R^2 = 0.29\)). The EFW coefficient is statistically and economically significant. Increasing a country’s EFW score by one is assocated with an increased ECI score of about 0.62. Since both variables have a standard deviation of close to 1, this is a fairly significant change.

One standard deviation change in EFW is associated with about 0.6 of a standard deviation of

Optionally, if you did some fancy data wrangling and you’d like to show off, feel free to include your R code. If I’m impressed enough I might give you some extra credit points. Don’t let this lead you to only take on the most difficult data cleaning projects, but let it be some incentive to put in the extra work when the results are interesting.

## Incorporating EFW data with ISO code data to sort out names and add continent and region information.
library(readr) ; library(dplyr) ; library(tidyr)
# EFW data
EFW <- read.csv("../../../../../../Write/EFW/EFWcomponents.csv") %>% tbl_df()
## Note on file location:
# Each "../" means, "go up one directory." 
# So if my working directory is C:/Users/weberr/Dropbox/Teach/classes/F15/380/Homework Template/Simple OLS
# then "../../" means C:/Users/weberr/Dropbox/Teach/classes/F15/380
# Line 5 is looking for C:/Users/weberr/Dropbox/Write/EFW/EFWcomponents.csv
##
# Get rid of everything that isn't normalized into the 0-10 scale.
EFW <- EFW[,c(1:4,6,8,10,12,14:26,28,30,32:34,36,38,40:69)]
# Save the poorly formatted, but descriptive, variable names
variables.EFW <- colnames(EFW)
# Rename the variables to something easier to use.
names.EFW <- c("ISO_Code",
               "Year","Country",
               paste("EFW_",1,LETTERS[1:3],sep=""),
               "EFW_1Di","EFW_1Dii","EFW_1D","EFW_1",
               paste("EFW_",2,LETTERS[1:9],sep=""),
               "EFW_2",
               paste("EFW_",3,LETTERS[1:4],sep=""),
               "EFW_3",
               paste("EFW_",4,"A",tolower(as.roman(1:3)),sep=""),
               "EFW_4A",
               paste("EFW_",4,"B",tolower(as.roman(1:2)),sep=""),
               "EFW_4B","EFW_4C",
               paste("EFW_",4,"D",tolower(as.roman(1:3)),sep=""),
               "EFW_4D","EFW_4",
               paste("EFW_",5,"A",tolower(as.roman(1:3)),sep=""),
               "EFW_5A",
               paste("EFW_",5,"B",tolower(as.roman(1:6)),sep=""),
               "EFW_5B",
               paste("EFW_",5,"C",tolower(as.roman(1:6)),sep=""),
               "EFW_5C",
               "EFW_5",
               "EFW")
colnames(EFW) <- names.EFW
EFW.bak <- EFW ; #EFW <- EFW.bak
################################################################################
# Sort out the ISO data to allow a continent/region variable
ISO.reg <- tbl_df(read.csv("ISO_Codes_Regions.csv",comment.char="#"))
ISO.nam <- tbl_df(read_csv("ISO_Codes_Names.csv"))
ISO.nam$Country2 <- as.character(ISO.nam$Country2)
ISO.reg$Country2 <- as.character(ISO.reg$Country2)
#ISO.nam$ISO_Code <- as.character(ISO.nam$ISO_Code)
# The UN isn't allowed to recognize Taiwan... sort that out.
ISO.nam <- rbind(ISO.nam,c(490,"Taiwan","TWN"))
ISO.reg <- rbind(ISO.reg,c(490,"Taiwan",142,30))
# Change numeric codes for continents to character descriptions
ISO.reg$Continent <- sub(142,"Asia",ISO.reg$Continent)
ISO.reg$Continent <- sub(150,"Europe",ISO.reg$Continent)
ISO.reg$Continent <- sub(419,"Latin America and Caribbean",ISO.reg$Continent)
ISO.reg$Continent <- sub(021,"North America",ISO.reg$Continent)
ISO.reg$Continent <- sub(9,"Oceania",ISO.reg$Continent)
ISO.reg$Continent <- sub(2,"Africa",ISO.reg$Continent)
# Regions
ISO.reg$Region <- sub(14,"Eastern Africa",ISO.reg$Region)
ISO.reg$Region <- sub(17,"Middle Africa",ISO.reg$Region)
ISO.reg$Region <- sub(15,"Northern Africa",ISO.reg$Region)
ISO.reg$Region <- sub(18,"Southern Africa",ISO.reg$Region)
ISO.reg$Region <- sub(11,"Western Africa",ISO.reg$Region)
#
ISO.reg$Region <- sub(29,"Caribbean",ISO.reg$Region)
ISO.reg$Region <- sub(13,"Central America",ISO.reg$Region)
ISO.reg$Region <- sub(5,"South America",ISO.reg$Region)
#
ISO.reg$Region <- sub(143,"Central Asia",ISO.reg$Region)
ISO.reg$Region <- sub(30,"Eastern Asia",ISO.reg$Region)
ISO.reg$Region <- sub(34,"Southern Asia",ISO.reg$Region)
ISO.reg$Region <- sub(35,"South-Eastern Asia",ISO.reg$Region)
ISO.reg$Region <- sub(145,"Western Asia",ISO.reg$Region)
#
ISO.reg$Region <- sub(151,"Eastern Europe",ISO.reg$Region)
ISO.reg$Region <- sub(154,"Northern Europe",ISO.reg$Region)
ISO.reg$Region <- sub(39,"Southern Europe",ISO.reg$Region)
ISO.reg$Region <- sub(155,"Western Europe",ISO.reg$Region)
#
ISO.reg$Region <- sub(53,"Australia and New Zealand",ISO.reg$Region)
ISO.reg$Region <- sub(54,"Melanesia",ISO.reg$Region)
ISO.reg$Region <- sub(57,"Micronesia",ISO.reg$Region)
ISO.reg$Region <- sub(61,"Polynesia",ISO.reg$Region)
ISO <- merge(ISO.reg,ISO.nam) %>% tbl_df() # And put it all together.
################################################################################
# EFW lists Barbados as BRD instead of BRB
EFW$ISO_Code <- as.character(EFW$ISO_Code)
EFW[EFW$ISO_Code=="BRD",]$ISO_Code <- "BRB"
################################################################################
# Put it all together
EFW <- merge(ISO,EFW) %>% tbl_df()
# Hooray!
rm(ISO,ISO.reg,ISO.nam,EFW.bak,names.EFW) # Get rid of objects we aren't going to use again
write_csv(EFW,"EFW-clean.csv") # And spit out a cleaned version of the data in this working directory.
# done

Requirements

A complete submission will include the following:

A complete works cited section
A verbal description of your data
The mean and standard deviation of each of your data series
A scatterplot of your data with a line of best fit
The summary from a linear regression of your two variables

Works Cited

Economic Freedom of the World Index

Data available at http://www.freetheworld.com/datasets_efw.html

Authors: James Gwartney, Robert Lawson, and Joshua Hall

Title: 2014 Economic Freedom Dataset, published in Economic Freedom of the World: 2014 Annual Report

Publisher: Fraser Institute

Year: 2014

URL: http://www.freetheworld.com/datasets_efw.html

Economic Complexity Index

Data available at http://atlas.cid.harvard.edu/rankings/

“Country Rankings (2013) | The Atlas of Economic Complexity.” 2015. Accessed October 5. http://atlas.cid.harvard.edu/rankings/.

Simple OLS: X and Y

Rick Weber

Sunday, October 04, 2015

Requirements

Works Cited

Economic Freedom of the World Index

Economic Complexity Index