Data
I chose the OECD gas dataset. Included variables are country, year, and the log values of gas consumption, per-capita income, price of gasoline, and stock of cars. I’d like to model gas consumption using cars as the independent variable. I expected the coefficient to be positive, but it was not, so I used the fixed effects approach to account for OVB.
EDA, Panel balance
data("OECDGas")
stargazer(OECDGas,
type = "text",
title = "Summary Statistics",
style = "qje")
##
## Summary Statistics
## -----------------------------------------------
## Statistic N Mean St. Dev. Min Max
## ===============================================
## year 342 1,969.000 5.485 1,960 1,978
## gas 342 4.296 0.549 3.380 6.157
## income 342 -6.139 0.635 -8.073 -5.221
## price 342 -0.523 0.678 -2.896 1.125
## cars 342 -9.042 1.219 -13.475 -7.536
## ===============================================
missmap(OECDGas)
#No missing values. Let's check the number of years for each country (member of the panel)
unique(OECDGas$country)
## [1] Austria Belgium Canada Denmark France Germany
## [7] Greece Ireland Italy Japan Netherlands Norway
## [13] Spain Sweden Switzerland Turkey UK USA
## 18 Levels: Austria Belgium Canada Denmark France Germany Greece ... USA
The years of the data span from 1969-1978. The entities, or members of the study group are 18 countries. They are mostly large European countries, and also the US, Japan and Canada. Now I’ll see if the panel is balanced. I know that there are no missing values per the missingness map above, so now I just need to confirm that each country is represented across the time span of the data.
# Check if each country has the same years
summary_years <- OECDGas %>%
group_by(country) %>%
summarize(min_year = min(year), max_year = max(year), unique_years = n_distinct(year))
kable(summary_years, caption = "Check of balanced data", format = "html")
| country | min_year | max_year | unique_years |
|---|---|---|---|
| Austria | 1960 | 1978 | 19 |
| Belgium | 1960 | 1978 | 19 |
| Canada | 1960 | 1978 | 19 |
| Denmark | 1960 | 1978 | 19 |
| France | 1960 | 1978 | 19 |
| Germany | 1960 | 1978 | 19 |
| Greece | 1960 | 1978 | 19 |
| Ireland | 1960 | 1978 | 19 |
| Italy | 1960 | 1978 | 19 |
| Japan | 1960 | 1978 | 19 |
| Netherlands | 1960 | 1978 | 19 |
| Norway | 1960 | 1978 | 19 |
| Spain | 1960 | 1978 | 19 |
| Sweden | 1960 | 1978 | 19 |
| Switzerland | 1960 | 1978 | 19 |
| Turkey | 1960 | 1978 | 19 |
| UK | 1960 | 1978 | 19 |
| USA | 1960 | 1978 | 19 |
As you can see above, each country has the same start year, end year, and number of years in the dataset. This indicates that the panel is balanced, meaning each member of the study group is represented evenly (course, data quality could vary by country, but we don’t have information on that). Now I’ll create a basic model of gas consumption using cars per capita as my dependent variable.
\[gas consumption= \beta0 + cars\beta1 + \epsilon\]
gasmodel1 <- lm(gas ~ cars, data = OECDGas)
stargazer(gasmodel1,
type = "text",
title = "Summary Statistics")
##
## Summary Statistics
## ===============================================
## Dependent variable:
## ---------------------------
## gas
## -----------------------------------------------
## cars -0.312***
## (0.018)
##
## Constant 1.471***
## (0.160)
##
## -----------------------------------------------
## Observations 342
## R2 0.481
## Adjusted R2 0.480
## Residual Std. Error 0.396 (df = 340)
## F Statistic 315.565*** (df = 1; 340)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Basic OLS model interpretation As the stock of cars increases by one percentage point, consumption of gas falls by 0.35%. It is statistically significant at the 99% confidence level. This is counter-intuitive to me. I would have expected that more cars would be associated with more gasoline consumption. There appears to be some omitted variable bias in the model. What could be causing this?
OLS with country fixed effects i.e. dummies
\[gas consumption= \beta0 + \beta1cars_{it} + country1_i\gamma2 + country2_i\gamma3 + .... country18_i\gamma18 + \epsilon\]
gasmodel_fe_entity <- lm(gas ~ cars + country,data = OECDGas)
stargazer(gasmodel_fe_entity,
type = "text",
title = "Summary Statistics")
##
## Summary Statistics
## ===============================================
## Dependent variable:
## ---------------------------
## gas
## -----------------------------------------------
## cars -0.352***
## (0.011)
##
## countryBelgium -0.058
## (0.036)
##
## countryCanada 1.075***
## (0.037)
##
## countryDenmark 0.226***
## (0.036)
##
## countryFrance -0.102***
## (0.036)
##
## countryGermany -0.043
## (0.036)
##
## countryGreece 0.142***
## (0.042)
##
## countryIreland 0.103***
## (0.036)
##
## countryItaly -0.319***
## (0.036)
##
## countryJapan 0.257***
## (0.038)
##
## countryNetherlands 0.035
## (0.036)
##
## countryNorway 0.082**
## (0.036)
##
## countrySpain -0.370***
## (0.038)
##
## countrySweden 0.160***
## (0.037)
##
## countrySwitzerland 0.289***
## (0.036)
##
## countryTurkey 0.440***
## (0.053)
##
## countryUK 0.034
## (0.036)
##
## countryUSA 1.138***
## (0.038)
##
## Constant 0.945***
## (0.099)
##
## -----------------------------------------------
## Observations 342
## R2 0.961
## Adjusted R2 0.959
## Residual Std. Error 0.111 (df = 323)
## F Statistic 443.560*** (df = 18; 323)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
#Plot coefficients
coefplot(gasmodel_fe_entity, title = "Country FE model coefficients", xlab = "Coefficients")
Adding the country fixed effects in the form of dummy variables for each country. The dummy variables capture the fixed, time-invariant characteristics of each entity that are not included in the independent variables. This helps control for individual-specific effects. By including these, I had hoped to control for any country specific characteristics that could introduce bias. As you can see, the cars coefficient was nearly the same, so the fixed effects did not. OLS with country and time fixed effects
\[gas consumption= \beta0 + \beta1cars_{it} + country1_i\gamma2 + country2_i\gamma3 + .... country18_i\gamma18 + year1_t\delta1 + year19_t\delta19 + \epsilon\]
gasmodel_fe_time_entity <- lm(gas ~ cars + factor(year)+ country ,data = OECDGas)
stargazer(gasmodel_fe_time_entity,
type = "text",
title = "Summary Statistics")
##
## Summary Statistics
## ===============================================
## Dependent variable:
## ---------------------------
## gas
## -----------------------------------------------
## cars -0.579***
## (0.017)
##
## factor(year)1961 0.050*
## (0.028)
##
## factor(year)1962 0.058**
## (0.028)
##
## factor(year)1963 0.085***
## (0.029)
##
## factor(year)1964 0.135***
## (0.029)
##
## factor(year)1965 0.154***
## (0.030)
##
## factor(year)1966 0.200***
## (0.030)
##
## factor(year)1967 0.234***
## (0.031)
##
## factor(year)1968 0.268***
## (0.032)
##
## factor(year)1969 0.288***
## (0.033)
##
## factor(year)1970 0.328***
## (0.033)
##
## factor(year)1971 0.358***
## (0.034)
##
## factor(year)1972 0.392***
## (0.035)
##
## factor(year)1973 0.439***
## (0.035)
##
## factor(year)1974 0.373***
## (0.036)
##
## factor(year)1975 0.407***
## (0.037)
##
## factor(year)1976 0.418***
## (0.037)
##
## factor(year)1977 0.438***
## (0.038)
##
## factor(year)1978 0.460***
## (0.038)
##
## countryBelgium -0.008
## (0.027)
##
## countryCanada 1.250***
## (0.030)
##
## countryDenmark 0.287***
## (0.027)
##
## countryFrance -0.012
## (0.028)
##
## countryGermany 0.035
## (0.028)
##
## countryGreece -0.298***
## (0.042)
##
## countryIreland 0.060**
## (0.027)
##
## countryItaly -0.315***
## (0.027)
##
## countryJapan 0.007
## (0.033)
##
## countryNetherlands 0.042
## (0.027)
##
## countryNorway 0.101***
## (0.027)
##
## countrySpain -0.609***
## (0.032)
##
## countrySweden 0.296***
## (0.029)
##
## countrySwitzerland 0.359***
## (0.028)
##
## countryTurkey -0.382***
## (0.066)
##
## countryUK 0.102***
## (0.028)
##
## countryUSA 1.381***
## (0.032)
##
## Constant -1.338***
## (0.165)
##
## -----------------------------------------------
## Observations 342
## R2 0.979
## Adjusted R2 0.977
## Residual Std. Error 0.084 (df = 305)
## F Statistic 400.098*** (df = 36; 305)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
#Plot coefficients
coefplot(gasmodel_fe_time_entity, title = "Country FE model coefficients", xlab = "Coefficients")
Adding the time dummies increases the total number of variables in the model noticeably and does not have the intended effect of causing the coefficient of cars to be positive. The coefficient does change, but is even lower. The magnitude of the change in the cars coefficient and the high number of variables suggests multicollinearity may be an issue here. Of the three models, the country fixed effects model is likely the best one as it contains the fixed effects and has a lower risk of multicollinearity.