Data

I chose the OECD gas dataset. Included variables are country, year, and the log values of gas consumption, per-capita income, price of gasoline, and stock of cars. I’d like to model gas consumption using cars as the independent variable. I expected the coefficient to be positive, but it was not, so I used the fixed effects approach to account for OVB.

EDA, Panel balance

data("OECDGas")

stargazer(OECDGas, 
          type = "text",
          title = "Summary Statistics",
          style = "qje")
## 
## Summary Statistics
## -----------------------------------------------
## Statistic  N    Mean    St. Dev.   Min    Max  
## ===============================================
## year      342 1,969.000  5.485    1,960  1,978 
## gas       342   4.296    0.549    3.380  6.157 
## income    342  -6.139    0.635   -8.073  -5.221
## price     342  -0.523    0.678   -2.896  1.125 
## cars      342  -9.042    1.219   -13.475 -7.536
## ===============================================
missmap(OECDGas)

#No missing values. Let's check the number of years for each country (member of the panel)
unique(OECDGas$country)
##  [1] Austria     Belgium     Canada      Denmark     France      Germany    
##  [7] Greece      Ireland     Italy       Japan       Netherlands Norway     
## [13] Spain       Sweden      Switzerland Turkey      UK          USA        
## 18 Levels: Austria Belgium Canada Denmark France Germany Greece ... USA

The years of the data span from 1969-1978. The entities, or members of the study group are 18 countries. They are mostly large European countries, and also the US, Japan and Canada. Now I’ll see if the panel is balanced. I know that there are no missing values per the missingness map above, so now I just need to confirm that each country is represented across the time span of the data.

# Check if each country has the same years
summary_years <- OECDGas %>%
  group_by(country) %>%
  summarize(min_year = min(year), max_year = max(year), unique_years = n_distinct(year))

kable(summary_years, caption = "Check of balanced data", format = "html")
Check of balanced data
country min_year max_year unique_years
Austria 1960 1978 19
Belgium 1960 1978 19
Canada 1960 1978 19
Denmark 1960 1978 19
France 1960 1978 19
Germany 1960 1978 19
Greece 1960 1978 19
Ireland 1960 1978 19
Italy 1960 1978 19
Japan 1960 1978 19
Netherlands 1960 1978 19
Norway 1960 1978 19
Spain 1960 1978 19
Sweden 1960 1978 19
Switzerland 1960 1978 19
Turkey 1960 1978 19
UK 1960 1978 19
USA 1960 1978 19

As you can see above, each country has the same start year, end year, and number of years in the dataset. This indicates that the panel is balanced, meaning each member of the study group is represented evenly (course, data quality could vary by country, but we don’t have information on that). Now I’ll create a basic model of gas consumption using cars per capita as my dependent variable.

\[gas consumption= \beta0 + cars\beta1 + \epsilon\]

gasmodel1 <- lm(gas ~ cars, data = OECDGas)
stargazer(gasmodel1, 
          type = "text",
          title = "Summary Statistics")
## 
## Summary Statistics
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                 gas            
## -----------------------------------------------
## cars                         -0.312***         
##                               (0.018)          
##                                                
## Constant                     1.471***          
##                               (0.160)          
##                                                
## -----------------------------------------------
## Observations                    342            
## R2                             0.481           
## Adjusted R2                    0.480           
## Residual Std. Error      0.396 (df = 340)      
## F Statistic          315.565*** (df = 1; 340)  
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

Basic OLS model interpretation As the stock of cars increases by one percentage point, consumption of gas falls by 0.35%. It is statistically significant at the 99% confidence level. This is counter-intuitive to me. I would have expected that more cars would be associated with more gasoline consumption. There appears to be some omitted variable bias in the model. What could be causing this?

OLS with country fixed effects i.e. dummies

\[gas consumption= \beta0 + \beta1cars_{it} + country1_i\gamma2 + country2_i\gamma3 + .... country18_i\gamma18 + \epsilon\]

gasmodel_fe_entity <- lm(gas ~ cars + country,data = OECDGas)
stargazer(gasmodel_fe_entity, 
          type = "text",
          title = "Summary Statistics")
## 
## Summary Statistics
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                 gas            
## -----------------------------------------------
## cars                         -0.352***         
##                               (0.011)          
##                                                
## countryBelgium                -0.058           
##                               (0.036)          
##                                                
## countryCanada                1.075***          
##                               (0.037)          
##                                                
## countryDenmark               0.226***          
##                               (0.036)          
##                                                
## countryFrance                -0.102***         
##                               (0.036)          
##                                                
## countryGermany                -0.043           
##                               (0.036)          
##                                                
## countryGreece                0.142***          
##                               (0.042)          
##                                                
## countryIreland               0.103***          
##                               (0.036)          
##                                                
## countryItaly                 -0.319***         
##                               (0.036)          
##                                                
## countryJapan                 0.257***          
##                               (0.038)          
##                                                
## countryNetherlands             0.035           
##                               (0.036)          
##                                                
## countryNorway                 0.082**          
##                               (0.036)          
##                                                
## countrySpain                 -0.370***         
##                               (0.038)          
##                                                
## countrySweden                0.160***          
##                               (0.037)          
##                                                
## countrySwitzerland           0.289***          
##                               (0.036)          
##                                                
## countryTurkey                0.440***          
##                               (0.053)          
##                                                
## countryUK                      0.034           
##                               (0.036)          
##                                                
## countryUSA                   1.138***          
##                               (0.038)          
##                                                
## Constant                     0.945***          
##                               (0.099)          
##                                                
## -----------------------------------------------
## Observations                    342            
## R2                             0.961           
## Adjusted R2                    0.959           
## Residual Std. Error      0.111 (df = 323)      
## F Statistic          443.560*** (df = 18; 323) 
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01
#Plot coefficients
coefplot(gasmodel_fe_entity, title = "Country FE model coefficients", xlab = "Coefficients")

Adding the country fixed effects in the form of dummy variables for each country. The dummy variables capture the fixed, time-invariant characteristics of each entity that are not included in the independent variables. This helps control for individual-specific effects. By including these, I had hoped to control for any country specific characteristics that could introduce bias. As you can see, the cars coefficient was nearly the same, so the fixed effects did not. OLS with country and time fixed effects

\[gas consumption= \beta0 + \beta1cars_{it} + country1_i\gamma2 + country2_i\gamma3 + .... country18_i\gamma18 + year1_t\delta1 + year19_t\delta19 + \epsilon\]

gasmodel_fe_time_entity <- lm(gas ~ cars + factor(year)+ country ,data = OECDGas)
stargazer(gasmodel_fe_time_entity, 
          type = "text",
          title = "Summary Statistics")
## 
## Summary Statistics
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                 gas            
## -----------------------------------------------
## cars                         -0.579***         
##                               (0.017)          
##                                                
## factor(year)1961              0.050*           
##                               (0.028)          
##                                                
## factor(year)1962              0.058**          
##                               (0.028)          
##                                                
## factor(year)1963             0.085***          
##                               (0.029)          
##                                                
## factor(year)1964             0.135***          
##                               (0.029)          
##                                                
## factor(year)1965             0.154***          
##                               (0.030)          
##                                                
## factor(year)1966             0.200***          
##                               (0.030)          
##                                                
## factor(year)1967             0.234***          
##                               (0.031)          
##                                                
## factor(year)1968             0.268***          
##                               (0.032)          
##                                                
## factor(year)1969             0.288***          
##                               (0.033)          
##                                                
## factor(year)1970             0.328***          
##                               (0.033)          
##                                                
## factor(year)1971             0.358***          
##                               (0.034)          
##                                                
## factor(year)1972             0.392***          
##                               (0.035)          
##                                                
## factor(year)1973             0.439***          
##                               (0.035)          
##                                                
## factor(year)1974             0.373***          
##                               (0.036)          
##                                                
## factor(year)1975             0.407***          
##                               (0.037)          
##                                                
## factor(year)1976             0.418***          
##                               (0.037)          
##                                                
## factor(year)1977             0.438***          
##                               (0.038)          
##                                                
## factor(year)1978             0.460***          
##                               (0.038)          
##                                                
## countryBelgium                -0.008           
##                               (0.027)          
##                                                
## countryCanada                1.250***          
##                               (0.030)          
##                                                
## countryDenmark               0.287***          
##                               (0.027)          
##                                                
## countryFrance                 -0.012           
##                               (0.028)          
##                                                
## countryGermany                 0.035           
##                               (0.028)          
##                                                
## countryGreece                -0.298***         
##                               (0.042)          
##                                                
## countryIreland                0.060**          
##                               (0.027)          
##                                                
## countryItaly                 -0.315***         
##                               (0.027)          
##                                                
## countryJapan                   0.007           
##                               (0.033)          
##                                                
## countryNetherlands             0.042           
##                               (0.027)          
##                                                
## countryNorway                0.101***          
##                               (0.027)          
##                                                
## countrySpain                 -0.609***         
##                               (0.032)          
##                                                
## countrySweden                0.296***          
##                               (0.029)          
##                                                
## countrySwitzerland           0.359***          
##                               (0.028)          
##                                                
## countryTurkey                -0.382***         
##                               (0.066)          
##                                                
## countryUK                    0.102***          
##                               (0.028)          
##                                                
## countryUSA                   1.381***          
##                               (0.032)          
##                                                
## Constant                     -1.338***         
##                               (0.165)          
##                                                
## -----------------------------------------------
## Observations                    342            
## R2                             0.979           
## Adjusted R2                    0.977           
## Residual Std. Error      0.084 (df = 305)      
## F Statistic          400.098*** (df = 36; 305) 
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01
#Plot coefficients
coefplot(gasmodel_fe_time_entity, title = "Country FE model coefficients", xlab = "Coefficients")

Adding the time dummies increases the total number of variables in the model noticeably and does not have the intended effect of causing the coefficient of cars to be positive. The coefficient does change, but is even lower. The magnitude of the change in the cars coefficient and the high number of variables suggests multicollinearity may be an issue here. Of the three models, the country fixed effects model is likely the best one as it contains the fixed effects and has a lower risk of multicollinearity.