Regression Discussion 7

Author

Langley

Setup

# Clear the workspace
rm(list = ls()) # Clear environment-remove all files from your workspace
gc()            # Clear unused memory
          used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells  599859 32.1    1363758 72.9         NA   700248 37.4
Vcells 1104475  8.5    8388608 64.0      18432  1963387 15.0
cat("\f")       # Clear the console
graphics.off()  # Clear all graphs


# Prepare needed libraries
packages <- c("AER",          # load dataset
              "plm",          # for FE
              "stargazer",    # summary stats for sharing
              "ggplot2",      #for graphing,
              "psych",
              "naniar",      # for visualisation of missing data,
              "visdat",      # for visualisation of missing data,
              "VIM",         # for visualisation of missing data,
              "DataExplorer", # for visualisation of missing data,
              "dplyr",
              "magrittr"
              )

for (i in 1:length(packages)) {
  if (!packages[i] %in% rownames(installed.packages())) {
    install.packages(packages[i]
                     , repos = "http://cran.rstudio.com/"
                     , dependencies = TRUE
                     )
  }
  library(packages[i], character.only = TRUE)
}
Loading required package: car
Loading required package: carData
Loading required package: lmtest
Loading required package: zoo

Attaching package: 'zoo'
The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
Loading required package: sandwich
Loading required package: survival

Please cite as: 
 Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
 R package version 5.2.3. https://CRAN.R-project.org/package=stargazer 

Attaching package: 'psych'
The following objects are masked from 'package:ggplot2':

    %+%, alpha
The following object is masked from 'package:car':

    logit
Loading required package: colorspace
Loading required package: grid
VIM is ready to use.
Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues

Attaching package: 'VIM'
The following object is masked from 'package:datasets':

    sleep

Attaching package: 'dplyr'
The following objects are masked from 'package:plm':

    between, lag, lead
The following object is masked from 'package:car':

    recode
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
rm(packages)

Data Loading

#install.packages("AER")
library(AER)
data("NaturalGas")


head(NaturalGas)
  state statecode year consumption price eprice oprice lprice heating   income
1    NY        35 1967      313656  1.42   2.98   7.40   1.47    6262 10903.75
2    NY        35 1968      319282  1.38   2.91   7.77   1.42    6125 11370.02
3    NY        35 1969      331326  1.37   2.84   7.96   1.38    6040 11578.68
4    NY        35 1970      346533  1.40   2.87   8.33   1.37    6085 11586.77
5    NY        35 1971      352085  1.50   3.07   8.80   1.40    5907 11657.42
6    NY        35 1972      363412  1.62   3.26   8.85   1.50    6248 11860.80
naturalgas <- as.data.frame(NaturalGas)

Summary Stats

stargazer(naturalgas, type= "text")

============================================================
Statistic    N     Mean      St. Dev.      Min       Max    
------------------------------------------------------------
consumption 138 252,901.500 184,478.100   9,430    637,289  
price       138    3.422       2.169      0.680     8.060   
eprice      138    5.054       2.578      1.980     10.860  
oprice      138   24.636      15.401      5.010     51.730  
lprice      138    3.208       2.125      0.680     7.870   
heating     138  4,154.529   2,451.998     481      7,440   
income      138 11,193.240   1,906.726  7,465.340 16,425.330
------------------------------------------------------------

Balanced Dataset- 138 observations for each of the 10 variables. 3 are factor variables.

str(naturalgas)
'data.frame':   138 obs. of  10 variables:
 $ state      : Factor w/ 6 levels "CA","FL","MI",..: 4 4 4 4 4 4 4 4 4 4 ...
 $ statecode  : Factor w/ 6 levels "5","10","23",..: 4 4 4 4 4 4 4 4 4 4 ...
 $ year       : Factor w/ 23 levels "1967","1968",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ consumption: int  313656 319282 331326 346533 352085 363412 342608 341032 327384 339949 ...
 $ price      : num  1.42 1.38 1.37 1.4 1.5 1.62 1.74 2 2.54 2.87 ...
 $ eprice     : num  2.98 2.91 2.84 2.87 3.07 3.26 3.51 4.66 5.13 5.37 ...
 $ oprice     : num  7.4 7.77 7.96 8.33 8.8 ...
 $ lprice     : num  1.47 1.42 1.38 1.37 1.4 1.5 1.62 1.74 2 2.54 ...
 $ heating    : int  6262 6125 6040 6085 5907 6248 5450 5858 5583 6238 ...
 $ income     : num  10904 11370 11579 11587 11657 ...

Factor Variables include: State, Statecode, and Year.

State: State Abbreviation

Year: factor Coding Year

Consumption: Consumption of natural gas by residential sector

Price: Price of Natural Gas

Eprice: Price of Electricity

Oprice: Price of distillate fuel oil

Lprice: Price of liquified petroleum gas

Heating: Heating degree days

Income: Real per-capita personal income

OLS

Does Income, and pricing effect consumption?

\[ Consumption_i = \beta_0 + \beta_1Income_i+\beta_2Price_i+\beta_3Eprice_i+ \epsilon_i \]

model1 <- lm(consumption ~ price + eprice + income, data = naturalgas)

stargazer(model1, type = "text")

===============================================
                        Dependent variable:    
                    ---------------------------
                            consumption        
-----------------------------------------------
price                     -100,083.800***      
                           (10,761.060)        
                                               
eprice                     28,924.930***       
                            (9,671.677)        
                                               
income                      101.267***         
                              (6.140)          
                                               
Constant                  -684,263.200***      
                           (55,694.640)        
                                               
-----------------------------------------------
Observations                    138            
R2                             0.732           
Adjusted R2                    0.726           
Residual Std. Error    96,541.900 (df = 134)   
F Statistic          122.080*** (df = 3; 134)  
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01

The estimated coefficients here state that of the three predictor varibales, price, eprice and income, eprice and price are the most significant while income is slightly less. The magnitude of the price coefficient, meaning if the price of natural gas increases by 1 dollar, the consumption by residential sector will decrease by 100 083 units, holding all other variables constant.. The Electricity price estimator has a coefficient of 28 924, which predicts with every 1 dollar increase in electricity price consumption will increase in the residential sector by 28 924 units, holding all other variables constant. This is unexpected - we would expect when price increases the consumption would decrease. Lastly, income coefficient is 101.267, therefore a 1 dollar increase in salary is estimated to increase consumption roughly by 101 units, holding all other variables constant.

Factors that could influence consumption that are not included in this OLS equation:

Time Invariant

  • Weather conditions: Some states having higher number of heating degree days/cooling degree days

  • Population Density: Higher density areas might have different consumption patterns due to differences in housing types

  • Age Distribution: Different age groups may have varying energy needs and consumption habits.

  • Household Types: More or less people in a housing unit could contribute to natural gas consumption

Time Variant

  • Economic Conditions: Factors such as national unemployment rates or economic growth affect all states and vary over time.

  • National Energy Policies: Changes in federal regulations or incentives that impact energy consumption across all states would be time-varying.

    Fixed Effect Model 1

\[ Consumption_{it} = \beta_0 + \beta_1Income_{it}+\beta_2Price_{it}+\beta_3Eprice_{it} + \beta_4 FL_i + \\\beta_5MI_i+ \beta_6NY_i + \beta_7TX_i+ \beta_8UT_i + \epsilon_{it} \]

fe_model1 <- lm(consumption ~ price + eprice + income + factor(state), data = naturalgas)
  
stargazer(fe_model1, type = "text")

===============================================
                        Dependent variable:    
                    ---------------------------
                            consumption        
-----------------------------------------------
price                       -1,305.957         
                            (5,356.337)        
                                               
eprice                      -3,193.104         
                            (4,683.359)        
                                               
income                         4.014           
                              (3.565)          
                                               
factor(state)FL           -521,146.200***      
                           (13,769.690)        
                                               
factor(state)MI           -199,813.300***      
                            (9,102.106)        
                                               
factor(state)NY           -201,307.200***      
                            (8,675.776)        
                                               
factor(state)TX           -307,050.900***      
                           (11,383.360)        
                                               
factor(state)UT           -480,960.500***      
                           (15,788.260)        
                                               
Constant                  513,625.000***       
                           (38,911.210)        
                                               
-----------------------------------------------
Observations                    138            
R2                             0.982           
Adjusted R2                    0.981           
Residual Std. Error    25,642.080 (df = 129)   
F Statistic          870.243*** (df = 8; 129)  
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01

The coefficients do end up changing which is expected. This is due to the addition of those omitted state variables, the model may now be capturing different relationships than it would if it only considered overall averages of the states. The eprice magnitude and negative relationship now matches what is expected, as the price of electricity increases by one dollar the consumption now is estimated to decrease. When controlling for these state effects by including them in the model it seems that the relationship that was predicted between the price, eprice and income with the consumption of natural gas all weakened to insignificant.

By keeping price, eprice and income in the model while adding state dummy variables the model now controls for both some time invariant and time variant.

Two Way - State and Time

\[ Consumption_{it} = \beta_0 + \beta_1Income_{it}+\beta_2Price_{it}+\beta_3Eprice_{it} + \beta_4 FL_i + \\\beta_5MI_i+ \beta_6NY_i + \beta_7TX_i+ \beta_8UT_i + \beta_nYear(n)_t + \epsilon_{it} \]

fe_model2 <- lm(consumption ~ price + eprice + income + factor(state) + factor(year), data = naturalgas)
  
stargazer(fe_model2, type = "text")

===============================================
                        Dependent variable:    
                    ---------------------------
                            consumption        
-----------------------------------------------
price                        8,870.303         
                            (7,764.214)        
                                               
eprice                      -2,442.999         
                            (5,551.516)        
                                               
income                        -5.857           
                              (5.049)          
                                               
factor(state)FL           -556,847.500***      
                           (18,433.120)        
                                               
factor(state)MI           -214,932.000***      
                           (10,368.850)        
                                               
factor(state)NY           -216,692.400***      
                           (12,399.400)        
                                               
factor(state)TX           -331,968.000***      
                           (14,626.630)        
                                               
factor(state)UT           -516,342.800***      
                           (21,230.460)        
                                               
factor(year)1968             6,211.188         
                           (14,274.950)        
                                               
factor(year)1969            22,032.600         
                           (14,508.850)        
                                               
factor(year)1970            26,562.100*        
                           (14,557.120)        
                                               
factor(year)1971           42,844.420***       
                           (14,641.250)        
                                               
factor(year)1972           49,615.860***       
                           (15,123.610)        
                                               
factor(year)1973           42,215.460***       
                           (15,667.700)        
                                               
factor(year)1974           33,600.010**        
                           (15,910.840)        
                                               
factor(year)1975           38,987.970**        
                           (16,000.380)        
                                               
factor(year)1976           38,781.380**        
                           (17,078.890)        
                                               
factor(year)1977            20,222.190         
                           (18,594.610)        
                                               
factor(year)1978            29,987.700         
                           (20,195.070)        
                                               
factor(year)1979           49,772.870**        
                           (21,800.360)        
                                               
factor(year)1980            26,787.090         
                           (25,245.850)        
                                               
factor(year)1981            11,608.450         
                           (28,749.950)        
                                               
factor(year)1982            12,655.480         
                           (32,158.010)        
                                               
factor(year)1983            -5,279.623         
                           (36,091.820)        
                                               
factor(year)1984            -4,362.007         
                           (38,895.080)        
                                               
factor(year)1985             3,277.635         
                           (39,656.850)        
                                               
factor(year)1986            -5,381.288         
                           (38,153.780)        
                                               
factor(year)1987             -564.530          
                           (38,139.290)        
                                               
factor(year)1988            10,485.910         
                           (39,244.700)        
                                               
factor(year)1989            20,620.480         
                           (40,870.500)        
                                               
Constant                  586,117.500***       
                           (55,293.780)        
                                               
-----------------------------------------------
Observations                    138            
R2                             0.986           
Adjusted R2                    0.982           
Residual Std. Error    24,510.650 (df = 107)   
F Statistic          255.123*** (df = 30; 107) 
===============================================
Note:               *p<0.1; **p<0.05; ***p<0.01

The coefficients change after implementing the year fixed effects as well. This alters the relationships in the model, the estimate coefficients and the significance similarly to how adding state dummy variable effected the original model.