Setup
# Clear the workspace
rm (list = ls ()) # Clear environment-remove all files from your workspace
gc () # Clear unused memory
used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 599859 32.1 1363758 72.9 NA 700248 37.4
Vcells 1104475 8.5 8388608 64.0 18432 1963387 15.0
cat (" \f " ) # Clear the console
graphics.off () # Clear all graphs
# Prepare needed libraries
packages <- c ("AER" , # load dataset
"plm" , # for FE
"stargazer" , # summary stats for sharing
"ggplot2" , #for graphing,
"psych" ,
"naniar" , # for visualisation of missing data,
"visdat" , # for visualisation of missing data,
"VIM" , # for visualisation of missing data,
"DataExplorer" , # for visualisation of missing data,
"dplyr" ,
"magrittr"
)
for (i in 1 : length (packages)) {
if (! packages[i] %in% rownames (installed.packages ())) {
install.packages (packages[i]
, repos = "http://cran.rstudio.com/"
, dependencies = TRUE
)
}
library (packages[i], character.only = TRUE )
}
Loading required package: car
Loading required package: carData
Loading required package: lmtest
Loading required package: zoo
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
Loading required package: sandwich
Loading required package: survival
Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
Attaching package: 'psych'
The following objects are masked from 'package:ggplot2':
%+%, alpha
The following object is masked from 'package:car':
logit
Loading required package: colorspace
Loading required package: grid
Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues
The following object is masked from 'package:datasets':
sleep
Attaching package: 'dplyr'
The following objects are masked from 'package:plm':
between, lag, lead
The following object is masked from 'package:car':
recode
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Data Loading
library (AER)
data ("NaturalGas" )
head (NaturalGas)
state statecode year consumption price eprice oprice lprice heating income
1 NY 35 1967 313656 1.42 2.98 7.40 1.47 6262 10903.75
2 NY 35 1968 319282 1.38 2.91 7.77 1.42 6125 11370.02
3 NY 35 1969 331326 1.37 2.84 7.96 1.38 6040 11578.68
4 NY 35 1970 346533 1.40 2.87 8.33 1.37 6085 11586.77
5 NY 35 1971 352085 1.50 3.07 8.80 1.40 5907 11657.42
6 NY 35 1972 363412 1.62 3.26 8.85 1.50 6248 11860.80
naturalgas <- as.data.frame (NaturalGas)
Summary Stats
stargazer (naturalgas, type= "text" )
============================================================
Statistic N Mean St. Dev. Min Max
------------------------------------------------------------
consumption 138 252,901.500 184,478.100 9,430 637,289
price 138 3.422 2.169 0.680 8.060
eprice 138 5.054 2.578 1.980 10.860
oprice 138 24.636 15.401 5.010 51.730
lprice 138 3.208 2.125 0.680 7.870
heating 138 4,154.529 2,451.998 481 7,440
income 138 11,193.240 1,906.726 7,465.340 16,425.330
------------------------------------------------------------
Balanced Dataset- 138 observations for each of the 10 variables. 3 are factor variables.
'data.frame': 138 obs. of 10 variables:
$ state : Factor w/ 6 levels "CA","FL","MI",..: 4 4 4 4 4 4 4 4 4 4 ...
$ statecode : Factor w/ 6 levels "5","10","23",..: 4 4 4 4 4 4 4 4 4 4 ...
$ year : Factor w/ 23 levels "1967","1968",..: 1 2 3 4 5 6 7 8 9 10 ...
$ consumption: int 313656 319282 331326 346533 352085 363412 342608 341032 327384 339949 ...
$ price : num 1.42 1.38 1.37 1.4 1.5 1.62 1.74 2 2.54 2.87 ...
$ eprice : num 2.98 2.91 2.84 2.87 3.07 3.26 3.51 4.66 5.13 5.37 ...
$ oprice : num 7.4 7.77 7.96 8.33 8.8 ...
$ lprice : num 1.47 1.42 1.38 1.37 1.4 1.5 1.62 1.74 2 2.54 ...
$ heating : int 6262 6125 6040 6085 5907 6248 5450 5858 5583 6238 ...
$ income : num 10904 11370 11579 11587 11657 ...
Factor Variables include: State, Statecode, and Year.
State: State Abbreviation
Year: factor Coding Year
Consumption: Consumption of natural gas by residential sector
Price: Price of Natural Gas
Eprice: Price of Electricity
Oprice: Price of distillate fuel oil
Lprice: Price of liquified petroleum gas
Heating: Heating degree days
Income: Real per-capita personal income
OLS
Does Income, and pricing effect consumption?
\[
Consumption_i = \beta_0 + \beta_1Income_i+\beta_2Price_i+\beta_3Eprice_i+ \epsilon_i
\]
model1 <- lm (consumption ~ price + eprice + income, data = naturalgas)
stargazer (model1, type = "text" )
===============================================
Dependent variable:
---------------------------
consumption
-----------------------------------------------
price -100,083.800***
(10,761.060)
eprice 28,924.930***
(9,671.677)
income 101.267***
(6.140)
Constant -684,263.200***
(55,694.640)
-----------------------------------------------
Observations 138
R2 0.732
Adjusted R2 0.726
Residual Std. Error 96,541.900 (df = 134)
F Statistic 122.080*** (df = 3; 134)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
The estimated coefficients here state that of the three predictor varibales, price, eprice and income, eprice and price are the most significant while income is slightly less. The magnitude of the price coefficient, meaning if the price of natural gas increases by 1 dollar, the consumption by residential sector will decrease by 100 083 units, holding all other variables constant.. The Electricity price estimator has a coefficient of 28 924, which predicts with every 1 dollar increase in electricity price consumption will increase in the residential sector by 28 924 units, holding all other variables constant. This is unexpected - we would expect when price increases the consumption would decrease. Lastly, income coefficient is 101.267, therefore a 1 dollar increase in salary is estimated to increase consumption roughly by 101 units, holding all other variables constant.
Factors that could influence consumption that are not included in this OLS equation:
Time Invariant
Weather conditions: Some states having higher number of heating degree days/cooling degree days
Population Density: Higher density areas might have different consumption patterns due to differences in housing types
Age Distribution: Different age groups may have varying energy needs and consumption habits.
Household Types: More or less people in a housing unit could contribute to natural gas consumption
Time Variant
Economic Conditions : Factors such as national unemployment rates or economic growth affect all states and vary over time.
National Energy Policies : Changes in federal regulations or incentives that impact energy consumption across all states would be time-varying.
Fixed Effect Model 1
\[
Consumption_{it} = \beta_0 + \beta_1Income_{it}+\beta_2Price_{it}+\beta_3Eprice_{it} + \beta_4 FL_i + \\\beta_5MI_i+ \beta_6NY_i + \beta_7TX_i+ \beta_8UT_i + \epsilon_{it}
\]
fe_model1 <- lm (consumption ~ price + eprice + income + factor (state), data = naturalgas)
stargazer (fe_model1, type = "text" )
===============================================
Dependent variable:
---------------------------
consumption
-----------------------------------------------
price -1,305.957
(5,356.337)
eprice -3,193.104
(4,683.359)
income 4.014
(3.565)
factor(state)FL -521,146.200***
(13,769.690)
factor(state)MI -199,813.300***
(9,102.106)
factor(state)NY -201,307.200***
(8,675.776)
factor(state)TX -307,050.900***
(11,383.360)
factor(state)UT -480,960.500***
(15,788.260)
Constant 513,625.000***
(38,911.210)
-----------------------------------------------
Observations 138
R2 0.982
Adjusted R2 0.981
Residual Std. Error 25,642.080 (df = 129)
F Statistic 870.243*** (df = 8; 129)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
The coefficients do end up changing which is expected. This is due to the addition of those omitted state variables, the model may now be capturing different relationships than it would if it only considered overall averages of the states. The eprice magnitude and negative relationship now matches what is expected, as the price of electricity increases by one dollar the consumption now is estimated to decrease. When controlling for these state effects by including them in the model it seems that the relationship that was predicted between the price, eprice and income with the consumption of natural gas all weakened to insignificant.
By keeping price, eprice and income in the model while adding state dummy variables the model now controls for both some time invariant and time variant.
Two Way - State and Time
\[
Consumption_{it} = \beta_0 + \beta_1Income_{it}+\beta_2Price_{it}+\beta_3Eprice_{it} + \beta_4 FL_i + \\\beta_5MI_i+ \beta_6NY_i + \beta_7TX_i+ \beta_8UT_i + \beta_nYear(n)_t + \epsilon_{it}
\]
fe_model2 <- lm (consumption ~ price + eprice + income + factor (state) + factor (year), data = naturalgas)
stargazer (fe_model2, type = "text" )
===============================================
Dependent variable:
---------------------------
consumption
-----------------------------------------------
price 8,870.303
(7,764.214)
eprice -2,442.999
(5,551.516)
income -5.857
(5.049)
factor(state)FL -556,847.500***
(18,433.120)
factor(state)MI -214,932.000***
(10,368.850)
factor(state)NY -216,692.400***
(12,399.400)
factor(state)TX -331,968.000***
(14,626.630)
factor(state)UT -516,342.800***
(21,230.460)
factor(year)1968 6,211.188
(14,274.950)
factor(year)1969 22,032.600
(14,508.850)
factor(year)1970 26,562.100*
(14,557.120)
factor(year)1971 42,844.420***
(14,641.250)
factor(year)1972 49,615.860***
(15,123.610)
factor(year)1973 42,215.460***
(15,667.700)
factor(year)1974 33,600.010**
(15,910.840)
factor(year)1975 38,987.970**
(16,000.380)
factor(year)1976 38,781.380**
(17,078.890)
factor(year)1977 20,222.190
(18,594.610)
factor(year)1978 29,987.700
(20,195.070)
factor(year)1979 49,772.870**
(21,800.360)
factor(year)1980 26,787.090
(25,245.850)
factor(year)1981 11,608.450
(28,749.950)
factor(year)1982 12,655.480
(32,158.010)
factor(year)1983 -5,279.623
(36,091.820)
factor(year)1984 -4,362.007
(38,895.080)
factor(year)1985 3,277.635
(39,656.850)
factor(year)1986 -5,381.288
(38,153.780)
factor(year)1987 -564.530
(38,139.290)
factor(year)1988 10,485.910
(39,244.700)
factor(year)1989 20,620.480
(40,870.500)
Constant 586,117.500***
(55,293.780)
-----------------------------------------------
Observations 138
R2 0.986
Adjusted R2 0.982
Residual Std. Error 24,510.650 (df = 107)
F Statistic 255.123*** (df = 30; 107)
===============================================
Note: *p<0.1; **p<0.05; ***p<0.01
The coefficients change after implementing the year fixed effects as well. This alters the relationships in the model, the estimate coefficients and the significance similarly to how adding state dummy variable effected the original model.