library(AER)
## Warning: package 'AER' was built under R version 4.2.3
## Loading required package: car
## Loading required package: carData
## Loading required package: lmtest
## Warning: package 'lmtest' was built under R version 4.2.3
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.2.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
## Warning: package 'sandwich' was built under R version 4.2.3
## Loading required package: survival
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(plm)
## Warning: package 'plm' was built under R version 4.2.3
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.2.3
library(psych)
## Warning: package 'psych' was built under R version 4.2.3
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
## The following object is masked from 'package:car':
##
## logit
library(naniar)
## Warning: package 'naniar' was built under R version 4.2.3
library(visdat)
## Warning: package 'visdat' was built under R version 4.2.3
library(VIM)
## Warning: package 'VIM' was built under R version 4.2.3
## Loading required package: colorspace
## Loading required package: grid
## VIM is ready to use.
## Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues
##
## Attaching package: 'VIM'
## The following object is masked from 'package:datasets':
##
## sleep
library(Amelia)
## Warning: package 'Amelia' was built under R version 4.2.3
## Loading required package: Rcpp
## ##
## ## Amelia II: Multiple Imputation
## ## (Version 1.8.1, built: 2022-11-18)
## ## Copyright (C) 2005-2023 James Honaker, Gary King and Matthew Blackwell
## ## Refer to http://gking.harvard.edu/amelia/ for more information
## ##
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plm':
##
## between, lag, lead
## The following object is masked from 'package:car':
##
## recode
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(magrittr)
In this discussion, I am using the “CigarettesSW” data from the AER package. This panel data set records cigarette consumption for the 48 continental US states from 1985-1995. This data frame consists of 96 observations of 9 variables.
data("CigarettesSW")
stargazer(CigarettesSW, type = "text", title = "Panel Data Sumamry Stats (CigarettesSW)")
##
## Panel Data Sumamry Stats (CigarettesSW)
## ==================================================================
## Statistic N Mean St. Dev. Min Max
## ------------------------------------------------------------------
## cpi 96 1.300 0.225 1.076 1.524
## population 96 5,168,866.000 5,442,345.000 478,447 31,493,524
## packs 96 109.182 25.871 49.272 197.994
## income 96 99,878,736.000 120,541,138.000 6,887,097 771,470,144
## tax 96 42.684 16.138 18.000 99.000
## price 96 143.448 43.887 84.968 240.850
## taxs 96 48.326 19.332 21.268 112.633
## ------------------------------------------------------------------
missmap(CigarettesSW)
freq <- table(CigarettesSW$state)
print(freq)
##
## AL AR AZ CA CO CT DE FL GA IA ID IL IN KS KY LA MA MD ME MI MN MO MS MT NC ND
## 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## NE NH NJ NM NV NY OH OK OR PA RI SC SD TN TX UT VA VT WA WI WV WY
## 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
Based on the output given above, from the frequency table, we can see that each state (entities) was observed in the two years (1985 and 1995). We can also see from the “Missingness Map” that there are not any missing data points. This allows us to identify that the data frame is balanced.
For a traditional OLS model I have decided to use packs (the number of packs of cigarettes per capita) as the dependent variable, while the independent variable will be price (average price of cigarettes during fiscal year, including sales tax).
\[ packs = \beta_0 + \beta_1 price + \epsilon \]
OLS_1 <- lm(packs ~ price, data = CigarettesSW)
stargazer(OLS_1, type = "text", title = "OLS Model Sumamry Stats")
##
## OLS Model Sumamry Stats
## ===============================================
## Dependent variable:
## ---------------------------
## packs
## -----------------------------------------------
## price -0.385***
## (0.046)
##
## Constant 164.356***
## (6.909)
##
## -----------------------------------------------
## Observations 96
## R2 0.426
## Adjusted R2 0.420
## Residual Std. Error 19.710 (df = 94)
## F Statistic 69.684*** (df = 1; 94)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Based on the OLS model preformed above, when the average price of cigarettes (including sales tax) increases by $1, we interpret with statistical significance at p<0.01, that the number of packs per capita would decrease by 0.385. Intuitively this makes sense, as an increase in price shouldn’t decrease the demand for an item such as Cigarettes (addictive), however, the quantity demanded should decrease, as people may not be able to afford as many cigarettes.
Even though our results from the previous OLS model seem to be intuitive in the direction associated with the estimated coefficient, it appears that we may run into some issues with OVB, as the magnitude may vary for the effects on increases of price, since each state has their own socio-economic backgrounds. Additionally, differeing states have different viewpoints socio-politically about the tobacco industry.
Income could cause people to consume more or less cigarettes
taxes are state dependent and could cause more of an impact on consumption depending on the state’s views on tobacco (higher or lower tax)
\[ packs = \beta_0 + \beta_1 price_it + \delta_2 state 2_i + \delta_3 state 3_i + ... + \delta_n state N_i + \epsilon \]
OLS_FE <- lm(packs ~ price + state, data = CigarettesSW)
stargazer( OLS_FE, type = "text", title = "SUMMARY STATS FIXED EFFECTS MODEL")
##
## SUMMARY STATS FIXED EFFECTS MODEL
## ===============================================
## Dependent variable:
## ---------------------------
## packs
## -----------------------------------------------
## price -0.330***
## (0.015)
##
## stateAR 13.723**
## (6.115)
##
## stateAZ -12.842**
## (6.124)
##
## stateCA -20.626***
## (6.130)
##
## stateCO -10.837*
## (6.114)
##
## stateCT -0.242
## (6.149)
##
## stateDE 26.619***
## (6.114)
##
## stateFL 5.857
## (6.122)
##
## stateGA 2.420
## (6.114)
##
## stateIA -0.395
## (6.119)
##
## stateID -16.220**
## (6.116)
##
## stateIL 1.451
## (6.122)
##
## stateIN 25.536***
## (6.114)
##
## stateKS -3.825
## (6.115)
##
## stateKY 66.000***
## (6.117)
##
## stateLA 10.164
## (6.115)
##
## stateMA -1.276
## (6.136)
##
## stateMD -6.677
## (6.115)
##
## stateME 13.732**
## (6.123)
##
## stateMI 9.990
## (6.148)
##
## stateMN 1.274
## (6.139)
##
## stateMO 16.968***
## (6.114)
##
## stateMS 4.834
## (6.115)
##
## stateMT -13.911**
## (6.114)
##
## stateNC 25.397***
## (6.117)
##
## stateND -9.788
## (6.121)
##
## stateNE -7.126
## (6.117)
##
## stateNH 68.642***
## (6.114)
##
## stateNJ -1.591
## (6.127)
##
## stateNM -29.045***
## (6.115)
##
## stateNV 18.908***
## (6.131)
##
## stateNY -3.265
## (6.138)
##
## stateOH 11.656*
## (6.114)
##
## stateOK 10.950*
## (6.114)
##
## stateOR 1.444
## (6.117)
##
## statePA 1.966
## (6.117)
##
## stateRI 14.620**
## (6.134)
##
## stateSC 6.036
## (6.115)
##
## stateSD -6.126
## (6.114)
##
## stateTN 18.689***
## (6.114)
##
## stateTX -7.255
## (6.123)
##
## stateUT -45.068***
## (6.118)
##
## stateVA 10.536*
## (6.114)
##
## stateVT 27.678***
## (6.115)
##
## stateWA -10.059
## (6.169)
##
## stateWI 0.542
## (6.128)
##
## stateWV 7.880
## (6.115)
##
## stateWY 10.621*
## (6.114)
##
## Constant 151.837***
## (4.759)
##
## -----------------------------------------------
## Observations 96
## R2 0.972
## Adjusted R2 0.944
## Residual Std. Error 6.114 (df = 47)
## F Statistic 34.462*** (df = 48; 47)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
FE_state <- plm(formula = packs ~ price + state,
data = CigarettesSW,
index = c("year"), # declaring data to be panel
model = "within" # fixed effects mdodel
)
stargazer(FE_state, type = "text" )
##
## ========================================
## Dependent variable:
## ---------------------------
## packs
## ----------------------------------------
## price -0.468***
## (0.065)
##
## stateAR 14.853**
## (5.907)
##
## stateAZ -9.644
## (6.072)
##
## stateCA -16.663***
## (6.171)
##
## stateCO -10.764*
## (5.883)
##
## stateCT 5.639
## (6.501)
##
## stateDE 27.136***
## (5.888)
##
## stateFL 8.768
## (6.040)
##
## stateGA 1.943
## (5.887)
##
## stateIA 1.812
## (5.974)
##
## stateID -14.712**
## (5.925)
##
## stateIL 4.356
## (6.039)
##
## stateIN 24.861***
## (5.891)
##
## stateKS -2.894
## (5.899)
##
## stateKY 64.109***
## (5.950)
##
## stateLA 11.237*
## (5.904)
##
## stateMA 3.440
## (6.287)
##
## stateMD -5.480
## (5.910)
##
## stateME 16.731***
## (6.050)
##
## stateMI 15.834**
## (6.494)
##
## stateMN 6.311
## (6.342)
##
## stateMO 16.695***
## (5.884)
##
## stateMS 5.792
## (5.900)
##
## stateMT -14.257**
## (5.885)
##
## stateNC 23.642***
## (5.940)
##
## stateND -7.147
## (6.013)
##
## stateNE -5.327
## (5.943)
##
## stateNH 68.751***
## (5.883)
##
## stateNJ 2.041
## (6.126)
##
## stateNM -27.785***
## (5.913)
##
## stateNV 23.039***
## (6.196)
##
## stateNY 1.626
## (6.317)
##
## stateOH 12.051**
## (5.886)
##
## stateOK 11.708*
## (5.894)
##
## stateOR 3.280
## (5.946)
##
## statePA 3.659
## (5.936)
##
## stateRI 19.067***
## (6.244)
##
## stateSC 4.864
## (5.909)
##
## stateSD -5.813
## (5.885)
##
## stateTN 19.269***
## (5.889)
##
## stateTX -4.166
## (6.060)
##
## stateUT -42.968***
## (5.965)
##
## stateVA 10.380*
## (5.883)
##
## stateVT 28.780***
## (5.906)
##
## stateWA -2.651
## (6.838)
##
## stateWI 4.343
## (6.149)
##
## stateWV 8.900
## (5.902)
##
## stateWY 10.035*
## (5.889)
##
## ----------------------------------------
## Observations 96
## R2 0.967
## Adjusted R2 0.931
## F Statistic 27.771*** (df = 48; 46)
## ========================================
## Note: *p<0.1; **p<0.05; ***p<0.01
FE_time <- plm(formula = packs ~ price + year,
data = CigarettesSW,
index = c("state"), # declaring data to be panel
model = "within" # fixed effects mdodel
)
stargazer(OLS_1, OLS_FE, FE_state, FE_time, type = "text")
##
## ==================================================================================================================
## Dependent variable:
## ----------------------------------------------------------------------------------------------
## packs
## OLS panel
## linear
## (1) (2) (3) (4)
## ------------------------------------------------------------------------------------------------------------------
## price -0.385*** -0.330*** -0.468*** -0.468***
## (0.046) (0.015) (0.065) (0.065)
##
## stateAR 13.723** 14.853**
## (6.115) (5.907)
##
## stateAZ -12.842** -9.644
## (6.124) (6.072)
##
## stateCA -20.626*** -16.663***
## (6.130) (6.171)
##
## stateCO -10.837* -10.764*
## (6.114) (5.883)
##
## stateCT -0.242 5.639
## (6.149) (6.501)
##
## stateDE 26.619*** 27.136***
## (6.114) (5.888)
##
## stateFL 5.857 8.768
## (6.122) (6.040)
##
## stateGA 2.420 1.943
## (6.114) (5.887)
##
## stateIA -0.395 1.812
## (6.119) (5.974)
##
## stateID -16.220** -14.712**
## (6.116) (5.925)
##
## stateIL 1.451 4.356
## (6.122) (6.039)
##
## stateIN 25.536*** 24.861***
## (6.114) (5.891)
##
## stateKS -3.825 -2.894
## (6.115) (5.899)
##
## stateKY 66.000*** 64.109***
## (6.117) (5.950)
##
## stateLA 10.164 11.237*
## (6.115) (5.904)
##
## stateMA -1.276 3.440
## (6.136) (6.287)
##
## stateMD -6.677 -5.480
## (6.115) (5.910)
##
## stateME 13.732** 16.731***
## (6.123) (6.050)
##
## stateMI 9.990 15.834**
## (6.148) (6.494)
##
## stateMN 1.274 6.311
## (6.139) (6.342)
##
## stateMO 16.968*** 16.695***
## (6.114) (5.884)
##
## stateMS 4.834 5.792
## (6.115) (5.900)
##
## stateMT -13.911** -14.257**
## (6.114) (5.885)
##
## stateNC 25.397*** 23.642***
## (6.117) (5.940)
##
## stateND -9.788 -7.147
## (6.121) (6.013)
##
## stateNE -7.126 -5.327
## (6.117) (5.943)
##
## stateNH 68.642*** 68.751***
## (6.114) (5.883)
##
## stateNJ -1.591 2.041
## (6.127) (6.126)
##
## stateNM -29.045*** -27.785***
## (6.115) (5.913)
##
## stateNV 18.908*** 23.039***
## (6.131) (6.196)
##
## stateNY -3.265 1.626
## (6.138) (6.317)
##
## stateOH 11.656* 12.051**
## (6.114) (5.886)
##
## stateOK 10.950* 11.708*
## (6.114) (5.894)
##
## stateOR 1.444 3.280
## (6.117) (5.946)
##
## statePA 1.966 3.659
## (6.117) (5.936)
##
## stateRI 14.620** 19.067***
## (6.134) (6.244)
##
## stateSC 6.036 4.864
## (6.115) (5.909)
##
## stateSD -6.126 -5.813
## (6.114) (5.885)
##
## stateTN 18.689*** 19.269***
## (6.114) (5.889)
##
## stateTX -7.255 -4.166
## (6.123) (6.060)
##
## stateUT -45.068*** -42.968***
## (6.118) (5.965)
##
## stateVA 10.536* 10.380*
## (6.114) (5.883)
##
## stateVT 27.678*** 28.780***
## (6.115) (5.906)
##
## stateWA -10.059 -2.651
## (6.169) (6.838)
##
## stateWI 0.542 4.343
## (6.128) (6.149)
##
## stateWV 7.880 8.900
## (6.115) (5.902)
##
## stateWY 10.621* 10.035*
## (6.114) (5.889)
##
## year1995 11.515**
## (5.276)
##
## Constant 164.356*** 151.837***
## (6.909) (4.759)
##
## ------------------------------------------------------------------------------------------------------------------
## Observations 96 96 96 96
## R2 0.426 0.972 0.967 0.917
## Adjusted R2 0.420 0.944 0.931 0.829
## Residual Std. Error 19.710 (df = 94) 6.114 (df = 47)
## F Statistic 69.684*** (df = 1; 94) 34.462*** (df = 48; 47) 27.771*** (df = 48; 46) 255.432*** (df = 2; 46)
## ==================================================================================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
As we can see from the table above, we observe that there is a higher Adjusted R^2 component associated with the OLS w/ FE model, this indicates that 94.4% of the variance is explained by the OLS w/FE model as opposed to only 42% of the variance being explained by the normal OLS model, 93.1% of the variance being explained by the FE_state model, and 82.9% of the variance explained by the FE_time model . Additionally, we can observe that the intuition associated with an increase in price on the the effects of pack consumption still makes sense, as we observe a negative value. This value for the FE model is -.330 as opposed to -.385 in the regular OLS model, however, we can see that the state observed plays a major role in influencing the number of packs per capita. We can see from the table that Utah has a statistically significant beta coefficient at p<0.01 and is the greatest magnitude in the negative direction (OLS w/FE model). This value is -45.068. Therefore, if someone is a resident of California, then their pack consumption decreases by 45.068 packs. On the other hand, we can observe that the statistically significant (p<0.01) greatest positive magnitude associated with a state of residence is 68.642 in New Hampshire (OLS w/FE model). Therefore, being a resident of New Hampshire is likely to increases pack consumption by 68.642 packs. These coefficients show us that there are socio-economics factors that play a large role in whether people even desire consuming cigarettes in the first place.
The Fixed Effect of State is controlling the time-invariant characteristics of pack consumption, as the variable state is a product of the state characteristics that do not change over time. The Fixed effects of Year from model 4 is a time-variant characteristic of pack consumption, as the variable can help to explain differences in cultural attitude towards cigarette consumption, and it can also explain difference in pack consumption attributed to legislative changes or technologies/alternatives to tobacco that come to the market. Due to the fact that the year based model includes a positive effect on pack consumption for 1995, we can most likely observe that there were no popular alternatives at the time that would affect cigarette consumption such as nicotine vapes or pouches nowadays.
We can see that calculating the fixed effect of state in model 3 (stargazer table) outputs a coefficient estimate of -0.468 for price, while the fixed effect of year in model 4 (stargazer table) also outputs a coefficient estimate of -0.468 for price. Therefore, specifying the Fixed Effect as time or individual dependent will allow for the same interpretations of the non-fixed effect variables (in the case of the plm models).