Discussion 4: Panel Data

1. Panel Data Selection

I chose the panel data set “CigarettesSW” which details panel data for cigarette consumption in 48 states in the years 1985 and 1995. The variables include:

state: Factor indicating state
year: Factor indicating year
cpi: Consumer price index
population: State population
packs: Number of packs per capita
income: State personal income (total, nominal)
tax: Average state, federal and average local excise taxes for fiscal year
price: Average price during fiscal year, including sales tax
taxs: Average excise taxes for fiscal year, including sales tax

The data appear balanced since there are entries for every variable for both years included. The time component is year (1985, 1995) and the entity component is the U.S. state.

glimpse(CigarettesSW)

## Rows: 96
## Columns: 9
## $ state      <fct> AL, AR, AZ, CA, CO, CT, DE, FL, GA, IA, ID, IL, IN, KS, KY,…
## $ year       <fct> 1985, 1985, 1985, 1985, 1985, 1985, 1985, 1985, 1985, 1985,…
## $ cpi        <dbl> 1.076, 1.076, 1.076, 1.076, 1.076, 1.076, 1.076, 1.076, 1.0…
## $ population <dbl> 3973000, 2327000, 3184000, 26444000, 3209000, 3201000, 6180…
## $ packs      <dbl> 116.4863, 128.5346, 104.5226, 100.3630, 112.9635, 109.2784,…
## $ income     <dbl> 46014968, 26210736, 43956936, 447102816, 49466672, 60063368…
## $ tax        <dbl> 32.50000, 37.00000, 31.00000, 26.00000, 31.00000, 42.00000,…
## $ price      <dbl> 102.18167, 101.47500, 108.57875, 107.83734, 94.26666, 128.0…
## $ taxs       <dbl> 33.34834, 37.00000, 36.17042, 32.10400, 31.00000, 51.48333,…

reorder_size <- function(x) {
        factor(x, levels = names(sort(table(x), decreasing = TRUE)))
}
ggplot(data = CigarettesSW, 
       aes(x = reorder_size(year)
           )
       ) +
        geom_bar() +
        xlab("Year") + ylab("Frequency") +
        theme(axis.text.x = element_text(angle = 45)
              )

ggplot(data = CigarettesSW, 
       aes(x = reorder_size(state)
           )
       ) +
        geom_bar() +
        xlab("State") + ylab("Frequency") +
        theme(axis.text.x = element_text(angle = 90)
              )

2. Estimating Equations

I will see if the number of packs per capita is influenced by average taxes, prices and the cpi. Proceeding in the typical fashion to find the estimating coefficients based on the \(R_{adj}^2\) :

smoking_rate1 <- lm(data = CigarettesSW, packs ~ cpi + tax + price + taxs)
smoking_rate2 <- lm(data = CigarettesSW, packs ~ tax + price + taxs)
smoking_rate3 <- lm(data = CigarettesSW, packs ~ cpi + price + taxs)
smoking_rate4 <- lm(data = CigarettesSW, packs ~ cpi + tax + price)

stargazer(smoking_rate1, smoking_rate2, smoking_rate3, smoking_rate4 , type = "text", title="Results", align=TRUE, style = "aer")

## 
## Results
## ===============================================================================================================
##                                                                packs                                           
##                              (1)                    (2)                    (3)                    (4)          
## ---------------------------------------------------------------------------------------------------------------
## cpi                       104.302***                                    103.926***             87.014***       
##                            (38.881)                                      (38.363)               (31.476)       
##                                                                                                                
## tax                         -0.055                 0.186                                         0.389         
##                            (0.715)                (0.733)                                       (0.413)        
##                                                                                                                
## price                     -1.106***                -0.179               -1.102***              -0.920***       
##                            (0.364)                (0.118)                (0.357)                (0.269)        
##                                                                                                                
## taxs                        0.620                  -0.658                 0.569                                
##                            (0.816)                (0.684)                (0.469)                               
##                                                                                                                
## Constant                  104.584***             158.774***             104.580***             111.471***      
##                            (21.570)               (7.814)                (21.453)               (19.534)       
##                                                                                                                
## Observations                  96                     96                     96                     96          
## R2                          0.488                  0.447                  0.488                  0.485         
## Adjusted R2                 0.465                  0.429                  0.471                  0.468         
## Residual Std. Error    18.918 (df = 91)       19.545 (df = 92)       18.816 (df = 92)       18.875 (df = 92)   
## F Statistic         21.666*** (df = 4; 91) 24.818*** (df = 3; 92) 29.201*** (df = 3; 92) 28.827*** (df = 3; 92)
## ---------------------------------------------------------------------------------------------------------------
## Notes:              ***Significant at the 1 percent level.                                                     
##                     **Significant at the 5 percent level.                                                      
##                     *Significant at the 10 percent level.

smoking_rate3 <- lm(data = CigarettesSW, packs ~ cpi + price + taxs)
smoking_rate5 <- lm(data = CigarettesSW, packs ~ price + taxs)
smoking_rate6 <- lm(data = CigarettesSW, packs ~ cpi + taxs)
smoking_rate7 <- lm(data = CigarettesSW, packs ~ cpi + price)

stargazer(smoking_rate3, smoking_rate5, smoking_rate6, smoking_rate7, type = "text", title="Results", align=TRUE, style = "aer")

## 
## Results
## ===============================================================================================================
##                                                                packs                                           
##                              (1)                    (2)                    (3)                    (4)          
## ---------------------------------------------------------------------------------------------------------------
## cpi                       103.926***                                      -8.332               64.879***       
##                            (38.363)                                      (12.613)               (20.914)       
##                                                                                                                
## price                     -1.102***                -0.183                                      -0.688***       
##                            (0.357)                (0.116)                                       (0.107)        
##                                                                                                                
## taxs                        0.569                 -0.498*               -0.811***                              
##                            (0.469)                (0.264)                (0.147)                               
##                                                                                                                
## Constant                  104.580***             159.463***             159.228***             123.546***      
##                            (21.453)               (7.293)                (12.624)               (14.724)       
##                                                                                                                
## Observations                  96                     96                     96                     96          
## R2                          0.488                  0.447                  0.435                  0.480         
## Adjusted R2                 0.471                  0.435                  0.423                  0.468         
## Residual Std. Error    18.816 (df = 92)       19.446 (df = 93)       19.657 (df = 93)       18.863 (df = 93)   
## F Statistic         29.201*** (df = 3; 92) 37.572*** (df = 2; 93) 35.779*** (df = 2; 93) 42.850*** (df = 2; 93)
## ---------------------------------------------------------------------------------------------------------------
## Notes:              ***Significant at the 1 percent level.                                                     
##                     **Significant at the 5 percent level.                                                      
##                     *Significant at the 10 percent level.

Selecting model #3 based, the best estimating equation is :

\[ Packs = \beta_{0} + \beta_1CPI+\;\beta_{2}Price \; +\;\beta_{3}Taxs \]

\[ Packs = 104.58\; +\; 103.93\times CPI\;-1.1\times Price\;+0.57\times Taxs \]

In terms of the coefficients, we would expect the variables to all be negatively correlated with number of packs per capita. However, in the model, price is negative, but excise tax (taxs) is positive. We see that there is an instance of omitted variable bias.

cor(CigarettesSW$tax, CigarettesSW$packs)

## [1] -0.6421176

cor(CigarettesSW$tax, CigarettesSW$price)

## [1] 0.8993727

cor(CigarettesSW$tax, CigarettesSW$taxs)

## [1] 0.985333

cor(CigarettesSW$tax, CigarettesSW$cpi)

## [1] 0.6857145

Since tax is negatively correlated with packs, and positively correlated with the other variables, the estimate will be negatively biased. (Perhaps someone could confirm this? I am still somewhat confused on OVB…)

Please let me know if you have any suggestions ?

3. Fixed Effects Model

Running the two-way fixed effects models to control for Year and Time we have:

smoking_rate3 <- lm(data = CigarettesSW, packs ~ cpi + price + taxs)

smoking_rate_FE<- feols(packs ~ cpi + price + taxs | state + year,
                      data = CigarettesSW)

## The variable 'cpi' has been removed because of collinearity (see $collin.var).

summary(smoking_rate_FE)

## OLS estimation, Dep. Var.: packs
## Observations: 96
## Fixed-effects: state: 48,  year: 2
## Standard-errors: IID 
##        Estimate Std. Error   t value  Pr(>|t|)    
## price -0.579361   0.199082 -2.910161 0.0055981 ** 
## taxs   0.152363   0.256588  0.593803 0.5556174    
## ... 1 variable was removed because of collinearity (cpi)
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 4.05627     Adj. R2: 0.947558
##                 Within R2: 0.53662

stargazer(smoking_rate3,type = "text")

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                packs           
## -----------------------------------------------
## cpi                         103.926***         
##                              (38.363)          
##                                                
## price                        -1.102***         
##                               (0.357)          
##                                                
## taxs                           0.569           
##                               (0.469)          
##                                                
## Constant                    104.580***         
##                              (21.453)          
##                                                
## -----------------------------------------------
## Observations                    96             
## R2                             0.488           
## Adjusted R2                    0.471           
## Residual Std. Error      18.816 (df = 92)      
## F Statistic           29.201*** (df = 3; 92)   
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

Since CPI was eliminated due to its strong correlation and subsequent redundancy, out of curiosity I’ll see what the model would look like if it were with the variables taxs, tax and price.

smoking_rate8 <- lm(data = CigarettesSW, packs ~ price + taxs + tax)
stargazer(smoking_rate8,type = "text")

## 
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                packs           
## -----------------------------------------------
## price                         -0.179           
##                               (0.118)          
##                                                
## taxs                          -0.658           
##                               (0.684)          
##                                                
## tax                            0.186           
##                               (0.733)          
##                                                
## Constant                    158.774***         
##                               (7.814)          
##                                                
## -----------------------------------------------
## Observations                    96             
## R2                             0.447           
## Adjusted R2                    0.429           
## Residual Std. Error      19.545 (df = 92)      
## F Statistic           24.818*** (df = 3; 92)   
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

summary(smoking_rate_FE)

## OLS estimation, Dep. Var.: packs
## Observations: 96
## Fixed-effects: state: 48,  year: 2
## Standard-errors: IID 
##        Estimate Std. Error   t value  Pr(>|t|)    
## price -0.579361   0.199082 -2.910161 0.0055981 ** 
## taxs   0.152363   0.256588  0.593803 0.5556174    
## ... 1 variable was removed because of collinearity (cpi)
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 4.05627     Adj. R2: 0.947558
##                 Within R2: 0.53662

smoking_rate_FE2<- feols(packs ~ price + taxs + tax | state + year,
                      data = CigarettesSW)
summary(smoking_rate_FE2)

## OLS estimation, Dep. Var.: packs
## Observations: 96
## Fixed-effects: state: 48,  year: 2
## Standard-errors: IID 
##        Estimate Std. Error   t value  Pr(>|t|)    
## price -0.581433   0.199297 -2.917419 0.0055395 ** 
## taxs   0.572783   0.510465  1.122081 0.2679141    
## tax   -0.514392   0.539740 -0.953036 0.3457781    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## RMSE: 4.01504     Adj. R2: 0.947451
##                 Within R2: 0.545992

So, keeping year and state constant, price is still negatively associated with number of cigarette packs, but taxes (both excise and average) are going in opposite directions. The coefficient on price is quite similar in both FE models (-0.579, -0.581). My thoughts are that these effects could be that due to the variation in state tax codes and levels of personal income per person (which were not included in my analysis). As we know, some states don’t have income tax, others have lower or higher taxes on certain goods. It would therefore make sense that the average price of cigarettes would be the strongest indicator regarding the number of packs per capita.

Discussion 4: Panel Data

Will Brewster

2026-04-07

1. Panel Data Selection

2. Estimating Equations

3. Fixed Effects Model