Panel Data

Author

Betty Wang

Quarto

Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org.

Running Code

When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this:

1 + 1
[1] 2

You can add options to executable code like this

[1] 4

The echo: false option disables the printing of code (only output is displayed).

# Load the AER package
library(AER)
Loading required package: car
Loading required package: carData
Loading required package: lmtest
Loading required package: zoo

Attaching package: 'zoo'
The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
Loading required package: sandwich
Loading required package: survival
# Load the Grunfeld dataset
data("Grunfeld")

# View the structure of the dataset
str(Grunfeld)
'data.frame':   220 obs. of  5 variables:
 $ invest : num  318 392 411 258 331 ...
 $ value  : num  3078 4662 5387 2792 4313 ...
 $ capital: num  2.8 52.6 156.9 209.2 203.4 ...
 $ firm   : Factor w/ 11 levels "General Motors",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ year   : int  1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 ...
# Check if the data is balanced
table(Grunfeld$firm, Grunfeld$year)
                   
                    1935 1936 1937 1938 1939 1940 1941 1942 1943 1944 1945 1946
  General Motors       1    1    1    1    1    1    1    1    1    1    1    1
  US Steel             1    1    1    1    1    1    1    1    1    1    1    1
  General Electric     1    1    1    1    1    1    1    1    1    1    1    1
  Chrysler             1    1    1    1    1    1    1    1    1    1    1    1
  Atlantic Refining    1    1    1    1    1    1    1    1    1    1    1    1
  IBM                  1    1    1    1    1    1    1    1    1    1    1    1
  Union Oil            1    1    1    1    1    1    1    1    1    1    1    1
  Westinghouse         1    1    1    1    1    1    1    1    1    1    1    1
  Goodyear             1    1    1    1    1    1    1    1    1    1    1    1
  Diamond Match        1    1    1    1    1    1    1    1    1    1    1    1
  American Steel       1    1    1    1    1    1    1    1    1    1    1    1
                   
                    1947 1948 1949 1950 1951 1952 1953 1954
  General Motors       1    1    1    1    1    1    1    1
  US Steel             1    1    1    1    1    1    1    1
  General Electric     1    1    1    1    1    1    1    1
  Chrysler             1    1    1    1    1    1    1    1
  Atlantic Refining    1    1    1    1    1    1    1    1
  IBM                  1    1    1    1    1    1    1    1
  Union Oil            1    1    1    1    1    1    1    1
  Westinghouse         1    1    1    1    1    1    1    1
  Goodyear             1    1    1    1    1    1    1    1
  Diamond Match        1    1    1    1    1    1    1    1
  American Steel       1    1    1    1    1    1    1    1
# Summary of the number of observations for each firm
obs_per_firm <- table(Grunfeld$firm)

# Check if all firms have the same number of observations
obs_per_firm

   General Motors          US Steel  General Electric          Chrysler 
               20                20                20                20 
Atlantic Refining               IBM         Union Oil      Westinghouse 
               20                20                20                20 
         Goodyear     Diamond Match    American Steel 
               20                20                20 

If each firm has the same number of observations over the entire period, then all entities (firms) are observed for all time periods meaning the panel dataset is balanced. Since all firms have exactly 20 observations (one for each year from 1935 to 1954), the dataset is balanced. Time Component: represented by the year variable which ranges from 1935 to 1954. Entity Component: firm variable, which represents 10 U.S. firms (labeled as 1, 2, 3,…, 10). \[ invest_{it} = \beta_0 + \beta_1 value_{it} + \beta_2 capital_{it} + \epsilon_{it} \]

# Estimate the OLS regression model
ols_model <- lm(invest ~ value + capital, data = Grunfeld)

# Summary of the model
summary(ols_model)

Call:
lm(formula = invest ~ value + capital, data = Grunfeld)

Residuals:
    Min      1Q  Median      3Q     Max 
-290.33  -25.76   11.06   29.74  377.94 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -38.410054   8.413371  -4.565 8.35e-06 ***
value         0.114534   0.005519  20.753  < 2e-16 ***
capital       0.227514   0.024228   9.390  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 90.28 on 217 degrees of freedom
Multiple R-squared:  0.8179,    Adjusted R-squared:  0.8162 
F-statistic: 487.3 on 2 and 217 DF,  p-value: < 2.2e-16

Both the coefficients for the value variable and capital variable are statistically significant at the alpha level, indicating that both factors are related to the outcome variable investment. The estimated coefficients, 0.114534 for value and 0.227514 for capital are not big so the economic magnitude of these two factors are relatively small.

There could be omitted variable bias (OVB). If there are relevant factors that affect investment but are not included in the model, estimates of the coefficients will be biased and inconsistent. It is possible that firm-specific characteristics or time-specific effects (e.g., macroeconomic conditions) that influence investment are omitted. \[ invest_{it} = \beta_0 + \beta_1 value_{it} + \beta_2 capital_{it} + \alpha_i + \epsilon_{it} \] Three Different Ways to Estimate the Fixed Effects Model in R: 1. Using the plm package (within estimator): This method explicitly removes the firm-specific effects by transforming the data using the “within” transformation (demeaning). 2. Using the lm() function with dummy variables: This method creates dummy variables for each firm and includes them in the regression to control for firm-specific fixed effects. 3. Manually demean the data (within transformation): This method manually removes the firm-specific means from the data to estimate the fixed effects.

# 1. plm (within estimator)
# Load the 'plm' package for panel data analysis
library(plm)

# Convert the Grunfeld dataset to a panel data frame
grunfeld_panel <- pdata.frame(Grunfeld, index = c("firm", "year"))

# Estimate the fixed effects model using the 'within' method
fe_model_plm <- plm(invest ~ value + capital, data = grunfeld_panel, model = "within")

# Summary of the fixed effects model
summary(fe_model_plm)
Oneway (individual) effect Within Model

Call:
plm(formula = invest ~ value + capital, data = grunfeld_panel, 
    model = "within")

Balanced Panel: n = 11, T = 20, N = 220

Residuals:
      Min.    1st Qu.     Median    3rd Qu.       Max. 
-184.00792  -15.66024    0.27161   16.41421  250.75337 

Coefficients:
        Estimate Std. Error t-value  Pr(>|t|)    
value    0.11013    0.01130  9.7461 < 2.2e-16 ***
capital  0.31003    0.01654 18.7439 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    2244500
Residual Sum of Squares: 523720
R-Squared:      0.76667
Adj. R-Squared: 0.75314
F-statistic: 340.079 on 2 and 207 DF, p-value: < 2.22e-16
# 2. lm() with Dummy Variables for Firms
# Use the 'lm()' function and add dummy variables for firms
fe_model_lm <- lm(invest ~ value + capital + factor(firm), data = Grunfeld)

# Summary of the fixed effects model
summary(fe_model_lm)

Call:
lm(formula = invest ~ value + capital + factor(firm), data = Grunfeld)

Residuals:
     Min       1Q   Median       3Q      Max 
-184.008  -15.660    0.272   16.414  250.753 

Coefficients:
                                Estimate Std. Error t value Pr(>|t|)    
(Intercept)                    -70.29907   47.37535  -1.484    0.139    
value                            0.11013    0.01130   9.746  < 2e-16 ***
capital                          0.31003    0.01654  18.744  < 2e-16 ***
factor(firm)US Steel           172.20381   29.70010   5.798 2.48e-08 ***
factor(firm)General Electric  -165.27033   30.28530  -5.457 1.38e-07 ***
factor(firm)Chrysler            42.48996   41.84991   1.015    0.311    
factor(firm)Atlantic Refining  -44.30345   48.12221  -0.921    0.358    
factor(firm)IBM                 47.13887   44.61445   1.057    0.292    
factor(firm)Union Oil            3.75484   48.19182   0.078    0.938    
factor(firm)Westinghouse        12.75258   41.98604   0.304    0.762    
factor(firm)Goodyear           -16.91548   46.17941  -0.366    0.715    
factor(firm)Diamond Match       63.73104   47.96886   1.329    0.185    
factor(firm)American Steel      49.72087   48.28006   1.030    0.304    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 50.3 on 207 degrees of freedom
Multiple R-squared:  0.9461,    Adjusted R-squared:  0.9429 
F-statistic: 302.6 on 12 and 207 DF,  p-value: < 2.2e-16
# 3. Manually Demean the Data (Within Transformation)
# First, calculate the firm means for each variable
firm_means <- aggregate(cbind(invest, value, capital) ~ firm, data = Grunfeld, mean)

# Merge the firm means with the original dataset
Grunfeld <- merge(Grunfeld, firm_means, by = "firm", suffixes = c("", "_mean"))

# Manually demean the variables by subtracting firm means
Grunfeld$invest_demeaned <- Grunfeld$invest - Grunfeld$invest_mean
Grunfeld$value_demeaned <- Grunfeld$value - Grunfeld$value_mean
Grunfeld$capital_demeaned <- Grunfeld$capital - Grunfeld$capital_mean

# Estimate the fixed effects model using the demeaned data
fe_model_manual <- lm(invest_demeaned ~ value_demeaned + capital_demeaned - 1, data = Grunfeld)

# Summary of the fixed effects model
summary(fe_model_manual)

Call:
lm(formula = invest_demeaned ~ value_demeaned + capital_demeaned - 
    1, data = Grunfeld)

Residuals:
     Min       1Q   Median       3Q      Max 
-184.008  -15.660    0.272   16.414  250.753 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
value_demeaned    0.11013    0.01101   10.00   <2e-16 ***
capital_demeaned  0.31003    0.01612   19.24   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 49.01 on 218 degrees of freedom
Multiple R-squared:  0.7667,    Adjusted R-squared:  0.7645 
F-statistic: 358.2 on 2 and 218 DF,  p-value: < 2.2e-16

The estimated coefficients, 0.11013 for value and 0.31003 for capital, changed for the fixed effect models, and all of the FE models give the same coefficient estimates for the variables value and capital. 1. The plm() function automatically transforms the data by removing the firm-specific means and uses the within estimator to control for fixed effects. 2. The lm() function creates a dummy variable for each firm using the factor(firm) term and estimates fixed effects by including firm dummies. 3. The de-mean method manually removes the firm-specific means from the variables (i.e., it demeans the data) and then runs an OLS regression on the demeaned data. The -1 in the formula removes the intercept since the data has already been demeaned. The fixed effect model run above using the three methods only control for time-invariant characteristics of the entity (such as firm-specific factors). These are characteristics that do not change over time but differ across entities (firms), such as: industry type, geographical location, and other structural or permanent differences. \[ invest_{it} = \beta_0 + \beta_1 value_{it} + \beta_2 capital_{it} + \alpha_i + \lambda_t + \epsilon_{it} \] Now, run another fixed effect models using the three methods again to include both entity fixed effects and time fixed effects. By doing so, we account for both time-invariant characteristics of the entities and time-varying characteristics that affect all entities.

# 1. plm
# Load the 'plm' package if not already loaded
library(plm)

# Convert the Grunfeld dataset to a panel data frame
grunfeld_panel2 <- pdata.frame(Grunfeld, index = c("firm", "year"))

# Estimate the fixed effects model with both entity and time fixed effects
fe_model_plm_time <- plm(invest ~ value + capital + factor(year), data = grunfeld_panel2, model = "within")

# Summary of the fixed effects model
summary(fe_model_plm_time)
Oneway (individual) effect Within Model

Call:
plm(formula = invest ~ value + capital + factor(year), data = grunfeld_panel2, 
    model = "within")

Balanced Panel: n = 11, T = 20, N = 220

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-163.4113  -17.6747   -1.8345   17.9490  217.1070 

Coefficients:
                   Estimate Std. Error t-value  Pr(>|t|)    
value              0.116681   0.012933  9.0219 < 2.2e-16 ***
capital            0.351436   0.021049 16.6964 < 2.2e-16 ***
factor(year)1936 -16.959225  21.517883 -0.7881 0.4316042    
factor(year)1937 -36.375640  22.363809 -1.6265 0.1055097    
factor(year)1938 -35.623722  21.162286 -1.6834 0.0939654 .  
factor(year)1939 -63.099393  21.505219 -2.9341 0.0037613 ** 
factor(year)1940 -39.824767  21.626070 -1.8415 0.0671215 .  
factor(year)1941 -16.487765  21.528944 -0.7658 0.4447309    
factor(year)1942 -17.999327  21.274589 -0.8460 0.3986014    
factor(year)1943 -37.772443  21.414676 -1.7639 0.0793802 .  
factor(year)1944 -38.320061  21.459336 -1.7857 0.0757588 .  
factor(year)1945 -49.539482  21.687469 -2.2842 0.0234746 *  
factor(year)1946 -27.754389  21.865751 -1.2693 0.2059001    
factor(year)1947 -34.877538  21.588537 -1.6156 0.1078668    
factor(year)1948 -38.330726  21.733580 -1.7637 0.0794130 .  
factor(year)1949 -65.200751  21.900872 -2.9771 0.0032931 ** 
factor(year)1950 -67.387721  22.027641 -3.0592 0.0025432 ** 
factor(year)1951 -54.834631  22.436806 -2.4440 0.0154498 *  
factor(year)1952 -56.489038  22.819259 -2.4755 0.0141899 *  
factor(year)1953 -58.512579  23.819421 -2.4565 0.0149371 *  
factor(year)1954 -81.793910  24.204108 -3.3793 0.0008834 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    2244500
Residual Sum of Squares: 459400
R-Squared:      0.79533
Adj. R-Squared: 0.76158
F-statistic: 34.7874 on 21 and 188 DF, p-value: < 2.22e-16
# 2. lm
# Estimate the fixed effects model using lm() with dummies for firms and years
fe_model_lm_time <- lm(invest ~ value + capital + factor(firm) + factor(year), data = Grunfeld)

# Summary of the fixed effects model
summary(fe_model_lm_time)

Call:
lm(formula = invest ~ value + capital + factor(firm) + factor(year), 
    data = Grunfeld)

Residuals:
     Min       1Q   Median       3Q      Max 
-163.411  -17.675   -1.835   17.949  217.107 

Coefficients:
                                Estimate Std. Error t value Pr(>|t|)    
(Intercept)                    -83.68199   52.83502  -1.584 0.114912    
value                            0.11668    0.01293   9.022  < 2e-16 ***
capital                          0.35144    0.02105  16.696  < 2e-16 ***
factor(firm)US Steel           202.31880   33.10138   6.112 5.55e-09 ***
factor(firm)General Electric  -139.31536   33.63785  -4.142 5.21e-05 ***
factor(firm)Chrysler            88.17030   47.63465   1.851 0.065743 .  
factor(firm)Atlantic Refining  -10.73113   54.42347  -0.197 0.843901    
factor(firm)IBM                 95.31235   50.87165   1.874 0.062539 .  
factor(firm)Union Oil           44.97606   54.70390   0.822 0.412020    
factor(firm)Westinghouse        60.05315   47.85534   1.255 0.211077    
factor(firm)Goodyear            23.80679   52.39144   0.454 0.650063    
factor(firm)Diamond Match      118.26245   54.92529   2.153 0.032581 *  
factor(firm)American Steel     101.76963   55.17670   1.844 0.066694 .  
factor(year)1936               -16.95922   21.51788  -0.788 0.431604    
factor(year)1937               -36.37564   22.36381  -1.627 0.105510    
factor(year)1938               -35.62372   21.16229  -1.683 0.093965 .  
factor(year)1939               -63.09939   21.50522  -2.934 0.003761 ** 
factor(year)1940               -39.82477   21.62607  -1.842 0.067121 .  
factor(year)1941               -16.48777   21.52894  -0.766 0.444731    
factor(year)1942               -17.99933   21.27459  -0.846 0.398601    
factor(year)1943               -37.77244   21.41468  -1.764 0.079380 .  
factor(year)1944               -38.32006   21.45934  -1.786 0.075759 .  
factor(year)1945               -49.53948   21.68747  -2.284 0.023475 *  
factor(year)1946               -27.75439   21.86575  -1.269 0.205900    
factor(year)1947               -34.87754   21.58854  -1.616 0.107867    
factor(year)1948               -38.33073   21.73358  -1.764 0.079413 .  
factor(year)1949               -65.20075   21.90087  -2.977 0.003293 ** 
factor(year)1950               -67.38772   22.02764  -3.059 0.002543 ** 
factor(year)1951               -54.83463   22.43681  -2.444 0.015450 *  
factor(year)1952               -56.48904   22.81926  -2.475 0.014190 *  
factor(year)1953               -58.51258   23.81942  -2.457 0.014937 *  
factor(year)1954               -81.79391   24.20411  -3.379 0.000883 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 49.43 on 188 degrees of freedom
Multiple R-squared:  0.9527,    Adjusted R-squared:  0.9449 
F-statistic: 122.1 on 31 and 188 DF,  p-value: < 2.2e-16
# 3. Manually demean
# First, calculate the firm means for each variable
firm_means <- aggregate(cbind(invest, value, capital) ~ firm, data = Grunfeld, mean)

# Merge the firm means with the original dataset, use different suffixes to avoid duplication
Grunfeld <- merge(Grunfeld, firm_means, by = "firm", suffixes = c("", "_firm_mean"))

# Manually demean the variables by subtracting firm means
Grunfeld$invest_demeaned2 <- Grunfeld$invest - Grunfeld$invest_firm_mean
Grunfeld$value_demeaned2 <- Grunfeld$value - Grunfeld$value_firm_mean
Grunfeld$capital_demeaned2 <- Grunfeld$capital - Grunfeld$capital_firm_mean

# Estimate the fixed effects model using the demeaned data and include time dummies
fe_model_manual_time <- lm(invest_demeaned2 ~ value_demeaned2 + capital_demeaned2 + factor(year) - 1, data = Grunfeld)

# Summary of the fixed effects model
summary(fe_model_manual_time)

Call:
lm(formula = invest_demeaned2 ~ value_demeaned2 + capital_demeaned2 + 
    factor(year) - 1, data = Grunfeld)

Residuals:
     Min       1Q   Median       3Q      Max 
-163.411  -17.675   -1.835   17.949  217.107 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
value_demeaned2     0.11668    0.01260   9.259  < 2e-16 ***
capital_demeaned2   0.35144    0.02051  17.135  < 2e-16 ***
factor(year)1935   41.85916   15.33714   2.729  0.00692 ** 
factor(year)1936   24.89993   15.00703   1.659  0.09866 .  
factor(year)1937    5.48352   15.41684   0.356  0.72246    
factor(year)1938    6.23543   14.84962   0.420  0.67501    
factor(year)1939  -21.24024   14.69853  -1.445  0.15002    
factor(year)1940    2.03439   14.72881   0.138  0.89028    
factor(year)1941   25.37139   14.64477   1.732  0.08475 .  
factor(year)1942   23.85983   14.70399   1.623  0.10625    
factor(year)1943    4.08671   14.57883   0.280  0.77953    
factor(year)1944    3.53909   14.56985   0.243  0.80833    
factor(year)1945   -7.68033   14.59185  -0.526  0.59924    
factor(year)1946   14.10477   14.63597   0.964  0.33637    
factor(year)1947    6.98162   14.66620   0.476  0.63457    
factor(year)1948    3.52843   14.80270   0.238  0.81185    
factor(year)1949  -23.34160   14.86583  -1.570  0.11798    
factor(year)1950  -25.52857   14.83278  -1.721  0.08680 .  
factor(year)1951  -12.97548   14.78247  -0.878  0.38114    
factor(year)1952  -14.62988   15.01966  -0.974  0.33122    
factor(year)1953  -16.65342   15.68305  -1.062  0.28959    
factor(year)1954  -39.93475   16.07066  -2.485  0.01379 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 48.17 on 198 degrees of freedom
Multiple R-squared:  0.7953,    Adjusted R-squared:  0.7726 
F-statistic: 34.97 on 22 and 198 DF,  p-value: < 2.2e-16

The estimated coefficients, 0.11668 for value and 0.35144 for capital, changed for the two-way fixed effect models. The previous FE model controlled for entity fixed effects only by removing the effect of time-invariant firm-specific characteristics. In comparison, the two-way FE model also controls for time fixed effects and captures time-varying shocks or changes that affect all firms similarly. The fixed effects model, now including both entity and time fixed effects, controls for both time-invariant characteristics of the entities (e.g., firm-specific factors) and time-varying characteristics affecting all entities (e.g., macroeconomic conditions or industry trends). Including both effects helps mitigate omitted variable bias from unobserved factors that are constant within firms over time and those that vary over time across all firms.