Panel Regression

Set up Rmarkdown

Setting up RMarkdown when opening it enables you to create dynamic, reproducible, and visually appealing reports, presentations, and documents, that can help you communicate your data analysis and research findings more effectively.

Introduction to Panel Regression

Panel regression, also known as panel data analysis or longitudinal data analysis, is a statistical method used to analyze data that has both cross-sectional and time-series dimensions. Panel data sets consist of observations on multiple individuals or entities over time, and are commonly used in economics, finance, and social sciences.

Panel regression is particularly useful for studying the effects of time-varying and time-invariant factors on a dependent variable, as it allows for the estimation of both within-group and between-group effects. Within-group effects measure how changes in the independent variables affect the dependent variable within each individual or entity over time, while between-group effects measure how differences in the independent variables across individuals or entities affect the dependent variable.

There are several types of panel regression models, including fixed effects, random effects, and hybrid models that combine both types of effects. Fixed effects models control for time-invariant individual or entity-level factors that are unobserved, while random effects models assume that these factors are uncorrelated with the independent variables. Hybrid models, such as the Hausman-Taylor model, combine both fixed and random effects.

Panel regression can be conducted using a variety of software packages, including R, Stata, and SAS. In R, the plm package is commonly used for panel data analysis.

Overall, panel regression is a powerful tool for analyzing panel data and can provide insights into the relationships between independent and dependent variables over time.

Libraries Installation and loading

Load the following for data manipulations, graphical representation and other functions

library(vars)
library(ggplot2)
library(tseries)
library(tidyverse)
library(stargazer)
library(readxl)
library(plm)
library(corrplot)
library(dplyr)
library(plm) 
library(tseries) # for `adf.test()`
library(dynlm) #for function `dynlm()`
library(vars) # for function `VAR()`
library(nlWaldTest) # for the `nlWaldtest()` function
library(lmtest) #for `coeftest()` and `bptest()`.
library(broom) #for `glance(`) and `tidy()`
library(car) #for `hccm()` robust standard errors
library(sandwich)
library(knitr) #for `kable()`
library(forecast) 
library(systemfit)
library(AER)
library(ggplot2)
library(ggpubr)
library(stargazer)
library(kableExtra)
library(jtools)
library(gtsummary)
library(broom)
library(xtable)
library(psych)
library(psycho)

To run a panel regression in Rstudio, you can use the plm package, which provides a set of functions for panel data analysis. Here is a simple example of how to run a panel regression:

Load the data set

mydata<-read.csv("C:\\Users\\user\\Downloads\\bankdata.csv")
attach(mydata)

View the first few observations

head(mydata,5)

Variables

Year (2010-2019)
Bank (Banks A, B, C)
Debt to Equity Ratio (DER)
Liquidity Current Ratio (LCR)
Size (Assets.)
Operating Expense Ratio (OER)

Fixed Effects Model(s)

We use fixed effects model when we are interested in analyzing the impact of a variable that vary overtime. It explores the relationship between predictor outcome variables within an entity/unit e.g. bank. Since the variation within banks may affect the relationship between the variables of interest, we need to control for this. FE model removes the time-invariant characteristics and allows us to access the net impact of the predictors on the outcome variable. We assume that the characteristics are unique in each bank thus the entity’s error term should not be correlated with others. If this is not the case we use RE model.

Random Effect Model

This model is used when there is no correlation between panels and the predictors. The error term captures random effects due to panels and the random error (ECM). The rationale behind random effects model is that, unlike the fixed effects model, the variation across entities is assumed to be random and uncorrelated with the predictor or independent variables included in the model: (Green, 2008). If we assume that differences across counties influence the response variable then random effects model will appropriate. This model allows for inclusion of time invariant variables like sex where these variables are absorbed by the intercept. In this model it is assumed that the country’s error term is not correlated with the explanatory variables. Controlling for time and country effects my thus affect the response variable. With the assumption of no fixed effects and that variances across countries are zero, POLS can be used to estimate the coefficients of each variable. In panel data analysis POLS is similar to OLS method of estimating a linear regression model.

Others Panel Regression Models

RE TOBIT
RE PROBIT
Dynamic Models
RE Multinomial Logit.

Make some visuals

Correlation Matrix

cor_mat <- data.frame(DER, LCR, SIZE, OER)
corr_matrix <- cor(cor_mat)
lower<-corr_matrix
lower[lower.tri(corr_matrix)]<-""
lower<-as.data.frame(lower)
lower

The correlation matrix above shows the correlation coefficients between four variables: DER, LCR, SIZE, and OER. The matrix is a 4x4 square matrix, with diagonal elements of 1, since the correlation of a variable with itself is always perfect.

Looking at the matrix, we can see that there are strong positive correlations between DER and SIZE (0.7569) and between DER and OER (0.7912), indicating that companies with higher levels of debt tend to have larger sizes and higher operating expenses. In contrast, there is a weak negative correlation between LCR and OER (-0.0593), suggesting that companies with higher liquidity ratios tend to have lower operating expenses.

There is also a weak positive correlation between DER and LCR (0.2247), but this correlation is not statistically significant at conventional levels. Similarly, there is a weak positive correlation between LCR and SIZE (0.0297), which is also not significant.

Overall, the correlation matrix provides a useful summary of the relationships between the variables, but it is important to note that correlation does not imply causation. Further analysis, such as regression analysis or structural equation modeling, would be needed to explore the causal relationships between these variables.

The results can be represented as a graph using correlation graph as shown below;

Correlation Graph

mydata %>% 
  dplyr::select(DER:OER) %>%
  cor() %>%
  round(3) %>%
  corrplot(method = "color", addCoef.col="white", type = "upper", 
           title="Correlation - Balance/Purchase/Oneoff/Payment",
           mar=c(0,0,2,0),
           tl.cex=0.5, number.cex = 0.4)

Boxplot

Boxplots are a graphical tool used for displaying and summarizing the distribution of a set of continuous data. They are often used in exploratory data analysis to quickly visualize the distribution of data and to identify any outliers or unusual observations.

Boxplots display five summary statistics for the data: the minimum and maximum values (whiskers), the lower and upper quartiles (box), and the median (line inside the box). The box itself represents the middle 50% of the data, with the lower quartile (25th percentile) at the bottom of the box and the upper quartile (75th percentile) at the top. The length of the box is therefore a measure of the spread of the data, and the median indicates the central tendency.

Outliers, which are observations that fall outside of the whiskers, are typically plotted as individual points. Boxplots can also be used to compare the distributions of multiple groups of data side-by-side, by plotting the boxplots for each group next to each other.

Some common uses of boxplots include:

Identifying the shape of the distribution (e.g., skewed, symmetric, bimodal)
Identifying outliers or extreme values in the data
Comparing the distributions of different groups or samples
Visualizing changes in a distribution over time or across different conditions.

Overall, boxplots are a useful tool for exploring and summarizing the distribution of continuous data, and are widely used in data analysis and visualization.

Boxplot of DER across banks

ggplot(mydata, aes(x = BANK, y = DER))+
  labs(title = "A Boxplot showing the distribution of DER across banks", y = "DER", x = "BANK")+
  geom_boxplot(aes(fill = BANK)) +theme(legend.position="none")+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

A Boxplot of SIZE across Banks

ggplot(mydata, aes(x = BANK, y = SIZE))+
  labs(title = "A Boxplot showing the distribution of SIZE across banks", y = "SIZE", x = "BANK")+
  geom_boxplot(aes(fill = BANK)) +theme(legend.position="none")+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

A Boxplot of OER across Banks

ggplot(mydata, aes(x = BANK, y = OER))+
  labs(title = "A Boxplot showing the distribution of OER across banks", y = "OER", x = "BANK")+
  geom_boxplot(aes(fill = BANK)) +theme(legend.position="none")+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5))

A line plot showing the trend in Debt to Equity Ratio (DER) over the years for the four banks

ggplot(data = mydata, aes(x = Year, y = DER, color = BANK)) +
  geom_line() +
  labs(x = "Year", y = "Debt Equity Ratio", color = "BANK") +
  theme_classic()

A line plot showing the trend in Liquidity Current Ratio (LCR) over the years for the four banks

ggplot(data = mydata, aes(x = Year, y = OER, color = BANK)) +
  geom_line() +
  labs(x = "Year", y = "Liquidity Current Ratio", color = "BANK") +
  theme_classic()

Summary Statistics

describe(mydata[,3:6])

Fixed-Effect and Random Effect Models

Fixed effect and random effect models are two commonly used methods in panel data analysis to control for unobserved heterogeneity. The main difference between the two is how they treat the unobserved heterogeneity.

In a fixed effect model, individual-specific fixed effects are included in the regression model to capture the time-invariant differences between individuals or entities. This means that the regression coefficients are estimated within each individual, and the model controls for all time-invariant differences between individuals. The fixed effect model assumes that all unobserved heterogeneity is time-invariant and constant across time.

In contrast, a random effects model assumes that the unobserved heterogeneity is random and varies across time. In this type of model, individual-specific random effects are included in the regression model to capture the time-varying differences between individuals. The random effects model assumes that the unobserved heterogeneity is uncorrelated with the observed explanatory variables and has a constant variance across time.

The main advantage of the fixed effect model is that it controls for all time-invariant differences between individuals, even if they are unobserved. This makes it more robust to omitted variable bias and other sources of endogeneity. However, the fixed effect model does not allow for estimation of the coefficients of time-invariant variables.

The main advantage of the random effects model is that it allows for estimation of the coefficients of time-invariant variables. However, the random effects model may suffer from omitted variable bias if the unobserved heterogeneity is correlated with the observed explanatory variables.

Fixed-Effect Model

A fixed-effect panel regression is a statistical method used to analyze data that contains both time-series and cross-sectional dimensions. In this type of regression, the focus is on identifying the impact of a particular independent variable on a dependent variable while controlling for individual-specific characteristics that are fixed over time, also known as individual fixed effects.

The fixed-effect model controls for unobserved heterogeneity that is specific to each individual entity, such as individual characteristics or unmeasured factors that are constant over time. This is achieved by including a separate intercept term for each individual entity in the regression model. By including individual fixed effects, the model effectively removes the influence of time-invariant factors and focuses solely on the relationship between the independent and dependent variables.

Fixed-effect panel regression is commonly used in economics and social science research, particularly when analyzing data that is collected over multiple periods and involves a large number of individuals or groups. The fixed-effect approach can help to improve the accuracy of the regression analysis by reducing omitted variable bias and other forms of specification errors.

What about panel regression

Panel data gathers information about several individuals (cross-sectional units) over several periods. The panel is balanced if all units are observed in all periods; if some units are missing in some periods, the panel is unbalanced

Organizing the Data as a Panel

A wide panel has the cross-sectional dimension (N) much larger than the longitudinal dimension (T); when the opposite is true, we have a long panel. Normally, the same units are observed in all periods; when this is not the case and each period samples mostly other units, the result is not a proper panel data, but pooled cross-sections model.

This manual uses the panel data package plm(), which also gives the possibility of organizing the data under the form of a panel. Panel datsets can be organized in mainly two forms: the long form has a column for each variable and a row for each individual-period; the wide form has a column for each variable-period and a row for each individual. Most panel data methods require the long form, but many data sources provide one wide-form table for each variable; assembling the data from different sources into a long form data frame is often not a trivial matter.

The next code sequence creates a panel structure for the dataset nls_panel using the function pdata.frame of the plm package and displays a small part of this dataset. Please note how the selection of the rows and columns to be displayed is done, using the compact operator %in% and arrays such as c(1:6, 14:15). Table 15.1 shows this sample.

head(mydata,5)

mydata_1 <- pdata.frame(mydata, index = c("Year", "BANK"))
head(mydata_1,5)

Check if your panel is balanced

pdim(mydata_1)

Balanced Panel: n = 10, T = 4, N = 40

The Pooled Model

A pooled model has the specification, which does not allow for intercept or slope differences among individuals. Such a model can be estimated in R using the specification pooling in the plm() function, as the following code sequence illustrates.

Estimate the model

OER.pooled <- plm(OER~DER+LCR+SIZE, 
  model="pooling", data=mydata_1)

View the results using stargazer

stargazer(OER.pooled,report = "vc*stp",type = "text",out = "./q7results.txt")


========================================
                 Dependent variable:    
             ---------------------------
                         OER            
----------------------------------------
DER                   0.044***          
                       (0.013)          
                      t = 3.270         
                      p = 0.003         
                                        
LCR                    -0.050           
                       (0.087)          
                     t = -0.570         
                      p = 0.572         
                                        
SIZE                 0.00000***         
                      (0.00000)         
                      t = 2.821         
                      p = 0.008         
                                        
Constant              0.284***          
                       (0.065)          
                      t = 4.337         
                     p = 0.0002         
                                        
----------------------------------------
Observations             40             
R2                      0.713           
Adjusted R2             0.689           
F Statistic    29.761*** (df = 3; 36)   
========================================
Note:        *p<0.1; **p<0.05; ***p<0.01

Alternative way to visualize the results

kable(tidy(OER.pooled), digits=3, 
           caption="Pooled model")

Pooled model
term	estimate	std.error	statistic	p.value
(Intercept)	0.284	0.065	4.337	0.000
DER	0.044	0.013	3.270	0.002
LCR	-0.050	0.087	-0.570	0.572
SIZE	0.000	0.000	2.821	0.008

Model Interpretation

A pooled panel model is a type of regression analysis that combines cross-sectional and time-series data. In a pooled panel model, the same group of individuals or entities (the panel) is observed over time, and data from each observation is pooled together for analysis. This type of model allows for the estimation of the effect of both individual and time-specific characteristics on the outcome variable, as well as the interaction between them.

Pooled panel models are useful when data is available on both individual units and over time, such as in longitudinal studies, cohort studies, or when analyzing the effect of policies or interventions over time. They can provide a more comprehensive understanding of the relationship between the independent and dependent variables and can account for both individual-level and time-specific effects.

It is important to note that pooled panel models assume that the relationship between the dependent variable and the independent variables is constant across all individuals and over time, and that there is no unobserved heterogeneity among individuals. Therefore, caution should be taken when interpreting the results and alternative models, such as fixed effects or random effects models, should be considered when there is potential for unobserved heterogeneity or individual-level effects.

The model presented is a panel regression, also known as a pooled regression model, where the dependent variable is OER, and the independent variables are DER, LCR, and SIZE. The coefficients for the independent variables represent the change in the dependent variable, OER, when the independent variable increases by one unit, holding all other variables constant.

The coefficient for DER is 0.044, which is statistically significant at the 1% level (p<0.01), indicating that an increase in DER by one unit leads to an increase in OER by 0.044 units, holding all other variables constant. On the other hand, the coefficient for LCR is -0.050, which is not statistically significant (p>0.1), indicating that the variable does not have a significant impact on OER.

The coefficient for SIZE is 0.00000, which is statistically significant at the 1% level (p<0.01), indicating that an increase in SIZE by one unit leads to an increase in OER by 0.00000 units, holding all other variables constant. This coefficient is very close to zero, indicating that the effect of SIZE on OER is almost negligible.

The constant term is 0.284, which is statistically significant at the 1% level (p<0.01), indicating that the intercept value of OER is 0.284 when all independent variables are zero. Besides, the R-squared value of 0.713 indicates that the model explains approximately 71.3% of the variation in OER, while the adjusted R-squared value of 0.689 adjusts for the number of independent variables in the model.

The F-statistic of 29.761 is statistically significant at the 1% level (p<0.01), indicating that at least one of the independent variables has a significant impact on OER. Overall, the model suggests that DER and SIZE have a significant impact on OER, while LCR does not.

Clustered Robust Standard Errors

tbl <- tidy(coeftest(OER.pooled, vcov=vcovHC(OER.pooled,
                    type="HC0",cluster="group")))
kable(tbl, digits=5, caption=
"Pooled 'OER' model with cluster robust standard errors")

Pooled ‘OER’ model with cluster robust standard errors
term	estimate	std.error	statistic	p.value
(Intercept)	0.28396	0.04288	6.62290	0.00000
DER	0.04366	0.01161	3.76144	0.00060
LCR	-0.04965	0.04524	-1.09757	0.27968
SIZE	0.00000	0.00000	2.93246	0.00581

The Fixed Effects Model

The fixed effects model takes into account individual differences, translated into different intercepts of the regression line for different individuals. The model in this case assigns the subscript i to the constant term β1. The constant terms calculated in this way are called fixed effects. Additionally, variables that change little or not at all over time, such as some individual characteristics should not be included in a fixed effects model because they produce collinearity with the fixed effects.

Create the fixed-effect model

model_fe <- plm(OER ~ DER + LCR + SIZE, data = mydata_1, model = "within")

stargazer(model_fe,report = "vc*stp",type = "text",out = "./q7results.txt")


========================================
                 Dependent variable:    
             ---------------------------
                         OER            
----------------------------------------
DER                    0.046**          
                       (0.018)          
                      t = 2.540         
                      p = 0.018         
                                        
LCR                    -0.083           
                       (0.165)          
                     t = -0.501         
                      p = 0.621         
                                        
SIZE                  0.00000**         
                      (0.00000)         
                      t = 2.514         
                      p = 0.019         
                                        
----------------------------------------
Observations             40             
R2                      0.716           
Adjusted R2             0.590           
F Statistic    22.723*** (df = 3; 27)   
========================================
Note:        *p<0.1; **p<0.05; ***p<0.01

Alternative

OER.within <- plm(OER ~ DER + LCR + SIZE, data = mydata_1, 
                  model="within")
tbl <- tidy(OER.within)
kable(tbl, digits=5, caption=
"Fixed effects using 'within' with full sample")

Fixed effects using ‘within’ with full sample
term	estimate	std.error	statistic	p.value
DER	0.04559	0.01795	2.54048	0.01713
LCR	-0.08253	0.16466	-0.50122	0.62028
SIZE	0.00000	0.00000	2.51393	0.01821

kable(tidy(pFtest(OER.within, OER.pooled)), caption=
        "Fixed effects test: Ho:'No fixed effects'")

Fixed effects test: Ho:‘No fixed effects’
df1	df2	statistic	p.value	method	alternative
9	27	0.5890368	0.7944397	F test for individual effects	significant effects

Random Effect Model

The random effects model elaborates on the fixed effects model by recognizing that, since the individuals in the panel are randomly selected, their characteristics, measured by the intercept β1 should also be random. Thus, the random effects model assumes the form of the intercept where β1 stands for the population average and ui represents an individual-specific random term. As in the case of fixed effects, random effects are also time-invariant.

OER_ReTest <- plmtest(OER.pooled, effect="individual")
kable(tidy(OER_ReTest), caption=
        "A random effects test for the OER equation")

A random effects test for the OER equation
statistic	p.value	method	alternative
-0.9073556	0.8178906	Lagrange Multiplier Test - (Honda)	significant effects

Random effects estimator are reliable under the assumption that individual characteristics (heterogeneity) are exogenous, that is, they are independent with respect to the regressors in the random effects equation. The same Hausman test for endogeneity we have already used in another chapter can be used here as well, with the null hypothesis that individual random effects are exogenous. The test function phtest() compares the fixed effects and the random effects models; the next code lines estimate the random effects model and performs the Hausman endogeneity test.

Estimate the random effect model

OER.random <- plm(OER~DER+LCR+SIZE,
                  data=mydata_1, random.method="swar",
                  model="random")
kable(tidy(OER.random), digits=4, caption=
      "The random effects results for the OER equation")

The random effects results for the OER equation
term	estimate	std.error	statistic	p.value
(Intercept)	0.2840	0.0655	4.3374	0.0000
DER	0.0437	0.0134	3.2700	0.0011
LCR	-0.0497	0.0870	-0.5705	0.5684
SIZE	0.0000	0.0000	2.8211	0.0048

kable(tidy(phtest(OER.within, OER.random)), caption=
 "Hausman endogeneity test for the random effects OER model")

Hausman endogeneity test for the random effects OER model
statistic	p.value	parameter	method	alternative
0.1358947	0.9872066	3	Hausman Test	one model is inconsistent

The results above shows a high p-value of the test, which indicates that the null hypothesis saying that the individual random effects are exogenous is accepted, which makes the random effects equation consistent. In this case the random effects model is the correct solution.

Second Illustration

Grunfeld’s Investment Example

The dataset grunfeld2 is a subset of the initial dataset; it includes two firms, GE and WE observed over the period 1935 to 1954. The purpose of this example is to identify various issues that should be taken into account when building a panel data econometric model. The problem is to find the determinants of investment by a firm , invit among regressors such as the value of the firm, vit, and capital stock kit. Table below gives a glimpse of the grunfeld panel data.

library(devtools)
install_git("https://github.com/ccolonescu/PoEdata")

data("grunfeld2", package="PoEdata")

View the first few observations

head(grunfeld2,10)

Create the panel data

grun <- pdata.frame(grunfeld2, index=c("firm","year"))
head(grun,10)

Let us consider a pooling model first, assuming that the coefficients of the regression equation, as well as the error variances are the same for both firms (no individual heterogeneity).

grun.pool <- plm(inv~v+k, 
                 model="pooling",data=grun)
kable(tidy(grun.pool), digits=5, caption=
  "Grunfeld dataset, pooling panel data results")

Grunfeld dataset, pooling panel data results
term	estimate	std.error	statistic	p.value
(Intercept)	17.87200	7.02408	2.54439	0.01525
v	0.01519	0.00620	2.45191	0.01905
k	0.14358	0.01860	7.71890	0.00000

SSE.pool <- sum(resid(grun.pool)^2)
sigma2.pool <- SSE.pool/(grun.pool$df.residual)
sigma2.pool

[1] 447.6487

Alternative way to display the results

grun.pool_1 <- plm(inv~v+k, 
                 model="pooling",data=grun)|>
  tidy() |> 
  kable() |> 
  kable_classic()
grun.pool_1

term	estimate	std.error	statistic	p.value
(Intercept)	17.8720011	7.0240806	2.544390	0.0152529
v	0.0151926	0.0061962	2.451913	0.0190508
k	0.1435792	0.0186010	7.718900	0.0000000

Allowing for different coefficients across firms but same error structure is the fixed effects model summarized in the results below. Note that the fixed effects are modeled using the function factor().

grun.fe <- plm(inv~v*grun$firm+k*grun$firm, 
               model="pooling",data=grun)
kable(tidy(grun.fe), digits=4, caption=
  "Grunfeld dataset, 'pooling' panel data results")

Grunfeld dataset, ‘pooling’ panel data results
term	estimate	std.error	statistic	p.value
(Intercept)	-9.9563	23.6264	-0.4214	0.6761
v	0.0266	0.0117	2.2651	0.0300
grun$firm2 </td> <td style="text-align:right;"> 9.4469 </td> <td style="text-align:right;"> 28.8054 </td> <td style="text-align:right;"> 0.3280 </td> <td style="text-align:right;"> 0.7450 </td> </tr> <tr> <td style="text-align:left;"> k </td> <td style="text-align:right;"> 0.1517 </td> <td style="text-align:right;"> 0.0194 </td> <td style="text-align:right;"> 7.8369 </td> <td style="text-align:right;"> 0.0000 </td> </tr> <tr> <td style="text-align:left;"> v:grun$firm2	0.0263	0.0344	0.7668	0.4485
grun$firm2:k	-0.0593	0.1169	-0.5070	0.6155

SSE.fe <- sum(resid(grun.fe)^2)
sigma2.fe <- SSE.fe/(grun.fe$df.residual)
sigma2.fe

[1] 440.8771

Alternative Display of Results

grun.fe_1 <- plm(inv~v*grun$firm+k*grun$firm, 
               model="pooling",data=grun)|>
  tidy() |> 
  kable() |> 
  kable_classic()
grun.fe_1

term	estimate	std.error	statistic	p.value
(Intercept)	-9.9563065	23.6263647	-0.4214066	0.6761105
v	0.0265512	0.0117220	2.2650640	0.0299963
grun$firm2 </td> <td style="text-align:right;"> 9.4469163 </td> <td style="text-align:right;"> 28.8053507 </td> <td style="text-align:right;"> 0.3279570 </td> <td style="text-align:right;"> 0.7449553 </td> </tr> <tr> <td style="text-align:left;"> k </td> <td style="text-align:right;"> 0.1516939 </td> <td style="text-align:right;"> 0.0193564 </td> <td style="text-align:right;"> 7.8368647 </td> <td style="text-align:right;"> 0.0000000 </td> </tr> <tr> <td style="text-align:left;"> v:grun$firm2	0.0263429	0.0343527	0.7668380	0.4484701
grun$firm2:k	-0.0592874	0.1169464	-0.5069618	0.6154540

A test to see if the coefficients are significantly different between the pooling and fixed effects equations can be done in R using the function pooltest from package plm; to perform this test, the fixed effects model should be estimated with the function pvcm with the argument model= “within”, as the next code lines show.

grun.pvcm <- pvcm(inv~v+k, 
                   model="within", data=grun)
coef(grun.pvcm)

pooltest(grun.pool, grun.pvcm)


    F statistic

data:  inv ~ v + k
F = 1.1894, df1 = 3, df2 = 34, p-value = 0.3284
alternative hypothesis: unstability

The result shows that the null hypothesis of zero coefficients for the individual dummy terms are zero cannot be rejected. (However, the pvcm function is not equivalent to the fixed effects model that uses individual dummies; it is, though, useful for testing the ‘poolability’ of a dataset.)

Now, if we allow for different coefficients and different error variances, the equations for each individual is independent from those for other individuals and it can be estimated separately.

grun1.pool <- plm(inv~v+k, model="pooling",
                 subset=grun$firm==1, data=grun)
SSE.pool1<- sum(resid(grun1.pool)^2)
sig2.pool1 <- SSE.pool1/grun1.pool$df.residual
kable(tidy(grun1.pool), digits=4, align='c', caption=
        "Pooling astimates for the GE firm (firm=1)")

Pooling astimates for the GE firm (firm=1)
term	estimate	std.error	statistic	p.value
(Intercept)	-9.9563	31.3742	-0.3173	0.7548
v	0.0266	0.0156	1.7057	0.1063
k	0.1517	0.0257	5.9015	0.0000

grun2.pool <- plm(inv~v+k, model="pooling",
                 subset=grun$firm==2, data=grun)
SSE.pool2 <- sum(resid(grun2.pool)^2)
sig2.pool2 <- SSE.pool2/grun2.pool$df.residual
kable(tidy(grun2.pool), digits=4, align='c', caption=
        "Pooling estimates for the WE firm (firm=2)")

Pooling estimates for the WE firm (firm=2)
term	estimate	std.error	statistic	p.value
(Intercept)	-0.5094	8.0153	-0.0636	0.9501
v	0.0529	0.0157	3.3677	0.0037
k	0.0924	0.0561	1.6472	0.1179

Tables above show the results for the equations on subsets of data, separated by firms. A Godfeld-Quandt test can be carried out to determine whether the variances are different among firms, as the next code shows.

gqtest(grun.pool, point=0.5, alternative="two.sided",
       order.by=grun$firm)


    Goldfeld-Quandt test

data:  grun.pool
GQ = 0.13417, df1 = 17, df2 = 17, p-value = 0.000143
alternative hypothesis: variance changes from segment 1 to 2

The result is rejection of the null hypothesis that the variances are equal, indicating that estimating separate equations for each firm is the correct model. What happens when we assume that the only link between the two firms is correlation between their contemporaneous error terms? This is the model of seemingly unrelated regressions, a generalized least squares method.

library(systemfit)
grunf<- grunfeld2
grunf$Firm<-"WE"
for (i in 1:40){
  if(grunf$firm[i]==1){grunf$Firm[i] <- "GE"}
}
grunf$firm <- NULL
names(grunf)<- c("inv", "val", "cap", "year", "firm")
grunfpd <- plm.data(grunf, c("firm","year"))
grunf.SUR <- systemfit(inv~val+cap, method="SUR", data=grunfpd)
summary(grunf.SUR, resdCov=FALSE, equations=FALSE)


systemfit results 
method: SUR 

        N DF     SSR detRCov   OLS-R2 McElroy-R2
system 40 34 15589.7 35640.6 0.698968   0.615103

    N DF     SSR     MSE    RMSE       R2   Adj R2
GE 20 17 13788.4 811.081 28.4795 0.692557 0.656388
WE 20 17  1801.3 105.959 10.2936 0.740401 0.709860

The covariance matrix of the residuals used for estimation
        GE      WE
GE 777.446 207.587
WE 207.587 104.308

The covariance matrix of the residuals
        GE      WE
GE 811.081 224.278
WE 224.278 105.959

The correlations of the residuals
         GE       WE
GE 1.000000 0.765043
WE 0.765043 1.000000


Coefficients:
                  Estimate  Std. Error  t value   Pr(>|t|)    
GE_(Intercept) -27.7193171  29.3212188 -0.94537  0.3577155    
GE_val           0.0383102   0.0144152  2.65763  0.0165755 *  
GE_cap           0.1390363   0.0249856  5.56466 3.4234e-05 ***
WE_(Intercept)  -1.2519882   7.5452174 -0.16593  0.8701684    
WE_val           0.0576298   0.0145463  3.96182  0.0010072 ** 
WE_cap           0.0639781   0.0530406  1.20621  0.2442559    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

First, please note that the systemfit() function requires a panel data file created with plm.data, instead of the pdata.frame that we have used above; second, for some reason I had to change the names of the variables to names having more than one letter to make the function work. I did this using the function names().

Panel Regression

Lumumba W. Vivtor

2023-03-05

Set up Rmarkdown

Introduction to Panel Regression

Libraries Installation and loading

Load the data set

View the first few observations

Variables

Fixed Effects Model(s)

Random Effect Model

Others Panel Regression Models

Make some visuals

Correlation Matrix

Correlation Graph

Boxplot

Some common uses of boxplots include:

Boxplot of DER across banks

A Boxplot of SIZE across Banks

A Boxplot of OER across Banks

A line plot showing the trend in Debt to Equity Ratio (DER) over the years for the four banks

A line plot showing the trend in Liquidity Current Ratio (LCR) over the years for the four banks

Summary Statistics

Fixed-Effect and Random Effect Models

Fixed-Effect Model

What about panel regression

Organizing the Data as a Panel

Check if your panel is balanced

The Pooled Model

Estimate the model

View the results using stargazer

Alternative way to visualize the results

Model Interpretation

Clustered Robust Standard Errors

The Fixed Effects Model

Create the fixed-effect model

Alternative

Random Effect Model

Estimate the random effect model

Second Illustration

Grunfeld’s Investment Example

View the first few observations

Create the panel data

Alternative way to display the results

Alternative Display of Results