Setting up RMarkdown when opening it enables you to create dynamic, reproducible, and visually appealing reports, presentations, and documents, that can help you communicate your data analysis and research findings more effectively.
Panel regression, also known as panel data analysis or longitudinal data analysis, is a statistical method used to analyze data that has both cross-sectional and time-series dimensions. Panel data sets consist of observations on multiple individuals or entities over time, and are commonly used in economics, finance, and social sciences.
Panel regression is particularly useful for studying the effects of time-varying and time-invariant factors on a dependent variable, as it allows for the estimation of both within-group and between-group effects. Within-group effects measure how changes in the independent variables affect the dependent variable within each individual or entity over time, while between-group effects measure how differences in the independent variables across individuals or entities affect the dependent variable.
There are several types of panel regression models, including fixed effects, random effects, and hybrid models that combine both types of effects. Fixed effects models control for time-invariant individual or entity-level factors that are unobserved, while random effects models assume that these factors are uncorrelated with the independent variables. Hybrid models, such as the Hausman-Taylor model, combine both fixed and random effects.
Panel regression can be conducted using a variety of software packages, including R, Stata, and SAS. In R, the plm package is commonly used for panel data analysis.
Overall, panel regression is a powerful tool for analyzing panel data and can provide insights into the relationships between independent and dependent variables over time.
Load the following for data manipulations, graphical representation and other functions
library(vars)
library(ggplot2)
library(tseries)
library(tidyverse)
library(stargazer)
library(readxl)
library(plm)
library(corrplot)
library(dplyr)
library(plm)
library(tseries) # for `adf.test()`
library(dynlm) #for function `dynlm()`
library(vars) # for function `VAR()`
library(nlWaldTest) # for the `nlWaldtest()` function
library(lmtest) #for `coeftest()` and `bptest()`.
library(broom) #for `glance(`) and `tidy()`
library(car) #for `hccm()` robust standard errors
library(sandwich)
library(knitr) #for `kable()`
library(forecast)
library(systemfit)
library(AER)
library(ggplot2)
library(ggpubr)
library(stargazer)
library(kableExtra)
library(jtools)
library(gtsummary)
library(broom)
library(xtable)
library(psych)
library(psycho)
To run a panel regression in Rstudio, you can use the plm package, which provides a set of functions for panel data analysis. Here is a simple example of how to run a panel regression:
mydata<-read.csv("C:\\Users\\user\\Downloads\\bankdata.csv")
attach(mydata)
head(mydata,5)
We use fixed effects model when we are interested in analyzing the impact of a variable that vary overtime. It explores the relationship between predictor outcome variables within an entity/unit e.g. bank. Since the variation within banks may affect the relationship between the variables of interest, we need to control for this. FE model removes the time-invariant characteristics and allows us to access the net impact of the predictors on the outcome variable. We assume that the characteristics are unique in each bank thus the entity’s error term should not be correlated with others. If this is not the case we use RE model.
This model is used when there is no correlation between panels and the predictors. The error term captures random effects due to panels and the random error (ECM). The rationale behind random effects model is that, unlike the fixed effects model, the variation across entities is assumed to be random and uncorrelated with the predictor or independent variables included in the model: (Green, 2008). If we assume that differences across counties influence the response variable then random effects model will appropriate. This model allows for inclusion of time invariant variables like sex where these variables are absorbed by the intercept. In this model it is assumed that the country’s error term is not correlated with the explanatory variables. Controlling for time and country effects my thus affect the response variable. With the assumption of no fixed effects and that variances across countries are zero, POLS can be used to estimate the coefficients of each variable. In panel data analysis POLS is similar to OLS method of estimating a linear regression model.
cor_mat <- data.frame(DER, LCR, SIZE, OER)
corr_matrix <- cor(cor_mat)
lower<-corr_matrix
lower[lower.tri(corr_matrix)]<-""
lower<-as.data.frame(lower)
lower
The correlation matrix above shows the correlation coefficients between four variables: DER, LCR, SIZE, and OER. The matrix is a 4x4 square matrix, with diagonal elements of 1, since the correlation of a variable with itself is always perfect.
Looking at the matrix, we can see that there are strong positive correlations between DER and SIZE (0.7569) and between DER and OER (0.7912), indicating that companies with higher levels of debt tend to have larger sizes and higher operating expenses. In contrast, there is a weak negative correlation between LCR and OER (-0.0593), suggesting that companies with higher liquidity ratios tend to have lower operating expenses.
There is also a weak positive correlation between DER and LCR (0.2247), but this correlation is not statistically significant at conventional levels. Similarly, there is a weak positive correlation between LCR and SIZE (0.0297), which is also not significant.
Overall, the correlation matrix provides a useful summary of the relationships between the variables, but it is important to note that correlation does not imply causation. Further analysis, such as regression analysis or structural equation modeling, would be needed to explore the causal relationships between these variables.
The results can be represented as a graph using correlation graph as shown below;
mydata %>%
dplyr::select(DER:OER) %>%
cor() %>%
round(3) %>%
corrplot(method = "color", addCoef.col="white", type = "upper",
title="Correlation - Balance/Purchase/Oneoff/Payment",
mar=c(0,0,2,0),
tl.cex=0.5, number.cex = 0.4)
Boxplots are a graphical tool used for displaying and summarizing the distribution of a set of continuous data. They are often used in exploratory data analysis to quickly visualize the distribution of data and to identify any outliers or unusual observations.
Boxplots display five summary statistics for the data: the minimum and maximum values (whiskers), the lower and upper quartiles (box), and the median (line inside the box). The box itself represents the middle 50% of the data, with the lower quartile (25th percentile) at the bottom of the box and the upper quartile (75th percentile) at the top. The length of the box is therefore a measure of the spread of the data, and the median indicates the central tendency.
Outliers, which are observations that fall outside of the whiskers, are typically plotted as individual points. Boxplots can also be used to compare the distributions of multiple groups of data side-by-side, by plotting the boxplots for each group next to each other.
Overall, boxplots are a useful tool for exploring and summarizing the distribution of continuous data, and are widely used in data analysis and visualization.
ggplot(mydata, aes(x = BANK, y = DER))+
labs(title = "A Boxplot showing the distribution of DER across banks", y = "DER", x = "BANK")+
geom_boxplot(aes(fill = BANK)) +theme(legend.position="none")+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
ggplot(mydata, aes(x = BANK, y = SIZE))+
labs(title = "A Boxplot showing the distribution of SIZE across banks", y = "SIZE", x = "BANK")+
geom_boxplot(aes(fill = BANK)) +theme(legend.position="none")+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
ggplot(mydata, aes(x = BANK, y = OER))+
labs(title = "A Boxplot showing the distribution of OER across banks", y = "OER", x = "BANK")+
geom_boxplot(aes(fill = BANK)) +theme(legend.position="none")+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5))
ggplot(data = mydata, aes(x = Year, y = DER, color = BANK)) +
geom_line() +
labs(x = "Year", y = "Debt Equity Ratio", color = "BANK") +
theme_classic()
ggplot(data = mydata, aes(x = Year, y = OER, color = BANK)) +
geom_line() +
labs(x = "Year", y = "Liquidity Current Ratio", color = "BANK") +
theme_classic()
describe(mydata[,3:6])
Fixed effect and random effect models are two commonly used methods in panel data analysis to control for unobserved heterogeneity. The main difference between the two is how they treat the unobserved heterogeneity.
In a fixed effect model, individual-specific fixed effects are included in the regression model to capture the time-invariant differences between individuals or entities. This means that the regression coefficients are estimated within each individual, and the model controls for all time-invariant differences between individuals. The fixed effect model assumes that all unobserved heterogeneity is time-invariant and constant across time.
In contrast, a random effects model assumes that the unobserved heterogeneity is random and varies across time. In this type of model, individual-specific random effects are included in the regression model to capture the time-varying differences between individuals. The random effects model assumes that the unobserved heterogeneity is uncorrelated with the observed explanatory variables and has a constant variance across time.
The main advantage of the fixed effect model is that it controls for all time-invariant differences between individuals, even if they are unobserved. This makes it more robust to omitted variable bias and other sources of endogeneity. However, the fixed effect model does not allow for estimation of the coefficients of time-invariant variables.
The main advantage of the random effects model is that it allows for estimation of the coefficients of time-invariant variables. However, the random effects model may suffer from omitted variable bias if the unobserved heterogeneity is correlated with the observed explanatory variables.
A fixed-effect panel regression is a statistical method used to analyze data that contains both time-series and cross-sectional dimensions. In this type of regression, the focus is on identifying the impact of a particular independent variable on a dependent variable while controlling for individual-specific characteristics that are fixed over time, also known as individual fixed effects.
The fixed-effect model controls for unobserved heterogeneity that is specific to each individual entity, such as individual characteristics or unmeasured factors that are constant over time. This is achieved by including a separate intercept term for each individual entity in the regression model. By including individual fixed effects, the model effectively removes the influence of time-invariant factors and focuses solely on the relationship between the independent and dependent variables.
Fixed-effect panel regression is commonly used in economics and social science research, particularly when analyzing data that is collected over multiple periods and involves a large number of individuals or groups. The fixed-effect approach can help to improve the accuracy of the regression analysis by reducing omitted variable bias and other forms of specification errors.
Panel data gathers information about several individuals (cross-sectional units) over several periods. The panel is balanced if all units are observed in all periods; if some units are missing in some periods, the panel is unbalanced
A wide panel has the cross-sectional dimension (N) much larger than the longitudinal dimension (T); when the opposite is true, we have a long panel. Normally, the same units are observed in all periods; when this is not the case and each period samples mostly other units, the result is not a proper panel data, but pooled cross-sections model.
This manual uses the panel data package plm(), which also gives the possibility of organizing the data under the form of a panel. Panel datsets can be organized in mainly two forms: the long form has a column for each variable and a row for each individual-period; the wide form has a column for each variable-period and a row for each individual. Most panel data methods require the long form, but many data sources provide one wide-form table for each variable; assembling the data from different sources into a long form data frame is often not a trivial matter.
The next code sequence creates a panel structure for the dataset nls_panel using the function pdata.frame of the plm package and displays a small part of this dataset. Please note how the selection of the rows and columns to be displayed is done, using the compact operator %in% and arrays such as c(1:6, 14:15). Table 15.1 shows this sample.
head(mydata,5)
mydata_1 <- pdata.frame(mydata, index = c("Year", "BANK"))
head(mydata_1,5)
pdim(mydata_1)
Balanced Panel: n = 10, T = 4, N = 40
A pooled model has the specification, which does not allow for intercept or slope differences among individuals. Such a model can be estimated in R using the specification pooling in the plm() function, as the following code sequence illustrates.
OER.pooled <- plm(OER~DER+LCR+SIZE,
model="pooling", data=mydata_1)
stargazer(OER.pooled,report = "vc*stp",type = "text",out = "./q7results.txt")
========================================
Dependent variable:
---------------------------
OER
----------------------------------------
DER 0.044***
(0.013)
t = 3.270
p = 0.003
LCR -0.050
(0.087)
t = -0.570
p = 0.572
SIZE 0.00000***
(0.00000)
t = 2.821
p = 0.008
Constant 0.284***
(0.065)
t = 4.337
p = 0.0002
----------------------------------------
Observations 40
R2 0.713
Adjusted R2 0.689
F Statistic 29.761*** (df = 3; 36)
========================================
Note: *p<0.1; **p<0.05; ***p<0.01
kable(tidy(OER.pooled), digits=3,
caption="Pooled model")
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 0.284 | 0.065 | 4.337 | 0.000 |
| DER | 0.044 | 0.013 | 3.270 | 0.002 |
| LCR | -0.050 | 0.087 | -0.570 | 0.572 |
| SIZE | 0.000 | 0.000 | 2.821 | 0.008 |
A pooled panel model is a type of regression analysis that combines cross-sectional and time-series data. In a pooled panel model, the same group of individuals or entities (the panel) is observed over time, and data from each observation is pooled together for analysis. This type of model allows for the estimation of the effect of both individual and time-specific characteristics on the outcome variable, as well as the interaction between them.
Pooled panel models are useful when data is available on both individual units and over time, such as in longitudinal studies, cohort studies, or when analyzing the effect of policies or interventions over time. They can provide a more comprehensive understanding of the relationship between the independent and dependent variables and can account for both individual-level and time-specific effects.
It is important to note that pooled panel models assume that the relationship between the dependent variable and the independent variables is constant across all individuals and over time, and that there is no unobserved heterogeneity among individuals. Therefore, caution should be taken when interpreting the results and alternative models, such as fixed effects or random effects models, should be considered when there is potential for unobserved heterogeneity or individual-level effects.
The model presented is a panel regression, also known as a pooled regression model, where the dependent variable is OER, and the independent variables are DER, LCR, and SIZE. The coefficients for the independent variables represent the change in the dependent variable, OER, when the independent variable increases by one unit, holding all other variables constant.
The coefficient for DER is 0.044, which is statistically significant at the 1% level (p<0.01), indicating that an increase in DER by one unit leads to an increase in OER by 0.044 units, holding all other variables constant. On the other hand, the coefficient for LCR is -0.050, which is not statistically significant (p>0.1), indicating that the variable does not have a significant impact on OER.
The coefficient for SIZE is 0.00000, which is statistically significant at the 1% level (p<0.01), indicating that an increase in SIZE by one unit leads to an increase in OER by 0.00000 units, holding all other variables constant. This coefficient is very close to zero, indicating that the effect of SIZE on OER is almost negligible.
The constant term is 0.284, which is statistically significant at the 1% level (p<0.01), indicating that the intercept value of OER is 0.284 when all independent variables are zero. Besides, the R-squared value of 0.713 indicates that the model explains approximately 71.3% of the variation in OER, while the adjusted R-squared value of 0.689 adjusts for the number of independent variables in the model.
The F-statistic of 29.761 is statistically significant at the 1% level (p<0.01), indicating that at least one of the independent variables has a significant impact on OER. Overall, the model suggests that DER and SIZE have a significant impact on OER, while LCR does not.
tbl <- tidy(coeftest(OER.pooled, vcov=vcovHC(OER.pooled,
type="HC0",cluster="group")))
kable(tbl, digits=5, caption=
"Pooled 'OER' model with cluster robust standard errors")
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 0.28396 | 0.04288 | 6.62290 | 0.00000 |
| DER | 0.04366 | 0.01161 | 3.76144 | 0.00060 |
| LCR | -0.04965 | 0.04524 | -1.09757 | 0.27968 |
| SIZE | 0.00000 | 0.00000 | 2.93246 | 0.00581 |
The fixed effects model takes into account individual differences, translated into different intercepts of the regression line for different individuals. The model in this case assigns the subscript i to the constant term β1. The constant terms calculated in this way are called fixed effects. Additionally, variables that change little or not at all over time, such as some individual characteristics should not be included in a fixed effects model because they produce collinearity with the fixed effects.
model_fe <- plm(OER ~ DER + LCR + SIZE, data = mydata_1, model = "within")
stargazer(model_fe,report = "vc*stp",type = "text",out = "./q7results.txt")
========================================
Dependent variable:
---------------------------
OER
----------------------------------------
DER 0.046**
(0.018)
t = 2.540
p = 0.018
LCR -0.083
(0.165)
t = -0.501
p = 0.621
SIZE 0.00000**
(0.00000)
t = 2.514
p = 0.019
----------------------------------------
Observations 40
R2 0.716
Adjusted R2 0.590
F Statistic 22.723*** (df = 3; 27)
========================================
Note: *p<0.1; **p<0.05; ***p<0.01
OER.within <- plm(OER ~ DER + LCR + SIZE, data = mydata_1,
model="within")
tbl <- tidy(OER.within)
kable(tbl, digits=5, caption=
"Fixed effects using 'within' with full sample")
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| DER | 0.04559 | 0.01795 | 2.54048 | 0.01713 |
| LCR | -0.08253 | 0.16466 | -0.50122 | 0.62028 |
| SIZE | 0.00000 | 0.00000 | 2.51393 | 0.01821 |
kable(tidy(pFtest(OER.within, OER.pooled)), caption=
"Fixed effects test: Ho:'No fixed effects'")
| df1 | df2 | statistic | p.value | method | alternative |
|---|---|---|---|---|---|
| 9 | 27 | 0.5890368 | 0.7944397 | F test for individual effects | significant effects |
The random effects model elaborates on the fixed effects model by recognizing that, since the individuals in the panel are randomly selected, their characteristics, measured by the intercept β1 should also be random. Thus, the random effects model assumes the form of the intercept where β1 stands for the population average and ui represents an individual-specific random term. As in the case of fixed effects, random effects are also time-invariant.
OER_ReTest <- plmtest(OER.pooled, effect="individual")
kable(tidy(OER_ReTest), caption=
"A random effects test for the OER equation")
| statistic | p.value | method | alternative |
|---|---|---|---|
| -0.9073556 | 0.8178906 | Lagrange Multiplier Test - (Honda) | significant effects |
Random effects estimator are reliable under the assumption that individual characteristics (heterogeneity) are exogenous, that is, they are independent with respect to the regressors in the random effects equation. The same Hausman test for endogeneity we have already used in another chapter can be used here as well, with the null hypothesis that individual random effects are exogenous. The test function phtest() compares the fixed effects and the random effects models; the next code lines estimate the random effects model and performs the Hausman endogeneity test.
OER.random <- plm(OER~DER+LCR+SIZE,
data=mydata_1, random.method="swar",
model="random")
kable(tidy(OER.random), digits=4, caption=
"The random effects results for the OER equation")
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 0.2840 | 0.0655 | 4.3374 | 0.0000 |
| DER | 0.0437 | 0.0134 | 3.2700 | 0.0011 |
| LCR | -0.0497 | 0.0870 | -0.5705 | 0.5684 |
| SIZE | 0.0000 | 0.0000 | 2.8211 | 0.0048 |
kable(tidy(phtest(OER.within, OER.random)), caption=
"Hausman endogeneity test for the random effects OER model")
| statistic | p.value | parameter | method | alternative |
|---|---|---|---|---|
| 0.1358947 | 0.9872066 | 3 | Hausman Test | one model is inconsistent |
The results above shows a high p-value of the test, which indicates that the null hypothesis saying that the individual random effects are exogenous is accepted, which makes the random effects equation consistent. In this case the random effects model is the correct solution.
The dataset grunfeld2 is a subset of the initial dataset; it includes two firms, GE and WE observed over the period 1935 to 1954. The purpose of this example is to identify various issues that should be taken into account when building a panel data econometric model. The problem is to find the determinants of investment by a firm , invit among regressors such as the value of the firm, vit, and capital stock kit. Table below gives a glimpse of the grunfeld panel data.
library(devtools)
install_git("https://github.com/ccolonescu/PoEdata")
data("grunfeld2", package="PoEdata")
head(grunfeld2,10)
grun <- pdata.frame(grunfeld2, index=c("firm","year"))
head(grun,10)
Let us consider a pooling model first, assuming that the coefficients of the regression equation, as well as the error variances are the same for both firms (no individual heterogeneity).
grun.pool <- plm(inv~v+k,
model="pooling",data=grun)
kable(tidy(grun.pool), digits=5, caption=
"Grunfeld dataset, pooling panel data results")
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 17.87200 | 7.02408 | 2.54439 | 0.01525 |
| v | 0.01519 | 0.00620 | 2.45191 | 0.01905 |
| k | 0.14358 | 0.01860 | 7.71890 | 0.00000 |
SSE.pool <- sum(resid(grun.pool)^2)
sigma2.pool <- SSE.pool/(grun.pool$df.residual)
sigma2.pool
[1] 447.6487
grun.pool_1 <- plm(inv~v+k,
model="pooling",data=grun)|>
tidy() |>
kable() |>
kable_classic()
grun.pool_1
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 17.8720011 | 7.0240806 | 2.544390 | 0.0152529 |
| v | 0.0151926 | 0.0061962 | 2.451913 | 0.0190508 |
| k | 0.1435792 | 0.0186010 | 7.718900 | 0.0000000 |
Allowing for different coefficients across firms but same error structure is the fixed effects model summarized in the results below. Note that the fixed effects are modeled using the function factor().
grun.fe <- plm(inv~v*grun$firm+k*grun$firm,
model="pooling",data=grun)
kable(tidy(grun.fe), digits=4, caption=
"Grunfeld dataset, 'pooling' panel data results")
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -9.9563 | 23.6264 | -0.4214 | 0.6761 |
| v | 0.0266 | 0.0117 | 2.2651 | 0.0300 |
| grun\(firm2 </td> <td style="text-align:right;"> 9.4469 </td> <td style="text-align:right;"> 28.8054 </td> <td style="text-align:right;"> 0.3280 </td> <td style="text-align:right;"> 0.7450 </td> </tr> <tr> <td style="text-align:left;"> k </td> <td style="text-align:right;"> 0.1517 </td> <td style="text-align:right;"> 0.0194 </td> <td style="text-align:right;"> 7.8369 </td> <td style="text-align:right;"> 0.0000 </td> </tr> <tr> <td style="text-align:left;"> v:grun\)firm2 | 0.0263 | 0.0344 | 0.7668 | 0.4485 |
| grun$firm2:k | -0.0593 | 0.1169 | -0.5070 | 0.6155 |
SSE.fe <- sum(resid(grun.fe)^2)
sigma2.fe <- SSE.fe/(grun.fe$df.residual)
sigma2.fe
[1] 440.8771
grun.fe_1 <- plm(inv~v*grun$firm+k*grun$firm,
model="pooling",data=grun)|>
tidy() |>
kable() |>
kable_classic()
grun.fe_1
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -9.9563065 | 23.6263647 | -0.4214066 | 0.6761105 |
| v | 0.0265512 | 0.0117220 | 2.2650640 | 0.0299963 |
| grun\(firm2 </td> <td style="text-align:right;"> 9.4469163 </td> <td style="text-align:right;"> 28.8053507 </td> <td style="text-align:right;"> 0.3279570 </td> <td style="text-align:right;"> 0.7449553 </td> </tr> <tr> <td style="text-align:left;"> k </td> <td style="text-align:right;"> 0.1516939 </td> <td style="text-align:right;"> 0.0193564 </td> <td style="text-align:right;"> 7.8368647 </td> <td style="text-align:right;"> 0.0000000 </td> </tr> <tr> <td style="text-align:left;"> v:grun\)firm2 | 0.0263429 | 0.0343527 | 0.7668380 | 0.4484701 |
| grun$firm2:k | -0.0592874 | 0.1169464 | -0.5069618 | 0.6154540 |
A test to see if the coefficients are significantly different between the pooling and fixed effects equations can be done in R using the function pooltest from package plm; to perform this test, the fixed effects model should be estimated with the function pvcm with the argument model= “within”, as the next code lines show.
grun.pvcm <- pvcm(inv~v+k,
model="within", data=grun)
coef(grun.pvcm)
pooltest(grun.pool, grun.pvcm)
F statistic
data: inv ~ v + k
F = 1.1894, df1 = 3, df2 = 34, p-value = 0.3284
alternative hypothesis: unstability
The result shows that the null hypothesis of zero coefficients for the individual dummy terms are zero cannot be rejected. (However, the pvcm function is not equivalent to the fixed effects model that uses individual dummies; it is, though, useful for testing the ‘poolability’ of a dataset.)
Now, if we allow for different coefficients and different error variances, the equations for each individual is independent from those for other individuals and it can be estimated separately.
grun1.pool <- plm(inv~v+k, model="pooling",
subset=grun$firm==1, data=grun)
SSE.pool1<- sum(resid(grun1.pool)^2)
sig2.pool1 <- SSE.pool1/grun1.pool$df.residual
kable(tidy(grun1.pool), digits=4, align='c', caption=
"Pooling astimates for the GE firm (firm=1)")
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -9.9563 | 31.3742 | -0.3173 | 0.7548 |
| v | 0.0266 | 0.0156 | 1.7057 | 0.1063 |
| k | 0.1517 | 0.0257 | 5.9015 | 0.0000 |
grun2.pool <- plm(inv~v+k, model="pooling",
subset=grun$firm==2, data=grun)
SSE.pool2 <- sum(resid(grun2.pool)^2)
sig2.pool2 <- SSE.pool2/grun2.pool$df.residual
kable(tidy(grun2.pool), digits=4, align='c', caption=
"Pooling estimates for the WE firm (firm=2)")
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | -0.5094 | 8.0153 | -0.0636 | 0.9501 |
| v | 0.0529 | 0.0157 | 3.3677 | 0.0037 |
| k | 0.0924 | 0.0561 | 1.6472 | 0.1179 |
Tables above show the results for the equations on subsets of data, separated by firms. A Godfeld-Quandt test can be carried out to determine whether the variances are different among firms, as the next code shows.
gqtest(grun.pool, point=0.5, alternative="two.sided",
order.by=grun$firm)
Goldfeld-Quandt test
data: grun.pool
GQ = 0.13417, df1 = 17, df2 = 17, p-value = 0.000143
alternative hypothesis: variance changes from segment 1 to 2
The result is rejection of the null hypothesis that the variances are equal, indicating that estimating separate equations for each firm is the correct model. What happens when we assume that the only link between the two firms is correlation between their contemporaneous error terms? This is the model of seemingly unrelated regressions, a generalized least squares method.
library(systemfit)
grunf<- grunfeld2
grunf$Firm<-"WE"
for (i in 1:40){
if(grunf$firm[i]==1){grunf$Firm[i] <- "GE"}
}
grunf$firm <- NULL
names(grunf)<- c("inv", "val", "cap", "year", "firm")
grunfpd <- plm.data(grunf, c("firm","year"))
grunf.SUR <- systemfit(inv~val+cap, method="SUR", data=grunfpd)
summary(grunf.SUR, resdCov=FALSE, equations=FALSE)
systemfit results
method: SUR
N DF SSR detRCov OLS-R2 McElroy-R2
system 40 34 15589.7 35640.6 0.698968 0.615103
N DF SSR MSE RMSE R2 Adj R2
GE 20 17 13788.4 811.081 28.4795 0.692557 0.656388
WE 20 17 1801.3 105.959 10.2936 0.740401 0.709860
The covariance matrix of the residuals used for estimation
GE WE
GE 777.446 207.587
WE 207.587 104.308
The covariance matrix of the residuals
GE WE
GE 811.081 224.278
WE 224.278 105.959
The correlations of the residuals
GE WE
GE 1.000000 0.765043
WE 0.765043 1.000000
Coefficients:
Estimate Std. Error t value Pr(>|t|)
GE_(Intercept) -27.7193171 29.3212188 -0.94537 0.3577155
GE_val 0.0383102 0.0144152 2.65763 0.0165755 *
GE_cap 0.1390363 0.0249856 5.56466 3.4234e-05 ***
WE_(Intercept) -1.2519882 7.5452174 -0.16593 0.8701684
WE_val 0.0576298 0.0145463 3.96182 0.0010072 **
WE_cap 0.0639781 0.0530406 1.20621 0.2442559
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
First, please note that the systemfit() function requires a panel data file created with plm.data, instead of the pdata.frame that we have used above; second, for some reason I had to change the names of the variables to names having more than one letter to make the function work. I did this using the function names().