Main Differences between cross – sectional regression analysis,
• Type of data used o Cross-sectional: Analyzes many individuals
(people, firms, countries, etc.) at a single point in time. o Time
series: Analyzes one individual or variable over a period of time. o
Panel data: Combines both; analyzes multiple individuals across multiple
time periods.
• Dimension of analysis o Cross-sectional: Has only an individual
dimension (N). o Time series: Has only a time dimension (T). o Panel
data: Has both an individual dimension (N) and a time dimension (T)
simultaneously.
• Type of variability captured o Cross-sectional: Compares
differences between individuals. o Time series: Examines changes over
time. o Panel data: Allows analysis of both differences between
individuals and changes over time simultaneously.
• Common statistical issues o Cross-sectional: May suffer from
heteroskedasticity. o Time series: May suffer from autocorrelation and
non-stationarity problems. o Panel data: May involve fixed or random
effects, as well as autocorrelation and heteroskedasticity.
• Analytical capacity and depth o Cross-sectional: Simpler, but does
not capture time dynamics. o Time series: Allows analysis of trends and
cycles over time. o Panel data: More comprehensive because it better
controls for individual heterogeneity and improves estimator
precision.
Models
We defined a panel data model where real exports depend on industrial
specialization, salaries, population density, and public investment per
capita.
model_formula <- real_exports ~ lq_secondary +
average_daily_salary + pop_density + real_public_investment_pc
We estimated a pooled model that ignores differences between states
and years. Most variables were significant, and the model explained
about 62 percent of exports variation.
pooling <- plm(model_formula, data = panel_data, model = "pooling")
summary(pooling)
## Pooling Model
##
## Call:
## plm(formula = model_formula, data = panel_data, model = "pooling")
##
## Balanced Panel: n = 32, T = 9, N = 288
##
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -4.80e+08 -1.11e+08 4.19e+06 0.00e+00 1.09e+08 4.42e+08
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) -1004258854 83563782 -12.0179 < 2.2e-16 ***
## lq_secondary 455153589 28339329 16.0608 < 2.2e-16 ***
## average_daily_salary 2449813 259179 9.4522 < 2.2e-16 ***
## pop_density -49737 11526 -4.3153 2.205e-05 ***
## real_public_investment_pc -37243 25461 -1.4627 0.1447
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 2.2278e+19
## Residual Sum of Squares: 8.3902e+18
## R-Squared: 0.62339
## Adj. R-Squared: 0.61807
## F-statistic: 117.11 on 4 and 283 DF, p-value: < 2.22e-16
This model controls for differences between states. The variables
were not statistically significant, and the explanatory power was very
low.
fe_ind <- plm(model_formula, data = panel_data, model = "within", effect = "individual")
summary(fe_ind)
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = model_formula, data = panel_data, effect = "individual",
## model = "within")
##
## Balanced Panel: n = 32, T = 9, N = 288
##
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.20e+08 -1.12e+07 -7.81e+04 0.00e+00 8.88e+06 1.51e+08
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## lq_secondary 25034558.1 29583697.1 0.8462 0.3982
## average_daily_salary -17708.3 116039.2 -0.1526 0.8788
## pop_density -21049.3 271104.2 -0.0776 0.9382
## real_public_investment_pc 7340.4 6973.5 1.0526 0.2935
##
## Total Sum of Squares: 3.0639e+17
## Residual Sum of Squares: 3.0385e+17
## R-Squared: 0.0082919
## Adj. R-Squared: -0.12945
## F-statistic: 0.526758 on 4 and 252 DF, p-value: 0.71616
This model controls for differences between years. Most variables
were significant, and this model had the highest R-squared.
fe_time <- plm(model_formula, data = panel_data, model = "within", effect = "time")
summary(fe_time)
## Oneway (time) effect Within Model
##
## Call:
## plm(formula = model_formula, data = panel_data, effect = "time",
## model = "within")
##
## Balanced Panel: n = 32, T = 9, N = 288
##
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.85e+08 -9.29e+07 1.67e+06 0.00e+00 1.12e+08 4.15e+08
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## lq_secondary 435521531 27192797 16.0161 < 2.2e-16 ***
## average_daily_salary 3233824 280159 11.5428 < 2.2e-16 ***
## pop_density -63991 11248 -5.6889 3.268e-08 ***
## real_public_investment_pc -76084 25687 -2.9620 0.003324 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 2.2211e+19
## Residual Sum of Squares: 7.3903e+18
## R-Squared: 0.66727
## Adj. R-Squared: 0.65275
## F-statistic: 137.876 on 4 and 275 DF, p-value: < 2.22e-16
This model controls for both state and year effects. Only one
variable was weakly significant, and the model had low explanatory
power.
fe_tw <- plm(model_formula, data = panel_data, model = "within", effect = "twoways")
summary(fe_tw)
## Twoways effects Within Model
##
## Call:
## plm(formula = model_formula, data = panel_data, effect = "twoways",
## model = "within")
##
## Balanced Panel: n = 32, T = 9, N = 288
##
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.04e+08 -1.43e+07 -1.69e+06 0.00e+00 1.59e+07 1.29e+08
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## lq_secondary 13539301.8 26702971.7 0.5070 0.61259
## average_daily_salary -28770.2 229440.3 -0.1254 0.90032
## pop_density -286037.4 252623.3 -1.1323 0.25863
## real_public_investment_pc 13038.5 6598.8 1.9759 0.04929 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 2.3933e+17
## Residual Sum of Squares: 2.3375e+17
## R-Squared: 0.023305
## Adj. R-Squared: -0.14882
## F-statistic: 1.4555 on 4 and 244 DF, p-value: 0.21642
This model assumes that state differences are not related to the
independent variables. Only one variable was significant, and the fit
was weaker than the time fixed effects model.
re <- plm(model_formula, data = panel_data, model = "random")
summary(re)
## Oneway (individual) effect Random Effect Model
## (Swamy-Arora's transformation)
##
## Call:
## plm(formula = model_formula, data = panel_data, model = "random")
##
## Balanced Panel: n = 32, T = 9, N = 288
##
## Effects:
## var std.dev share
## idiosyncratic 1.206e+15 3.472e+07 0.044
## individual 2.602e+16 1.613e+08 0.956
## theta: 0.9284
##
## Residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -80456012 -18157289 -7849908 0 10480115 203393118
##
## Coefficients:
## Estimate Std. Error z-value Pr(>|z|)
## (Intercept) 139354203.5 56498460.8 2.4665 0.013644 *
## lq_secondary 86780057.3 30093475.3 2.8837 0.003931 **
## average_daily_salary 31246.0 116089.7 0.2692 0.787811
## pop_density -28431.3 29383.7 -0.9676 0.333251
## real_public_investment_pc 7097.1 7531.2 0.9424 0.346009
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 4.1893e+17
## Residual Sum of Squares: 4.0313e+17
## R-Squared: 0.037702
## Adj. R-Squared: 0.024101
## Chisq: 11.0878 on 4 DF, p-value: 0.025595
We tested whether fixed effects are necessary. The results showed
that both state and time effects are important.
##
## F test for individual effects
##
## data: model_formula
## F = 216.34, df1 = 31, df2 = 252, p-value < 2.2e-16
## alternative hypothesis: significant effects
##
## F test for time effects
##
## data: model_formula
## F = 4.6511, df1 = 8, df2 = 275, p-value = 2.384e-05
## alternative hypothesis: significant effects
##
## F test for twoways effects
##
## data: model_formula
## F = 218.31, df1 = 39, df2 = 244, p-value < 2.2e-16
## alternative hypothesis: significant effects
The Hausman test compared fixed and random effects. The result
suggested that fixed effects are more appropriate.
##
## Hausman Test
##
## data: model_formula
## chisq = 729.9, df = 4, p-value < 2.2e-16
## alternative hypothesis: one model is inconsistent
We checked for multicollinearity. All VIF values were low, so there
is no serious multicollinearity problem.
vif(lm(model_formula, data = panel_data))
## lq_secondary average_daily_salary pop_density
## 1.112735 1.398501 1.460673
## real_public_investment_pc
## 1.054705
We found heteroskedasticity and serial correlation in the errors.
This means we should use robust standard errors.
##
## studentized Breusch-Pagan test
##
## data: fe_time
## BP = 108.78, df = 4, p-value < 2.2e-16
##
## Breusch-Godfrey/Wooldridge test for serial correlation in panel models
##
## data: model_formula
## chisq = 187.28, df = 9, p-value < 2.2e-16
## alternative hypothesis: serial correlation in idiosyncratic errors
We re-estimated the time fixed effects model using robust standard
errors to correct for heteroskedasticity and serial correlation. After
this correction, lq_secondary, average_daily_salary, and pop_density
remained statistically significant. However, real_public_investment_pc
was not statistically significant. This confirms that the main results
are still strong even after using robust errors.
coeftest(fe_time, vcov = vcovHC(fe_time, method = "arellano", type = "HC1"))
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## lq_secondary 435521531 100305923 4.3419 1.987e-05 ***
## average_daily_salary 3233824 906833 3.5661 0.0004273 ***
## pop_density -63991 22148 -2.8892 0.0041691 **
## real_public_investment_pc -76084 52580 -1.4470 0.1490316
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Results
library(plm)
models <- list(
Pooling = pooling,
FE_Individual = fe_ind,
FE_TwoWays = fe_tw,
RE = re,
FE_Time = fe_time
)
comparison <- data.frame(
Model = names(models),
R2 = sapply(models, function(m) summary(m)$r.squared["rsq"]),
Adj_R2 = sapply(models, function(m) summary(m)$r.squared["adjrsq"])
)
comparison
## Model R2 Adj_R2
## Pooling.rsq Pooling 0.623389824 0.61806671
## FE_Individual.rsq FE_Individual 0.008291903 -0.12944533
## FE_TwoWays.rsq FE_TwoWays 0.023304595 -0.14881796
## RE.rsq RE 0.037702199 0.02410082
## FE_Time.rsq FE_Time 0.667272482 0.65275346
According to the time fixed effects model with robust standard
errors, the main driver of Mexico’s exports inflows is industrial
specialization (lq_secondary). States with a stronger secondary sector
tend to have significantly higher export levels. This suggests that
manufacturing and industrial concentration play a key role in export
performance.
Average daily salary is also positive and statistically significant.
This may indicate that more productive and developed states, which pay
higher wages, are also more competitive in international markets.
Population density has a negative and significant effect. This could
suggest congestion effects or structural differences in highly populated
states that reduce export performance.
Real public investment per capita was not statistically significant
after using robust errors. Therefore, public investment does not appear
to have a clear direct effect on export inflows in this model.
In summary, the main drivers of exports are industrial specialization
and wage levels, while high population density may represent a
structural challenge for export growth.
Elasticity
his is a log-log model, so the coefficients are elasticities.
Industrial specialization has an elasticity of 2.69 and is highly
significant. A 1 percent increase in industrial concentration increases
exports by about 2.7 percent. This is the main driver of export
inflows.
Average daily salary has an elasticity of 6.74 and is also
significant. Higher wages are associated with higher exports.
Population density and public investment are not statistically
significant.
The model explains about 67 percent of export variation.
elasticity_formula <- log(real_exports) ~ log(lq_secondary) +
log(average_daily_salary) +
log(pop_density) +
log(real_public_investment_pc)
fe_time_elasticity <- plm(
elasticity_formula,
data = panel_data,
model = "within",
effect = "time"
)
summary(fe_time_elasticity)
## Oneway (time) effect Within Model
##
## Call:
## plm(formula = elasticity_formula, data = panel_data, effect = "time",
## model = "within")
##
## Balanced Panel: n = 32, T = 9, N = 288
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -2.378580 -0.747761 -0.019808 0.786587 2.443928
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## log(lq_secondary) 2.688744 0.159687 16.8375 <2e-16 ***
## log(average_daily_salary) 6.744763 0.535786 12.5885 <2e-16 ***
## log(pop_density) -0.048301 0.050077 -0.9645 0.3356
## log(real_public_investment_pc) 0.013008 0.081292 0.1600 0.8730
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 896.68
## Residual Sum of Squares: 296.84
## R-Squared: 0.66895
## Adj. R-Squared: 0.65451
## F-statistic: 138.925 on 4 and 275 DF, p-value: < 2.22e-16
Industrial specialization is the main driver of exports. A 1 percent
increase in industrial concentration increases exports by about 2.7
percent. This shows that manufacturing strength is very important for
Mexico’s export inflows.
Average daily salary also has a strong positive effect. A 1 percent
increase in wages increases exports by about 6.7 percent. This suggests
that more productive and developed states export more.
Population density and public investment are not statistically
significant, so they do not show a clear impact on export inflows in
this model.