do these for key variables– 1. Economic magnitute inde 2. direction (positive) 3. Expected effect (negative NOTE!!!) 4. statistic significant Beta1/(standard Error bata1) NULL!!! NULL effect
#Set up environment
#Clear workspacerm(list =ls())gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 578798 31.0 1323511 70.7 660394 35.3
Vcells 1049087 8.1 8388608 64.0 1769558 13.6
# Clear the consolecat("\f")
# Clear all graphsgraphics.off() # Identify componentsentity_component <-"firm"time_component <-"year"
1. Please choose any panel data (can check out inbuiltLinks to an external site. R datasets or from the AER packageLinks to an external site.), and show/tell if your data is balanced or not. What is the time component and the entity component in the data?
We want to check if the dataset is balanced and identify the time and entity components
library(car)
Warning: package 'car' was built under R version 4.3.3
Loading required package: carData
Warning: package 'carData' was built under R version 4.3.3
library(AER)
Warning: package 'AER' was built under R version 4.3.3
Loading required package: lmtest
Warning: package 'lmtest' was built under R version 4.3.3
Loading required package: zoo
Attaching package: 'zoo'
The following objects are masked from 'package:base':
as.Date, as.Date.numeric
Loading required package: sandwich
Warning: package 'sandwich' was built under R version 4.3.3
Loading required package: survival
library(plm)
Warning: package 'plm' was built under R version 4.3.3
library(stargazer)
Please cite as:
Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(dplyr)
Warning: package 'dplyr' was built under R version 4.3.3
Attaching package: 'dplyr'
The following objects are masked from 'package:plm':
between, lag, lead
The following object is masked from 'package:car':
recode
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(psych)
Warning: package 'psych' was built under R version 4.3.3
Attaching package: 'psych'
The following object is masked from 'package:car':
logit
library(naniar)
Warning: package 'naniar' was built under R version 4.3.3
library(visdat)
Warning: package 'visdat' was built under R version 4.3.3
library(VIM)
Warning: package 'VIM' was built under R version 4.3.3
Loading required package: colorspace
Loading required package: grid
VIM is ready to use.
Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues
Attaching package: 'VIM'
The following object is masked from 'package:datasets':
sleep
library(DataExplorer)
Warning: package 'DataExplorer' was built under R version 4.3.3
library(magrittr)
Warning: package 'magrittr' was built under R version 4.3.3
library(pisaRT)
Warning: package 'pisaRT' was built under R version 4.3.3
============================================
Statistic N Mean St. Dev. Min Max
--------------------------------------------
firm 200 5.5 2.9 1 10
year 200 1,944.5 5.8 1,935 1,954
inv 200 146.0 216.9 0.9 1,486.7
value 200 1,081.7 1,314.5 58.1 6,241.7
capital 200 276.0 301.1 0.8 2,226.3
--------------------------------------------
Balance Check
# Check if the dataset is balancedcounts <- Grunfeld %>%group_by(firm) %>%summarise(n_time =n_distinct(year))is_balanced <-all(counts$n_time == counts$n_time[1])cat("Is the dataset balanced? T/F:", is_balanced, "\n")
Is the dataset balanced? T/F: TRUE
Balanced Panel: The dataset is considered balanced if each entity (in this case, firms) has the same number of observations for each time period (year).
Time and Entity Components: For the Grunfeld dataset, the time component is year, and the entity component is firm.
2. Type out meaningful estimating equation and run the OLS regression/estimate the coefficients.
Do the estimated coefficients make sense (direction, magnitude, statistical significance)? Could there be omitted variable bias that could potentially be reduced by throwing in fixed effects?
Direction: The coefficient for value is 0.141, indicating a positive relationship with investment. Magnitude: This means for every unit increase in value, investment increases by approximately 0.141 units. Statistical Significance: The p-value is < 2e-16, which is highly significant, confirming that value is a strong predictor of investment. Omitted Variable Bias: There could be unobserved factors influencing both investment and value, such as firm characteristics or economic conditions. Including fixed effects could help mitigate this bias.
Summary
The OLS results suggest that higher value is associated with increased investment, with a significant and positive effect. However, potential omitted variable bias indicates that fixed effects modeling might provide more accurate estimates.
3. Now, run a fixed effects model (there are three different ways to do so). Type out the estimating equation (pay attention to the subscript).
Do your coefficients change? Why or why not? Tell us what the fixed effects controlling for (time-invariant characteristics of the entity, or time-varying characteristics affecting all entities, or both - based on your specification)? It is common to include both time and entity fixed effects in many applications in Economics. Do you get the same coefficient if you specify the Fixed Effect in an alternative way? Show (or at least argue).
# Convert to panel data framepdata <-pdata.frame(Grunfeld, index =c("firm", "year"))# Run fixed effects modelfe_model <-plm(inv ~ value + capital, data = pdata, model ="within")summary(fe_model)
Oneway (individual) effect Within Model
Call:
plm(formula = inv ~ value + capital, data = pdata, model = "within")
Balanced Panel: n = 10, T = 20, N = 200
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-184.00857 -17.64316 0.56337 19.19222 250.70974
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
value 0.110124 0.011857 9.2879 < 2.2e-16 ***
capital 0.310065 0.017355 17.8666 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Total Sum of Squares: 2244400
Residual Sum of Squares: 523480
R-Squared: 0.76676
Adj. R-Squared: 0.75311
F-statistic: 309.014 on 2 and 188 DF, p-value: < 2.22e-16
# Fixed Effects with dummy variables for each firmfe_model_dummy <-lm(inv ~ value + capital +factor(firm), data = Grunfeld)summary(fe_model_dummy)
Call:
lm(formula = inv ~ value + capital + factor(firm), data = Grunfeld)
Residuals:
Min 1Q Median 3Q Max
-184.009 -17.643 0.563 19.192 250.710
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -70.29672 49.70796 -1.414 0.159
value 0.11012 0.01186 9.288 < 2e-16 ***
capital 0.31007 0.01735 17.867 < 2e-16 ***
factor(firm)2 172.20253 31.16126 5.526 1.08e-07 ***
factor(firm)3 -165.27512 31.77556 -5.201 5.14e-07 ***
factor(firm)4 42.48742 43.90988 0.968 0.334
factor(firm)5 -44.32010 50.49226 -0.878 0.381
factor(firm)6 47.13542 46.81068 1.007 0.315
factor(firm)7 3.74324 50.56493 0.074 0.941
factor(firm)8 12.75106 44.05263 0.289 0.773
factor(firm)9 -16.92555 48.45327 -0.349 0.727
factor(firm)10 63.72887 50.33023 1.266 0.207
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 52.77 on 188 degrees of freedom
Multiple R-squared: 0.9441, Adjusted R-squared: 0.9408
F-statistic: 288.5 on 11 and 188 DF, p-value: < 2.2e-16
# Demean the dataGrunfeld_demeaned <- Grunfeld %>%group_by(firm) %>%mutate(inv_demeaned = inv -mean(inv),value_demeaned = value -mean(value),capital_demeaned = capital -mean(capital))# Run regression on demeaned datafe_model_demeaned <-lm(inv_demeaned ~ value_demeaned + capital_demeaned, data = Grunfeld_demeaned)summary(fe_model_demeaned)
Call:
lm(formula = inv_demeaned ~ value_demeaned + capital_demeaned,
data = Grunfeld_demeaned)
Residuals:
Min 1Q Median 3Q Max
-184.009 -17.643 0.563 19.192 250.710
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.541e-15 3.645e+00 0.000 1
value_demeaned 1.101e-01 1.158e-02 9.508 <2e-16 ***
capital_demeaned 3.101e-01 1.695e-02 18.289 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 51.55 on 197 degrees of freedom
Multiple R-squared: 0.7668, Adjusted R-squared: 0.7644
F-statistic: 323.8 on 2 and 197 DF, p-value: < 2.2e-16
# Comparison of the fixed effects modelsstargazer(fe_model, fe_model_dummy, fe_model_demeaned, type ="text", title ="Comparison of Fixed Effects Models",column.labels =c("Fixed Effects (plm)", "Fixed Effects (Dummy)", "Demeaned Fixed Effects"),dep.var.labels ="Investment")
Direction: Both coefficients are positive, reflecting a positive relationship with investment. Expected Effect: We expect a positive effect on investment from both value and capital. Statistical Significance: Both coefficients are statistically significant (p < 0.001), indicating meaningful impacts on investment. Coefficient Changes: Coefficients remain consistent across fixed effects and OLS models but may slightly differ due to unobserved characteristics being controlled. Fixed Effects Control: Fixed effects control for time-invariant characteristics of firms, addressing omitted variable bias. Comparison of Models: Fixed effects models yield similar coefficients to OLS but provide more reliable estimates by accounting for firm-specific factors.
The coefficients change when moving from OLS to fixed effects models due to the latter controlling for unobserved, time-invariant characteristics of the entities, which can lead to omitted variable bias in OLS. Fixed effects effectively isolate the influence of time-varying predictors on the dependent variable by removing the impact of these constant factors. While different specifications of fixed effects (like using dummy variables versus the plm package) may yield slightly different coefficients, the underlying relationships they represent should remain consistent. Thus, fixed effects models enhance the robustness of the analysis by accounting for unobserved heterogeneity.