| Method | Process | Manual_Calculation | Assumptions | Advantages | Disadvantages | Parameter_Testing |
|---|---|---|---|---|---|---|
| OLS | Minimizes the sum of squared residuals to estimate parameters. |
|
Linearity, no multicollinearity, homoscedasticity, uncorrelated errors, normally distributed errors (for inference). | Simple, easy to interpret; BLUE under Gauss-Markov assumptions. | Sensitive to outliers and multicollinearity; inefficient under heteroscedasticity or autocorrelation. |
Use regress. Coefficients interpreted as unit change
effects. Test with t-tests and assess significance with p-values.
|
| GLS | Adjusts for heteroscedasticity/autocorrelation by transforming the data or using weights. |
|
Correctly specified structure of heteroscedasticity or autocorrelation. | Efficient under heteroscedasticity or autocorrelation. | Requires knowledge/estimation of error structure; bias if misspecified. |
Use prais or xtgls. Test similar to OLS but
interpret results within the transformed context.
|
| 2SLS | Stage 1: Regress endogenous variables on instruments. Stage 2: Use fitted values in the main regression. |
|
Instruments must be relevant (correlated with endogenous variables) and exogenous (uncorrelated with errors). | Corrects endogeneity bias; consistent under valid instruments. | Sensitive to weak instruments; relies on instrument validity. |
Use ivregress 2sls. Test instrument relevance with
first-stage F-statistics and use Hansen J-test for overidentification.
|
| GMM | Minimizes weighted sum of squared moment conditions derived from data and model. |
|
Valid moment conditions; correct specification of weighting matrix. | Flexible under heteroscedasticity or autocorrelation; handles overidentified models. | Computationally intensive; sensitive to weight matrix choice. |
Use gmm or xtabond2. Test overidentifying
restrictions with Hansen J-test. Interpret coefficients based on moment
conditions.
|
| GME | Maximizes entropy subject to constraints (data and prior information). |
|
Requires carefully chosen constraints; small or ill-posed data. | Handles multicollinearity and small samples; incorporates prior information. | Relatively uncommon; computationally demanding; interpretation depends on entropy weights. | Limited support in Stata. Often requires external packages or manual programming. Coefficients depend on entropy constraints and weights. |
| MLE | Maximizes the likelihood function of the data given the model. |
|
Correct specification of likelihood function; errors are i.i.d. | Asymptotically efficient and consistent; flexible for non-linear models. | Sensitive to misspecified likelihood; computationally intensive. |
Use ml, logit, probit. Interpret
likelihood values and use LR tests for model comparison. Interpret
coefficients based on likelihood estimation.
|
| Problem | Meaning | Consequences | Solution |
|---|---|---|---|
| Endogeneity | Occurs when an explanatory variable is correlated with the error term, often due to reverse causality, omitted variables, or measurement error. | Biased and inconsistent coefficient estimates; incorrect inference and policy recommendations. | Use instrumental variables (IV) or two-stage least squares (2SLS); include omitted variables; improve data quality. |
| Multicollinearity | Occurs when two or more independent variables are highly correlated, making it hard to estimate their individual effects. | Inflated standard errors, leading to low statistical significance and difficulty in determining the effect of each variable. | Center variables, drop one variable, or use ridge regression or principal component analysis (PCA). |
| Omitted Variable Bias | Happens when a relevant variable is excluded from the model, causing biased and inconsistent estimates. | Biased coefficient estimates; results cannot reliably reflect the true relationship between variables. | Include the omitted variable if data is available; use proxy variables; apply sensitivity analysis. |
| Heteroskedasticity | Occurs when the variance of the error term is not constant across observations. | Inefficient estimates, invalid hypothesis tests, and incorrect standard errors. | Use robust standard errors (e.g., White’s robust estimator) or generalized least squares (GLS). |
| Autocorrelation | Happens when error terms are correlated across observations, often in time-series data. | Biased standard errors, leading to invalid hypothesis tests and inefficient estimates. | Use Newey-West standard errors; model the autocorrelation structure (e.g., ARMA or Prais-Winsten regression). |
| Measurement Error | Occurs when the observed variables contain measurement errors, leading to biased and inconsistent parameter estimates. | Bias and inconsistency in parameter estimates; loss of reliability in results. | Use methods like instrumental variables (IV) to address measurement error; improve data collection methods. |
| Non-Linearity | Occurs when the relationship between the dependent and independent variables is not linear, violating the linearity assumption. | Incorrect model specification leads to biased estimates and poor predictive accuracy. | Apply non-linear models such as polynomial regression, log-transformation, or generalized additive models (GAM). |
| Problem | Test | Intuition_Process | Stata_Command | Statistic_and_Interpretation |
|---|---|---|---|---|
| Endogeneity | Durbin-Wu-Hausman Test | Compares the consistency of OLS and IV estimates. If IV estimates differ significantly from OLS, endogeneity is likely present. |
ivregress with Hausman test: hausman
|
The test returns a chi-square statistic: - Null: OLS is consistent. - Rejecting the null suggests endogeneity. Look at p-values for significance. |
| Multicollinearity | Variance Inflation Factor (VIF) | Checks if independent variables are highly correlated. A high VIF indicates multicollinearity. |
estat vif after regression
|
VIF > 10 indicates high multicollinearity. Analyze the
VIF values for each independent variable.
|
| Omitted Variable Bias | No direct test, but look for model misfit and theoretical relevance. | Omitted variable bias cannot be directly tested but can be suspected when model fit is poor, residuals are large, or theoretical relationships are overlooked. | No specific command; examine model fit and theoretical relevance. | No direct statistic; look for patterns in residual plots, model misfit, or theoretical gaps. |
| Heteroskedasticity | Breusch-Pagan Test, White Test | Detects non-constant variance in the residuals. Breusch-Pagan tests variance as a function of independent variables; White’s test checks for heteroskedasticity without specifying a form. |
estat hettest for Breusch-Pagan;
estat imtest, white for White test
|
Breusch-Pagan: High chi-square values suggest heteroskedasticity. White: Similar chi-square interpretation, robust to forms of heteroskedasticity. |
| Autocorrelation | Durbin-Watson Test, Breusch-Godfrey LM Test | Tests whether error terms are serially correlated. Durbin-Watson focuses on adjacent residuals; Breusch-Godfrey handles higher-order autocorrelation. |
estat dwatson; estat bgodfrey
|
Durbin-Watson statistic near 2 suggests no autocorrelation: - <2 suggests positive autocorrelation. - >2 suggests negative autocorrelation. Breusch-Godfrey returns a chi-square statistic; p-values indicate significance. |
| Measurement Error | No direct test; look for inconsistent results or discrepancies in estimates. | Measurement error tests are often qualitative; look for issues in data collection or unexpected inconsistencies in results. | No specific command; address by improving data quality or using IV methods. | No formal statistic. Look for bias and inconsistencies in coefficients across models. |
| Non-Linearity | Ramsey RESET Test | Checks whether higher-order terms improve the fit of the model. Ramsey RESET uses powers of fitted values to test for specification errors. |
estat ovtest for Ramsey RESET test
|
RESET: High F-statistic suggests non-linearity or omitted variable issues. Check the p-value. |
| Method | Estimator | Variance | Key_Assumptions |
|---|---|---|---|
| MLE | \(\hat{\mu}_{MLE} = \frac{1}{\bar{y}}\) | \(\frac{\mu^2}{n}\) | Correctly specified likelihood function |
| MME | \(\hat{\mu}_{MME} = \frac{1}{\bar{y}}\) | \(\frac{\mu^2}{n}\) | Validity of the moment condition |
| GMM | \(\hat{\mu}_{GMM} = \arg \min_\mu Q(\mu)\) | \((m_\mu' W_n^{-1} m_\mu)^{-1}\) | Correctly specified moment conditions |