Remember:

Assumptions

!If these assumptions are met, the Gauss-Mark theorem defines the OLS regression as Best Linear Unbiased Estimator (BLUE)

Ordinary Least Squares method (OLS)

Additional requirements

    1. Dependent variable is numeric; explanatory variables are either numeric or dummy
    1. Each explanatory variable varies, and it’s desirable, that these variables posses the widest possible range of variability
    1. Outliers and units with big impact are excluded among standardized residuals
    • Remember: Outlier is a value of dependent variable with a big residual. Unit with big impact changes the slope a lot.
    • Violation: drop the violating units.Re-run the OLS regression, if any observations were excluded. Regarding the droping process: drop it either by ID or by f.e. Cook’s distance greater than… .
    1. No overly strong multicolinearity
    • Violation: Inflated standard errors can result in impossibility of explanation capability of model.
    • Solution: drop the variable

Analysis

    1. Run the summary of the regression and get an output
    1. Check the p-value.
    • The corresponding test is called ‘test of partial regression coefficient’
    • Interpret the results
    1. Check the estimates in the middle.
    • Complete the (estimated) regression function with its help.
    • Interpret the function.
    1. Pay attention on the coefficient of determination (R-squared) below.
  • If we have only one explanatory variable, it’s simple R-squared. If two or more, it is multiple R-squared. Its value varies between 0 and +1. It shows how many percents of variability of the dependent variable is explained by linear effect of the explanatory variables. Interpret the value.
    1. Pay attention to the last line of the output with the F-Test.
    • It is the test of significance of the regression model (‘roh^2’).

*roh^2 is the population coefficient of determination

! Regarding two last points: Everything is reasonable. If the coefficient of determination is quite high, it explains much of variability of the dependent variable. And then roh^2 > 0 (H1), which means the model is so to speak good and appropriate. On the other hand, if the coefficient of determination is low, then roh^2 = 0 and the model barely explains anything, so it isn’t significant. R^2 stands for the sample. Meanwhile roh^2 refers to the population. But it’s important, that in case of roh^2 we cannot tell which proportion of the variable’s variability can be explained by the model, we only say that roh^2 is positive.

If different models are to compare

Post-analysis

    1. Look at multiple (because it is between all explanatory variables) correlation (squared root from R-squared) coefficient! We never talk about positive or negative correlation, when estimating multiple correlation.
    • It tells how strong is relationship between all exp. variables and dependent one.
    1. Check standardized partial regression coefficients (lm.beta()). We need to standardize them, because something is proportion, something is percentage.
    • The higher the absolute value, the bigger is the impact of the variable

2nd regression analysis

  • Run full OLS regression and summary of it.

Part of comparison of both models and conclusion

  • Conduct the ANOVA test

When making conclusions:

  • Don’t forget to mention ‘on average’ and ‘assuming all other variables remain unchanged’ (ceteris paribus condition)
  • If the units of explanatory variable are defined as %, then in explanation say about increase for 1 Pp
  • When the explanatory variable is dichotomous, don’t forget to mention ‘in comparison to {another category}’
  • Additionally, when the explanatory variable is dichotomous, the ceteris paribus part is as follows: given the value(s) of {all other explanatory variables}
  • lin-log model: assuming all other variables remain unchanged, if {explanatory variable} increases by 1%, the {dependent variable} increases on average by ‘b’:100
  • coefficient of determination R^2: if it’s needed to name the influencing explanatory variable, we mention them in the form they influence. And so, it also relates to f.e. ‘natural logarithm of {explanatory variable}’
  • Interactions:
    • num.exp.var. alone: assuming all other variables remain unchanged, if the {num.exp.var.} for {ref. cat. of dummy var.} increases by 1, the {dep.var.} on average increases by ‘b’
    • Interaction of num.exp.var. and category of dummy var.: assuming all other variables remain unchanged, if the num.exp.var. for {category of dummy var.} increases by 1, the {dep.var.} on average increases by ‘b’ MORE, than for {ref.cat}