1 OLS is a special case of MLE

Ordinary Least Squares (OLS) can be viewed as a special case of Maximum Likelihood Estimation (MLE) under certain assumptions. The relationship between OLS and MLE is particularly evident in the context of the simple linear regression model.

In a simple linear regression model with a normally distributed error term, the OLS estimates of the regression coefficients (slope and intercept) are equivalent to the MLE estimates. The OLS method minimizes the sum of squared differences between the observed and predicted values, and when the error term is normally distributed, this corresponds to maximizing the likelihood function.

Mathematically, the OLS estimates for the slope (\(\beta_1\)​) and intercept (\(\beta_0\)​) in the simple linear regression model \(Y=\beta_0​+\beta_1​ X+ \epsilon\) are obtained by minimizing the sum of squared residuals:

\[ \min_{\beta_0, \beta_1} \sum_{i=1}^{n} (Y_i - (\beta_0 + \beta_1 X_i))^2 \]

The MLE estimates for the same model assume a normal distribution for the error term \(\epsilon\), and the likelihood function to be maximized is proportional to the product of the normal probability density functions.

In summary, in the specific case of simple linear regression with normally distributed errors, OLS and MLE lead to equivalent parameter estimates. However, in more general cases or with different assumptions about the error distribution, OLS and MLE may not be equivalent.

2 OLS vs MLE

OLS (Ordinary Least Squares) regression and ML (Maximum Likelihood) regression are both statistical methods used for estimating the parameters of a regression model. However, they differ in terms of their underlying assumptions and estimation techniques.

1. Assumptions:

- OLS: OLS assumes that the errors (residuals) in the regression model are normally distributed and have constant variance (homoscedasticity). It also assumes that the errors are independent of each other.

- ML: ML does not assume any specific distribution for the errors. It only assumes that the errors are independent and identically distributed (i.i.d.).

2. Estimation Technique:

- OLS: OLS estimates the parameters of the regression model by minimizing the sum of squared residuals. It finds the values of the parameters that minimize the difference between the observed values and the predicted values.

- ML: ML estimates the parameters of the regression model by maximizing the likelihood function. It finds the values of the parameters that maximize the probability of observing the given data.

3. Efficiency:

- OLS: OLS provides unbiased estimates of the parameters, but they may not always be the most efficient (minimum variance) estimates.

- ML: ML provides estimates that are asymptotically efficient, meaning they are the most efficient estimates as the sample size approaches infinity.

4. Hypothesis Testing:

- OLS: OLS allows for straightforward hypothesis testing using t-tests, F-tests, etc., based on assumptions about the distribution of errors.

- ML: ML allows for hypothesis testing using likelihood ratio tests, Wald tests, etc., which are based on the likelihood function.

Again -

  • OLS assumes normality and homoscedasticity of errors, while ML makes fewer assumptions about error distribution.

  • OLS minimizes sum of squared residuals, while ML maximizes likelihood function.

  • OLS provides unbiased estimates, while ML provides asymptotically efficient estimates.

Let’s consider an example where we want to estimate the parameters of a normal distribution using MLE in R.

3 MLE Theory

3.1 Basic Idea

Maximum Likelihood Estimation (MLE) is often applied in the context of cross-sectional data when estimating parameters of a specific probability distribution. You want to maximize the probability of observing your dataset, given the parameters of that model.

3.2 Background

Please watch the videos below to strengthen your understanding of MLE -

  1. Maximum Likelihood, clearly explained!!! (StatQuest with Josh Starmer)

    • Talks about the intuition behind MLE for normal distribution.
  2. Maximum Likelihood For the Normal Distribution, step-by-step!!!

    • Talks about difference between probability and likelihood, and shows the MLE math proof for normal distribution

4 Implementation

# Clear the workspace
  rm(list = ls()) # Clear environment
  gc()            # Clear memory
##           used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells  551624 29.5    1225948 65.5         NA   700240 37.4
## Vcells 1024492  7.9    8388608 64.0      49152  1963456 15.0
  cat("\f")       # Clear the console
# Load necessary libraries
library(stats)

# Set a random seed for reproducibility
set.seed(123)

4.1 Collect the data generated from the underlying population distribution

We will have to create our data (that we observe) from a population distribution (normal here) with unknown parameters ($ N(\mu = 5, \sigma=2)$).

For the sake of creating the data we collect, I assume we have random sample draws from this normal distribution (with mean 5 and standard deviation 2). When we collect the data, we do not know what are these parameters and the Econometrcian has to estimate these.

# Generate synthetic normal distributed cross-sectional data
n           <- 100  # Number of observations
mu_true     <- 5    # True mean
sigma_true  <- 2    # True standard deviation

# Generate normal distributed data
data <- rnorm(n     = n, 
              mean  = mu_true, 
              sd    = sigma_true
              )

4.2 MLE function

Now that we have the data, lets code MLE function -

# Log-likelihood function for normal distribution

log_likelihood_normal <- function(parameters, data) {
  mu     <- parameters[1]
  sigma  <- parameters[2]
  
  # Calculate log-likelihood
  log_lik <- sum(dnorm(x    = data, 
                       mean = mu, 
                       sd   = sigma, 
                       log  = TRUE
                       )
                 )
  
  return(-log_lik)  # Minimize negative log-likelihood
}

# Use optim to find MLE
?optim
mle_result_normal <- optim(par  = c(0, 1),               #Initial values for the parameters to be optimized over.
                           fn   = log_likelihood_normal, # A function to be minimized (or maximized), with first argument the vector of parameters over which minimization is to take place. It should return a scalar result.
                           data = data
                           )


# Store the estimated parameters in the scalar
mu_hat    <- mle_result_normal$par[1]
sigma_hat <- mle_result_normal$par[2]

4.2.1 Explanation:

  1. We generate synthetic cross-sectional data from a normal distribution with known mean (\(\mu\)) and standard deviation (\(\sigma\)).

  2. We define the log-likelihood function for a normal distribution. The likelihood function calculates the probability of observing the given data under a specific set of parameters.

  3. We use the optim function to find the values of mean and standard deviation that maximize the log-likelihood, which is equivalent to finding the MLE.

  4. Finally, we print the true parameter values and the estimated MLE (below).

mu_true      # actual mean
## [1] 5
  mu_hat     # estimated mean is close 
## [1] 5.180711
sigma_true   # actual sd 
## [1] 2
  sigma_hat  # estimated sd is close 
## [1] 1.816117

As we can see, if we assume the correct distribution, we get distribution parameter estimates that are close to the true (unknown) distribution parameters.

This example is for illustration purposes and uses a normal distribution. In practice, the choice of distribution and the form of the likelihood function will depend on the specific characteristics of your data and the assumptions of your statistical model.

5 MLE vs GMM

GMM estimation was formalized by Hansen (1982), and since has become one of the most widely used methods of estimation for models in economics and finance. Unlike maximum likelihood estimation (MLE), GMM does not require complete knowledge of the distribution of the data.

Only specified moments derived from an underlying model are needed for GMM estimation. In some cases in which the distribution of the data is known, MLE can be computationally very burdensome whereas GMM can be computationally very easy.

5.0.1 Key questions to answer when deciding between MLE and GMM

  1. How much data is available for the estimation? Large data samples will make GMM relatively more attractive than MLE because of the nice large sample properties of GMM and fewer required assumptions on the model

  2. How complex is the model? Linear models or quadratic models are much easier to do using MLE than are more highly nonlinear models. Rational expectations models (macroeconomics) create an even more difficult level of nonlinearity that pushes you toward GMM estimation.

  3. How comfortable are you making strong distributional assumptions? MLE requires a complete specification of all distributional assumptions of the model DGP. If you think these assumptions are too strong, you should use GMM.

6 References

  1. Popal, Ahsanullah. (2023). Re: What are the basic differences between OLS and Maximum Likelihood method?. Retrieved from: https://www.researchgate.net/post/What_are_the_basic_differences_between_OLS_and_Maximum_Likelihood_method/652bb7d15f970987d60e18b2/citation/download.

  2. Chapter 2 Linear Regression by OLS and MLE

    https://bookdown.org/mrwhalen/abracadabrabook/linear-regression-by-ols-and-mle.html

  3. GMM vs MLE Comparison: Strengths and Weaknesses

    https://notes.quantecon.org/submission/5b3b1856b9eab00015b89f90

  4. GMM

    https://faculty.washington.edu/ezivot/econ583/gmm.pdf