Introduction

In this blog post, we expand on the previous post covering Ordinary Least Squares (OLS) regression, and how it can be used to predict ticket sales, and look at another type of linear regression that takes into account the variability of the errors or residuals of the data – weighted least squares regression.

About Weighted Least Squares Regression

In OLS regression, the sum of the squared differences between the observed and predicted values is minimized in order to find the line of best fit to predict future values. However, when conditions for ordinary least squares regression are not met, for example, when there is presence of outliers or non-constant variance of errors, weights based on the inverse of the variance of each observation can be applied to give more weight to data points that are more accurate and less weight to those that are less accurate, and minimizing the sum the of the weighted squared residuals. The linear equation for weighted least squares regression is the same as that of ordinary least squares regression:

\(y = β_0 + β_1X_1 + B_2X_2 +... + ε\)

Where \(y\) is the dependent variable, \(X\)s are the independent variables, \(β\) are the regression coefficients, and \(ε\) is the error term,

And the weight factor \(w_i\) for each observation \(i\) is defined as:

\(w_i = var(ε_ι)\)

Predicting Ticket Prices Using Weighted Least Squares

Referring back to previous blog post, let’s say a concert promoter has observations with the following variables:

But there are outliers in the data, for example, early in an artist’s career they sold orders of magnitude fewer tickets to their concerts. But we don’t want to dismiss this data, particuarly as it is useful to predict early career ticket sales as part of artist development. So we can apply weights to the data so we can fit a line that minimizes the sum of the squared residuals and make more accurate predictions.

Conclusion

Weighted Least Squares regression is a valuable alternative to OLS that is useful in the presence of outliers or heteroskedacticity in the errors/residuals in order to make more accurate predictions.