Multiple Linear Agression
ABSTRACT
Multiple linear regression (MLR), also known simply as multiple regression, is a statistical technique that uses several explanatory variables to predict the outcome of a response variable. The goal of multiple linear regression is to model the linear relationship between the explanatory (independent) variables and response (dependent) variables. In essence, multiple regression is the extension of ordinary least-squares (OLS) regression because it involves more than one explanatory variable.
Multiple linear regression (MLR), also known simply as multiple regression, is a statistical technique that uses several explanatory variables to predict the outcome of a response variable.
Multiple regression is an extension of linear (OLS) regression that uses just one explanatory variable.
MLR is used extensively in econometrics and financial inference.
INTRODUCTION
Simple linear regression is a function that allows an analyst or statistician to make predictions about one variable based on the information that is known about another variable. Linear regression can only be used when one has two continuous variables—an independent variable and a dependent variable. The independent variable is the parameter that is used to calculate the dependent variable or outcome. A multiple regression model extends to several explanatory variables.
The multiple regression model is based on the following assumptions:
There is a linear relationship between the dependent variables and the independent variables
The independent variables are not too highly correlared with each other
yi observations are selected independently and randomly from the population
Residuals should be normally distributed with a mean of 0 and variance σ
The coefficient of determination (R-squared) is a statistical metric that is used to measure how much of the variation in outcome can be explained by the variation in the independent variables. R2 always increases as more predictors are added to the MLR model, even though the predictors may not be related to the outcome variable.
R2 by itself can’t thus be used to identify which predictors should be included in a model and which should be excluded. R2 can only be between 0 and 1, where 0 indicates that the outcome cannot be predicted by any of the independent variables and 1 indicates that the outcome can be predicted without error from the independent variables.
When interpreting the results of multiple regression, beta coefficients are valid while holding all other variables constant (“all else equal”). The output from a multiple regression can be displayed horizontally as an equation, or vertically in table form.
FORMULAS:
Real Life Applications:
Finance:
Any econometric model that looks at more than one variable may be a multiple. Factor Models compare two or more factors to analyze relationships between variables and the resulting performance. The Fama and French Three-Factor Mod is such a model that expands on the capital asset pricing model (CAPM) by adding size risk and value risk factors to the market risk factor in CAPM (which is itself a regression model). By including these two additional factors, the model adjusts for this outperforming tendency, which is thought to make it a better tool for evaluating manager performance.
Medical:
Medical researchers often use linear regression to understand the relationship between drug dosage and blood pressure of patients.
For example, researchers might administer various dosages of a certain drug to patients and observe how their blood pressure responds. They might fit a simple linear regression model using dosage as the predictor variable and blood pressure as the response variable. The regression model would take the following form:
blood pressure = β0 + β1(dosage)
Problems and Solutions:
Calculate the two regression equations of X on Y and Y on X from the data given below, taking deviations from a actual means of X and Y.
Estimate the likely demand when the price is Rs.20.
Solution:
Calculation of Regression equation
(i) Regression equation of X on Y
(ii) Regression Equation of Y on X
When X is 20, Y will be
= –0.25 (20)+44.25
= –5+44.25
= 39.25 (when the price is Rs. 20, the likely demand is 39.25)
Conclusion:
The main principles of multiple linear regression were presented, followed by implementation from scratch in Python. The framework was applied to a simple example, in which the statistical significance of parameters was verified besides the main assumptions about residuals in linear least-squares problems.
Linear regression is a tool for unearthing previously unrecognized patterns and relationships between variables.
They are useful for making estimates and predictions which can be the basis for decision making.
References:
Bates, D. M. & Watts, D. G., 1988. Nonlinear Regression Analysis and Its Applications. Wiley.
Hastie, T., Tibshirani, R. & Friedman, J. H., 2009. The Elements of Statistical Learning: Data mining, Inference, and Prediction. 2nd ed. New York: Springer.
Montgomery, D. C. & Runger, G., 2003. Applied Statistics and Probability for Engineers. 3rd ed. John Wiley and Sons.
Myers, R. H., Montgomery, D. C., Vining, G. G. & Robinson, T. J., 2012. Generalized linear models: with applications in engineering and the sciences. 2nd ed. Hoboken: John Wiley & Sons.
Shalizi, C., 2021. Advanced Data Analysis from an Elementary Point of View. Cambridge University Press.