Linear Regression

Edward Correa
11/5/2015

Definition

Linear regression is a statistical analysis approach to modelling the relationship between a dependent variable and single or multiple explanatory variables. Explanatory variables are essentially independent variables; however, an explanatory variable is preferred by some researchers when quantities may not be entirely statistically independent.

Components of a Regression

Dependent Variable

A dependent variable, is a variable which can be effected by an independent variable. So for instance, a researcher wishes to find out what effect listening to classical music has on maths ability during an examination. The dependent variable would be the test outcome, i.e. the number of marks achieved – the outcome of which can be effected by the independent variable (playing/not playing classical music during the examination).

How does it work?

A linear regression model works on the assumption that the relationships between variables is a straight line relationship. While in contrast, non-linear regression models assume that variable relationships have a curved line relationship.

How should it look like?

alt text

Interpreting Linear regression summary in R

alt text

What is the P value?

The p-value for each term tests the null hypothesis that the coefficient is equal to zero (no effect). A low p-value (< 0.05) indicates that you can reject the null hypothesis. In other words, a predictor that has a low p-value is likely to be a meaningful addition to your model because changes in the predictor's value are related to changes in the response variable.

Conversely, a larger (insignificant) p-value suggests that changes in the predictor are not associated with changes in the response

Interpreting regression coeficients

Regression coefficients represent the mean change in the response variable for one unit of change in the predictor variable while holding other predictors in the model constant. This statistical control that regression provides is important because it isolates the role of one variable from all of the others in the model.

STD ERROR

The standard error of the mean permits the researcher to construct a confidence interval in which the population mean is likely to fall. The formula, (1-P) (most often P < 0.05) is the probability that the population mean will fall in the calculated interval (usually 95%

Thank yOU