Linear Regression

Rasim Muzaffer Musal

Goals

  • Recall Goals of Regression
  • Intuition Behind Regression
  • Mathematical Foundation for Simple Linear Regression
  • Fit a Simple Linear Regression
  • Interpret Simple Linear Regression Output
  • Simple Linear Regression Assumptions
  • Mathematical Foundation for Multiple Linear Regression
  • Interpret Multiple Linear Regression Output
  • Multiple Linear Regression Assumptions

Some terminology

  • X,Y are random variables.
  • X is used to represent a possible causal factor.
  • Y is used to represent a r.v. whose values we would like to know about.
  • E(Y) is expected/predicted value of Y
  • \(E(Y \lvert X)\) is expected value of Y given values of X.
  • \(E(Y_{i} \lvert X_{i})\) is expected value of Y given values of X for observation \(i\).

Some terminology

  • Below: \(\mathbf{X}\) indicates a matrix composed of n rows (observations) and K+1 columns where K is the number of possible causal variables. We will discuss the +1 later below.
  • \(\mathbf{Y}\) represents a column vector of length n.
  • \(\mathbf{X}'\) represents the transpose of matrix X, creating a matrix of K+1 rows and n columns.
  • In general A matrix \(\mathbf{H}\) with dimensions K+1 rows and n columns multiplied with vector Y that has n rows and 1 column will lead to a vector with K+1 rows and 1 column.

Some terminology

  • The inverse of a square matrix is another square matrix such that when multiplied with the original matrix will create an identity matrix.

  • An identity matrix is a square matrix where off diagonal elements are 0s and the diagonal elements are all 1.

  • These properties are used to find solutions to equations with unknowns.

  • \((\mathbf{X'X})^{-1}\) is the inverse of the matrix created by multiplying the transpose of matrix \(\mathbf{X}\) with \(\mathbf{X}\) leading to a square matrix of (K+1) dimensions.

Motivation for regression

  • Does X cause (statistical) Y?
  • If I increase X by 1 unit how much doI expect Y to change by?
  • What do I expect Y to be if X is fixed to a particular value?
  • If applying Multiple Regression:
    • When I have \(X_{1}\) in the model does \(X_{2}\) still have an effect on \(Y\).
    • What is the expected value of Y when I have \(X_{1},\cdots,X_{K}\) in the model.

Intuition of Regression

  • Think two sets of coordinates (data) on the X,Y Euclidean space.
  • \((X_{1}=1, Y_{1}=3);(X_{2}=4,Y_{2}=6)\)
  • When X was 1, the Y value was 3 and when X was 4, the Y value was 6.
  • If you do not have any context about what X and Y are and
  • you observe X to be 1, you will declare Y to be 3.
  • you observe X to be 4, you will declare Y to be 6.
  • If these are the only coordinates you observed and will observe you will be correct.

Intution of Regression

  • There is only a unique line that can fit through any two points on the Euclidean space.

  • This imaginary line, fitted through these two points can be conceptualized as a line of extrapolation.

  • An extrapolation line will give you a predicted Y value for any hypothetical X value.

Two points on the Euclidean Space

Only one possible line through

Solving n unknowns with n eqns.

  • To find \(\beta_{0},\beta_{1}\) or \(b,m\)
  • Write the general equation of the geometric line first.
  • Write the two coordinates as (1) and (2) within the geometric line equation.
\[\begin{aligned} Y = m \times X + b\\ (1)\quad 3 = m \times 1 + b\\ (2)\quad 6 = m \times 4 + b\\ \end{aligned}\]
  • Multiply (1) with 4 to equate the number of m values with (2).

Solving n unknowns with n eqns.

\[\begin{aligned} 4 \times 3 = 4 \times m \times 1 + 4 \times b\\ \end{aligned}\]
  • Subtract from this new equation (2)
\[\begin{aligned} 12 - 6 = 4\times m - 4 \times m + 4 \times b - b \\ 6 = 0 \times m + 4 \times b - b\\ \end{aligned}\]
  • Solve for b and m.

Solving n unknowns with n eqns.

\[\begin{aligned} 6 = 3 b \\ 2 = b\\ 3 = m \times 1 +2 \\ 3-2=1=m\\ Y = 1 \times X + 2 \end{aligned}\]
  • You can check whether the equation we have identified as \(Y=X+2\) is correct by substituting values to (1) and (2).

  • This extrapolation line works perfectly only if the relationship between X and Y is deterministic.

Solving Linear Equations with Matrix Algebra.

\[ \mathbf{Y}=\mathbf{X}\mathbf{B} \] - Where \(\mathbf{Y}\) is an \(n\) by 1 column vector, \(\mathbf{X}\) is an \(n\) by \(n\) square matrix and\(\mathbf{B}\) is an \(n\) by 1 column vector.

  • Left hand side of both sides of the equality will be multiplied by the inverse of the square matrix \(\mathbf{X}\).
  • This operation will create and \(n\) rows \(n\) columns identity matrix on the right hand side of the equality. \[ \mathbf{X}^{-1} \mathbf{Y}=\mathbf{I}\mathbf{B} = \mathbf{B} \]
  • since \(\mathbf{X}^{-1} %*% \mathbf{X}^{-1}=I\)

Solving Linear Equations with Matrix Algebra.

\[\begin{aligned} &\text{Write the equations in matrix multiplication form}\\ &\begin{bmatrix} 3 \\ 6 \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 1 & 4 \end{bmatrix} \begin{bmatrix} b_{0} \\ b_{1} \end{bmatrix}\\ &\text{Multiply lhs of both sides of equality with square matrix inverse}\\ & \begin{bmatrix} 1 & 1 \\ 1 & 4 \end{bmatrix}^{-1} \begin{bmatrix} 3 \\ 6 \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 1 & 4 \end{bmatrix}^{-1} \begin{bmatrix} 1 & 1 \\ 1 & 4 \end{bmatrix} \begin{bmatrix} b_{0} \\ b_{1} \end{bmatrix}\\ & \begin{bmatrix} 1 & 1 \\ 1 & 4 \end{bmatrix}^{-1} \begin{bmatrix} 3 \\ 6 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} b_{0} \\ b_{1} \end{bmatrix}= \begin{bmatrix} b_{0} \\ b_{1} \end{bmatrix}=\begin{bmatrix} 2 \\ 1 \end{bmatrix} \end{aligned}\]

Solving Linear Equations with Matrix Algebra in R.

#Set up the values of the X, known values.
X=matrix(nrow=2,ncol=2,c(1,1,1,4))
Y=matrix(nrow=2,ncol=1,c(3,6))
# first value is b_0, second value b_1
solve(X)%*%Y
     [,1]
[1,]    2
[2,]    1
#Or simply
solve(X,Y)
     [,1]
[1,]    2
[2,]    1

Curious Cats

  • Why do we have 1 s on the first column of X?

  • What would be the consequences if there were more rows on X but no additional unknowns in the \(\mathbf{\beta}\) matrix? Is this realistic?

Solving a set of linear equations vs Regression

  • What we do when we have n set of equations with n unknowns will be different from what we do when we have n set of equations (observations)

  • As we will discuss on the next set of slides regression requires us to think about a situation where we would like to know and outcome Y when we know about some sets of variables \((X_{1},\ldots,X_{K})\). Not only will we not know everything that affects Y but we are not going to know the structural form of how X is going to effect Y. We will have to make several assumptions involving X and how it relates to Y. Our main advantage will be that \(n\), the number of observations should be a lot larger than \(K\), the number of known unknowns.

Regression

  • Think about a situation where you know some features (X,independent variables, covariates, predictors,…) that is somewhat and somehow associated with a variable to be predicted (Y, target variable, dependent variable, outcome variable, response variable…)1

  • Most of us are neither omnipotent nor omniscient. This means for the majority of phenomena, not only do we know how values of X effect values of Y, we will not ever observe all possible Xs to “know” what Y is going to be.

Regression

\[Y=f(X)\]

  • Regression is about declaring the structure of \(f\) and finding the parameters of \(f\).

  • To do this we need 2 things.

    • Assuming a structure of f() and
    • Identify the parameters to optimize \(d(Y,f(Y))\) where d is a comparison function between the observed Y and the predicted Y \((f(Y))\)

Simple Linear Regression

  • Simplest form of Regression I can think of is Simple Linear Regression.
    • Linear and Additive Structure

\[Y=\beta_{0}+\beta_{1} \times X_{1}+\epsilon \]

  • The f(X) is the additive structure the \(\beta\) and \(\epsilon\) terms.
  • \(\beta_{0}\) is the intercept for some weird reason some CS people call it bias.
  • \(\beta_{1}\) is the slope or the coefficient of \(X_{1}\)
  • \(\epsilon\) is called the residual/error.

Curious Cats

  • What is the difference between the equation you see on the previous slide vs the one with 2 sets of coordinates.

  • Think about the equation \(Y=\beta_{0}+\beta_{1}X_{1}+\epsilon\), what does \(\epsilon\) represent?

  • Assume for a moment that the additive model of S.L.R. is the correct one. If all the uncertainty that can not be accounted by X is represented by \(\epsilon\) is there a scenario where you can reduce \(\epsilon\) to 0? What would this mean?

Mathematical Foundations of SLR

\[\mathbf{Y}=\beta_{0}+\beta_{1}\mathbf{X_{1}}+\mathbf{\epsilon}\] \[ \begin{bmatrix} y_{1} \\ y_{2} \\ y_{3} \\ \vdots\\ y_{n}\\ \end{bmatrix} = \beta_{0} + \beta_{1} \begin{bmatrix}x_{1}\\ x_{2} \\x_{3}\\ \vdots \\ x_{n} \\ \end{bmatrix} + \begin{bmatrix}\epsilon_{1}\\ \epsilon_{2} \\ \epsilon_{3}\\ \vdots \\ \epsilon_{n} \\ \end{bmatrix} \] - X and Y are data. \(\mathbf{\epsilon}\) are the error terms (more on them later). How do we estimate \(\beta_{0}\) and \(\beta_{1}\)?

Mathematical Foundations of LR

  • Let us think about the discrepancy function between Y and f(Y) which is going to lead us to pick \(\beta_{0}\) and \(\beta_{1}\) estimates.

  • What are some choices?

    1. MAE=\(\frac{\sum_{i=1}^{i=N}{\lvert Y_{i}-\hat{Y_{i}} \rvert}}{N}\)
    2. MSE=\(\frac{\sum_{i=1}^{i=N}{(Y_{i}-\hat{Y_{i}})^{2}}}{N}\)
  • \(\hat{Y_{i}}\) is predicted value of Y for observation i.

Mathematical Foundations of LR

  • In this course we are going to focus on MSE.
  • Min \(\frac{\sum_{i=1}^{i=N}{(Y_{i}-\hat{Y_{i}})^{2}}}{N}\)
    • N is a constant and can be ignored for the purposes of optimization.
  • Leads to the formulation to identify \[\mathbf{\hat{\beta}}=b_{0},\ldots,b_{K}=(X'X)^{-1}X'Y\]

Mathematical Foundations of LR

  • The matrix composed of \[(X'X)^{-1}X'\] is called the hat matrix and can be used to identify influential observations.

  • The matrix multiplication \[X (X'X)^{-1}X'Y=\hat{\mathbf{\beta}}\]

  • And \(X\hat{\mathbf{\beta}}= \hat{\mathbf{Y}}\)

  • We should ask ourselves how \(\hat{\mathbf{\beta}}\) are derived.

  • \(\mathbf{beta}\) defined as B.L.U.E. Best Linear Unbiased Estimators.

Linear Regression Derivation of B.L.U.E. of \(\beta\) terms

\[Min (\mathbf{Y}-\mathbf{X}\mathbf{\beta})'(\mathbf{Y}-\mathbf{X}\mathbf{\beta})\] \[Min \mathbf{Y}'\mathbf{Y}-2\mathbf{\beta}'\mathbf{X}'+\mathbf{\beta}'\mathbf{X}'\mathbf{X}\mathbf{\beta} \] - Take the derivative of the function with respect to \(\mathbf{\beta}\) and equate it to 0 to find the \(\mathbf{\hat{\beta}}\) that minimizes the function. Note \(\mathbf{Y}'\mathbf{Y}\) does not contain any \(\beta\) terms so its’ derivative is 0.

\[ -2\mathbf{X}'\mathbf{y}+2\mathbf{X}'\mathbf{X}\mathbf{\hat{\beta}}=0\]

  • Move the term without \(\beta\) to the r.h.s. of the equality and divide both sides with 2.

\[ \mathbf{X}'\mathbf{X}\mathbf{\hat{\beta}}=\mathbf{X}'\mathbf{y}\]

Linear Regression Derivation of B.L.U.E. of \(\beta\) terms

  • Recall how we have solved for linear equations earlier. Multiply both hands sides with the inverse of the square matrix \((\mathbf{X}'\mathbf{X})\)

\[ (\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{X}\mathbf{\hat{\beta}}=\mathbf{I}\mathbf{\hat{\beta}}=\mathbf{\hat{\beta}}= =(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{Y} \]

Assumptions we have made so far:

  • Note how we have multiplied and summed \(\mathbf{X}\) variables. Each value contributed to the sum independently of the value of other multiplications.

  • This in turn means that the \(\beta\) values are constants, they do not change according to the differing levels of \(\mathbf{X}\)

  • Further assumptions will need to be made for additional predictive and inferential results.

  • Before going further we need to test a hypothesis about \(\mathbf{\beta}\).

  • To understand what that hypothesis is, we need to discuss a bit of visual intuition.

Mean of Y as Predictions for Y

  • How would we predict the value of Y if we do not know any X and the only thing we know is Y itself?

  • To answer this question you need to know the discrepancy function between Y and what you will predict it to be.

  • If it is the squared loss function that we have discussed earlier \(\sum_{i=1}^{N} (Y_{i}-\hat{Y})^2\):

  • The 5 vertical lines drawn from y-coordinate to the mean of Y demonstrate the error the using Y only model would create for these 5 points. To distinguish this prediction from predictions with a regression model with X we use \(\bar{Y}\)

Mean of Y as Predictions for Y

  • The X-axis is only used for clarity and linking with the following plots. Imagine this on a single dimension of Y if you can.

Sum of Squares Total

  • Each \((Y_{i}-\bar{Y})\) term for each i would be the error term if we used \(\bar{Y}\) as our prediction for \(Y_{i}\).

  • If we obtain each and every \((Y_{i}-\bar{Y})\) term and add them up we get 0.

\[\sum_{i=1}^{i=N} (Y_{i}-\bar{Y})=0 \]

  • Therefore it can not be used as the aggregate amount of error we would commit if we used the constant value \(\bar{Y}\) as our predictions for each \(Y_{i}\).

Curious Cats

  • Show \[\sum_{i=1}^{i=N} (Y_{i}-\bar{Y})=0 \] using math.

  • Hint: \[\sum_{i=1}^{i=N} (Y_{i}-\bar{Y})=Y_{1}+\cdots+Y_{N}- \]

Sum of Squares Total

\[S.S.T.= \sum_{i=1}^{i=N} (Y_{i}-\bar{Y})^{2}\]

  • If we divided this with \(N-1\), we would obtain sample variance of Y.

    • SST is the total quantity of error we would commit if we used \(\bar{Y}\) as the prediction for each \(Y_{i}\).

    • It is also a measure of how much variation there is in Y.

\(\mathbf{\hat{Y}}\) as Predictions for \(Y_{i}\)

\(\mathbf{\hat{Y}}\) as Predictions for \(Y_{i}\)

  • We use information from a set of covariate(s) X(\(\mathbf{X}\)), come up with \(b_{0},b_{1},\cdots,b_{K}\) in order to predict each value for \(Y_{i}\) that minimizes the sum of squared errors.

  • We would like to quantify how much better the regression model with covariates does compared to \(\bar{Y}\) as the predicted value. Note that for the purpose of predicting the observed data, the predictions using \(b_{0},b_{1},\cdots,b_{K}\) overall, never do worse than \(\bar{Y}\).

  • This aggregate construct leads to the quantity referred to Sum of Squares Regression.

\[\sum_{i}^{i=N}(\hat{Y}_{i}-\bar{Y})^{2}\]

\(\mathbf{\hat{Y}}\) as Predictions for \(Y_{i}\)

  • Naturally we would like to aggregate the discrepancy (error) between what we predict \(Y_{i}\) to be and what \(Y_{i}\) is using the squared error loss function. This aggregate component is called sum of squared errors (S.S.E.). \(Y_{i}-\hat{Y_{i}}\) is the error the prediction model commits. We can shorten this notation with \(\epsilon_{i}\)

\[\sum_{i=1}^{i=N}(Y_{i}-\hat{Y_{i}})^{2}=\epsilon_{i}^{2} \]

Mean of Y vs \(\mathbf{\hat{Y}}\) as Predictions

S.S.T.=S.S.R.+S.S.E.

\[\sum_{i=1}^{i=N}(Y_{i}-\bar{Y})^{2}=\sum_{i=1}^{i=N}(\hat{Y}_{i}-\bar{Y})^{2}+\sum_{i=1}^{i=N}(Y_{i}-\hat{Y}_{i})^{2} \]

  • Sum of Squares Total = Total amount of variation in Y

  • Sum of Squared Regression = Total amount of variation that regression model accounts for.

  • Sum of Squares Error = Total amount of variation in Y that can not be explained with regression model.

\(r^{2} As Percentage of Variation Explained in Y\)

  • SST: Amount of variation there is in Y
  • SSR: Amount of variation there explained by the regression.

\[r^{2}=\frac{SSR}{SST} \]

  • r^{2} is the percentage of variation that the regression model can explain in Y.

  • So far we have made no use of distributional assumptions.

The key concept \(\epsilon\)

  • \(\epsilon\) is normally distributed.

  • We can use mathematical statistics to prove that if \(\epsilon\) has a normal distribution it leads to SSR and SSE to have a \(\chi^{2}\) distribution.

  • Furthermore if SSR is divided by what is referred to its’ degrees of freedom (K) you obtain mean square regression. \(\left(Mean \quad Square \quad Regression=\frac{SSR}{K}\right)\) where K is number of covariates.

The key concept \(\epsilon\)

  • Analogously SSE divided by its degrees of freedom (n-K-1) leads to Mean Square Error \(\left(Mean Square Error=\frac{SSE}{n-K-1}\right)\)

  • All of this leads in turn to \(F Ratio=\frac{MSR}{MSE}\) which has F distribution with K and n-K-1 degrees of freedom.

  • All of these relationships follows from \(\epsilon\) being normally distributed.

  • The calculated F-Ratio is a test statistic equivalent in purpose to \(t_{computed}\) which allows the following Hypothesis Test referred to as Global F Test.

Global F Test: Simple Linear Regression

\[\begin{aligned} H_{0}:\beta_{1}=0 \\ H_{a}:\beta_{1} \ne 0 \end{aligned}\]
  • Null hypothesis states that the coefficient of the covariate is 0. This means that the covariate \(X_{1}\) does not effect Y. The alternative hypothesis states that \(X_{1}\) does have an effect on Y. Recall that the equation of Linear Regression is \(Y=\beta_{0}+\beta_{1}X_{1} +\epsilon\). Therefore if \(\beta_{1}=0\) it does not matter what value \(X_{1}\) is.

Global F Test: Multiple Linear Regression

\[\begin{aligned} H_{0}: & \beta_{1}=\ldots =\beta_{K}=0\\ H_{a}: & NOT(\beta_{1}=\ldots =\beta_{K}) = 0 \end{aligned}\]
  • Null hypothesis states that none of the covariates have any effects. The alternative hypothesis test states that at least one covariate has an effect on Y (at least one coefficient is different from 0).

  • It does NOT say all covariates have an effect on Y. That would be very different test.

Excel Regression User Interface 1.

First steps involve adding Data Analysis toolpak, click on File, More, Options

Excel Regression User Interface 2.

Click on Add-ins, Click on Manage Excel Add-ins

Excel Regression User Interface

Make sure Analsis ToolPak is checked

Excel Regression User Interface

Data Analysis should appear on the Data tab

Excel Regression User Interface

Select Regression, Click OK

Excel Regression User Interface

Input Y and X ranges designate dependent and independent variables, respectively. The data has labels Price and Size so they are selected. It is not neccessary but check the box residuals.Click OK.

Excel Output

The first thing to look at in the regression output is the p value associated with the global hypothesis test. If you can reject the null hypothesis

Discussing Excel Output

  • The p value is under the label Significance F. It is the area to the right of the F Ratio statistic which is approximately 530.71 within the F distribution with 1 (K=1) and 516 (n-K-1=518-1-1) degrees of freedom. It is a tiny number and scientific notation has to be used to represent it. There are 80 0s to the right of the decimal point. We can say that since this number is smaller than any reasonable \(\alpha\) (alpha) value we can reject the null hypothesis that \(\beta_{1}=0\). In other words we can claim that Size has an effect on Price.

Excel Output

Next you can look at R Square, Coefficient Estimate etc… in no particular order

Discussing Excel Output

  • \(r^{2}\) is calculated as \(\frac{SSR}{SST}\). Recall SSR is the quantity of variation in Y (Price in this example) you can explain in Y with your model and SST is the total quantity of variation in Y. Therefore \(r^{2}\) is a number between 0 and 1. It quantifies the percentage of variation that your regression model explains in your dependent variable.

  • \(r^{2}\) is a useful quantity but interpretation can be limited when extended to multiple linear regression. It has the property that as you add more variables \(r^{2}\) never goes down even if the variable you add has nothing to do with the dependent variable. If you solely focus on \(r^{2}\) as to the strength of your model you will be misled.

  • If you have n observations and K-1 independent variables, \(r^{2}\) will be 1. If this is the case you clearly have an overfitted model where information from the model can not be generalized to new data observed from the same process.

Discussing Excel Output

  • \(r^{2}\) is approximately 51 percent. You are able to explain 51% of the changes in the dependent/target variable Price of a house.

  • The estimate of \(\beta_{0}\), \(b_{0}\) the intercept estimate is approximately 22,939. This is what you expect the price to be if the Size of the house is 0.

  • The coefficient estimate of Size is approximately 37.71. For each unit increase in Size increases the price of the house approximately by this amount.

Predicting an arbitrary house price

  • Predict the price of a house that is 1,000 square feet.
\[\begin{aligned}E(Price|X_{Size}=1000) & =22939+37.71 \times 1000 \\ & = 60649 \end{aligned}\]

Gatos Curiosos

  • The intercept’s general interpretation is that it is the value of the expected dependent variable if the independent variable is 0.

  • In this example the intercept is 22,939 dollars. How do you have a house that is 0 square feet and have a + value? Why is it not 0? Would you pay for a house that is 0 square feet?

  • We can all speculate answers however the answer simply lies in \[\mathbf{\hat{\beta}}=b_{0},b_{1}=(X'X)^{-1}X'Y\]

Computing Residuals

  • The first of the houses have a square footage of 1,076 square footage. The actual price of this particular house is 62,400 dollars.

  • Let us first find the predicted price for this house.

\[22939+37.7*1076=63504.2\] - The residual is simply \(Y-E(Y|X)\) and therefore \(62400-63504.2\) is the residual for this house \(-1104.2\).

Gatos Curiosos

  • There is more than one house that is the same size. The predicted values are the same. What changes off-course is the calculated residual.

  • If you had 100 houses for sale, each of the same size you would get 100 residuals.

  • What properties should these residuals have?

Regression via R-Scripts

#file parameter writes out the file location including name of the file and its' extension. NOTE THAT SLASHES ARE FORWARD / NOT BACKWARD \

# header = TRUE specifies that the file has labels on row 1

# sep="\t" means that each column is separated from each other with tab if it were comma seperated it \t would be replaced with ,

real_estate=read.table(file='C:/Users/rm84/Downloads/realestate.txt',header=TRUE,sep="\t")

# This is not always best practice but this attach function allows us to refer to the labels of the columns in the script.
attach(real_estate)

#The attach function allows us to specify Price or Size instead of referring to the data object real_estate first or column indices. o1 is the regression object we will use functions on. 
o1=lm(Price~Size)

Output via R-Scripts

summary(o1)

Call:
lm(formula = Price ~ Size)

Residuals:
   Min     1Q Median     3Q    Max 
-26100 -10552  -1141  11000  28267 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 22938.737   3110.831   7.374 6.65e-13 ***
Size           37.708      1.637  23.037  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 12620 on 516 degrees of freedom
Multiple R-squared:  0.507, Adjusted R-squared:  0.5061 
F-statistic: 530.7 on 1 and 516 DF,  p-value: < 2.2e-16

Discussing Output via R-Scripts

  • p value of the global F test is a very small number leading to the rejection of the null hypoyhesis that Size of the house has not effect on the price of the house. The number you see on R output and Excel output is not identical since in R if p values are extremely small the output lists it as being smaller than 0.00000000000000022 (supposed to be 15 0s here).

\(r^{2}\) is 51%, \(b_{0}\) is 22,939 and $b_{Size}=37.7

Assumptions: Simple Linear Regression

  • There is a linear relationship between X and Y

  • Residuals are Normally Distributed.

  • Residuals have an expectation of 0

  • Residuals have a constant variance