my_git_url <- getURL("https://raw.githubusercontent.com/AhmedBuckets/SPS605/main/home_data.csv")
price_data <- read.csv(text = my_git_url)
price_sqft_model <- lm(price ~ sqft_living, data = price_data)
## integer(0)
##
## Call:
## lm(formula = price ~ sqft_living, data = price_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1476062 -147486 -24043 106182 4362067
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -43580.743 4402.690 -9.899 <2e-16 ***
## sqft_living 280.624 1.936 144.920 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 261500 on 21611 degrees of freedom
## Multiple R-squared: 0.4929, Adjusted R-squared: 0.4928
## F-statistic: 2.1e+04 on 1 and 21611 DF, p-value: < 2.2e-16
The residuals tell us about the differences between observed values and values predicted by the model. The minimum, or largest underestimation by the model, is -1476062. The largest overestimation was 4362067. The median residual value is -24043. The residual values tell us more when we can cmompare them with the model’s predictions:
We see that the values are fairly evenly distributed evenly around 0. Residuals do tend to increase as we move to the right, meaning the model will struggle with prediction at larger values.
We can use a Q-Q plot to visualize whether or not the residuals are normally distributed:
The residuals don’t deviate too much from the the line until it gets to the rightmost extreme.