library(mosaic)
## Warning: package 'mosaic' was built under R version 3.2.5
## Warning: package 'dplyr' was built under R version 3.2.5
## Warning: package 'mosaicData' was built under R version 3.2.5
In this exercise you will study the data described in Agresti EXAMPLE 9.10.
You are studying house sales in Gainesville, Florida, where among other things the data contain the selling price (Price), property taxes (Taxes) and house size (Size).
HousePrices <- read.table("http://asta.math.aau.dk/dan/static/datasets?file=HousePrice.dat", header=TRUE)
head(HousePrices)
## Taxes Price Size
## 1 3104 279900 2048
## 2 1173 146500 912
## 3 3076 237700 1654
## 4 1608 200000 2068
## 5 1454 159900 1477
## 6 2997 499900 3153
plot(HousePrices)
Taxes and Size.There is a positive correlation between taxes and size as we can see on the graph it is in linear regression, and in the correlation test the 95% hypothesis is within the confidence interval of 0,74-0.87 whereas our cor estimate is 0.82.
cor.test(~ Size + Taxes, data = HousePrices)
##
## Pearson's product-moment correlation
##
## data: Size and Taxes
## t = 14.119, df = 98, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.7416554 0.8745614
## sample estimates:
## cor
## 0.8187958
Taxes and Size as predictors. Here we make a multiple regression model where we use Price as the response value and taxes and size as predictors.model <- lm(Price ~ Taxes + Size, data = HousePrices)
summary(model)
##
## Call:
## lm(formula = Price ~ Taxes + Size, data = HousePrices)
##
## Residuals:
## Min 1Q Median 3Q Max
## -188027 -26138 347 22944 200114
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -28608.744 13519.096 -2.116 0.0369 *
## Taxes 39.601 6.917 5.725 1.16e-07 ***
## Size 66.512 12.817 5.189 1.16e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 48830 on 97 degrees of freedom
## Multiple R-squared: 0.7722, Adjusted R-squared: 0.7675
## F-statistic: 164.4 on 2 and 97 DF, p-value: < 2.2e-16
The parameters are the intercept and the slope. The summary shows a positivie slope therefore, as prices increases, so does the taxes and size.
Explain the output where model is the fitted multiple regression model. This explanation should as a minimum include
t value and determination and interpretation of p-value.T-val equals the estimation/SE for each value
tval1 = -28608.7 / 13519.1
tval2 = 39.6 / 6.9
tval3 = 66.5 / 12.8
tval1
## [1] -2.116169
tval2
## [1] 5.73913
tval3
## [1] 5.195312
The p-value is much less than 5%. This means that we can rejest the null hypothesis for both the x1 and x2 variables.
(7)Interpretation of Multiple R-squared. R^2=(TSS - SSE)/ TSS We look at how many of the errors are not explained, to see how good the model is.
confint.95% confidence interval: (est??t*se)
t=qt (0.025, df=97, lower.tail = FALSE)
-28608.7 + (13519.1)*(t)
## [1] -1777.029
-28608.7 - (13519.1)*(t)
## [1] -55440.37
39.601 + (6.9)*(t)
## [1] 53.29559
39.601 - (6.9)*(t)
## [1] 25.90641
66.5 + (12.8)*(t)
## [1] 91.90446
66.5 - (12.8)*(t)
## [1] 41.09554
confint(model)
## 2.5 % 97.5 %
## (Intercept) -55440.40818 -1777.08054
## Taxes 25.87192 53.32920
## Size 41.07304 91.95066
With 95% confidence, our a1 will be between -55440.40818 and -1777.08054. b1 will be between 25.87192 and 53.32920, and b2 will be between 41.07304 and 91.95066 for our prediction equation.
Taxes and the effect of Size as predictors of Price.model2 <- lm(Price ~ Taxes * Size, data = HousePrices)
summary(model2)
##
## Call:
## lm(formula = Price ~ Taxes * Size, data = HousePrices)
##
## Residuals:
## Min 1Q Median 3Q Max
## -202902 -23642 -224 20081 213409
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.396e+04 2.450e+04 0.978 0.3305
## Taxes 1.991e+01 1.026e+01 1.941 0.0551 .
## Size 3.329e+01 1.806e+01 1.844 0.0683 .
## Taxes:Size 1.036e-02 4.072e-03 2.544 0.0126 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 47510 on 96 degrees of freedom
## Multiple R-squared: 0.7866, Adjusted R-squared: 0.7799
## F-statistic: 117.9 on 3 and 96 DF, p-value: < 2.2e-16
We look at the P value of each variable and the combined one. If p-value for Taxes:Size is more than 5%, then we would need to drop the combined value, return to summary(model) table and choose another model If we can see that p-value for combined is more than 5%, then we can say that particular value doesn’t have a significant impact on the response variable If combined (Taxes:Size) value has a p-value of less than 5%, then we still need to look at the estimated values of each (Taxes and Size) value and include them into calculation.
It looks like there is an interaction here aswell. Our R-squared is 0.78, so it fits rather well.