A glimpse through R
| Variable | Description |
|---|---|
| Price | Sales price of residence (dollars) |
| Sqft | Finished area of residence (square feet) |
| Bedroom | Total number of bedrooms in residence |
| Bathroom | Total number of bathrooms in residence |
| Airconditioning | \(1\) = Presence of air conditioning, \(0\) = otherwise |
| Garage | Number of cars that garage will hold |
| Pool | \(1\) = Presence of Pool, \(0\) = otherwise |
| YearBuild | Year of Construction |
| Quality | \(1\) = High quality, \(2\) = Medium, \(3\) = Low |
| Lot | Lot size (in square feet) |
| AdjHighway | \(1\) = if the property is adjacent to a highway, \(0\) = otherwise. |
All the variables have significant effect on Price or not.
Whether older houses tend to have lower prices.
How much adjacency to highway affects the price.
Is the average price different between air conditioned house and non air conditioned house. etc.
'data.frame': 522 obs. of 11 variables:
$ Price : int 360000 340000 250000 205500 275500 248000 229900 150000 195000 160000 ...
$ Sqft : int 3032 2058 1780 1638 2196 1966 2216 1597 1622 1976 ...
$ Bedroom : int 4 4 4 4 4 4 3 2 3 3 ...
$ Bathroom : int 4 2 3 2 3 3 2 1 2 3 ...
$ Airconditioning: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 1 ...
$ Garage : int 2 2 2 2 2 5 2 1 2 1 ...
$ Pool : Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 1 1 1 ...
$ YearBuild : int 1972 1976 1980 1963 1968 1972 1972 1955 1975 1918 ...
$ Quality : Factor w/ 3 levels "1","2","3": 2 2 2 2 2 2 2 2 3 3 ...
$ Lot : int 22221 22912 21345 17342 21786 18902 18639 22112 14321 32358 ...
$ AdjHighway : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
Firstly we fit the following linear model with all the variables, \[\boldsymbol{y}=X_{n\times p}\boldsymbol{\beta}+\boldsymbol{\epsilon}\] and, \(n=522,\:p=12\)
\[\boldsymbol{\beta}=\left(\alpha,\beta_{1},\beta_{2},\ldots,\beta_{11}\right)'\] \[X=\left(\boldsymbol{1_{n},x_{1},x_{2},\ldots,x_{11}}\right)\] assuming, \[\boldsymbol{\epsilon}\overset{iid}{\sim}\mathcal{N}\left(0,\sigma^{2}\right)\]
This is an ANOCOVA Model.
\[\boldsymbol{\hat{\beta}_{OLS}}=\left(X'X\right)^{-1}X'\boldsymbol{y}\]
OLS of Beta
[,1]
(Intercept) -2358196.4156
Sqft 87.0047
Bedroom -5125.0967
Bathroom 8126.9009
Airconditioning1 4850.7151
Garage 10888.3678
Pool1 10138.7609
YearBuild 1269.4213
Quality2 -142985.0702
Quality3 -148375.5019
Lot 1.5565
AdjHighway1 -27373.9498
Call:
lm(formula = Price ~ ., data = real)
Residuals:
Min 1Q Median 3Q Max
-204865 -28010 -4973 21315 298892
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.358e+06 3.991e+05 -5.909 6.29e-09 ***
Sqft 8.700e+01 6.570e+00 13.242 < 2e-16 ***
Bedroom -5.125e+03 3.275e+03 -1.565 0.1182
Bathroom 8.127e+03 4.288e+03 1.895 0.0586 .
Airconditioning1 4.851e+03 8.086e+03 0.600 0.5488
Garage 1.089e+04 5.060e+03 2.152 0.0319 *
Pool1 1.014e+04 1.040e+04 0.975 0.3303
YearBuild 1.269e+03 2.024e+02 6.272 7.60e-10 ***
Quality2 -1.430e+05 1.021e+04 -14.007 < 2e-16 ***
Quality3 -1.484e+05 1.404e+04 -10.564 < 2e-16 ***
Lot 1.556e+00 2.363e-01 6.587 1.12e-10 ***
AdjHighway1 -2.737e+04 1.810e+04 -1.512 0.1311
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 58770 on 510 degrees of freedom
Multiple R-squared: 0.8223, Adjusted R-squared: 0.8184
F-statistic: 214.5 on 11 and 510 DF, p-value: < 2.2e-16
Call:
lm(formula = Price ~ ., data = real)
Residuals:
Min 1Q Median 3Q Max
-204865 -28010 -4973 21315 298892
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.358e+06 3.991e+05 -5.909 6.29e-09 ***
Sqft 8.700e+01 6.570e+00 13.242 < 2e-16 ***
Bedroom -5.125e+03 3.275e+03 -1.565 0.1182
Bathroom 8.127e+03 4.288e+03 1.895 0.0586 .
Airconditioning1 4.851e+03 8.086e+03 0.600 0.5488
Garage 1.089e+04 5.060e+03 2.152 0.0319 *
Pool1 1.014e+04 1.040e+04 0.975 0.3303
YearBuild 1.269e+03 2.024e+02 6.272 7.60e-10 ***
Quality2 -1.430e+05 1.021e+04 -14.007 < 2e-16 ***
Quality3 -1.484e+05 1.404e+04 -10.564 < 2e-16 ***
Lot 1.556e+00 2.363e-01 6.587 1.12e-10 ***
AdjHighway1 -2.737e+04 1.810e+04 -1.512 0.1311
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 58770 on 510 degrees of freedom
Multiple R-squared: 0.8223, Adjusted R-squared: 0.8184
F-statistic: 214.5 on 11 and 510 DF, p-value: < 2.2e-16
Errors have mean zero.
Errors are Homoscedastic.
Errors are uncorrelated.
Errors are Normally distributed.
Indian Statistical Institute, Delhi