Determining the factors affecting price of house and predicting the price of house for varying factors
str(House_Price_Kaggle)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 21613 obs. of 6 variables:
$ bedrooms : num 3 3 2 4 3 4 3 3 3 3 ...
$ bathrooms : num 1 2.25 1 3 2 4.5 2.25 1.5 1 2.5 ...
$ sqft_living: num 1180 2570 770 1960 1680 ...
$ sqft_lot : num 5650 7242 10000 5000 8080 ...
$ floors : num 1 2 1 1 1 1 2 1 1 2 ...
$ price : num 221900 538000 180000 604000 510000 ...
above information shows structure of the given data representing the variables with their quantitative attributes.
summary(House_Price_Kaggle)
bedrooms bathrooms sqft_living
Min. : 0.000 Min. :0.000 Min. : 290
1st Qu.: 3.000 1st Qu.:1.750 1st Qu.: 1427
Median : 3.000 Median :2.250 Median : 1910
Mean : 3.371 Mean :2.115 Mean : 2080
3rd Qu.: 4.000 3rd Qu.:2.500 3rd Qu.: 2550
Max. :33.000 Max. :8.000 Max. :13540
sqft_lot floors price
Min. : 520 Min. :1.000 Min. : 75000
1st Qu.: 5040 1st Qu.:1.000 1st Qu.: 321950
Median : 7618 Median :1.500 Median : 450000
Mean : 15107 Mean :1.494 Mean : 540088
3rd Qu.: 10688 3rd Qu.:2.000 3rd Qu.: 645000
Max. :1651359 Max. :3.500 Max. :7700000
above shown are descriptive statistics of the given variables.
This scatterplot explains the variability of price house with respect to sqft_lot(area of lot in sq. feet) and floors.
above histogram graphs shows the bathrooms in the houses with respect to floors,In the above graph it is very clear that most of the house with 1 floor are having number of bathrooms in range 1-3.
the above graph is reprensenting that most of the houses are having 1 or 2 floors.
graphical view for house data shown above.
cor(House_Price_Kaggle)
bedrooms bathrooms sqft_living
bedrooms 1.00000000 0.51588364 0.5766707
bathrooms 0.51588364 1.00000000 0.7546653
sqft_living 0.57667069 0.75466528 1.0000000
sqft_lot 0.03170324 0.08773966 0.1728257
floors 0.17542894 0.50065317 0.3539493
price 0.30834960 0.52513751 0.7020351
sqft_lot floors price
bedrooms 0.031703243 0.175428935 0.30834960
bathrooms 0.087739662 0.500653173 0.52513751
sqft_living 0.172825661 0.353949290 0.70203505
sqft_lot 1.000000000 -0.005200991 0.08966086
floors -0.005200991 1.000000000 0.25679389
price 0.089660861 0.256793888 1.00000000
the above correlation data tells us the relation between the variable,here all the variable set having coefficient values greater than 0.50 are having good relationship. As we have to predict the price of house, price is ous dependent or target variable and initially we are taking all leftovers as independent variables.
the above shown is the graph plot of corelation data and it explains that the darker the color the stronger the relation between the corresponding variables towards negative(brown) or postive(blue)
summary(house_price)
Error in summary(house_price) : object 'house_price' not found
In the above data the coefficients of independent variable with p-value greater than .05 are considered to b zero. So ,we conclude that only bedrooms,sqft_lot and sqft_living are affecting the price of house. Hence we are now again doing the regression considering only these variables as independent variable.
summary(house_priceR)
Call:
lm(formula = price ~ bedrooms + sqft_lot + sqft_living, data = House_Price_Kaggle)
Residuals:
Min 1Q Median 3Q Max
-1578538 -143639 -22850 102992 4149420
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.278e+04 6.604e+03 12.536 <2e-16
bedrooms -5.880e+04 2.312e+03 -25.428 <2e-16
sqft_lot -3.818e-01 4.307e-02 -8.866 <2e-16
sqft_living 3.179e+02 2.376e+00 133.801 <2e-16
(Intercept) ***
bedrooms ***
sqft_lot ***
sqft_living ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 257400 on 21609 degrees of freedom
Multiple R-squared: 0.5086, Adjusted R-squared: 0.5085
F-statistic: 7455 on 3 and 21609 DF, p-value: < 2.2e-16
As all the variables are having p_values under 0.05. So we will use this regression equation to predict the result. Multiple R-square tells the percentage variablity in price explained by independent variable.
price_pred
1
1651089
hence, it is the predicted price of house with 3 bedrooms,sqft_living=5496andsqft_lot=7000.
plot(house_priceR,col="red")
abline(h=0,col="blue",lwd=1)
above are the graphs between the various results from the analysis.