Determining the factors affecting price of house and predicting the price of house for varying factors

str(House_Price_Kaggle)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   21613 obs. of  6 variables:
 $ bedrooms   : num  3 3 2 4 3 4 3 3 3 3 ...
 $ bathrooms  : num  1 2.25 1 3 2 4.5 2.25 1.5 1 2.5 ...
 $ sqft_living: num  1180 2570 770 1960 1680 ...
 $ sqft_lot   : num  5650 7242 10000 5000 8080 ...
 $ floors     : num  1 2 1 1 1 1 2 1 1 2 ...
 $ price      : num  221900 538000 180000 604000 510000 ...

above information shows structure of the given data representing the variables with their quantitative attributes.

summary(House_Price_Kaggle)
    bedrooms        bathrooms      sqft_living   
 Min.   : 0.000   Min.   :0.000   Min.   :  290  
 1st Qu.: 3.000   1st Qu.:1.750   1st Qu.: 1427  
 Median : 3.000   Median :2.250   Median : 1910  
 Mean   : 3.371   Mean   :2.115   Mean   : 2080  
 3rd Qu.: 4.000   3rd Qu.:2.500   3rd Qu.: 2550  
 Max.   :33.000   Max.   :8.000   Max.   :13540  
    sqft_lot           floors          price        
 Min.   :    520   Min.   :1.000   Min.   :  75000  
 1st Qu.:   5040   1st Qu.:1.000   1st Qu.: 321950  
 Median :   7618   Median :1.500   Median : 450000  
 Mean   :  15107   Mean   :1.494   Mean   : 540088  
 3rd Qu.:  10688   3rd Qu.:2.000   3rd Qu.: 645000  
 Max.   :1651359   Max.   :3.500   Max.   :7700000  

above shown are descriptive statistics of the given variables.

This scatterplot explains the variability of price house with respect to sqft_lot(area of lot in sq. feet) and floors.

above histogram graphs shows the bathrooms in the houses with respect to floors,In the above graph it is very clear that most of the house with 1 floor are having number of bathrooms in range 1-3.

the above graph is reprensenting that most of the houses are having 1 or 2 floors.

graphical view for house data shown above.

cor(House_Price_Kaggle)
              bedrooms  bathrooms sqft_living
bedrooms    1.00000000 0.51588364   0.5766707
bathrooms   0.51588364 1.00000000   0.7546653
sqft_living 0.57667069 0.75466528   1.0000000
sqft_lot    0.03170324 0.08773966   0.1728257
floors      0.17542894 0.50065317   0.3539493
price       0.30834960 0.52513751   0.7020351
                sqft_lot       floors      price
bedrooms     0.031703243  0.175428935 0.30834960
bathrooms    0.087739662  0.500653173 0.52513751
sqft_living  0.172825661  0.353949290 0.70203505
sqft_lot     1.000000000 -0.005200991 0.08966086
floors      -0.005200991  1.000000000 0.25679389
price        0.089660861  0.256793888 1.00000000

the above correlation data tells us the relation between the variable,here all the variable set having coefficient values greater than 0.50 are having good relationship. As we have to predict the price of house, price is ous dependent or target variable and initially we are taking all leftovers as independent variables.

the above shown is the graph plot of corelation data and it explains that the darker the color the stronger the relation between the corresponding variables towards negative(brown) or postive(blue)

summary(house_price)
Error in summary(house_price) : object 'house_price' not found

In the above data the coefficients of independent variable with p-value greater than .05 are considered to b zero. So ,we conclude that only bedrooms,sqft_lot and sqft_living are affecting the price of house. Hence we are now again doing the regression considering only these variables as independent variable.

summary(house_priceR)

Call:
lm(formula = price ~ bedrooms + sqft_lot + sqft_living, data = House_Price_Kaggle)

Residuals:
     Min       1Q   Median       3Q      Max 
-1578538  -143639   -22850   102992  4149420 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  8.278e+04  6.604e+03  12.536   <2e-16
bedrooms    -5.880e+04  2.312e+03 -25.428   <2e-16
sqft_lot    -3.818e-01  4.307e-02  -8.866   <2e-16
sqft_living  3.179e+02  2.376e+00 133.801   <2e-16
               
(Intercept) ***
bedrooms    ***
sqft_lot    ***
sqft_living ***
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 257400 on 21609 degrees of freedom
Multiple R-squared:  0.5086,    Adjusted R-squared:  0.5085 
F-statistic:  7455 on 3 and 21609 DF,  p-value: < 2.2e-16

As all the variables are having p_values under 0.05. So we will use this regression equation to predict the result. Multiple R-square tells the percentage variablity in price explained by independent variable.

price_pred
      1 
1651089 

hence, it is the predicted price of house with 3 bedrooms,sqft_living=5496andsqft_lot=7000.

plot(house_priceR,col="red")

abline(h=0,col="blue",lwd=1)

above are the graphs between the various results from the analysis.

LS0tDQp0aXRsZTogIkFuYWx5c2lzIG9uIHByaWNlIG9mIGhvdXNlIg0Kb3V0cHV0OiBodG1sX25vdGVib29rDQotLS0NCkRldGVybWluaW5nIHRoZSBmYWN0b3JzIGFmZmVjdGluZyBwcmljZSBvZiBob3VzZSBhbmQgcHJlZGljdGluZyB0aGUgcHJpY2Ugb2YgaG91c2UgZm9yIHZhcnlpbmcgZmFjdG9ycw0KDQpgYGB7cn0NCmxpYnJhcnkocmVhZHhsKQ0KSG91c2VfUHJpY2VfS2FnZ2xlIDwtIHJlYWRfZXhjZWwoIkQ6L2ltYXJ0aWN1cy8xL0hvdXNlX1ByaWNlX0thZ2dsZS54bHN4IikNClZpZXcoSG91c2VfUHJpY2VfS2FnZ2xlKQ0KaGVhZChIb3VzZV9QcmljZV9LYWdnbGUpDQpgYGANCmBgYHtyfQ0Kc3RyKEhvdXNlX1ByaWNlX0thZ2dsZSkNCmBgYA0KYWJvdmUgaW5mb3JtYXRpb24gc2hvd3Mgc3RydWN0dXJlIG9mIHRoZSBnaXZlbiBkYXRhIHJlcHJlc2VudGluZyB0aGUgdmFyaWFibGVzIHdpdGggdGhlaXIgcXVhbnRpdGF0aXZlIGF0dHJpYnV0ZXMuDQpgYGB7cn0NCnN1bW1hcnkoSG91c2VfUHJpY2VfS2FnZ2xlKQ0KYGBgDQogYWJvdmUgc2hvd24gYXJlIGRlc2NyaXB0aXZlIHN0YXRpc3RpY3Mgb2YgdGhlIGdpdmVuIHZhcmlhYmxlcy4NCg0KYGBge3J9DQpsaWJyYXJ5KGdncGxvdDIpDQpIb3VzZV9QcmljZV9LYWdnbGUkZmxvb3JzPWFzLmZhY3RvcihIb3VzZV9QcmljZV9LYWdnbGUkZmxvb3JzKQ0KcXBsb3Qoc3FmdF9sb3QscHJpY2UsY29sPXByaWNlLGRhdGEgPSBIb3VzZV9QcmljZV9LYWdnbGUpDQpgYGANClRoaXMgc2NhdHRlcnBsb3QgZXhwbGFpbnMgdGhlIHZhcmlhYmlsaXR5IG9mIHByaWNlIGhvdXNlIHdpdGggcmVzcGVjdCB0byBzcWZ0X2xvdChhcmVhIG9mIGxvdCBpbiBzcS4gZmVldCkgYW5kIGZsb29ycy4NCmBgYHtyfQ0KcXBsb3QoYmF0aHJvb21zLGZpbGw9Zmxvb3JzLGJpbndpZHRoPTIseWxhYiA9ICJvYnNlcnZhdGlvbnMiLG1haW4gPSAiaGlzdG9ncmFtIG9mIGJhdGhyb29tcyB3LnIudCBmbG9vcnMiLGRhdGEgPSBIb3VzZV9QcmljZV9LYWdnbGUpDQpgYGANCmFib3ZlIGhpc3RvZ3JhbSBncmFwaHMgc2hvd3MgdGhlIGJhdGhyb29tcyBpbiB0aGUgaG91c2VzIHdpdGggcmVzcGVjdCB0byBmbG9vcnMsSW4gdGhlIGFib3ZlIGdyYXBoIGl0IGlzIHZlcnkgY2xlYXIgdGhhdCBtb3N0IG9mIHRoZSBob3VzZSAgd2l0aCAxIGZsb29yIGFyZSBoYXZpbmcgbnVtYmVyIG9mIGJhdGhyb29tcyBpbiByYW5nZSAxLTMuDQpgYGB7cn0NCnBsb3QoSG91c2VfUHJpY2VfS2FnZ2xlJGZsb29ycyxjb2w9Im5hdnkiLHhsYWI9ImZsb29ycyIseWxhYj0ib2JzZXJ2YXRpb25zIixtYWluID0gImJhciBncmFwaCBmb3IgZmxvb3JzIikNCmBgYA0KdGhlIGFib3ZlIGdyYXBoIGlzIHJlcHJlbnNlbnRpbmcgdGhhdCBtb3N0IG9mIHRoZSBob3VzZXMgYXJlIGhhdmluZyAxIG9yIDIgZmxvb3JzLg0KYGBge3J9DQpsaWJyYXJ5KFJDb2xvckJyZXdlcikNCnBhcihtZnJvdz1jKDEsMikpDQpib3hwbG90KEhvdXNlX1ByaWNlX0thZ2dsZSRwcmljZVsxOjEwMF0seGxhYj0icHJpY2UiLHlsYWI9Im9ic2VydmF0aW9ucyIsbWFpbj0iYm94cGxvdCBvZiBwcmljZSIsY29sPSJyZWQiKQ0KYm94cGxvdChIb3VzZV9QcmljZV9LYWdnbGUkYmF0aHJvb21zWzE6MTAwXX5Ib3VzZV9QcmljZV9LYWdnbGUkZmxvb3JzWzE6MTAwXSx4bGFiID0gImJhdGhyb29tcyIseWxhYiA9ICJvYnNlcnZhdGlvbnMiLG1haW49ImJveHBsb3QgZm9yIGJhdGhyb29tcyIsY29sPWhlYXQuY29sb3JzKDQpKQ0KDQpgYGANCmBgYHtyfQ0KcGxvdChoZWFkKEhvdXNlX1ByaWNlX0thZ2dsZSksY29sPWJyZXdlci5wYWwoNiwiU2V0MSIpLG1haW49ImdyYXBoaWNhbCByZXByZXNlbnRhdGlvbiBvZiBob3VzZSBkYXRhIikNCmBgYA0KZ3JhcGhpY2FsIHZpZXcgZm9yIGhvdXNlIGRhdGEgc2hvd24gYWJvdmUuDQpgYGB7cn0NCmNvcihIb3VzZV9QcmljZV9LYWdnbGUpDQpgYGANCnRoZSBhYm92ZSBjb3JyZWxhdGlvbiBkYXRhIHRlbGxzIHVzIHRoZSByZWxhdGlvbiBiZXR3ZWVuIHRoZSB2YXJpYWJsZSxoZXJlIGFsbCB0aGUgdmFyaWFibGUgc2V0IGhhdmluZyBjb2VmZmljaWVudCB2YWx1ZXMgZ3JlYXRlciB0aGFuIDAuNTAgYXJlIGhhdmluZyBnb29kIHJlbGF0aW9uc2hpcC4NCkFzIHdlIGhhdmUgdG8gcHJlZGljdCB0aGUgcHJpY2Ugb2YgaG91c2UsIHByaWNlIGlzIG91cyBkZXBlbmRlbnQgb3IgdGFyZ2V0IHZhcmlhYmxlIGFuZCBpbml0aWFsbHkgd2UgYXJlIHRha2luZyBhbGwgbGVmdG92ZXJzIGFzIGluZGVwZW5kZW50IHZhcmlhYmxlcy4NCmBgYHtyfQ0KbGlicmFyeShjb3JycGxvdCkNCmNvcnJwbG90KGNvcihIb3VzZV9QcmljZV9LYWdnbGUpKQ0KYGBgDQp0aGUgYWJvdmUgc2hvd24gaXMgdGhlIGdyYXBoIHBsb3Qgb2YgY29yZWxhdGlvbiBkYXRhIGFuZCBpdCBleHBsYWlucyB0aGF0IHRoZSBkYXJrZXIgdGhlIGNvbG9yIHRoZSBzdHJvbmdlciB0aGUgcmVsYXRpb24gYmV0d2VlbiB0aGUgY29ycmVzcG9uZGluZyB2YXJpYWJsZXMgdG93YXJkcyBuZWdhdGl2ZShicm93bikgb3IgcG9zdGl2ZShibHVlKSANCmBgYHtyfQ0KaG91c2VfcHJpY2U9bG0ocHJpY2V+YmVkcm9vbXMrYmF0aHJvb21zK3NxZnRfbGl2aW5nK3NxZnRfbG90K2Zsb29ycyxkYXRhID0gSG91c2VfUHJpY2VfS2FnZ2xlKQ0Kc3VtbWFyeShob3VzZV9wcmljZSkNCmBgYA0KSW4gdGhlIGFib3ZlIGRhdGEgdGhlIGNvZWZmaWNpZW50cyBvZiBpbmRlcGVuZGVudCB2YXJpYWJsZSB3aXRoIHAtdmFsdWUgZ3JlYXRlciB0aGFuIC4wNSBhcmUgY29uc2lkZXJlZCB0byBiIHplcm8uDQpTbyAsd2UgY29uY2x1ZGUgdGhhdCBvbmx5IGJlZHJvb21zLHNxZnRfbG90IGFuZCBzcWZ0X2xpdmluZyBhcmUgIGFmZmVjdGluZyB0aGUgcHJpY2Ugb2YgaG91c2UuDQpIZW5jZSB3ZSBhcmUgbm93IGFnYWluIGRvaW5nIHRoZSByZWdyZXNzaW9uIGNvbnNpZGVyaW5nIG9ubHkgdGhlc2UgdmFyaWFibGVzIGFzIGluZGVwZW5kZW50IHZhcmlhYmxlLg0KYGBge3J9DQpob3VzZV9wcmljZVI9bG0ocHJpY2V+YmVkcm9vbXMrc3FmdF9sb3Qrc3FmdF9saXZpbmcsZGF0YSA9IEhvdXNlX1ByaWNlX0thZ2dsZSkNCnN1bW1hcnkoaG91c2VfcHJpY2VSKQ0KYGBgDQpBcyBhbGwgdGhlIHZhcmlhYmxlcyBhcmUgaGF2aW5nIHBfdmFsdWVzIHVuZGVyIDAuMDUuDQpTbyB3ZSB3aWxsICAgdXNlIHRoaXMgcmVncmVzc2lvbiBlcXVhdGlvbiB0byBwcmVkaWN0IHRoZSByZXN1bHQuDQpNdWx0aXBsZSBSLXNxdWFyZSB0ZWxscyB0aGUgcGVyY2VudGFnZSB2YXJpYWJsaXR5IGluIHByaWNlIGV4cGxhaW5lZCBieSBpbmRlcGVuZGVudCB2YXJpYWJsZS4NCmBgYHtyfQ0KbXlfcHJlZD1kYXRhLmZyYW1lKGJlZHJvb21zPTMsc3FmdF9saXZpbmc9NTQ5NixzcWZ0X2xvdD03MDAwKQ0KcHJpY2VfcHJlZD1wcmVkaWN0KGhvdXNlX3ByaWNlUixteV9wcmVkKQ0KcHJpY2VfcHJlZA0KYGBgDQpoZW5jZSwgaXQgaXMgdGhlIHByZWRpY3RlZCBwcmljZSBvZiBob3VzZSB3aXRoIDMgYmVkcm9vbXMsc3FmdF9saXZpbmc9NTQ5NmFuZHNxZnRfbG90PTcwMDAuDQpgYGB7cn0NCnBsb3QoaG91c2VfcHJpY2VSLGNvbD0icmVkIikNCmFibGluZShoPTAsY29sPSJibHVlIixsd2Q9MSkNCg0KYGBgDQphYm92ZSBhcmUgdGhlIGdyYXBocyBiZXR3ZWVuIHRoZSB2YXJpb3VzIHJlc3VsdHMgZnJvbSB0aGUgYW5hbHlzaXMuDQoNCg0KDQoNCg==