This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.

Sys.setlocale("LC_ALL", my_locale)
OS reports request to set locale to "LC_COLLATE=English_India.1252;LC_CTYPE=English_India.1252;LC_MONETARY=English_India.1252;LC_NUMERIC=C;LC_TIME=English_India.1252" cannot be honored
[1] ""

This dataset consists of six variables namely Bedroom,Bathroom,Sqft_living,sqft_lot,floors,Price. Here price is the dependent variable and other variables are independent variables.

View(House_Price_Kaggle)

View fuction helpsto view our dataset in R.

summary(House_Price_Kaggle)
    bedrooms        bathrooms      sqft_living       sqft_lot      
 Min.   : 0.000   Min.   :0.000   Min.   :  290   Min.   :    520  
 1st Qu.: 3.000   1st Qu.:1.750   1st Qu.: 1427   1st Qu.:   5040  
 Median : 3.000   Median :2.250   Median : 1910   Median :   7618  
 Mean   : 3.371   Mean   :2.115   Mean   : 2080   Mean   :  15107  
 3rd Qu.: 4.000   3rd Qu.:2.500   3rd Qu.: 2550   3rd Qu.:  10688  
 Max.   :33.000   Max.   :8.000   Max.   :13540   Max.   :1651359  
     floors          price        
 Min.   :1.000   Min.   :  75000  
 1st Qu.:1.000   1st Qu.: 321950  
 Median :1.500   Median : 450000  
 Mean   :1.494   Mean   : 540088  
 3rd Qu.:2.000   3rd Qu.: 645000  
 Max.   :3.500   Max.   :7700000  

Summary of the dataset gives us the minimum value,maximum value, quartile values,mean,median. This gives us the basic understanding of our dataset.

str(House_Price_Kaggle)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   21613 obs. of  6 variables:
 $ bedrooms   : num  3 3 2 4 3 4 3 3 3 3 ...
 $ bathrooms  : num  1 2.25 1 3 2 4.5 2.25 1.5 1 2.5 ...
 $ sqft_living: num  1180 2570 770 1960 1680 ...
 $ sqft_lot   : num  5650 7242 10000 5000 8080 ...
 $ floors     : num  1 2 1 1 1 1 2 1 1 2 ...
 $ price      : num  221900 538000 180000 604000 510000 ...

str stands for structure of the dataset to find out which are characters and which are numerical

Plot simply gives a scatter plot of our dataset including all variables.

scatter.smooth(House_Price_Kaggle)
pseudoinverse used at 3neighborhood radius 1reciprocal condition number  0There are other near singularities as well. 1pseudoinverse used at 3neighborhood radius 1reciprocal condition number  0There are other near singularities as well. 1pseudoinverse used at 3neighborhood radius 1reciprocal condition number  0There are other near singularities as well. 1pseudoinverse used at 3neighborhood radius 1reciprocal condition number  0There are other near singularities as well. 1pseudoinverse used at 3neighborhood radius 1reciprocal condition number  0There are other near singularities as well. 1

Scatter.smooth gives us same scatter plot as plot but here we have done for only one variable

This is a boxplot forBedrooms variable of the dataset.

This boxplot is for Bathrooms variable of the dataset.

This boxplot is for only Sqft_living variable of the dataset.

This boxplot is for only sqft_lot variable ofthe dataset.

This boxplot is for only Floors variable ofthe dataset.

This boxplot is for only Price variable ofthe dataset.

This scatter plot compares Bathrooms and Price of the house.

This scatter plot compares sqft_living and Price of the house.

This scatter plot compares sqft_lot and Price of the house.

This scatter plot compares Floors and Price of the house.

scatter.smooth(House_Price_Kaggle$price,House_Price_Kaggle$bathrooms,col=c('brown4','green'),main='Price vs Bathrooms')
scatter.smooth(House_Price_Kaggle$price,House_Price_Kaggle$sqft_living,col=c('green','brown1'),main='Price vs Sqft_living')

scatter.smooth(House_Price_Kaggle$price,House_Price_Kaggle$sqft_lot,col=c('green','black'),main='Price vs Sqft_lot')

scatter.smooth(House_Price_Kaggle$price,House_Price_Kaggle$floors,col=c('green','mediumvioletred'),main='Price vs Floors')

Now if we want to see all the four graphs in a single screen we use par function. We mention the row number and coloumn number in mfrow and we get all the graphs in a single screen.

cor(House_Price_Kaggle)
              bedrooms  bathrooms sqft_living     sqft_lot       floors
bedrooms    1.00000000 0.51588364   0.5766707  0.031703243  0.175428935
bathrooms   0.51588364 1.00000000   0.7546653  0.087739662  0.500653173
sqft_living 0.57667069 0.75466528   1.0000000  0.172825661  0.353949290
sqft_lot    0.03170324 0.08773966   0.1728257  1.000000000 -0.005200991
floors      0.17542894 0.50065317   0.3539493 -0.005200991  1.000000000
price       0.30834960 0.52513751   0.7020351  0.089660861  0.256793888
                 price
bedrooms    0.30834960
bathrooms   0.52513751
sqft_living 0.70203505
sqft_lot    0.08966086
floors      0.25679389
price       1.00000000

Now we check the corrrelation for all the variables to determine the strength.

In order to plot the correlation co-efficients we call the corrplot from library. We assign a variable to corrplot and run that variable.

Regprice

Call:
lm(formula = price ~ bedrooms + bathrooms + sqft_living + sqft_lot + 
    floors, data = House_Price_Kaggle)

Coefficients:
(Intercept)     bedrooms    bathrooms  sqft_living     sqft_lot  
  8.066e+04   -5.953e+04    6.958e+03    3.143e+02   -3.788e-01  
     floors  
 -1.758e+03  

After checking correlation we move on to regression. To perform regression we declare a variable and use lm function and form the regression model.

summary(Regprice)

Call:
lm(formula = price ~ bedrooms + bathrooms + sqft_living + sqft_lot + 
    floors, data = House_Price_Kaggle)

Residuals:
     Min       1Q   Median       3Q      Max 
-1573404  -143855   -22380   102493  4148365 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  8.066e+04  7.696e+03  10.481   <2e-16 ***
bedrooms    -5.953e+04  2.351e+03 -25.319   <2e-16 ***
bathrooms    6.958e+03  3.809e+03   1.827   0.0678 .  
sqft_living  3.143e+02  3.132e+00 100.355   <2e-16 ***
sqft_lot    -3.788e-01  4.320e-02  -8.769   <2e-16 ***
floors      -1.758e+03  3.776e+03  -0.466   0.6415    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 257400 on 21607 degrees of freedom
Multiple R-squared:  0.5087,    Adjusted R-squared:  0.5086 
F-statistic:  4474 on 5 and 21607 DF,  p-value: < 2.2e-16

Summary of the declared variable will give us the p value for considering our variables for the regression equation.

regfinal

Call:
lm(formula = price ~ bedrooms + sqft_living + sqft_lot, data = House_Price_Kaggle)

Coefficients:
(Intercept)     bedrooms  sqft_living     sqft_lot  
  8.278e+04   -5.880e+04    3.179e+02   -3.818e-01  

After checking p value we have only two variables so we form a new eqaution with those two variables.

my_prediction_price
      1 
1061021 

Finaly with our formed regression equation we can predict for any given value.

LS0tDQp0aXRsZTogIkhvdXNlIHByaWNpbmcgYW5hbHlzaXMiDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KDQpUaGlzIGlzIGFuIFtSIE1hcmtkb3duXShodHRwOi8vcm1hcmtkb3duLnJzdHVkaW8uY29tKSBOb3RlYm9vay4gV2hlbiB5b3UgZXhlY3V0ZSBjb2RlIHdpdGhpbiB0aGUgbm90ZWJvb2ssIHRoZSByZXN1bHRzIGFwcGVhciBiZW5lYXRoIHRoZSBjb2RlLiANCg0KVHJ5IGV4ZWN1dGluZyB0aGlzIGNodW5rIGJ5IGNsaWNraW5nIHRoZSAqUnVuKiBidXR0b24gd2l0aGluIHRoZSBjaHVuayBvciBieSBwbGFjaW5nIHlvdXIgY3Vyc29yIGluc2lkZSBpdCBhbmQgcHJlc3NpbmcgKkN0cmwrU2hpZnQrRW50ZXIqLiANCg0KYGBge3J9DQpteV9sb2NhbGUgPC0gU3lzLmdldGxvY2FsZSgiTENfQUxMIikNClN5cy5zZXRsb2NhbGUoIkxDX0FMTCIsIG15X2xvY2FsZSkNCmxpYnJhcnkocmVhZHhsKQ0KSG91c2VfUHJpY2VfS2FnZ2xlIDwtIHJlYWRfZXhjZWwoIkM6L1VzZXJzL0RFTEwvRGVza3RvcC9JbWFydGljdXMvQXNzaWdubWVudHMgZXhjZWwvSG91c2VfUHJpY2VfS2FnZ2xlLnhsc3giKQ0KYGBgDQpUaGlzIGRhdGFzZXQgY29uc2lzdHMgb2Ygc2l4IHZhcmlhYmxlcyBuYW1lbHkgQmVkcm9vbSxCYXRocm9vbSxTcWZ0X2xpdmluZyxzcWZ0X2xvdCxmbG9vcnMsUHJpY2UuIEhlcmUgcHJpY2UgaXMgIHRoZSBkZXBlbmRlbnQgdmFyaWFibGUgYW5kIG90aGVyIHZhcmlhYmxlcyBhcmUgaW5kZXBlbmRlbnQgdmFyaWFibGVzLg0KYGBge3J9DQpWaWV3KEhvdXNlX1ByaWNlX0thZ2dsZSkNCmBgYA0KVmlldyBmdWN0aW9uIGhlbHBzdG8gdmlldyBvdXIgZGF0YXNldCBpbiBSLg0KYGBge3J9DQpzdW1tYXJ5KEhvdXNlX1ByaWNlX0thZ2dsZSkNCmBgYA0KU3VtbWFyeSBvZiB0aGUgZGF0YXNldCBnaXZlcyB1cyB0aGUgbWluaW11bSB2YWx1ZSxtYXhpbXVtIHZhbHVlLCBxdWFydGlsZSB2YWx1ZXMsbWVhbixtZWRpYW4uIFRoaXMgZ2l2ZXMgdXMgdGhlIGJhc2ljIHVuZGVyc3RhbmRpbmcgb2Ygb3VyIGRhdGFzZXQuDQpgYGB7cn0NCnN0cihIb3VzZV9QcmljZV9LYWdnbGUpDQpgYGANCnN0ciBzdGFuZHMgZm9yIHN0cnVjdHVyZSBvZiB0aGUgZGF0YXNldCB0byBmaW5kIG91dCB3aGljaCBhcmUgY2hhcmFjdGVycyBhbmQgd2hpY2ggYXJlIG51bWVyaWNhbA0KYGBge3J9DQpwbG90KEhvdXNlX1ByaWNlX0thZ2dsZSkNCmBgYA0KUGxvdCBzaW1wbHkgZ2l2ZXMgYSBzY2F0dGVyIHBsb3Qgb2Ygb3VyIGRhdGFzZXQgaW5jbHVkaW5nIGFsbCB2YXJpYWJsZXMuDQpgYGB7cn0NCnNjYXR0ZXIuc21vb3RoKEhvdXNlX1ByaWNlX0thZ2dsZSkNCmBgYA0KU2NhdHRlci5zbW9vdGggZ2l2ZXMgdXMgc2FtZSBzY2F0dGVyIHBsb3QgYXMgcGxvdCBidXQgaGVyZSB3ZSBoYXZlIGRvbmUgZm9yIG9ubHkgb25lIHZhcmlhYmxlDQpgYGB7cn0NCmJveHBsb3QoSG91c2VfUHJpY2VfS2FnZ2xlJGJlZHJvb21zLGNvbD0nYmx1ZScsbWFpbj0nQmVkcm9vbXMnKQ0KYGBgDQpUaGlzIGlzIGEgYm94cGxvdCBmb3JCZWRyb29tcyB2YXJpYWJsZSBvZiB0aGUgZGF0YXNldC4NCmBgYHtyfQ0KYm94cGxvdChIb3VzZV9QcmljZV9LYWdnbGUkYmF0aHJvb21zLGNvbD0nZ3JlZW4nLG1haW49J0JhdGhyb29tcycpDQpgYGANClRoaXMgYm94cGxvdCBpcyBmb3IgQmF0aHJvb21zIHZhcmlhYmxlIG9mIHRoZSBkYXRhc2V0Lg0KYGBge3J9DQpib3hwbG90KEhvdXNlX1ByaWNlX0thZ2dsZSRzcWZ0X2xpdmluZyxjb2w9J29yYW5nZScsbWFpbj0nc3FmdF9saXZpbmcnKQ0KYGBgDQpUaGlzIGJveHBsb3QgaXMgZm9yIG9ubHkgU3FmdF9saXZpbmcgdmFyaWFibGUgb2YgdGhlIGRhdGFzZXQuDQpgYGB7cn0NCmJveHBsb3QoSG91c2VfUHJpY2VfS2FnZ2xlJHNxZnRfbG90LGNvbD0nY2hvY29sYXRlJyxtYWluPSdzcWZ0X2xvdCcpDQpgYGANClRoaXMgYm94cGxvdCBpcyBmb3Igb25seSBzcWZ0X2xvdCB2YXJpYWJsZSBvZnRoZSBkYXRhc2V0Lg0KYGBge3J9DQpib3hwbG90KEhvdXNlX1ByaWNlX0thZ2dsZSRmbG9vcnMsY29sID0gJ2Nob2NvbGF0ZScsbWFpbj0nRmxvb3JzJykNCmBgYA0KVGhpcyBib3hwbG90IGlzIGZvciBvbmx5IEZsb29ycyB2YXJpYWJsZSBvZnRoZSBkYXRhc2V0Lg0KYGBge3J9DQpib3hwbG90KEhvdXNlX1ByaWNlX0thZ2dsZSRwcmljZSxjb2w9J2N5YW4xJyxtYWluPSdQcmljZScpDQpgYGANClRoaXMgYm94cGxvdCBpcyBmb3Igb25seSBQcmljZSB2YXJpYWJsZSBvZnRoZSBkYXRhc2V0Lg0KDQpgYGB7cn0NCnNjYXR0ZXIuc21vb3RoKEhvdXNlX1ByaWNlX0thZ2dsZSRwcmljZSxIb3VzZV9QcmljZV9LYWdnbGUkYmF0aHJvb21zLGNvbD1jKCdicm93bjQnLCdncmVlbicpLG1haW49J1ByaWNlIHZzIEJhdGhyb29tcycpDQpgYGANClRoaXMgc2NhdHRlciBwbG90IGNvbXBhcmVzIEJhdGhyb29tcyBhbmQgUHJpY2Ugb2YgdGhlIGhvdXNlLg0KYGBge3J9DQpzY2F0dGVyLnNtb290aChIb3VzZV9QcmljZV9LYWdnbGUkcHJpY2UsSG91c2VfUHJpY2VfS2FnZ2xlJHNxZnRfbGl2aW5nLGNvbD1jKCdncmVlbicsJ2Jyb3duMScpLG1haW49J1ByaWNlIHZzIFNxZnRfbGl2aW5nJykNCmBgYA0KVGhpcyBzY2F0dGVyIHBsb3QgY29tcGFyZXMgc3FmdF9saXZpbmcgYW5kIFByaWNlIG9mIHRoZSBob3VzZS4NCmBgYHtyfQ0Kc2NhdHRlci5zbW9vdGgoSG91c2VfUHJpY2VfS2FnZ2xlJHByaWNlLEhvdXNlX1ByaWNlX0thZ2dsZSRzcWZ0X2xvdCxjb2w9YygnZ3JlZW4nLCdibGFjaycpLG1haW49J1ByaWNlIHZzIFNxZnRfbG90JykNCmBgYA0KVGhpcyBzY2F0dGVyIHBsb3QgY29tcGFyZXMgc3FmdF9sb3QgYW5kIFByaWNlIG9mIHRoZSBob3VzZS4NCmBgYHtyfQ0Kc2NhdHRlci5zbW9vdGgoSG91c2VfUHJpY2VfS2FnZ2xlJHByaWNlLEhvdXNlX1ByaWNlX0thZ2dsZSRmbG9vcnMsY29sPWMoJ2dyZWVuJywnbWVkaXVtdmlvbGV0cmVkJyksbWFpbj0nUHJpY2UgdnMgRmxvb3JzJykNCmBgYA0KVGhpcyBzY2F0dGVyIHBsb3QgY29tcGFyZXMgRmxvb3JzIGFuZCBQcmljZSBvZiB0aGUgaG91c2UuDQpgYGB7cn0NCnBhcihtZnJvdz1jKDEsNSkpDQpzY2F0dGVyLnNtb290aChIb3VzZV9QcmljZV9LYWdnbGUkcHJpY2UsSG91c2VfUHJpY2VfS2FnZ2xlJGJhdGhyb29tcyxjb2w9YygnYnJvd240JywnZ3JlZW4nKSxtYWluPSdQcmljZSB2cyBCYXRocm9vbXMnKQ0Kc2NhdHRlci5zbW9vdGgoSG91c2VfUHJpY2VfS2FnZ2xlJHByaWNlLEhvdXNlX1ByaWNlX0thZ2dsZSRzcWZ0X2xpdmluZyxjb2w9YygnZ3JlZW4nLCdicm93bjEnKSxtYWluPSdQcmljZSB2cyBTcWZ0X2xpdmluZycpDQpzY2F0dGVyLnNtb290aChIb3VzZV9QcmljZV9LYWdnbGUkcHJpY2UsSG91c2VfUHJpY2VfS2FnZ2xlJHNxZnRfbG90LGNvbD1jKCdncmVlbicsJ2JsYWNrJyksbWFpbj0nUHJpY2UgdnMgU3FmdF9sb3QnKQ0Kc2NhdHRlci5zbW9vdGgoSG91c2VfUHJpY2VfS2FnZ2xlJHByaWNlLEhvdXNlX1ByaWNlX0thZ2dsZSRmbG9vcnMsY29sPWMoJ2dyZWVuJywnbWVkaXVtdmlvbGV0cmVkJyksbWFpbj0nUHJpY2UgdnMgRmxvb3JzJykNCmBgYA0KTm93IGlmIHdlIHdhbnQgdG8gc2VlIGFsbCB0aGUgZm91ciBncmFwaHMgaW4gYSBzaW5nbGUgc2NyZWVuIHdlIHVzZSBwYXIgZnVuY3Rpb24uIFdlIG1lbnRpb24gdGhlIHJvdyBudW1iZXIgYW5kIGNvbG91bW4gbnVtYmVyIGluIG1mcm93IGFuZCB3ZSBnZXQgYWxsIHRoZSBncmFwaHMgaW4gYSBzaW5nbGUgc2NyZWVuLg0KYGBge3J9DQpjb3IoSG91c2VfUHJpY2VfS2FnZ2xlKQ0KYGBgDQpOb3cgd2UgY2hlY2sgdGhlIGNvcnJyZWxhdGlvbiBmb3IgYWxsIHRoZSB2YXJpYWJsZXMgdG8gZGV0ZXJtaW5lIHRoZSBzdHJlbmd0aC4gDQpgYGB7cn0NCmxpYnJhcnkoY29ycnBsb3QpDQphPWNvcihIb3VzZV9QcmljZV9LYWdnbGUpDQphDQpjb3JycGxvdChhKQ0KYGBgDQpJbiBvcmRlciB0byBwbG90IHRoZSBjb3JyZWxhdGlvbiBjby1lZmZpY2llbnRzIHdlIGNhbGwgdGhlIGNvcnJwbG90IGZyb20gbGlicmFyeS4gV2UgYXNzaWduIGEgdmFyaWFibGUgdG8gY29ycnBsb3QgYW5kIHJ1biB0aGF0IHZhcmlhYmxlLg0KYGBge3J9DQpSZWdwcmljZT1sbShwcmljZX5iZWRyb29tcytiYXRocm9vbXMrc3FmdF9saXZpbmcrc3FmdF9sb3QrZmxvb3JzLGRhdGE9SG91c2VfUHJpY2VfS2FnZ2xlKQ0KUmVncHJpY2UNCmBgYA0KQWZ0ZXIgY2hlY2tpbmcgY29ycmVsYXRpb24gd2UgbW92ZSBvbiB0byByZWdyZXNzaW9uLiBUbyBwZXJmb3JtIHJlZ3Jlc3Npb24gd2UgZGVjbGFyZSBhIHZhcmlhYmxlIGFuZCB1c2UgbG0gZnVuY3Rpb24gYW5kIGZvcm0gdGhlIHJlZ3Jlc3Npb24gbW9kZWwuDQpgYGB7cn0NCnN1bW1hcnkoUmVncHJpY2UpDQpgYGANClN1bW1hcnkgb2YgdGhlIGRlY2xhcmVkIHZhcmlhYmxlIHdpbGwgZ2l2ZSB1cyB0aGUgcCB2YWx1ZSBmb3IgY29uc2lkZXJpbmcgb3VyIHZhcmlhYmxlcyBmb3IgdGhlIHJlZ3Jlc3Npb24gZXF1YXRpb24uDQpgYGB7cn0NCnJlZ2ZpbmFsPWxtKHByaWNlfmJlZHJvb21zK3NxZnRfbGl2aW5nK3NxZnRfbG90LGRhdGE9SG91c2VfUHJpY2VfS2FnZ2xlKQ0KcmVnZmluYWwNCmBgYA0KQWZ0ZXIgY2hlY2tpbmcgcCB2YWx1ZSB3ZSBoYXZlIG9ubHkgdHdvIHZhcmlhYmxlcyBzbyB3ZSBmb3JtIGEgbmV3IGVxYXV0aW9uIHdpdGggdGhvc2UgdHdvIHZhcmlhYmxlcy4NCmBgYHtyfQ0KbXlfcHJlZGljdGlvbl9wcmljZT1kYXRhLmZyYW1lKGJlZHJvb21zPTgsc3FmdF9saXZpbmc9NDU2MSxzcWZ0X2xvdD0zODc3KQ0KbXlfcHJlZGljdGlvbl9wcmljZT1wcmVkaWN0KHJlZ2ZpbmFsLG15X3ByZWRpY3Rpb25fcHJpY2UpDQpteV9wcmVkaWN0aW9uX3ByaWNlDQpgYGANCkZpbmFseSB3aXRoIG91ciBmb3JtZWQgcmVncmVzc2lvbiBlcXVhdGlvbiB3ZSBjYW4gcHJlZGljdCBmb3IgYW55IGdpdmVuIHZhbHVlLg0K