LBB 4

BACKGROUND

LBB Requirements

In making a report, don’t forget to cover the following:

  • Selection of variable targets depends on the perspective of the case you want to take
  • Data analysis and the process of selecting predictor / feature selection variables
  • Test the validity of the model
  • Model interpretations and recommendations related to the initial case

Case Study

I conducted a regression and analysis of exploratory data to gain insight into housing prices in relation to other attributes. And the dataset I got from kaggle is “Housing Prices”

Insight

The aim is to analyze the variable that affect home prices with other variables, that will be considered as factors that can affect prices.

DATA PREPARATION

Data Input

##   Area Garage FirePlace Baths White.Marble Black.Marble Indian.Marble Floors
## 1  164      2         0     2            0            1             0      0
## 2   84      2         0     4            0            0             1      1
## 3  190      2         4     4            1            0             0      0
## 4   75      2         4     4            0            0             1      1
## 5  148      1         4     2            1            0             0      1
## 6  124      3         3     3            0            1             0      1
##   City Solar Electric Fiber Glass.Doors Swiming.Pool Garden Prices
## 1    3     1        1     1           1            0      0  43800
## 2    2     0        0     0           1            1      1  37550
## 3    2     0        0     1           0            0      0  49500
## 4    1     1        1     1           1            1      1  50075
## 5    2     1        0     0           1            1      1  52400
## 6    1     0        0     1           1            1      1  54300

Colnames

##  [1] "Area"          "Garage"        "FirePlace"     "Baths"        
##  [5] "White.Marble"  "Black.Marble"  "Indian.Marble" "Floors"       
##  [9] "City"          "Solar"         "Electric"      "Fiber"        
## [13] "Glass.Doors"   "Swiming.Pool"  "Garden"        "Prices"

Chunk Commentary:

Structure

## 'data.frame':    500000 obs. of  16 variables:
##  $ Area         : int  164 84 190 75 148 124 58 249 243 242 ...
##  $ Garage       : int  2 2 2 2 1 3 1 2 1 1 ...
##  $ FirePlace    : int  0 0 4 4 4 3 0 1 0 2 ...
##  $ Baths        : int  2 4 4 4 2 3 2 1 2 4 ...
##  $ White.Marble : int  0 0 1 0 1 0 0 1 0 0 ...
##  $ Black.Marble : int  1 0 0 0 0 1 0 0 0 0 ...
##  $ Indian.Marble: int  0 1 0 1 0 0 1 0 1 1 ...
##  $ Floors       : int  0 1 0 1 1 1 0 1 1 0 ...
##  $ City         : int  3 2 2 1 2 1 3 1 1 2 ...
##  $ Solar        : int  1 0 0 1 1 0 0 0 0 1 ...
##  $ Electric     : int  1 0 0 1 0 0 1 1 0 0 ...
##  $ Fiber        : int  1 0 1 1 0 1 1 0 0 0 ...
##  $ Glass.Doors  : int  1 1 0 1 1 1 1 1 0 0 ...
##  $ Swiming.Pool : int  0 1 0 1 1 1 0 1 1 1 ...
##  $ Garden       : int  0 1 0 1 1 1 1 0 0 0 ...
##  $ Prices       : int  43800 37550 49500 50075 52400 54300 34400 50425 29575 22300 ...

Chunk Commentary: The data has 500000 rows and 16 columns. Our target variable is the price, and the rest others is predictior

Variable Description C001

The following is an explanation of the variables and their corresponding data types:

  • Area: What is the area of the Unit? | data type: integer
  • Garage: Is there a Garage in the Unit? | data type: integer
  • Fireplace: how much e a Fireplace in the Unit? | data type: integer
  • Bath: What is the amount of Bath? | data type: integer
  • White.Marble: Do you use White Marble? | data type: level of Factor
  • Black.Marble: Do you use Black Marble? | data type: level of Factor
  • Indian.Marble: Does it use Indian Marble? | data type: level of Factor
  • Floors: What is the Number of Floors? | data type: integer
  • City: Is the city in the unit? | data type: Factor
  • Solar: Do you use Solar in the Unit? | data type: boolean
  • Electric: Does Electricy use? | data type: boolean
  • Fiber: Does they use Fiber? data type: boolean
  • Glass.Doors: Do you use Glass Doors? | data type: boolean
  • Swiming.Pool: Do you use Swimming Pool? | data type: boolean
  • Garden: Is there a Garden in the unit? | data type: boolean
  • Prices: What is the unit price? ? data type: integer
## [1] 2 4 3 1 5

lets Check the coverted data type

## 'data.frame':    500000 obs. of  16 variables:
##  $ Area         : int  164 84 190 75 148 124 58 249 243 242 ...
##  $ Garage       : int  2 2 2 2 1 3 1 2 1 1 ...
##  $ FirePlace    : int  0 0 4 4 4 3 0 1 0 2 ...
##  $ Baths        : int  2 4 4 4 2 3 2 1 2 4 ...
##  $ White.Marble : int  0 0 1 0 1 0 0 1 0 0 ...
##  $ Black.Marble : int  1 0 0 0 0 1 0 0 0 0 ...
##  $ Indian.Marble: int  0 1 0 1 0 0 1 0 1 1 ...
##  $ Floors       : Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 1 2 2 1 ...
##  $ City         : int  3 2 2 1 2 1 3 1 1 2 ...
##  $ Solar        : Factor w/ 2 levels "no","yes": 2 1 1 2 2 1 1 1 1 2 ...
##  $ Electric     : Factor w/ 2 levels "no","yes": 2 1 1 2 1 1 2 2 1 1 ...
##  $ Fiber        : Factor w/ 2 levels "no","yes": 2 1 2 2 1 2 2 1 1 1 ...
##  $ Glass.Doors  : Factor w/ 2 levels "no","yes": 2 2 1 2 2 2 2 2 1 1 ...
##  $ Swiming.Pool : Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 1 2 2 2 ...
##  $ Garden       : Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 2 1 1 1 ...
##  $ Prices       : int  43800 37550 49500 50075 52400 54300 34400 50425 29575 22300 ...
##          Area        Garage     FirePlace         Baths  White.Marble 
##             0             0             0             0             0 
##  Black.Marble Indian.Marble        Floors          City         Solar 
##             0             0             0             0             0 
##      Electric         Fiber   Glass.Doors  Swiming.Pool        Garden 
##             0             0             0             0             0 
##        Prices 
##             0
  • Since Indian Marble, Black Marble and White Marble that should be a factor and we argue that affected Price in business prespective so we erase those variables.

handling Marbles:

##   Floors Fiber marbles Prices Glass.Doors City Baths FirePlace Garage Area
## 1     no   yes   Black  43800         yes    3     2         0      2  164
## 2    yes    no  Indian  37550         yes    2     4         0      2   84
## 3     no   yes   White  49500          no    2     4         4      2  190
## 4    yes   yes  Indian  50075         yes    1     4         4      2   75
## 5    yes    no   White  52400         yes    2     2         4      1  148
## 6    yes   yes   Black  54300         yes    1     3         3      3  124
##   Electric Swiming.Pool Garden
## 1      yes           no     no
## 2       no          yes    yes
## 3       no           no     no
## 4      yes          yes    yes
## 5       no          yes    yes
## 6       no          yes    yes

EXPLANATORY DATA ANALYSIS

Linearity Test

Exploratory data analysis is a phase where we explore the data variables, see if there are any pattern that can indicate any kind of correlation between variables.

Find the Pearson correlation between features.

## Warning in ggcorr(house, label = T, hjust = 1, layout.exp = 1): data in
## column(s) 'Floors', 'Fiber', 'marbles', 'Glass.Doors', 'City', 'Electric',
## 'Swiming.Pool', 'Garden' are not numeric and were ignored

- Price variable has correlation with : price has a low correlation as much as 0.1 with Baths, FirePlace, Garage, abd Area

## 'data.frame':    500000 obs. of  13 variables:
##  $ Floors      : Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 1 2 2 1 ...
##  $ Fiber       : Factor w/ 2 levels "no","yes": 2 1 2 2 1 2 2 1 1 1 ...
##  $ marbles     : chr  "Black" "Indian" "White" "Indian" ...
##  $ Prices      : int  43800 37550 49500 50075 52400 54300 34400 50425 29575 22300 ...
##  $ Glass.Doors : Factor w/ 2 levels "no","yes": 2 2 1 2 2 2 2 2 1 1 ...
##  $ City        : Factor w/ 3 levels "1","2","3": 3 2 2 1 2 1 3 1 1 2 ...
##  $ Baths       : int  2 4 4 4 2 3 2 1 2 4 ...
##  $ FirePlace   : int  0 0 4 4 4 3 0 1 0 2 ...
##  $ Garage      : int  2 2 2 2 1 3 1 2 1 1 ...
##  $ Area        : int  164 84 190 75 148 124 58 249 243 242 ...
##  $ Electric    : Factor w/ 2 levels "no","yes": 2 1 1 2 1 1 2 2 1 1 ...
##  $ Swiming.Pool: Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 1 2 2 2 ...
##  $ Garden      : Factor w/ 2 levels "no","yes": 1 2 1 2 2 2 2 1 1 1 ...

Data With Outlier

Chunk commentary:

  • price is distributed normaly
  • it is observed that the collection of random data from independent sources is distributed normally. We get a bell shape curve on plotting a graph.

MODELING

Train Test Splitting

Chunk commentary:

  • store data splitting in house_train and house_test

Choosen predictor

## 
## Call:
## lm(formula = Prices ~ Floors + Fiber + marbles + Glass.Doors + 
##     City + Baths + FirePlace + Electric, data = house_train)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4754.9 -1564.8    -5.1  1571.6  4754.2 
## 
## Coefficients:
##                 Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)    15771.815     13.559  1163.2 <0.0000000000000002 ***
## Floorsyes      14994.281      6.885  2177.9 <0.0000000000000002 ***
## Fiberyes       11749.701      6.885  1706.6 <0.0000000000000002 ***
## marblesIndian  -5004.175      8.433  -593.4 <0.0000000000000002 ***
## marblesWhite    9006.261      8.439  1067.3 <0.0000000000000002 ***
## Glass.Doorsyes  4437.526      6.885   644.5 <0.0000000000000002 ***
## City2           3497.283      8.434   414.7 <0.0000000000000002 ***
## City3           6986.189      8.434   828.4 <0.0000000000000002 ***
## Baths           1245.565      2.434   511.7 <0.0000000000000002 ***
## FirePlace        751.936      2.435   308.9 <0.0000000000000002 ***
## Electricyes     1253.480      6.885   182.1 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2177 on 399988 degrees of freedom
## Multiple R-squared:  0.9677, Adjusted R-squared:  0.9677 
## F-statistic: 1.199e+06 on 10 and 399988 DF,  p-value: < 0.00000000000000022

Interpretasi koefisien: Setiap kenaikan 1 nilai pada Bath maka price bertambah sebesar 1245.565

Setiap kenaikan 1 nilai pada FirePlace maka price berkurang sebesar 751.936

Setiap Unit yang memilik Floors maka price bertambah sebesar 14994.281

Setiap Unit yang memilik Marbles.Indian maka price berkurang sebesar 5004.175

Step wise predictor

All mode and none predictor Model

All predictor stored in all.model and no perdictable variable stored in none.model

## 
## Call:
## lm(formula = Prices ~ ., data = house_train)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -127.1 -124.7 -122.6  125.3  127.7 
## 
## Coefficients:
##                     Estimate   Std. Error    t value             Pr(>|t|)    
## (Intercept)      9625.454893     1.018735   9448.441 < 0.0000000000000002 ***
## Floorsyes       14999.402578     0.395286  37945.693 < 0.0000000000000002 ***
## Fiberyes        11750.084214     0.395292  29725.078 < 0.0000000000000002 ***
## marblesIndian   -5000.565594     0.484159 -10328.358 < 0.0000000000000002 ***
## marblesWhite     8999.135046     0.484486  18574.622 < 0.0000000000000002 ***
## Glass.Doorsyes   4450.043568     0.395291  11257.642 < 0.0000000000000002 ***
## City2            3500.104138     0.484229   7228.205 < 0.0000000000000002 ***
## City3            6999.621004     0.484211  14455.729 < 0.0000000000000002 ***
## Baths            1249.945704     0.139769   8942.942 < 0.0000000000000002 ***
## FirePlace         749.999688     0.139780   5365.575 < 0.0000000000000002 ***
## Garage           1500.253660     0.241904   6201.856 < 0.0000000000000002 ***
## Area               25.000557     0.002753   9080.103 < 0.0000000000000002 ***
## Electricyes      1250.542534     0.395284   3163.660 < 0.0000000000000002 ***
## Swiming.Poolyes    -0.034082     0.395289     -0.086              0.93129    
## Gardenyes          -1.246035     0.395291     -3.152              0.00162 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 125 on 399984 degrees of freedom
## Multiple R-squared:  0.9999, Adjusted R-squared:  0.9999 
## F-statistic: 2.684e+08 on 14 and 399984 DF,  p-value: < 0.00000000000000022

Backward model

## 
## Call:
## lm(formula = Prices ~ Floors + Fiber + marbles + Glass.Doors + 
##     City + Baths + FirePlace + Garage + Area + Electric + Garden, 
##     data = house_train)
## 
## Coefficients:
##    (Intercept)       Floorsyes        Fiberyes   marblesIndian    marblesWhite  
##       9625.438       14999.403       11750.084       -5000.566        8999.135  
## Glass.Doorsyes           City2           City3           Baths       FirePlace  
##       4450.044        3500.104        6999.621        1249.946         750.000  
##         Garage            Area     Electricyes       Gardenyes  
##       1500.254          25.001        1250.543          -1.246
## 
## Call:
## lm(formula = Prices ~ Floors + Fiber + marbles + Glass.Doors + 
##     City + Baths + FirePlace + Garage + Area + Electric + Garden, 
##     data = house_train)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -127.1 -124.7 -122.6  125.3  127.7 
## 
## Coefficients:
##                    Estimate   Std. Error    t value             Pr(>|t|)    
## (Intercept)     9625.438006     0.999730   9628.042 < 0.0000000000000002 ***
## Floorsyes      14999.402624     0.395285  37945.775 < 0.0000000000000002 ***
## Fiberyes       11750.084045     0.395287  29725.479 < 0.0000000000000002 ***
## marblesIndian  -5000.565586     0.484158 -10328.371 < 0.0000000000000002 ***
## marblesWhite    8999.135104     0.484484  18574.663 < 0.0000000000000002 ***
## Glass.Doorsyes  4450.043544     0.395290  11257.659 < 0.0000000000000002 ***
## City2           3500.104127     0.484228   7228.214 < 0.0000000000000002 ***
## City3           6999.620964     0.484210  14455.753 < 0.0000000000000002 ***
## Baths           1249.945686     0.139769   8942.962 < 0.0000000000000002 ***
## FirePlace        749.999688     0.139780   5365.581 < 0.0000000000000002 ***
## Garage          1500.253647     0.241904   6201.865 < 0.0000000000000002 ***
## Area              25.000557     0.002753   9080.118 < 0.0000000000000002 ***
## Electricyes     1250.542530     0.395283   3163.664 < 0.0000000000000002 ***
## Gardenyes         -1.246040     0.395290     -3.152              0.00162 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 125 on 399985 degrees of freedom
## Multiple R-squared:  0.9999, Adjusted R-squared:  0.9999 
## F-statistic: 2.891e+08 on 13 and 399985 DF,  p-value: < 0.00000000000000022

Forward model

forward model

## 
## Call:
## lm(formula = Prices ~ Floors + Fiber + marbles + City + Glass.Doors + 
##     Area + Baths + Garage + FirePlace + Electric + Garden, data = house_train)
## 
## Coefficients:
##    (Intercept)       Floorsyes        Fiberyes   marblesIndian    marblesWhite  
##       9625.438       14999.403       11750.084       -5000.566        8999.135  
##          City2           City3  Glass.Doorsyes            Area           Baths  
##       3500.104        6999.621        4450.044          25.001        1249.946  
##         Garage       FirePlace     Electricyes       Gardenyes  
##       1500.254         750.000        1250.543          -1.246
## 
## Call:
## lm(formula = Prices ~ Floors + Fiber + marbles + City + Glass.Doors + 
##     Area + Baths + Garage + FirePlace + Electric + Garden, data = house_train)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -127.1 -124.7 -122.6  125.3  127.7 
## 
## Coefficients:
##                    Estimate   Std. Error    t value             Pr(>|t|)    
## (Intercept)     9625.438006     0.999730   9628.042 < 0.0000000000000002 ***
## Floorsyes      14999.402624     0.395285  37945.775 < 0.0000000000000002 ***
## Fiberyes       11750.084045     0.395287  29725.479 < 0.0000000000000002 ***
## marblesIndian  -5000.565586     0.484158 -10328.371 < 0.0000000000000002 ***
## marblesWhite    8999.135104     0.484484  18574.663 < 0.0000000000000002 ***
## City2           3500.104127     0.484228   7228.214 < 0.0000000000000002 ***
## City3           6999.620964     0.484210  14455.753 < 0.0000000000000002 ***
## Glass.Doorsyes  4450.043544     0.395290  11257.659 < 0.0000000000000002 ***
## Area              25.000557     0.002753   9080.118 < 0.0000000000000002 ***
## Baths           1249.945686     0.139769   8942.962 < 0.0000000000000002 ***
## Garage          1500.253647     0.241904   6201.865 < 0.0000000000000002 ***
## FirePlace        749.999688     0.139780   5365.581 < 0.0000000000000002 ***
## Electricyes     1250.542530     0.395283   3163.664 < 0.0000000000000002 ***
## Gardenyes         -1.246040     0.395290     -3.152              0.00162 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 125 on 399985 degrees of freedom
## Multiple R-squared:  0.9999, Adjusted R-squared:  0.9999 
## F-statistic: 2.891e+08 on 13 and 399985 DF,  p-value: < 0.00000000000000022

Both model

## 
## Call:
## lm(formula = Prices ~ Floors + Fiber + marbles + Glass.Doors + 
##     City + Baths + FirePlace + Garage + Area + Electric + Garden, 
##     data = house_train)
## 
## Coefficients:
##    (Intercept)       Floorsyes        Fiberyes   marblesIndian    marblesWhite  
##       9625.438       14999.403       11750.084       -5000.566        8999.135  
## Glass.Doorsyes           City2           City3           Baths       FirePlace  
##       4450.044        3500.104        6999.621        1249.946         750.000  
##         Garage            Area     Electricyes       Gardenyes  
##       1500.254          25.001        1250.543          -1.246
## 
## Call:
## lm(formula = Prices ~ Floors + Fiber + marbles + Glass.Doors + 
##     City + Baths + FirePlace + Garage + Area + Electric + Garden, 
##     data = house_train)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -127.1 -124.7 -122.6  125.3  127.7 
## 
## Coefficients:
##                    Estimate   Std. Error    t value             Pr(>|t|)    
## (Intercept)     9625.438006     0.999730   9628.042 < 0.0000000000000002 ***
## Floorsyes      14999.402624     0.395285  37945.775 < 0.0000000000000002 ***
## Fiberyes       11750.084045     0.395287  29725.479 < 0.0000000000000002 ***
## marblesIndian  -5000.565586     0.484158 -10328.371 < 0.0000000000000002 ***
## marblesWhite    8999.135104     0.484484  18574.663 < 0.0000000000000002 ***
## Glass.Doorsyes  4450.043544     0.395290  11257.659 < 0.0000000000000002 ***
## City2           3500.104127     0.484228   7228.214 < 0.0000000000000002 ***
## City3           6999.620964     0.484210  14455.753 < 0.0000000000000002 ***
## Baths           1249.945686     0.139769   8942.962 < 0.0000000000000002 ***
## FirePlace        749.999688     0.139780   5365.581 < 0.0000000000000002 ***
## Garage          1500.253647     0.241904   6201.865 < 0.0000000000000002 ***
## Area              25.000557     0.002753   9080.118 < 0.0000000000000002 ***
## Electricyes     1250.542530     0.395283   3163.664 < 0.0000000000000002 ***
## Gardenyes         -1.246040     0.395290     -3.152              0.00162 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 125 on 399985 degrees of freedom
## Multiple R-squared:  0.9999, Adjusted R-squared:  0.9999 
## F-statistic: 2.891e+08 on 13 and 399985 DF,  p-value: < 0.00000000000000022

Prediction

Based on evaluation test that we have, we before Continue to “Checking Assumption” Choosen Predictor has a higher Error in RMSE, MAE, and MSE

Stepwise Predictor RMSE C001

Comparing Adjsted R squared and RMSE/MAE/MSE C001

Adjusted R squared

We found out Stepwise model is a better model considering adj.r.squared

For the model suggested by stepwise (both backward, forward, and both) there seems to be an overfitting case because the R-squares are almost perfect 100%

Choosen Predictor Model

## [1] 0.9677117

backward.model

## [1] 0.9998936

forward.model

## [1] 0.9998936

both.model

## [1] 0.9998936

CHECKING ASSUMPTIONS

Normality

What if data is not distributed normal like in stepwise model?

  • find new model base on business insight, check it until pass assumption and make sure residuals distributed normally
  • add more data

Expectation when making linear regression models, the resulting errors are normally distributed. This means that many errors gather around the number 0. To test this assumption can be done: Visualization of residual histograms, using the hist () function.

In the normality test, the distribution of the residuals stepwise model is really abnormal because the distribution is outside point 0, our.model is better than that recommended by stepwise, because it fulfills all linear regression assumptions.

Saphiro Test’s cannot be used since the the sample size is more than 5000

## Warning in ks.test(our.model$residuals, "pnorm", mean =
## mean(our.model$residuals), : ties should not be present for the Kolmogorov-
## Smirnov test
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  our.model$residuals
## D = 0.021994, p-value < 0.00000000000000022
## alternative hypothesis: two-sided

backward model

In the normality test, the distribution of the residuals stepwise model is really abnormal because the distribution is outside point 0

H0: Residual spreads normally H1: Residuals do not spread normally

if p-value <alpha (0.05) then reject h0 Conclusion reject H0 residuals are declared not normal when it is not p-value> 0.05 (assumptions are not met)

### forward model In the normality test, the distribution of the residuals stepwise model is really abnormal because the distribution is outside point 0

H0: Residual spreads normally H1: Residuals do not spread normally

if p-value <alpha (0.05) then reject h0 Coclusion: reject H0 residuals are declared not normal when it is not p-value> 0.05 (assumptions are not met)

Both model

H0: Residual spreads normally H1: Residuals do not spread normally

if p-value <alpha (0.05) then reject h0 conclusion: reject H0 residuals are declared not normal when it is not p-value> 0.05 (assumptions are not met)

Choosen Predictor

H0: Residual spreads normally H1: Residuals do not spread normally

if p-value <alpha (0.05) then reject h0

Conclusion: Failed to reject H0 residuals are declared normal when it is not p-value> 0.05 (assumptions are met)

## Warning in ks.test(our.model$residuals, "pnorm", mean =
## mean(our.model$residuals), : ties should not be present for the Kolmogorov-
## Smirnov test
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  our.model$residuals
## D = 0.021994, p-value < 0.00000000000000022
## alternative hypothesis: two-sided

Homocedasticity

With Breusch-Pagan from the lmtest package Breusch-Pagan hypothesis test: (the expectation is pvalue> alpha) H0: Variance error spreads constant (Homoscedasticity) H1: Variance error spreads is not constant / forming pattern (Heteroscedasticity)

Conclusion three Models Failed to reject H0 means ALL three models is homocedasticity

## 
##  studentized Breusch-Pagan test
## 
## data:  our.model
## BP = 6.7316, df = 10, p-value = 0.7505
## 
##  studentized Breusch-Pagan test
## 
## data:  backward.model
## BP = 10.455, df = 13, p-value = 0.6563
## 
##  studentized Breusch-Pagan test
## 
## data:  forward.model
## BP = 10.455, df = 13, p-value = 0.6563
## 
##  studentized Breusch-Pagan test
## 
## data:  both.model
## BP = 10.455, df = 13, p-value = 0.6563

Multicolinearity

multicolinarity: Multicollinearity exists whenever an independent variable is highly correlated with one or more of the other independent variables in a multiple regression equation.

When the VIF value is more than 10, it means multicollinearity. hopes to get VIF <10

Choosen Model

##                 GVIF Df GVIF^(1/(2*Df))
## Floors      1.000013  1        1.000007
## Fiber       1.000028  1        1.000014
## marbles     1.000011  2        1.000003
## Glass.Doors 1.000020  1        1.000010
## City        1.000027  2        1.000007
## Baths       1.000026  1        1.000013
## FirePlace   1.000012  1        1.000006
## Electric    1.000008  1        1.000004

Stepwisemodel

##                 GVIF Df GVIF^(1/(2*Df))
## Floors      1.000019  1        1.000009
## Fiber       1.000028  1        1.000014
## marbles     1.000028  2        1.000007
## City        1.000050  2        1.000013
## Glass.Doors 1.000046  1        1.000023
## Area        1.000032  1        1.000016
## Baths       1.000044  1        1.000022
## Garage      1.000026  1        1.000013
## FirePlace   1.000015  1        1.000007
## Electric    1.000009  1        1.000005
## Garden      1.000043  1        1.000021
##                 GVIF Df GVIF^(1/(2*Df))
## Floors      1.000019  1        1.000009
## Fiber       1.000028  1        1.000014
## marbles     1.000028  2        1.000007
## Glass.Doors 1.000046  1        1.000023
## City        1.000050  2        1.000013
## Baths       1.000044  1        1.000022
## FirePlace   1.000015  1        1.000007
## Garage      1.000026  1        1.000013
## Area        1.000032  1        1.000016
## Electric    1.000009  1        1.000005
## Garden      1.000043  1        1.000021
##                 GVIF Df GVIF^(1/(2*Df))
## Floors      1.000019  1        1.000009
## Fiber       1.000028  1        1.000014
## marbles     1.000028  2        1.000007
## City        1.000050  2        1.000013
## Glass.Doors 1.000046  1        1.000023
## Area        1.000032  1        1.000016
## Baths       1.000044  1        1.000022
## Garage      1.000026  1        1.000013
## FirePlace   1.000015  1        1.000007
## Electric    1.000009  1        1.000005
## Garden      1.000043  1        1.000021

Chunk Commentary:

  • There is no Multicolinearity in this test

CONCLUSION

From the results of the model created and after conducting several evaluation tests, the model is formed from stepwise regression backward, forward, and both meet the multicolinearity, homocedasticity test, but do not meet the normality test. After choosing The predictor sees the correlation of the linear test and coupled with the insight of the business that we have, finally we choose the predictor such as marbles, Fiber and Floors.

some business Recomendation From Choosen Model

  • Indian Marbles make Price cheaper than the other Marbles
  • Floor and Fiber make Price way more expensive than other Variables
  • FirePlace is not expensive than other Variables.

Lucky Putranto

15/3/2020