Load Datasets

This study looked into assessing the heating load and cooling load requirements of buildings (that is, energy efficiency) as a function of building parameters.

library(readxl)
## Warning: package 'readxl' was built under R version 4.3.2
Energy<- read_xlsx("ENB2012_data.xlsx")
head(Energy)
## # A tibble: 6 × 10
##      X1    X2    X3    X4    X5    X6    X7    X8    Y1    Y2
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1  0.98  514.  294   110.     7     2     0     0  15.6  21.3
## 2  0.98  514.  294   110.     7     3     0     0  15.6  21.3
## 3  0.98  514.  294   110.     7     4     0     0  15.6  21.3
## 4  0.98  514.  294   110.     7     5     0     0  15.6  21.3
## 5  0.9   564.  318.  122.     7     2     0     0  20.8  28.3
## 6  0.9   564.  318.  122.     7     3     0     0  21.5  25.4

This dataset contains 10 Variables, 8 Predictors and 2 Response: - X1: Relative Compactness - X2: Surface Area - X3: Wall Area - X4: Roff Area - X5: Overall Height - X6: Orientation - X7: Glazing Area - X8: Glazing Area Distribution - Y1: Heating Load - Y2: Cooling Load

Fit Regression Model

Fit two multiple Regression to predict heating load and cooling load, using all 8 predictors, write up the linear expression and explain the coefficient in the models.

  • General Multiple Regression Model: \[ Y = \beta_0 + \beta_1X_1 + \beta_2X_2 + \ldots + \beta_kX_k + \epsilon \]
HL<-lm(formula = Y1~.-Y2,data = Energy)
summary(HL)
## 
## Call:
## lm(formula = Y1 ~ . - Y2, data = Energy)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.8965 -1.3196 -0.0252  1.3532  7.7052 
## 
## Coefficients: (1 not defined because of singularities)
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  84.013418  19.033613   4.414 1.16e-05 ***
## X1          -64.773432  10.289448  -6.295 5.19e-10 ***
## X2           -0.087289   0.017075  -5.112 4.04e-07 ***
## X3            0.060813   0.006648   9.148  < 2e-16 ***
## X4                  NA         NA      NA       NA    
## X5            4.169954   0.337990  12.338  < 2e-16 ***
## X6           -0.023330   0.094705  -0.246  0.80548    
## X7           19.932736   0.813986  24.488  < 2e-16 ***
## X8            0.203777   0.069918   2.915  0.00367 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.934 on 760 degrees of freedom
## Multiple R-squared:  0.9162, Adjusted R-squared:  0.9154 
## F-statistic:  1187 on 7 and 760 DF,  p-value: < 2.2e-16
CL<-lm(formula = Y2~.-Y1,data = Energy)
summary(CL)
## 
## Call:
## lm(formula = Y2 ~ . - Y1, data = Energy)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.6940 -1.5606 -0.2668  1.3968 11.1775 
## 
## Coefficients: (1 not defined because of singularities)
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  97.245749  20.764711   4.683 3.34e-06 ***
## X1          -70.787707  11.225269  -6.306 4.85e-10 ***
## X2           -0.088245   0.018628  -4.737 2.59e-06 ***
## X3            0.044682   0.007253   6.161 1.17e-09 ***
## X4                  NA         NA      NA       NA    
## X5            4.283843   0.368730  11.618  < 2e-16 ***
## X6            0.121510   0.103318   1.176    0.240    
## X7           14.717068   0.888018  16.573  < 2e-16 ***
## X8            0.040697   0.076277   0.534    0.594    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.201 on 760 degrees of freedom
## Multiple R-squared:  0.8878, Adjusted R-squared:  0.8868 
## F-statistic: 859.1 on 7 and 760 DF,  p-value: < 2.2e-16

Hypothesis Testing

In model above, we have some varaible that not significant, so I want to reduce to increase goodness of model. Which Predictor we can reject the null hypothesis?

HL2<-lm(formula = Y1~.-Y2-X8-X6-X4,data = Energy)
summary(HL2)
## 
## Call:
## lm(formula = Y1 ~ . - Y2 - X8 - X6 - X4, data = Energy)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.3862  -1.3667  -0.0142   1.3162   7.5555 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  84.386471  19.111765   4.415 1.15e-05 ***
## X1          -64.773432  10.333611  -6.268 6.11e-10 ***
## X2           -0.087289   0.017149  -5.090 4.51e-07 ***
## X3            0.060813   0.006676   9.109  < 2e-16 ***
## X5            4.169954   0.339441  12.285  < 2e-16 ***
## X7           20.437968   0.798727  25.588  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.947 on 762 degrees of freedom
## Multiple R-squared:  0.9153, Adjusted R-squared:  0.9147 
## F-statistic:  1646 on 5 and 762 DF,  p-value: < 2.2e-16
CL2<-lm(formula = Y2~.-Y1-X8-X6-X4,data = Energy)
summary(CL2)
## 
## Call:
## lm(formula = Y2 ~ . - Y1 - X8 - X6 - X4, data = Energy)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.7240 -1.6017 -0.2631  1.3417 11.3251 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  97.761848  20.756339   4.710 2.94e-06 ***
## X1          -70.787707  11.222822  -6.307 4.80e-10 ***
## X2           -0.088245   0.018624  -4.738 2.57e-06 ***
## X3            0.044682   0.007251   6.162 1.16e-09 ***
## X5            4.283843   0.368650  11.620  < 2e-16 ***
## X7           14.817971   0.867458  17.082  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.2 on 762 degrees of freedom
## Multiple R-squared:  0.8876, Adjusted R-squared:  0.8868 
## F-statistic:  1203 on 5 and 762 DF,  p-value: < 2.2e-16

Conclusion

The Variable that are Significantly Effect on Energy Efficiency rather in Heating LOad (HL) or Cooling Load (CL) are Relative Compactness (X1), Surface Area (X2), Wall Area (X3), Overall Height (X5), Glazing Area (X7), The relation of each Variable with Heating LOad (HL) or Cooling Load (CL) are determine by the sign of its estimate.