Week 3 Solutions

This is the published version for the Week 3 assignment

Question - 4 Meaningful question for analysis: Please state at the beginning a meaningful question for analysis. Use the first three steps and anything else that would be helpful to answer the question you are posing from the data set you chose. Please write a brief conclusion paragraph in R markdown at the end.

I have selected a data set which reports a simulated building heating and cooling load based on a varying set of inputs (https://archive.ics.uci.edu/ml/datasets/Energy+efficiency). To provide some background, I am a mechanical engineer who specializes in building energy simulation which is why I found this data set particularly interesting. Since I haven’t seen the actually energy simulation, i’m unsure of many input parameters which could greatly affect the results. Therefore, I’ll be treating this as just a data set with numbers rather than a summarey on modeling analysis.

It appears this data was constructed in order to determine how two response variables, cooling load and heating load, will vary given changes in a number of tracked predictor variables. The main question I set out to answer is, which variables have the greatest impact on the cooling and heating loads for the modeled buildings?

First, I will take the data and import it into data tables. During my analysis, it seemed like some plotting functions (e.g. geom_histogram) did not work well wtih numberical and categorical data. To get the best plots possible, I created two data tables; one where all the colums are character type, and the other with numerical types where applicable.

eedataurl <- "https://github.com/john-grando/Masters/raw/master/Workshop/R/Week3/assignment/ENB2012_data.csv"
eedata <- read.csv(file = eedataurl, 
                   header = TRUE, sep <- ",", 
                   #colClasses = c("numeric","numeric","numeric","numeric", "numeric", "character", "numeric", "character", "numeric", "numeric"),
                   colClasses = c("character","character","character","character", "character", "character", "character", "character", "numeric", "numeric"),
                   col.names = c("relative.compactness","surface.area","wall.area","roof.area","overall.height","orientation","glazing.area","glazing.area.distribution","heating.load","cooling.load"), 
                   stringsAsFactors = FALSE)
eedatanumeric <- read.csv(file = eedataurl, 
                          header = TRUE, sep <- ",", 
                          colClasses = c("numeric","numeric","numeric","numeric","numeric","character","numeric","character","numeric","numeric"),
                          col.names = c("relative.compactness","surface.area","wall.area","roof.area","overall.height","orientation","glazing.area","glazing.area.distribution","heating.load","cooling.load"), 
                          stringsAsFactors = FALSE)

Question 2 - Data wrangling: Please perform some basic transformations. They will need to make sense but could include column renaming, creating a subset of the data, replacing values, or creating new columns with derived data (for example – if it makes sense you could sum two columns together)

From the data provided, it appears that glazing area is actually the glazing ratio. We can then calculate an extra column which gives the actual glazing area.

eedata$glazing.area.actual <- formatC(eedatanumeric$glazing.area * eedatanumeric$wall.area)
eedatanumeric$glazing.area.actual <- eedatanumeric$glazing.area * eedatanumeric$wall.area

Question 1 - Data Exploration: This should include summary statistics, means, medians, quartiles, or any other relevant information about the data set. Please include some conclusions in the R Markdown text.

To start, I take a look at the histograms of the two response variables in this study (cooling load and heating load)

require(ggplot2)
## Loading required package: ggplot2
ggplot(eedata, aes(x = cooling.load)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

ggplot(eedata, aes(x = heating.load)) + geom_histogram(binwidth = 1) + labs(x = "heating load")

Upon inspection, it appears both of these response variables are bimodal. Therefore, I will check whether there are any inputs influencing this shape.

First, i’ll check categorical fields of orientation and glazing area distribution since a linear regression wouldn’t provide any meaningful results

ggplot(eedata, aes(cooling.load, fill = orientation)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

ggplot(eedata, aes(cooling.load, fill = glazing.area.distribution)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

ggplot(eedata, aes(heating.load, fill = orientation)) + geom_histogram(binwidth = 1) + labs(x = "heating load")

ggplot(eedata, aes(heating.load, fill = glazing.area.distribution)) + geom_histogram(binwidth = 1) + labs(x = "heating load")

It does not appear that these attributes are influencing the bimodal nature of the cooling or heating response variables.

I then start with checking the numeric fields in reference to the cooling response variable. For the following steps, I will be using the lm() function to check for correlation. I realize that a lot or results show weak r squared values but i’m using more of a check rather than a proof of strong relationships.

testlm <- lm(cooling.load ~ relative.compactness, data = eedatanumeric)
summary(testlm)
## 
## Call:
## lm(formula = cooling.load ~ relative.compactness, data = eedatanumeric)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.571  -5.632  -1.233   3.379  21.968 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -19.008      1.938  -9.809   <2e-16 ***
## relative.compactness   57.051      2.512  22.710   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.359 on 766 degrees of freedom
## Multiple R-squared:  0.4024, Adjusted R-squared:  0.4016 
## F-statistic: 515.8 on 1 and 766 DF,  p-value: < 2.2e-16
ggplot(eedata, aes(x = cooling.load, fill = relative.compactness)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

testlm <- lm(cooling.load ~ surface.area, data = eedatanumeric)
summary(testlm)
## 
## Call:
## lm(formula = cooling.load ~ surface.area, data = eedatanumeric)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.6843  -5.3800  -0.5586   3.7048  20.9195 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  73.410157   1.955287   37.54   <2e-16 ***
## surface.area -0.072684   0.002886  -25.18   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.041 on 766 degrees of freedom
## Multiple R-squared:  0.4529, Adjusted R-squared:  0.4522 
## F-statistic: 634.2 on 1 and 766 DF,  p-value: < 2.2e-16
ggplot(eedata, aes(x = cooling.load, fill = surface.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

testlm <- lm(cooling.load ~ wall.area, data = eedatanumeric)
summary(testlm)
## 
## Call:
## lm(formula = cooling.load ~ wall.area, data = eedatanumeric)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.111  -6.664  -0.909   7.331  21.160 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -5.076775   2.290183  -2.217   0.0269 *  
## wall.area    0.093138   0.007124  13.074   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.608 on 766 degrees of freedom
## Multiple R-squared:  0.1824, Adjusted R-squared:  0.1814 
## F-statistic: 170.9 on 1 and 766 DF,  p-value: < 2.2e-16
ggplot(eedata, aes(x = cooling.load, fill = wall.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

testlm <- lm(cooling.load ~ glazing.area, data = eedatanumeric)
summary(testlm)
## 
## Call:
## lm(formula = cooling.load ~ glazing.area, data = eedatanumeric)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -12.462  -9.049  -1.536   8.127  21.151 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   21.1148     0.6803  31.036  < 2e-16 ***
## glazing.area  14.8180     2.5240   5.871 6.46e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.312 on 766 degrees of freedom
## Multiple R-squared:  0.04306,    Adjusted R-squared:  0.04181 
## F-statistic: 34.47 on 1 and 766 DF,  p-value: 6.457e-09
ggplot(eedata, aes(x = cooling.load, fill = glazing.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

testlm <- lm(cooling.load ~ glazing.area.actual, data = eedatanumeric)
summary(testlm)
## 
## Call:
## lm(formula = cooling.load ~ glazing.area.actual, data = eedatanumeric)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.290  -8.522  -1.286   7.942  21.364 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         19.736783   0.645142  30.593   <2e-16 ***
## glazing.area.actual  0.064984   0.007445   8.728   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.079 on 766 degrees of freedom
## Multiple R-squared:  0.09046,    Adjusted R-squared:  0.08927 
## F-statistic: 76.18 on 1 and 766 DF,  p-value: < 2.2e-16
ggplot(eedata, aes(x = cooling.load, fill = glazing.area.actual)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

While some of these show a correlation between surface properties and cooling load, the overall height and roof area are clearly the dominant factors

testlm <- lm(cooling.load ~ overall.height, data = eedatanumeric)
summary(testlm)
## 
## Call:
## lm(formula = cooling.load ~ overall.height, data = eedatanumeric)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.9441  -2.3539  -0.2664   2.0386  14.9259 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -0.96122    0.48283  -1.991   0.0469 *  
## overall.height  4.86647    0.08725  55.777   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.231 on 766 degrees of freedom
## Multiple R-squared:  0.8024, Adjusted R-squared:  0.8022 
## F-statistic:  3111 on 1 and 766 DF,  p-value: < 2.2e-16
ggplot(eedata, aes(x = cooling.load, fill = overall.height)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

ggplot(eedata, aes(y = cooling.load, x = overall.height)) + geom_boxplot() + labs(x = "overall height", y = "cooling load")

testlm <- lm(cooling.load ~ roof.area, data = eedatanumeric)
summary(testlm)
## 
## Call:
## lm(formula = cooling.load ~ roof.area, data = eedatanumeric)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.3129  -2.4653  -0.6178   2.4377  18.0638 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 56.672891   0.701905   80.74   <2e-16 ***
## roof.area   -0.181678   0.003851  -47.18   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.817 on 766 degrees of freedom
## Multiple R-squared:  0.744,  Adjusted R-squared:  0.7437 
## F-statistic:  2226 on 1 and 766 DF,  p-value: < 2.2e-16
ggplot(eedata, aes(x = cooling.load, fill = roof.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

ggplot(eedata, aes(y = cooling.load, x = roof.area)) + geom_boxplot() + labs(x = "roof area", y = "cooling load")

Next, I test the heating load

testlm <- lm(heating.load ~ relative.compactness, data = eedatanumeric)
summary(testlm)
## 
## Call:
## lm(formula = heating.load ~ relative.compactness, data = eedatanumeric)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -19.569  -6.332  -1.028   3.393  19.259 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           -23.053      2.081  -11.08   <2e-16 ***
## relative.compactness   59.359      2.698   22.00   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.904 on 766 degrees of freedom
## Multiple R-squared:  0.3872, Adjusted R-squared:  0.3864 
## F-statistic:   484 on 1 and 766 DF,  p-value: < 2.2e-16
ggplot(eedata, aes(x = heating.load, fill = relative.compactness)) + geom_histogram(binwidth = 1) + labs(x = "heating load")

testlm <- lm(heating.load ~ surface.area, data = eedatanumeric)
summary(testlm)
## 
## Call:
## lm(formula = heating.load ~ surface.area, data = eedatanumeric)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -18.609  -5.524  -1.300   3.529  18.176 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  72.945382   2.111062   34.55   <2e-16 ***
## surface.area -0.075387   0.003116  -24.19   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.602 on 766 degrees of freedom
## Multiple R-squared:  0.4331, Adjusted R-squared:  0.4324 
## F-statistic: 585.3 on 1 and 766 DF,  p-value: < 2.2e-16
ggplot(eedata, aes(x = heating.load, fill = surface.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load")

testlm <- lm(heating.load ~ wall.area, data = eedatanumeric)
summary(testlm)
## 
## Call:
## lm(formula = heating.load ~ wall.area, data = eedatanumeric)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.0213  -7.3937  -0.4882   7.5728  18.2107 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -11.259633   2.391321  -4.709 2.96e-06 ***
## wall.area     0.105390   0.007439  14.168  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.988 on 766 degrees of freedom
## Multiple R-squared:  0.2076, Adjusted R-squared:  0.2066 
## F-statistic: 200.7 on 1 and 766 DF,  p-value: < 2.2e-16
ggplot(eedata, aes(x = heating.load, fill = wall.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load")

testlm <- lm(heating.load ~ glazing.area, data = eedatanumeric)
summary(testlm)
## 
## Call:
## lm(formula = heating.load ~ glazing.area, data = eedatanumeric)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.272  -9.193  -3.054   7.253  17.699 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.5171     0.7103  24.662  < 2e-16 ***
## glazing.area  20.4379     2.6351   7.756  2.8e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.722 on 766 degrees of freedom
## Multiple R-squared:  0.07281,    Adjusted R-squared:  0.0716 
## F-statistic: 60.16 on 1 and 766 DF,  p-value: 2.796e-14
ggplot(eedata, aes(x = heating.load, fill = glazing.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load")

testlm <- lm(heating.load ~ glazing.area.actual, data = eedatanumeric)
summary(testlm)
## 
## Call:
## lm(formula = heating.load ~ glazing.area.actual, data = eedatanumeric)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.340  -9.141  -1.956   7.972  18.367 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         15.990050   0.666772   23.98   <2e-16 ***
## glazing.area.actual  0.084625   0.007695   11.00   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.383 on 766 degrees of freedom
## Multiple R-squared:  0.1364, Adjusted R-squared:  0.1352 
## F-statistic: 120.9 on 1 and 766 DF,  p-value: < 2.2e-16
ggplot(eedata, aes(x = heating.load, fill = glazing.area.actual)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

Similaraly, the heating response variable is greatly impacted by the overall height and roof area.

testlm <- lm(heating.load ~ overall.height, data = eedatanumeric)
summary(testlm)
## 
## Call:
## lm(formula = heating.load ~ overall.height, data = eedatanumeric)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.7259  -2.5929  -0.3085   2.0015  11.8241 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -4.59885    0.52661  -8.733   <2e-16 ***
## overall.height  5.12496    0.09516  53.857   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.615 on 766 degrees of freedom
## Multiple R-squared:  0.7911, Adjusted R-squared:  0.7908 
## F-statistic:  2901 on 1 and 766 DF,  p-value: < 2.2e-16
ggplot(eedata, aes(x = heating.load, fill = overall.height)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

ggplot(eedata, aes(y = heating.load, x = overall.height)) + geom_boxplot() + labs(x = "overall height", y = "cooling load")

testlm <- lm(heating.load ~ roof.area, data = eedatanumeric)
summary(testlm)
## 
## Call:
## lm(formula = heating.load ~ roof.area, data = eedatanumeric)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -19.5327  -2.6392  -0.3191   2.4997  15.0930 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 56.309643   0.746268   75.45   <2e-16 ***
## roof.area   -0.192535   0.004094  -47.03   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.121 on 766 degrees of freedom
## Multiple R-squared:  0.7427, Adjusted R-squared:  0.7424 
## F-statistic:  2212 on 1 and 766 DF,  p-value: < 2.2e-16
ggplot(eedata, aes(x = heating.load, fill = roof.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load")

ggplot(eedata, aes(y = heating.load, x = roof.area)) + geom_boxplot() + labs(x = "overall height", y = "heating load")

However, we see that only one roof area is applied to buildings with a 3.50 overall height and it appears to correlate to a much lower heating and cooling load

ggplot(eedata, aes(x = cooling.load, fill = roof.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load") + facet_wrap(~overall.height)

ggplot(eedata, aes(x = heating.load, fill = roof.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load") + facet_wrap(~overall.height)

So let’s split this data into two new tables where one is for 3.50 tall buildings and the other is for 7.0 tall buildings; I’m assuming the unit is meters from here on out.

eedata3.50 <- subset(eedata, eedata$overall.height == "3.50")
eedata3.50numeric <- subset(eedatanumeric, eedata$overall.height < 6)
eedata7.00 <- subset(eedata, eedata$overall.height == "7.00")
eedata7.00numeric <- subset(eedatanumeric, eedata$overall.height > 4)

Now we can look at the 3.5m tall building histograms

ggplot(eedata3.50, aes(x = cooling.load)) + geom_histogram(binwidth = 1) + labs(x = "cooling load 3.5m")

ggplot(eedata3.50, aes(x = heating.load)) + geom_histogram(binwidth = 1) + labs(x = "heating load 3.5m")

Next, let’s re-run our cooling analysis

testlm <- lm(cooling.load ~ relative.compactness, data = eedata3.50numeric)
summary(testlm)
## 
## Call:
## lm(formula = cooling.load ~ relative.compactness, data = eedata3.50numeric)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.3043 -1.4466 -0.2292  1.6271  5.8349 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            31.271      1.886  16.582  < 2e-16 ***
## relative.compactness  -22.463      2.782  -8.075 8.93e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.24 on 382 degrees of freedom
## Multiple R-squared:  0.1458, Adjusted R-squared:  0.1436 
## F-statistic:  65.2 on 1 and 382 DF,  p-value: 8.926e-15
ggplot(eedata3.50, aes(x = cooling.load, fill = relative.compactness)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

testlm <- lm(cooling.load ~ surface.area, data = eedata3.50numeric)
summary(testlm)
## 
## Call:
## lm(formula = cooling.load ~ surface.area, data = eedata3.50numeric)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.3439 -1.4302 -0.1939  1.7036  5.8711 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.059885   2.054806   0.029    0.977    
## surface.area 0.021427   0.002746   7.804 5.79e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.251 on 382 degrees of freedom
## Multiple R-squared:  0.1375, Adjusted R-squared:  0.1353 
## F-statistic: 60.91 on 1 and 382 DF,  p-value: 5.789e-14
ggplot(eedata3.50, aes(x = cooling.load, fill = surface.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

testlm <- lm(cooling.load ~ wall.area, data = eedata3.50numeric)
summary(testlm)
## 
## Call:
## lm(formula = cooling.load ~ wall.area, data = eedata3.50numeric)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.3439 -1.4302 -0.1939  1.7036  5.8711 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 9.509323   0.848628  11.206  < 2e-16 ***
## wall.area   0.021427   0.002746   7.804 5.79e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.251 on 382 degrees of freedom
## Multiple R-squared:  0.1375, Adjusted R-squared:  0.1353 
## F-statistic: 60.91 on 1 and 382 DF,  p-value: 5.789e-14
ggplot(eedata3.50, aes(x = cooling.load, fill = wall.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

testlm <- lm(cooling.load ~ glazing.area, data = eedata3.50numeric)
summary(testlm)
## 
## Call:
## lm(formula = cooling.load ~ glazing.area, data = eedata3.50numeric)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2414 -1.1214 -0.6340 -0.0128  4.9535 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   13.5950     0.2037   66.75   <2e-16 ***
## glazing.area  10.5660     0.7557   13.98   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.971 on 382 degrees of freedom
## Multiple R-squared:  0.3385, Adjusted R-squared:  0.3368 
## F-statistic: 195.5 on 1 and 382 DF,  p-value: < 2.2e-16
ggplot(eedata3.50, aes(x = cooling.load, fill = glazing.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

testlm <- lm(cooling.load ~ glazing.area.actual, data = eedata3.50numeric)
summary(testlm)
## 
## Call:
## lm(formula = cooling.load ~ glazing.area.actual, data = eedata3.50numeric)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8375 -0.9960 -0.5046 -0.0055  4.7867 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         13.432060   0.186494   72.02   <2e-16 ***
## glazing.area.actual  0.036772   0.002238   16.43   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.856 on 382 degrees of freedom
## Multiple R-squared:  0.414,  Adjusted R-squared:  0.4125 
## F-statistic: 269.9 on 1 and 382 DF,  p-value: < 2.2e-16
ggplot(eedata3.50, aes(x = cooling.load, fill = glazing.area.actual)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

Glazing area appears to have the biggest impact on the cooling response variable in this subset of data, but not by much.

ggplot(eedata3.50, aes(x = cooling.load)) + geom_histogram(binwidth = 1) + labs(x = "cooling load") + facet_wrap(~glazing.area)

When a facet wrap is applied, it can be seen that wall area may play a factor in the cooling load as well. It appears that higher glazing ratios and larger wall areas would correlate to larger cooling loads; however, it’s not so clear at the moment.

#Test the two parameters with each other
testlm <- lm(cooling.load ~ glazing.area * wall.area, data = eedatanumeric)
summary(testlm)
## 
## Call:
## lm(formula = cooling.load ~ glazing.area * wall.area, data = eedatanumeric)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.768  -6.904  -1.271   6.981  18.861 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            -7.05775    4.51852  -1.562    0.119    
## glazing.area            8.45215   16.76330   0.504    0.614    
## wall.area               0.08845    0.01406   6.293 5.24e-10 ***
## glazing.area:wall.area  0.01999    0.05215   0.383    0.702    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.388 on 764 degrees of freedom
## Multiple R-squared:  0.2256, Adjusted R-squared:  0.2226 
## F-statistic: 74.21 on 3 and 764 DF,  p-value: < 2.2e-16
ggplot(eedata3.50, aes(x = cooling.load, fill = wall.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load") + facet_wrap(~glazing.area)

#reverse wrapping
ggplot(eedata3.50, aes(x = cooling.load, fill = glazing.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load") + facet_wrap(~wall.area)

Now we can run the heating analysis.

testlm <- lm(heating.load ~ relative.compactness, data = eedata3.50numeric)
summary(testlm)
## 
## Call:
## lm(formula = heating.load ~ relative.compactness, data = eedata3.50numeric)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.7531 -1.5668  0.0219  1.5597  5.0494 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            34.231      1.981   17.28   <2e-16 ***
## relative.compactness  -30.876      2.922  -10.57   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.353 on 382 degrees of freedom
## Multiple R-squared:  0.2261, Adjusted R-squared:  0.2241 
## F-statistic: 111.6 on 1 and 382 DF,  p-value: < 2.2e-16
ggplot(eedata3.50, aes(x = heating.load, fill = relative.compactness)) + geom_histogram(binwidth = 1) + labs(x = "heating load")

testlm <- lm(heating.load ~ surface.area, data = eedata3.50numeric)
summary(testlm)
## 
## Call:
## lm(formula = heating.load ~ surface.area, data = eedata3.50numeric)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.7554 -1.5060  0.0566  1.5781  5.0614 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -9.437223   2.144938   -4.40 1.41e-05 ***
## surface.area  0.030479   0.002866   10.63  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.35 on 382 degrees of freedom
## Multiple R-squared:  0.2284, Adjusted R-squared:  0.2264 
## F-statistic: 113.1 on 1 and 382 DF,  p-value: < 2.2e-16
ggplot(eedata3.50, aes(x = heating.load, fill = surface.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load")

testlm <- lm(heating.load ~ wall.area, data = eedata3.50numeric)
summary(testlm)
## 
## Call:
## lm(formula = heating.load ~ wall.area, data = eedata3.50numeric)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.7554 -1.5060  0.0566  1.5781  5.0614 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 4.004196   0.885852    4.52 8.25e-06 ***
## wall.area   0.030479   0.002866   10.63  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.35 on 382 degrees of freedom
## Multiple R-squared:  0.2284, Adjusted R-squared:  0.2264 
## F-statistic: 113.1 on 1 and 382 DF,  p-value: < 2.2e-16
ggplot(eedata3.50, aes(x = heating.load, fill = wall.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load")

Again, glazing ratio appears to have the biggest impact, along with the wall area.

testlm <- lm(heating.load ~ glazing.area, data = eedata3.50numeric)
summary(testlm)
## 
## Call:
## lm(formula = heating.load ~ glazing.area, data = eedata3.50numeric)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9184 -1.2102 -0.6159  1.0441  4.0366 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    9.9284     0.1901   52.23   <2e-16 ***
## glazing.area  14.5499     0.7052   20.63   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.84 on 382 degrees of freedom
## Multiple R-squared:  0.527,  Adjusted R-squared:  0.5258 
## F-statistic: 425.7 on 1 and 382 DF,  p-value: < 2.2e-16
ggplot(eedata3.50, aes(x = heating.load, fill = glazing.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load")

Heating facet wraps

testlm <- lm(heating.load ~ glazing.area * wall.area, data = eedatanumeric)
summary(testlm)
## 
## Call:
## lm(formula = heating.load ~ glazing.area * wall.area, data = eedatanumeric)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -16.132  -7.872  -1.819   8.585  15.265 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            -12.68961    4.61763  -2.748  0.00614 ** 
## glazing.area             6.10124   17.13100   0.356  0.72183    
## wall.area                0.09484    0.01436   6.603 7.56e-11 ***
## glazing.area:wall.area   0.04501    0.05329   0.845  0.39855    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.572 on 764 degrees of freedom
## Multiple R-squared:  0.2811, Adjusted R-squared:  0.2783 
## F-statistic: 99.59 on 3 and 764 DF,  p-value: < 2.2e-16
ggplot(eedata3.50, aes(x = heating.load, fill = wall.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load") + facet_wrap(~glazing.area)

ggplot(eedata3.50, aes(x = heating.load, fill = glazing.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load") + facet_wrap(~wall.area)

Now let’s run the 7.0m building

ggplot(eedata7.00, aes(x = cooling.load)) + geom_histogram(binwidth = 1)

ggplot(eedata7.00, aes(x = heating.load)) + geom_histogram(binwidth = 1)

Cooling analysis

testlm <- lm(cooling.load ~ relative.compactness, data = eedata7.00numeric)
summary(testlm)
## 
## Call:
## lm(formula = cooling.load ~ relative.compactness, data = eedata7.00numeric)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.9948  -3.0064   0.5992   3.4155  12.8798 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            61.362      2.929  20.950   <2e-16 ***
## relative.compactness  -33.180      3.427  -9.683   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.91 on 382 degrees of freedom
## Multiple R-squared:  0.1971, Adjusted R-squared:  0.195 
## F-statistic: 93.76 on 1 and 382 DF,  p-value: < 2.2e-16
ggplot(eedata7.00, aes(x = cooling.load, fill = relative.compactness)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

testlm <- lm(cooling.load ~ surface.area, data = eedata7.00numeric)
summary(testlm)
## 
## Call:
## lm(formula = cooling.load ~ surface.area, data = eedata7.00numeric)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.8026  -2.9340   0.3618   3.2738  12.7796 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.767915   3.065312   0.577    0.564    
## surface.area 0.052563   0.005125  10.256   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.852 on 382 degrees of freedom
## Multiple R-squared:  0.2159, Adjusted R-squared:  0.2139 
## F-statistic: 105.2 on 1 and 382 DF,  p-value: < 2.2e-16
ggplot(eedata7.00, aes(x = cooling.load, fill = surface.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

testlm <- lm(cooling.load ~ wall.area, data = eedata7.00numeric)
summary(testlm)
## 
## Call:
## lm(formula = cooling.load ~ wall.area, data = eedata7.00numeric)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.1890  -3.6865  -0.2242   2.8284  14.1709 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 12.717702   1.964366   6.474 2.93e-10 ***
## wall.area    0.061637   0.005892  10.461  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.831 on 382 degrees of freedom
## Multiple R-squared:  0.2227, Adjusted R-squared:  0.2206 
## F-statistic: 109.4 on 1 and 382 DF,  p-value: < 2.2e-16
ggplot(eedata7.00, aes(x = cooling.load, fill = wall.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

testlm <- lm(cooling.load ~ roof.area, data = eedata7.00numeric)
summary(testlm)
## 
## Call:
## lm(formula = cooling.load ~ roof.area, data = eedata7.00numeric)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.4224  -3.9624   0.3876   3.6776  14.4476 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 28.66252    2.50184  11.457   <2e-16 ***
## roof.area    0.03347    0.01873   1.786   0.0748 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.457 on 382 degrees of freedom
## Multiple R-squared:  0.008285,   Adjusted R-squared:  0.005689 
## F-statistic: 3.191 on 1 and 382 DF,  p-value: 0.07482
ggplot(eedata7.00, aes(x = cooling.load, fill = roof.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

Glazing area doesn’t appear to have the same level of impact on the 7.0m buildings but these are separated to be consistent with the 3.50m findings.

testlm <- lm(cooling.load ~ glazing.area, data = eedata7.00numeric)
summary(testlm)
## 
## Call:
## lm(formula = cooling.load ~ glazing.area, data = eedata7.00numeric)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -8.3325 -3.8821 -0.5623  3.3757 12.7884 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   28.6346     0.5014   57.11   <2e-16 ***
## glazing.area  19.0699     1.8600   10.25   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.852 on 382 degrees of freedom
## Multiple R-squared:  0.2158, Adjusted R-squared:  0.2137 
## F-statistic: 105.1 on 1 and 382 DF,  p-value: < 2.2e-16
ggplot(eedata7.00, aes(x = cooling.load, fill = glazing.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

ggplot(eedata7.00, aes(x = cooling.load)) + geom_histogram(binwidth = 1) + labs(x = "cooling load") + facet_wrap(~glazing.area)

#facet wrap
ggplot(eedata7.00, aes(x = cooling.load, fill = wall.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load") + facet_wrap(~glazing.area)

#reverse wrapping
ggplot(eedata7.00, aes(x = cooling.load, fill = glazing.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load") + facet_wrap(~wall.area)

Perform a heating analysis on the 7.0m buildings

testlm <- lm(heating.load ~ relative.compactness, data = eedata7.00numeric)
summary(testlm)
## 
## Call:
## lm(formula = heating.load ~ relative.compactness, data = eedata7.00numeric)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -16.4257  -2.9207   0.9914   4.7448   9.6239 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            61.662      3.196  19.292   <2e-16 ***
## relative.compactness  -35.679      3.739  -9.542   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.358 on 382 degrees of freedom
## Multiple R-squared:  0.1925, Adjusted R-squared:  0.1904 
## F-statistic: 91.05 on 1 and 382 DF,  p-value: < 2.2e-16
ggplot(eedata7.00, aes(x = heating.load, fill = relative.compactness)) + geom_histogram(binwidth = 1) + labs(x = "heating load")

testlm <- lm(heating.load ~ surface.area, data = eedata7.00numeric)
summary(testlm)
## 
## Call:
## lm(formula = heating.load ~ surface.area, data = eedata7.00numeric)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -16.2286  -2.8192   0.9919   4.4033   9.4924 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.767643   3.336822  -0.829    0.407    
## surface.area  0.057104   0.005579  10.236   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.282 on 382 degrees of freedom
## Multiple R-squared:  0.2152, Adjusted R-squared:  0.2132 
## F-statistic: 104.8 on 1 and 382 DF,  p-value: < 2.2e-16
ggplot(eedata7.00, aes(x = heating.load, fill = surface.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load")

testlm <- lm(heating.load ~ wall.area, data = eedata7.00numeric)
summary(testlm)
## 
## Call:
## lm(formula = heating.load ~ wall.area, data = eedata7.00numeric)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.4034  -3.9683  -0.2133   3.6467  10.9316 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.177783   2.081549   3.448 0.000627 ***
## wall.area   0.072859   0.006244  11.669  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.119 on 382 degrees of freedom
## Multiple R-squared:  0.2628, Adjusted R-squared:  0.2609 
## F-statistic: 136.2 on 1 and 382 DF,  p-value: < 2.2e-16
ggplot(eedata7.00, aes(x = heating.load, fill = wall.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load")

testlm <- lm(heating.load ~ roof.area, data = eedata7.00numeric)
summary(testlm)
## 
## Call:
## lm(formula = heating.load ~ roof.area, data = eedata7.00numeric)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.4816  -4.4541   0.4076   4.8243  11.6384 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 29.55130    2.73217  10.816   <2e-16 ***
## roof.area    0.01300    0.02046   0.635    0.526    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.959 on 382 degrees of freedom
## Multiple R-squared:  0.001055,   Adjusted R-squared:  -0.00156 
## F-statistic: 0.4034 on 1 and 382 DF,  p-value: 0.5257
ggplot(eedata7.00, aes(x = heating.load, fill = roof.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")

Glazing area results

testlm <- lm(heating.load ~ glazing.area, data = eedata7.00numeric)
summary(testlm)
## 
## Call:
## lm(formula = heating.load ~ glazing.area, data = eedata7.00numeric)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.556 -3.507 -1.263  4.664  9.522 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   25.1058     0.4977   50.45   <2e-16 ***
## glazing.area  26.3258     1.8463   14.26   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.817 on 382 degrees of freedom
## Multiple R-squared:  0.3474, Adjusted R-squared:  0.3456 
## F-statistic: 203.3 on 1 and 382 DF,  p-value: < 2.2e-16
ggplot(eedata7.00, aes(x = heating.load, fill = glazing.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load")

#facet wrap
ggplot(eedata7.00, aes(x = heating.load, fill = wall.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load") + facet_wrap(~glazing.area)

#reverse wrapping
ggplot(eedata7.00, aes(x = heating.load, fill = glazing.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load") + facet_wrap(~wall.area)

Question - 3 Graphics: Please make sure to display at least one scatter plot, box plot and histogram. Don’t be limited to this. Please explore the many other options in R packages such as ggplot2.

Summary

Let’s put together some summary graphs.

Cooling - scatter plot separating 3.50 vs. 7 meter buildings by wall area and glazing ratio.

ggplot(eedatanumeric, aes(y = cooling.load, x = wall.area)) + geom_point(aes(color = glazing.area)) + facet_wrap(~overall.height) + labs(x = "wall area", y = "cooling load")

ggplot(eedatanumeric, aes(y = cooling.load, x = glazing.area)) + geom_point(aes(color = wall.area)) + facet_wrap(~overall.height) + labs(x = "glazing area ratio", y = "cooling load")

Heating - scatter plot separating 3.50 vs. 7 meter buildings by wall area and glazing ratio.

ggplot(eedatanumeric, aes(y = heating.load, x = wall.area)) + geom_point(aes(color = glazing.area)) + facet_wrap(~overall.height) + labs(x = "wall area", y = "heating load")

ggplot(eedatanumeric, aes(y = heating.load, x = glazing.area)) + geom_point(aes(color = wall.area)) + facet_wrap(~overall.height) + labs(x = "glazing area ratio", y = "cooling load")

As we can see be the summary graphs, it appears that as the glazing area ratio and wall area increase, the heating and cooling loads will generally increase as well. This is also shown by graphing the actual glazing area (glazing area ratio * wall area) against the heating and cooling loads. However, due to low quality of fit in many of the tests run in this analysis, it appears there may be more factors influencing the response variables.

ggplot(eedatanumeric, aes(y = cooling.load, x = glazing.area)) + geom_point(aes(color = glazing.area.actual)) + facet_wrap(~overall.height) + labs(x = "glazing area ratio", y = "cooling load")

ggplot(eedatanumeric, aes(y = heating.load, x = glazing.area)) + geom_point(aes(color = glazing.area.actual)) + facet_wrap(~overall.height) + labs(x = "glazing area ratio", y = "heating load")