This is the published version for the Week 3 assignment
Question - 4 Meaningful question for analysis: Please state at the beginning a meaningful question for analysis. Use the first three steps and anything else that would be helpful to answer the question you are posing from the data set you chose. Please write a brief conclusion paragraph in R markdown at the end.
I have selected a data set which reports a simulated building heating and cooling load based on a varying set of inputs (https://archive.ics.uci.edu/ml/datasets/Energy+efficiency). To provide some background, I am a mechanical engineer who specializes in building energy simulation which is why I found this data set particularly interesting. Since I haven’t seen the actually energy simulation, i’m unsure of many input parameters which could greatly affect the results. Therefore, I’ll be treating this as just a data set with numbers rather than a summarey on modeling analysis.
It appears this data was constructed in order to determine how two response variables, cooling load and heating load, will vary given changes in a number of tracked predictor variables. The main question I set out to answer is, which variables have the greatest impact on the cooling and heating loads for the modeled buildings?
First, I will take the data and import it into data tables. During my analysis, it seemed like some plotting functions (e.g. geom_histogram) did not work well wtih numberical and categorical data. To get the best plots possible, I created two data tables; one where all the colums are character type, and the other with numerical types where applicable.
eedataurl <- "https://github.com/john-grando/Masters/raw/master/Workshop/R/Week3/assignment/ENB2012_data.csv"
eedata <- read.csv(file = eedataurl,
header = TRUE, sep <- ",",
#colClasses = c("numeric","numeric","numeric","numeric", "numeric", "character", "numeric", "character", "numeric", "numeric"),
colClasses = c("character","character","character","character", "character", "character", "character", "character", "numeric", "numeric"),
col.names = c("relative.compactness","surface.area","wall.area","roof.area","overall.height","orientation","glazing.area","glazing.area.distribution","heating.load","cooling.load"),
stringsAsFactors = FALSE)
eedatanumeric <- read.csv(file = eedataurl,
header = TRUE, sep <- ",",
colClasses = c("numeric","numeric","numeric","numeric","numeric","character","numeric","character","numeric","numeric"),
col.names = c("relative.compactness","surface.area","wall.area","roof.area","overall.height","orientation","glazing.area","glazing.area.distribution","heating.load","cooling.load"),
stringsAsFactors = FALSE)
Question 2 - Data wrangling: Please perform some basic transformations. They will need to make sense but could include column renaming, creating a subset of the data, replacing values, or creating new columns with derived data (for example – if it makes sense you could sum two columns together)
From the data provided, it appears that glazing area is actually the glazing ratio. We can then calculate an extra column which gives the actual glazing area.
eedata$glazing.area.actual <- formatC(eedatanumeric$glazing.area * eedatanumeric$wall.area)
eedatanumeric$glazing.area.actual <- eedatanumeric$glazing.area * eedatanumeric$wall.area
Question 1 - Data Exploration: This should include summary statistics, means, medians, quartiles, or any other relevant information about the data set. Please include some conclusions in the R Markdown text.
To start, I take a look at the histograms of the two response variables in this study (cooling load and heating load)
require(ggplot2)
## Loading required package: ggplot2
ggplot(eedata, aes(x = cooling.load)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
ggplot(eedata, aes(x = heating.load)) + geom_histogram(binwidth = 1) + labs(x = "heating load")
Upon inspection, it appears both of these response variables are bimodal. Therefore, I will check whether there are any inputs influencing this shape.
First, i’ll check categorical fields of orientation and glazing area distribution since a linear regression wouldn’t provide any meaningful results
ggplot(eedata, aes(cooling.load, fill = orientation)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
ggplot(eedata, aes(cooling.load, fill = glazing.area.distribution)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
ggplot(eedata, aes(heating.load, fill = orientation)) + geom_histogram(binwidth = 1) + labs(x = "heating load")
ggplot(eedata, aes(heating.load, fill = glazing.area.distribution)) + geom_histogram(binwidth = 1) + labs(x = "heating load")
It does not appear that these attributes are influencing the bimodal nature of the cooling or heating response variables.
I then start with checking the numeric fields in reference to the cooling response variable. For the following steps, I will be using the lm() function to check for correlation. I realize that a lot or results show weak r squared values but i’m using more of a check rather than a proof of strong relationships.
testlm <- lm(cooling.load ~ relative.compactness, data = eedatanumeric)
summary(testlm)
##
## Call:
## lm(formula = cooling.load ~ relative.compactness, data = eedatanumeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.571 -5.632 -1.233 3.379 21.968
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -19.008 1.938 -9.809 <2e-16 ***
## relative.compactness 57.051 2.512 22.710 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.359 on 766 degrees of freedom
## Multiple R-squared: 0.4024, Adjusted R-squared: 0.4016
## F-statistic: 515.8 on 1 and 766 DF, p-value: < 2.2e-16
ggplot(eedata, aes(x = cooling.load, fill = relative.compactness)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
testlm <- lm(cooling.load ~ surface.area, data = eedatanumeric)
summary(testlm)
##
## Call:
## lm(formula = cooling.load ~ surface.area, data = eedatanumeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.6843 -5.3800 -0.5586 3.7048 20.9195
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 73.410157 1.955287 37.54 <2e-16 ***
## surface.area -0.072684 0.002886 -25.18 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.041 on 766 degrees of freedom
## Multiple R-squared: 0.4529, Adjusted R-squared: 0.4522
## F-statistic: 634.2 on 1 and 766 DF, p-value: < 2.2e-16
ggplot(eedata, aes(x = cooling.load, fill = surface.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
testlm <- lm(cooling.load ~ wall.area, data = eedatanumeric)
summary(testlm)
##
## Call:
## lm(formula = cooling.load ~ wall.area, data = eedatanumeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.111 -6.664 -0.909 7.331 21.160
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.076775 2.290183 -2.217 0.0269 *
## wall.area 0.093138 0.007124 13.074 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.608 on 766 degrees of freedom
## Multiple R-squared: 0.1824, Adjusted R-squared: 0.1814
## F-statistic: 170.9 on 1 and 766 DF, p-value: < 2.2e-16
ggplot(eedata, aes(x = cooling.load, fill = wall.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
testlm <- lm(cooling.load ~ glazing.area, data = eedatanumeric)
summary(testlm)
##
## Call:
## lm(formula = cooling.load ~ glazing.area, data = eedatanumeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.462 -9.049 -1.536 8.127 21.151
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.1148 0.6803 31.036 < 2e-16 ***
## glazing.area 14.8180 2.5240 5.871 6.46e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.312 on 766 degrees of freedom
## Multiple R-squared: 0.04306, Adjusted R-squared: 0.04181
## F-statistic: 34.47 on 1 and 766 DF, p-value: 6.457e-09
ggplot(eedata, aes(x = cooling.load, fill = glazing.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
testlm <- lm(cooling.load ~ glazing.area.actual, data = eedatanumeric)
summary(testlm)
##
## Call:
## lm(formula = cooling.load ~ glazing.area.actual, data = eedatanumeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.290 -8.522 -1.286 7.942 21.364
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 19.736783 0.645142 30.593 <2e-16 ***
## glazing.area.actual 0.064984 0.007445 8.728 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.079 on 766 degrees of freedom
## Multiple R-squared: 0.09046, Adjusted R-squared: 0.08927
## F-statistic: 76.18 on 1 and 766 DF, p-value: < 2.2e-16
ggplot(eedata, aes(x = cooling.load, fill = glazing.area.actual)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
While some of these show a correlation between surface properties and cooling load, the overall height and roof area are clearly the dominant factors
testlm <- lm(cooling.load ~ overall.height, data = eedatanumeric)
summary(testlm)
##
## Call:
## lm(formula = cooling.load ~ overall.height, data = eedatanumeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.9441 -2.3539 -0.2664 2.0386 14.9259
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.96122 0.48283 -1.991 0.0469 *
## overall.height 4.86647 0.08725 55.777 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.231 on 766 degrees of freedom
## Multiple R-squared: 0.8024, Adjusted R-squared: 0.8022
## F-statistic: 3111 on 1 and 766 DF, p-value: < 2.2e-16
ggplot(eedata, aes(x = cooling.load, fill = overall.height)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
ggplot(eedata, aes(y = cooling.load, x = overall.height)) + geom_boxplot() + labs(x = "overall height", y = "cooling load")
testlm <- lm(cooling.load ~ roof.area, data = eedatanumeric)
summary(testlm)
##
## Call:
## lm(formula = cooling.load ~ roof.area, data = eedatanumeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.3129 -2.4653 -0.6178 2.4377 18.0638
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 56.672891 0.701905 80.74 <2e-16 ***
## roof.area -0.181678 0.003851 -47.18 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.817 on 766 degrees of freedom
## Multiple R-squared: 0.744, Adjusted R-squared: 0.7437
## F-statistic: 2226 on 1 and 766 DF, p-value: < 2.2e-16
ggplot(eedata, aes(x = cooling.load, fill = roof.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
ggplot(eedata, aes(y = cooling.load, x = roof.area)) + geom_boxplot() + labs(x = "roof area", y = "cooling load")
Next, I test the heating load
testlm <- lm(heating.load ~ relative.compactness, data = eedatanumeric)
summary(testlm)
##
## Call:
## lm(formula = heating.load ~ relative.compactness, data = eedatanumeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.569 -6.332 -1.028 3.393 19.259
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -23.053 2.081 -11.08 <2e-16 ***
## relative.compactness 59.359 2.698 22.00 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.904 on 766 degrees of freedom
## Multiple R-squared: 0.3872, Adjusted R-squared: 0.3864
## F-statistic: 484 on 1 and 766 DF, p-value: < 2.2e-16
ggplot(eedata, aes(x = heating.load, fill = relative.compactness)) + geom_histogram(binwidth = 1) + labs(x = "heating load")
testlm <- lm(heating.load ~ surface.area, data = eedatanumeric)
summary(testlm)
##
## Call:
## lm(formula = heating.load ~ surface.area, data = eedatanumeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.609 -5.524 -1.300 3.529 18.176
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 72.945382 2.111062 34.55 <2e-16 ***
## surface.area -0.075387 0.003116 -24.19 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.602 on 766 degrees of freedom
## Multiple R-squared: 0.4331, Adjusted R-squared: 0.4324
## F-statistic: 585.3 on 1 and 766 DF, p-value: < 2.2e-16
ggplot(eedata, aes(x = heating.load, fill = surface.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load")
testlm <- lm(heating.load ~ wall.area, data = eedatanumeric)
summary(testlm)
##
## Call:
## lm(formula = heating.load ~ wall.area, data = eedatanumeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.0213 -7.3937 -0.4882 7.5728 18.2107
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -11.259633 2.391321 -4.709 2.96e-06 ***
## wall.area 0.105390 0.007439 14.168 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.988 on 766 degrees of freedom
## Multiple R-squared: 0.2076, Adjusted R-squared: 0.2066
## F-statistic: 200.7 on 1 and 766 DF, p-value: < 2.2e-16
ggplot(eedata, aes(x = heating.load, fill = wall.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load")
testlm <- lm(heating.load ~ glazing.area, data = eedatanumeric)
summary(testlm)
##
## Call:
## lm(formula = heating.load ~ glazing.area, data = eedatanumeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.272 -9.193 -3.054 7.253 17.699
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.5171 0.7103 24.662 < 2e-16 ***
## glazing.area 20.4379 2.6351 7.756 2.8e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.722 on 766 degrees of freedom
## Multiple R-squared: 0.07281, Adjusted R-squared: 0.0716
## F-statistic: 60.16 on 1 and 766 DF, p-value: 2.796e-14
ggplot(eedata, aes(x = heating.load, fill = glazing.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load")
testlm <- lm(heating.load ~ glazing.area.actual, data = eedatanumeric)
summary(testlm)
##
## Call:
## lm(formula = heating.load ~ glazing.area.actual, data = eedatanumeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.340 -9.141 -1.956 7.972 18.367
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.990050 0.666772 23.98 <2e-16 ***
## glazing.area.actual 0.084625 0.007695 11.00 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.383 on 766 degrees of freedom
## Multiple R-squared: 0.1364, Adjusted R-squared: 0.1352
## F-statistic: 120.9 on 1 and 766 DF, p-value: < 2.2e-16
ggplot(eedata, aes(x = heating.load, fill = glazing.area.actual)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
Similaraly, the heating response variable is greatly impacted by the overall height and roof area.
testlm <- lm(heating.load ~ overall.height, data = eedatanumeric)
summary(testlm)
##
## Call:
## lm(formula = heating.load ~ overall.height, data = eedatanumeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.7259 -2.5929 -0.3085 2.0015 11.8241
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.59885 0.52661 -8.733 <2e-16 ***
## overall.height 5.12496 0.09516 53.857 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.615 on 766 degrees of freedom
## Multiple R-squared: 0.7911, Adjusted R-squared: 0.7908
## F-statistic: 2901 on 1 and 766 DF, p-value: < 2.2e-16
ggplot(eedata, aes(x = heating.load, fill = overall.height)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
ggplot(eedata, aes(y = heating.load, x = overall.height)) + geom_boxplot() + labs(x = "overall height", y = "cooling load")
testlm <- lm(heating.load ~ roof.area, data = eedatanumeric)
summary(testlm)
##
## Call:
## lm(formula = heating.load ~ roof.area, data = eedatanumeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -19.5327 -2.6392 -0.3191 2.4997 15.0930
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 56.309643 0.746268 75.45 <2e-16 ***
## roof.area -0.192535 0.004094 -47.03 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.121 on 766 degrees of freedom
## Multiple R-squared: 0.7427, Adjusted R-squared: 0.7424
## F-statistic: 2212 on 1 and 766 DF, p-value: < 2.2e-16
ggplot(eedata, aes(x = heating.load, fill = roof.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load")
ggplot(eedata, aes(y = heating.load, x = roof.area)) + geom_boxplot() + labs(x = "overall height", y = "heating load")
However, we see that only one roof area is applied to buildings with a 3.50 overall height and it appears to correlate to a much lower heating and cooling load
ggplot(eedata, aes(x = cooling.load, fill = roof.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load") + facet_wrap(~overall.height)
ggplot(eedata, aes(x = heating.load, fill = roof.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load") + facet_wrap(~overall.height)
So let’s split this data into two new tables where one is for 3.50 tall buildings and the other is for 7.0 tall buildings; I’m assuming the unit is meters from here on out.
eedata3.50 <- subset(eedata, eedata$overall.height == "3.50")
eedata3.50numeric <- subset(eedatanumeric, eedata$overall.height < 6)
eedata7.00 <- subset(eedata, eedata$overall.height == "7.00")
eedata7.00numeric <- subset(eedatanumeric, eedata$overall.height > 4)
Now we can look at the 3.5m tall building histograms
ggplot(eedata3.50, aes(x = cooling.load)) + geom_histogram(binwidth = 1) + labs(x = "cooling load 3.5m")
ggplot(eedata3.50, aes(x = heating.load)) + geom_histogram(binwidth = 1) + labs(x = "heating load 3.5m")
Next, let’s re-run our cooling analysis
testlm <- lm(cooling.load ~ relative.compactness, data = eedata3.50numeric)
summary(testlm)
##
## Call:
## lm(formula = cooling.load ~ relative.compactness, data = eedata3.50numeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.3043 -1.4466 -0.2292 1.6271 5.8349
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 31.271 1.886 16.582 < 2e-16 ***
## relative.compactness -22.463 2.782 -8.075 8.93e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.24 on 382 degrees of freedom
## Multiple R-squared: 0.1458, Adjusted R-squared: 0.1436
## F-statistic: 65.2 on 1 and 382 DF, p-value: 8.926e-15
ggplot(eedata3.50, aes(x = cooling.load, fill = relative.compactness)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
testlm <- lm(cooling.load ~ surface.area, data = eedata3.50numeric)
summary(testlm)
##
## Call:
## lm(formula = cooling.load ~ surface.area, data = eedata3.50numeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.3439 -1.4302 -0.1939 1.7036 5.8711
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.059885 2.054806 0.029 0.977
## surface.area 0.021427 0.002746 7.804 5.79e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.251 on 382 degrees of freedom
## Multiple R-squared: 0.1375, Adjusted R-squared: 0.1353
## F-statistic: 60.91 on 1 and 382 DF, p-value: 5.789e-14
ggplot(eedata3.50, aes(x = cooling.load, fill = surface.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
testlm <- lm(cooling.load ~ wall.area, data = eedata3.50numeric)
summary(testlm)
##
## Call:
## lm(formula = cooling.load ~ wall.area, data = eedata3.50numeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.3439 -1.4302 -0.1939 1.7036 5.8711
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.509323 0.848628 11.206 < 2e-16 ***
## wall.area 0.021427 0.002746 7.804 5.79e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.251 on 382 degrees of freedom
## Multiple R-squared: 0.1375, Adjusted R-squared: 0.1353
## F-statistic: 60.91 on 1 and 382 DF, p-value: 5.789e-14
ggplot(eedata3.50, aes(x = cooling.load, fill = wall.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
testlm <- lm(cooling.load ~ glazing.area, data = eedata3.50numeric)
summary(testlm)
##
## Call:
## lm(formula = cooling.load ~ glazing.area, data = eedata3.50numeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.2414 -1.1214 -0.6340 -0.0128 4.9535
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.5950 0.2037 66.75 <2e-16 ***
## glazing.area 10.5660 0.7557 13.98 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.971 on 382 degrees of freedom
## Multiple R-squared: 0.3385, Adjusted R-squared: 0.3368
## F-statistic: 195.5 on 1 and 382 DF, p-value: < 2.2e-16
ggplot(eedata3.50, aes(x = cooling.load, fill = glazing.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
testlm <- lm(cooling.load ~ glazing.area.actual, data = eedata3.50numeric)
summary(testlm)
##
## Call:
## lm(formula = cooling.load ~ glazing.area.actual, data = eedata3.50numeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8375 -0.9960 -0.5046 -0.0055 4.7867
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.432060 0.186494 72.02 <2e-16 ***
## glazing.area.actual 0.036772 0.002238 16.43 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.856 on 382 degrees of freedom
## Multiple R-squared: 0.414, Adjusted R-squared: 0.4125
## F-statistic: 269.9 on 1 and 382 DF, p-value: < 2.2e-16
ggplot(eedata3.50, aes(x = cooling.load, fill = glazing.area.actual)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
Glazing area appears to have the biggest impact on the cooling response variable in this subset of data, but not by much.
ggplot(eedata3.50, aes(x = cooling.load)) + geom_histogram(binwidth = 1) + labs(x = "cooling load") + facet_wrap(~glazing.area)
When a facet wrap is applied, it can be seen that wall area may play a factor in the cooling load as well. It appears that higher glazing ratios and larger wall areas would correlate to larger cooling loads; however, it’s not so clear at the moment.
#Test the two parameters with each other
testlm <- lm(cooling.load ~ glazing.area * wall.area, data = eedatanumeric)
summary(testlm)
##
## Call:
## lm(formula = cooling.load ~ glazing.area * wall.area, data = eedatanumeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.768 -6.904 -1.271 6.981 18.861
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.05775 4.51852 -1.562 0.119
## glazing.area 8.45215 16.76330 0.504 0.614
## wall.area 0.08845 0.01406 6.293 5.24e-10 ***
## glazing.area:wall.area 0.01999 0.05215 0.383 0.702
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.388 on 764 degrees of freedom
## Multiple R-squared: 0.2256, Adjusted R-squared: 0.2226
## F-statistic: 74.21 on 3 and 764 DF, p-value: < 2.2e-16
ggplot(eedata3.50, aes(x = cooling.load, fill = wall.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load") + facet_wrap(~glazing.area)
#reverse wrapping
ggplot(eedata3.50, aes(x = cooling.load, fill = glazing.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load") + facet_wrap(~wall.area)
Now we can run the heating analysis.
testlm <- lm(heating.load ~ relative.compactness, data = eedata3.50numeric)
summary(testlm)
##
## Call:
## lm(formula = heating.load ~ relative.compactness, data = eedata3.50numeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.7531 -1.5668 0.0219 1.5597 5.0494
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.231 1.981 17.28 <2e-16 ***
## relative.compactness -30.876 2.922 -10.57 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.353 on 382 degrees of freedom
## Multiple R-squared: 0.2261, Adjusted R-squared: 0.2241
## F-statistic: 111.6 on 1 and 382 DF, p-value: < 2.2e-16
ggplot(eedata3.50, aes(x = heating.load, fill = relative.compactness)) + geom_histogram(binwidth = 1) + labs(x = "heating load")
testlm <- lm(heating.load ~ surface.area, data = eedata3.50numeric)
summary(testlm)
##
## Call:
## lm(formula = heating.load ~ surface.area, data = eedata3.50numeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.7554 -1.5060 0.0566 1.5781 5.0614
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.437223 2.144938 -4.40 1.41e-05 ***
## surface.area 0.030479 0.002866 10.63 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.35 on 382 degrees of freedom
## Multiple R-squared: 0.2284, Adjusted R-squared: 0.2264
## F-statistic: 113.1 on 1 and 382 DF, p-value: < 2.2e-16
ggplot(eedata3.50, aes(x = heating.load, fill = surface.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load")
testlm <- lm(heating.load ~ wall.area, data = eedata3.50numeric)
summary(testlm)
##
## Call:
## lm(formula = heating.load ~ wall.area, data = eedata3.50numeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.7554 -1.5060 0.0566 1.5781 5.0614
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.004196 0.885852 4.52 8.25e-06 ***
## wall.area 0.030479 0.002866 10.63 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.35 on 382 degrees of freedom
## Multiple R-squared: 0.2284, Adjusted R-squared: 0.2264
## F-statistic: 113.1 on 1 and 382 DF, p-value: < 2.2e-16
ggplot(eedata3.50, aes(x = heating.load, fill = wall.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load")
Again, glazing ratio appears to have the biggest impact, along with the wall area.
testlm <- lm(heating.load ~ glazing.area, data = eedata3.50numeric)
summary(testlm)
##
## Call:
## lm(formula = heating.load ~ glazing.area, data = eedata3.50numeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9184 -1.2102 -0.6159 1.0441 4.0366
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.9284 0.1901 52.23 <2e-16 ***
## glazing.area 14.5499 0.7052 20.63 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.84 on 382 degrees of freedom
## Multiple R-squared: 0.527, Adjusted R-squared: 0.5258
## F-statistic: 425.7 on 1 and 382 DF, p-value: < 2.2e-16
ggplot(eedata3.50, aes(x = heating.load, fill = glazing.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load")
Heating facet wraps
testlm <- lm(heating.load ~ glazing.area * wall.area, data = eedatanumeric)
summary(testlm)
##
## Call:
## lm(formula = heating.load ~ glazing.area * wall.area, data = eedatanumeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.132 -7.872 -1.819 8.585 15.265
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -12.68961 4.61763 -2.748 0.00614 **
## glazing.area 6.10124 17.13100 0.356 0.72183
## wall.area 0.09484 0.01436 6.603 7.56e-11 ***
## glazing.area:wall.area 0.04501 0.05329 0.845 0.39855
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.572 on 764 degrees of freedom
## Multiple R-squared: 0.2811, Adjusted R-squared: 0.2783
## F-statistic: 99.59 on 3 and 764 DF, p-value: < 2.2e-16
ggplot(eedata3.50, aes(x = heating.load, fill = wall.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load") + facet_wrap(~glazing.area)
ggplot(eedata3.50, aes(x = heating.load, fill = glazing.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load") + facet_wrap(~wall.area)
Now let’s run the 7.0m building
ggplot(eedata7.00, aes(x = cooling.load)) + geom_histogram(binwidth = 1)
ggplot(eedata7.00, aes(x = heating.load)) + geom_histogram(binwidth = 1)
Cooling analysis
testlm <- lm(cooling.load ~ relative.compactness, data = eedata7.00numeric)
summary(testlm)
##
## Call:
## lm(formula = cooling.load ~ relative.compactness, data = eedata7.00numeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.9948 -3.0064 0.5992 3.4155 12.8798
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 61.362 2.929 20.950 <2e-16 ***
## relative.compactness -33.180 3.427 -9.683 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.91 on 382 degrees of freedom
## Multiple R-squared: 0.1971, Adjusted R-squared: 0.195
## F-statistic: 93.76 on 1 and 382 DF, p-value: < 2.2e-16
ggplot(eedata7.00, aes(x = cooling.load, fill = relative.compactness)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
testlm <- lm(cooling.load ~ surface.area, data = eedata7.00numeric)
summary(testlm)
##
## Call:
## lm(formula = cooling.load ~ surface.area, data = eedata7.00numeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.8026 -2.9340 0.3618 3.2738 12.7796
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.767915 3.065312 0.577 0.564
## surface.area 0.052563 0.005125 10.256 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.852 on 382 degrees of freedom
## Multiple R-squared: 0.2159, Adjusted R-squared: 0.2139
## F-statistic: 105.2 on 1 and 382 DF, p-value: < 2.2e-16
ggplot(eedata7.00, aes(x = cooling.load, fill = surface.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
testlm <- lm(cooling.load ~ wall.area, data = eedata7.00numeric)
summary(testlm)
##
## Call:
## lm(formula = cooling.load ~ wall.area, data = eedata7.00numeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.1890 -3.6865 -0.2242 2.8284 14.1709
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.717702 1.964366 6.474 2.93e-10 ***
## wall.area 0.061637 0.005892 10.461 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.831 on 382 degrees of freedom
## Multiple R-squared: 0.2227, Adjusted R-squared: 0.2206
## F-statistic: 109.4 on 1 and 382 DF, p-value: < 2.2e-16
ggplot(eedata7.00, aes(x = cooling.load, fill = wall.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
testlm <- lm(cooling.load ~ roof.area, data = eedata7.00numeric)
summary(testlm)
##
## Call:
## lm(formula = cooling.load ~ roof.area, data = eedata7.00numeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.4224 -3.9624 0.3876 3.6776 14.4476
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.66252 2.50184 11.457 <2e-16 ***
## roof.area 0.03347 0.01873 1.786 0.0748 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.457 on 382 degrees of freedom
## Multiple R-squared: 0.008285, Adjusted R-squared: 0.005689
## F-statistic: 3.191 on 1 and 382 DF, p-value: 0.07482
ggplot(eedata7.00, aes(x = cooling.load, fill = roof.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
Glazing area doesn’t appear to have the same level of impact on the 7.0m buildings but these are separated to be consistent with the 3.50m findings.
testlm <- lm(cooling.load ~ glazing.area, data = eedata7.00numeric)
summary(testlm)
##
## Call:
## lm(formula = cooling.load ~ glazing.area, data = eedata7.00numeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.3325 -3.8821 -0.5623 3.3757 12.7884
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.6346 0.5014 57.11 <2e-16 ***
## glazing.area 19.0699 1.8600 10.25 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.852 on 382 degrees of freedom
## Multiple R-squared: 0.2158, Adjusted R-squared: 0.2137
## F-statistic: 105.1 on 1 and 382 DF, p-value: < 2.2e-16
ggplot(eedata7.00, aes(x = cooling.load, fill = glazing.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
ggplot(eedata7.00, aes(x = cooling.load)) + geom_histogram(binwidth = 1) + labs(x = "cooling load") + facet_wrap(~glazing.area)
#facet wrap
ggplot(eedata7.00, aes(x = cooling.load, fill = wall.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load") + facet_wrap(~glazing.area)
#reverse wrapping
ggplot(eedata7.00, aes(x = cooling.load, fill = glazing.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load") + facet_wrap(~wall.area)
Perform a heating analysis on the 7.0m buildings
testlm <- lm(heating.load ~ relative.compactness, data = eedata7.00numeric)
summary(testlm)
##
## Call:
## lm(formula = heating.load ~ relative.compactness, data = eedata7.00numeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.4257 -2.9207 0.9914 4.7448 9.6239
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 61.662 3.196 19.292 <2e-16 ***
## relative.compactness -35.679 3.739 -9.542 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.358 on 382 degrees of freedom
## Multiple R-squared: 0.1925, Adjusted R-squared: 0.1904
## F-statistic: 91.05 on 1 and 382 DF, p-value: < 2.2e-16
ggplot(eedata7.00, aes(x = heating.load, fill = relative.compactness)) + geom_histogram(binwidth = 1) + labs(x = "heating load")
testlm <- lm(heating.load ~ surface.area, data = eedata7.00numeric)
summary(testlm)
##
## Call:
## lm(formula = heating.load ~ surface.area, data = eedata7.00numeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.2286 -2.8192 0.9919 4.4033 9.4924
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.767643 3.336822 -0.829 0.407
## surface.area 0.057104 0.005579 10.236 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.282 on 382 degrees of freedom
## Multiple R-squared: 0.2152, Adjusted R-squared: 0.2132
## F-statistic: 104.8 on 1 and 382 DF, p-value: < 2.2e-16
ggplot(eedata7.00, aes(x = heating.load, fill = surface.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load")
testlm <- lm(heating.load ~ wall.area, data = eedata7.00numeric)
summary(testlm)
##
## Call:
## lm(formula = heating.load ~ wall.area, data = eedata7.00numeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.4034 -3.9683 -0.2133 3.6467 10.9316
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.177783 2.081549 3.448 0.000627 ***
## wall.area 0.072859 0.006244 11.669 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.119 on 382 degrees of freedom
## Multiple R-squared: 0.2628, Adjusted R-squared: 0.2609
## F-statistic: 136.2 on 1 and 382 DF, p-value: < 2.2e-16
ggplot(eedata7.00, aes(x = heating.load, fill = wall.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load")
testlm <- lm(heating.load ~ roof.area, data = eedata7.00numeric)
summary(testlm)
##
## Call:
## lm(formula = heating.load ~ roof.area, data = eedata7.00numeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.4816 -4.4541 0.4076 4.8243 11.6384
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 29.55130 2.73217 10.816 <2e-16 ***
## roof.area 0.01300 0.02046 0.635 0.526
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.959 on 382 degrees of freedom
## Multiple R-squared: 0.001055, Adjusted R-squared: -0.00156
## F-statistic: 0.4034 on 1 and 382 DF, p-value: 0.5257
ggplot(eedata7.00, aes(x = heating.load, fill = roof.area)) + geom_histogram(binwidth = 1) + labs(x = "cooling load")
Glazing area results
testlm <- lm(heating.load ~ glazing.area, data = eedata7.00numeric)
summary(testlm)
##
## Call:
## lm(formula = heating.load ~ glazing.area, data = eedata7.00numeric)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.556 -3.507 -1.263 4.664 9.522
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 25.1058 0.4977 50.45 <2e-16 ***
## glazing.area 26.3258 1.8463 14.26 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.817 on 382 degrees of freedom
## Multiple R-squared: 0.3474, Adjusted R-squared: 0.3456
## F-statistic: 203.3 on 1 and 382 DF, p-value: < 2.2e-16
ggplot(eedata7.00, aes(x = heating.load, fill = glazing.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load")
#facet wrap
ggplot(eedata7.00, aes(x = heating.load, fill = wall.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load") + facet_wrap(~glazing.area)
#reverse wrapping
ggplot(eedata7.00, aes(x = heating.load, fill = glazing.area)) + geom_histogram(binwidth = 1) + labs(x = "heating load") + facet_wrap(~wall.area)
Question - 3 Graphics: Please make sure to display at least one scatter plot, box plot and histogram. Don’t be limited to this. Please explore the many other options in R packages such as ggplot2.
Let’s put together some summary graphs.
Cooling - scatter plot separating 3.50 vs. 7 meter buildings by wall area and glazing ratio.
ggplot(eedatanumeric, aes(y = cooling.load, x = wall.area)) + geom_point(aes(color = glazing.area)) + facet_wrap(~overall.height) + labs(x = "wall area", y = "cooling load")
ggplot(eedatanumeric, aes(y = cooling.load, x = glazing.area)) + geom_point(aes(color = wall.area)) + facet_wrap(~overall.height) + labs(x = "glazing area ratio", y = "cooling load")
Heating - scatter plot separating 3.50 vs. 7 meter buildings by wall area and glazing ratio.
ggplot(eedatanumeric, aes(y = heating.load, x = wall.area)) + geom_point(aes(color = glazing.area)) + facet_wrap(~overall.height) + labs(x = "wall area", y = "heating load")
ggplot(eedatanumeric, aes(y = heating.load, x = glazing.area)) + geom_point(aes(color = wall.area)) + facet_wrap(~overall.height) + labs(x = "glazing area ratio", y = "cooling load")
As we can see be the summary graphs, it appears that as the glazing area ratio and wall area increase, the heating and cooling loads will generally increase as well. This is also shown by graphing the actual glazing area (glazing area ratio * wall area) against the heating and cooling loads. However, due to low quality of fit in many of the tests run in this analysis, it appears there may be more factors influencing the response variables.
ggplot(eedatanumeric, aes(y = cooling.load, x = glazing.area)) + geom_point(aes(color = glazing.area.actual)) + facet_wrap(~overall.height) + labs(x = "glazing area ratio", y = "cooling load")
ggplot(eedatanumeric, aes(y = heating.load, x = glazing.area)) + geom_point(aes(color = glazing.area.actual)) + facet_wrap(~overall.height) + labs(x = "glazing area ratio", y = "heating load")