R notebooks
This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter.
5 + 3
[1] 8
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.
Now, let’s solve the study case.
Required Packages
Before running this packages, please, be sure of install them:
library(dplyr)
library(readxl)
library(ggplot2)
library(Hmisc)
library(RcmdrMisc)
and set the classic theme:
theme_set(theme_classic())
Introduction
Problem context
Sustainability is the ability of humanity to coexist with nature without altering its homeostasis. One action that contributes enormously to achieving this is energy saving. For this reason, the energy performance of buildings is an active research field. Some official reports suggests that bulling energy consumption has increased over the past years beacause of the use of heating, cooling and ventilation indoors.
Analytical context
Suppose you are a hired as an engineer in a consultant company for an Urban Planning Office, and your client is interested in study the impact of building geometry in energy consumption for air conditioning indoors. With this result, they would redact policies and regulations to govern the construction of new buildings.
You are in charge of the EDA. Your objective will be to:
Extract and the relevant information from data. You will have to manipulate several data sets to obtain useful information for the case.
Conduct Exploratory Data Analysis. You will have to create meaningful plots, and study the relationship between various features of existent buildings.
The Data
The dataset is obtained from a simulations of buildings of 771.75 \(m^3\). It contains eight features and two response variables (outcomes), denoted by y1 and y2, as follows:
X1 Relative Compactness
X2 Surface Area
X3 Wall Area
X4 Roof Area
X5 Overall Height
X6 Orientation
X7 Glazing Area
X8 Glazing Area Distribution
y1 Heating Load
y2 Cooling Load
Five different distribution scenarios for each glazing area are found in data: (1) uniform: with 25% glazing on each side, (2) north: 55% on the north side and 15% on each of the other sides, (3) east: 55% on the east side and 15% on each of the other sides, (4) south: 55% on the south side and 15% on each of the other sides, and (5) west: 55% on the west side and 15% on each of the other sides. (0) seems to be for buildings with no glasses.
Let’s take a look at the data frame:
buildings <- read_xlsx(path='ENB2012_data.xlsx', sheet=1)
head(buildings)
and also look inside:
str(buildings)
Furthermore, we can see three types of glazing areas were used, which are expressed as percentages of the floor area: 10%, 25%, and 40%.
Excercise 0
There are some categorical features that R has interpreted as numeric. Turn them into categorical using function factor(), and store them in new columns.
# Code here
Excercise 1
A data set is balanced if it has the same (at least, a similar) number of samples for each value of a categorical variable. Look into the data and determine how many data points are for each orientation, each glazing area and each glazing area distribution.
Is the data set balanced for variable glazing area?
Hint: Use function count() from dplyr package.
# Code here
Excercise 2
Use the result from exercise 1 to plot the results for variable glazing area. Hint: you must identify which plot to perform.
# Code here
Let’s do some marginal analysis, i.e, exploration in pairs.
Excercise 3
Which glazing area distribution is better? Compute summary statistics of responses by groups. Take feature X8, filter data frames for each value and compute the summary statistics for each case. Use numSummary() function from RcmdrMisc.
# Code here
Bonus (optional): You can also use describe() from Hmisc to compute summary statistics. Hint: For further information of describe(), please, run ?describe after loading Hmisc package.
# Code here
Exercise 4
How does the orientation and the glazing area influence the energy consumption? Use boxplots to answer this question.
# Code here
Now, use other plots to answer this question. Hint: in lecture notes we explored basic graphics. Please, be creative. Maybe, it is easier trying ggplots.
Excercise 5
Which of the (geometric) features has little influence in responses? Hint: use pairwise comparison between the features an the responses in order to give the answer.
# Code here
Conclusion
Based on your analysis, provide a short answer: which configuration allows to obtain a more efficient building?
LS0tDQp0aXRsZTogIk91ciBmaXJzdCBFREEgKEV4cGxvcmF0b3J5IERhdGEgQW5hbHlzaXMpIg0Kb3V0cHV0OiBodG1sX25vdGVib29rDQotLS0NCg0KIyMgUiBub3RlYm9va3MNCg0KVGhpcyBpcyBhbiBbUiBNYXJrZG93bl0oaHR0cDovL3JtYXJrZG93bi5yc3R1ZGlvLmNvbSkgTm90ZWJvb2suIFdoZW4geW91IGV4ZWN1dGUgY29kZSB3aXRoaW4gdGhlIG5vdGVib29rLCB0aGUgcmVzdWx0cyBhcHBlYXIgYmVuZWF0aCB0aGUgY29kZS4gDQoNClRyeSBleGVjdXRpbmcgdGhpcyBjaHVuayBieSBjbGlja2luZyB0aGUgKlJ1biogYnV0dG9uIHdpdGhpbiB0aGUgY2h1bmsgb3IgYnkgcGxhY2luZyB5b3VyIGN1cnNvciBpbnNpZGUgaXQgYW5kIHByZXNzaW5nICpDdHJsK1NoaWZ0K0VudGVyKi4gDQoNCmBgYHtyfQ0KNSArIDMNCmBgYA0KDQpBZGQgYSBuZXcgY2h1bmsgYnkgY2xpY2tpbmcgdGhlICpJbnNlcnQgQ2h1bmsqIGJ1dHRvbiBvbiB0aGUgdG9vbGJhciBvciBieSBwcmVzc2luZyAqQ3RybCtBbHQrSSouDQoNCldoZW4geW91IHNhdmUgdGhlIG5vdGVib29rLCBhbiBIVE1MIGZpbGUgY29udGFpbmluZyB0aGUgY29kZSBhbmQgb3V0cHV0IHdpbGwgYmUgc2F2ZWQgYWxvbmdzaWRlIGl0IChjbGljayB0aGUgKlByZXZpZXcqIGJ1dHRvbiBvciBwcmVzcyAqQ3RybCtTaGlmdCtLKiB0byBwcmV2aWV3IHRoZSBIVE1MIGZpbGUpLg0KDQpUaGUgcHJldmlldyBzaG93cyB5b3UgYSByZW5kZXJlZCBIVE1MIGNvcHkgb2YgdGhlIGNvbnRlbnRzIG9mIHRoZSBlZGl0b3IuIENvbnNlcXVlbnRseSwgdW5saWtlICpLbml0KiwgKlByZXZpZXcqIGRvZXMgbm90IHJ1biBhbnkgUiBjb2RlIGNodW5rcy4gSW5zdGVhZCwgdGhlIG91dHB1dCBvZiB0aGUgY2h1bmsgd2hlbiBpdCB3YXMgbGFzdCBydW4gaW4gdGhlIGVkaXRvciBpcyBkaXNwbGF5ZWQuDQoNCk5vdywgbGV0J3Mgc29sdmUgdGhlIHN0dWR5IGNhc2UuDQoNCg0KIyMgUmVxdWlyZWQgUGFja2FnZXMNCg0KQmVmb3JlIHJ1bm5pbmcgdGhpcyBwYWNrYWdlcywgcGxlYXNlLCBiZSBzdXJlIG9mIGluc3RhbGwgdGhlbToNCg0KYGBge3IsIHdhcm5pbmc9RkFMU0UsIGVjaG89VFJVRSwgbWVzc2FnZT1GQUxTRX0NCmxpYnJhcnkoZHBseXIpDQpsaWJyYXJ5KHJlYWR4bCkNCmxpYnJhcnkoZ2dwbG90MikNCmxpYnJhcnkoSG1pc2MpDQpsaWJyYXJ5KFJjbWRyTWlzYykNCmBgYA0KDQphbmQgc2V0IHRoZSBjbGFzc2ljIHRoZW1lOg0KDQpgYGB7cn0NCnRoZW1lX3NldCh0aGVtZV9jbGFzc2ljKCkpDQpgYGANCg0KDQojIyBJbnRyb2R1Y3Rpb24NCg0KIyMjIFByb2JsZW0gY29udGV4dA0KDQpTdXN0YWluYWJpbGl0eSBpcyB0aGUgYWJpbGl0eSBvZiBodW1hbml0eSB0byBjb2V4aXN0IHdpdGggbmF0dXJlIHdpdGhvdXQgYWx0ZXJpbmcgaXRzIGhvbWVvc3Rhc2lzLiBPbmUgYWN0aW9uIHRoYXQgY29udHJpYnV0ZXMgZW5vcm1vdXNseSB0byBhY2hpZXZpbmcgdGhpcyBpcyBlbmVyZ3kgc2F2aW5nLiBGb3IgdGhpcyByZWFzb24sIHRoZSBlbmVyZ3kgcGVyZm9ybWFuY2Ugb2YgYnVpbGRpbmdzIGlzIGFuIGFjdGl2ZSByZXNlYXJjaCBmaWVsZC4gU29tZSBvZmZpY2lhbCByZXBvcnRzIHN1Z2dlc3RzIHRoYXQgYnVsbGluZyBlbmVyZ3kgY29uc3VtcHRpb24gaGFzIGluY3JlYXNlZCBvdmVyIHRoZSBwYXN0IHllYXJzICBiZWFjYXVzZSBvZiB0aGUgdXNlIG9mIGhlYXRpbmcsIGNvb2xpbmcgYW5kIHZlbnRpbGF0aW9uIGluZG9vcnMuDQoNCiMjIyBBbmFseXRpY2FsIGNvbnRleHQNCg0KU3VwcG9zZSB5b3UgYXJlIGEgaGlyZWQgYXMgYW4gZW5naW5lZXIgaW4gYSBjb25zdWx0YW50IGNvbXBhbnkgZm9yIGFuIFVyYmFuIFBsYW5uaW5nIE9mZmljZSwgYW5kIHlvdXIgY2xpZW50IGlzIGludGVyZXN0ZWQgaW4gc3R1ZHkgKip0aGUgaW1wYWN0IG9mIGJ1aWxkaW5nIGdlb21ldHJ5IGluIGVuZXJneSBjb25zdW1wdGlvbiBmb3IgYWlyIGNvbmRpdGlvbmluZyBpbmRvb3JzKiouIFdpdGggdGhpcyByZXN1bHQsIHRoZXkgd291bGQgcmVkYWN0ICBwb2xpY2llcyBhbmQgcmVndWxhdGlvbnMgdG8gZ292ZXJuIHRoZSBjb25zdHJ1Y3Rpb24gb2YgbmV3IGJ1aWxkaW5ncy4NCg0KWW91IGFyZSBpbiBjaGFyZ2Ugb2YgdGhlIEVEQS4gWW91ciBvYmplY3RpdmUgd2lsbCBiZSB0bzoNCg0KMS4gRXh0cmFjdCBhbmQgdGhlIHJlbGV2YW50IGluZm9ybWF0aW9uIGZyb20gZGF0YS4gWW91IHdpbGwgaGF2ZSB0byBtYW5pcHVsYXRlIHNldmVyYWwgZGF0YSBzZXRzIHRvIG9idGFpbiB1c2VmdWwgaW5mb3JtYXRpb24gZm9yIHRoZSBjYXNlLg0KDQoyLiBDb25kdWN0IEV4cGxvcmF0b3J5IERhdGEgQW5hbHlzaXMuIFlvdSB3aWxsIGhhdmUgdG8gY3JlYXRlIG1lYW5pbmdmdWwgcGxvdHMsIGFuZCBzdHVkeSB0aGUgcmVsYXRpb25zaGlwIGJldHdlZW4gdmFyaW91cyBmZWF0dXJlcyBvZiBleGlzdGVudCBidWlsZGluZ3MuDQoNCiMjIFRoZSBEYXRhDQoNClRoZSBkYXRhc2V0IGlzIG9idGFpbmVkIGZyb20gYSBzaW11bGF0aW9ucyBvZiBidWlsZGluZ3Mgb2YgNzcxLjc1ICRtXjMkLiBJdCBjb250YWlucyBlaWdodCBmZWF0dXJlcyBhbmQgdHdvIHJlc3BvbnNlIHZhcmlhYmxlcyAob3V0Y29tZXMpLCBkZW5vdGVkIGJ5IGB5MWAgYW5kIGB5MmAsIGFzIGZvbGxvd3M6DQoNCi0gYFgxYCBSZWxhdGl2ZSBDb21wYWN0bmVzcw0KLSBgWDJgIFN1cmZhY2UgQXJlYQ0KLSBgWDNgIFdhbGwgQXJlYQ0KLSBgWDRgIFJvb2YgQXJlYQ0KLSBgWDVgIE92ZXJhbGwgSGVpZ2h0DQotIGBYNmAgT3JpZW50YXRpb24NCi0gYFg3YCBHbGF6aW5nIEFyZWENCi0gYFg4YCBHbGF6aW5nIEFyZWEgRGlzdHJpYnV0aW9uDQotIGB5MWAgSGVhdGluZyBMb2FkDQotIGB5MmAgQ29vbGluZyBMb2FkDQoNCkZpdmUgZGlmZmVyZW50IGRpc3RyaWJ1dGlvbiBzY2VuYXJpb3MgZm9yIGVhY2ggZ2xhemluZyBhcmVhIGFyZSBmb3VuZCBpbiBkYXRhOiAoMSkgdW5pZm9ybTogd2l0aCAyNSUgZ2xhemluZyBvbiBlYWNoIHNpZGUsICgyKSBub3J0aDogNTUlIG9uIHRoZSBub3J0aCBzaWRlIGFuZCAxNSUgb24gZWFjaCBvZiB0aGUgb3RoZXIgc2lkZXMsICgzKSBlYXN0OiA1NSUgb24gdGhlIGVhc3Qgc2lkZSBhbmQgMTUlIG9uIGVhY2ggb2YgdGhlIG90aGVyIHNpZGVzLCAoNCkgc291dGg6IDU1JSBvbiB0aGUgc291dGggc2lkZSBhbmQgMTUlIG9uIGVhY2ggb2YgdGhlIG90aGVyIHNpZGVzLCBhbmQgKDUpIHdlc3Q6IDU1JSBvbiB0aGUgd2VzdCBzaWRlIGFuZCAxNSUgb24gZWFjaCBvZiB0aGUgb3RoZXIgc2lkZXMuICgwKSBzZWVtcyB0byBiZSBmb3IgYnVpbGRpbmdzIHdpdGggbm8gZ2xhc3Nlcy4NCg0KTGV0J3MgdGFrZSBhIGxvb2sgYXQgdGhlIGRhdGEgZnJhbWU6DQoNCmBgYHtyfQ0KYnVpbGRpbmdzIDwtIHJlYWRfeGxzeChwYXRoPSdFTkIyMDEyX2RhdGEueGxzeCcsIHNoZWV0PTEpDQpoZWFkKGJ1aWxkaW5ncykNCmBgYA0KDQphbmQgYWxzbyBsb29rIGluc2lkZToNCg0KYGBge3J9DQpzdHIoYnVpbGRpbmdzKQ0KYGBgDQoNCkZ1cnRoZXJtb3JlLCB3ZSBjYW4gc2VlIHRocmVlIHR5cGVzIG9mIGdsYXppbmcgYXJlYXMgd2VyZSB1c2VkLCB3aGljaCBhcmUgZXhwcmVzc2VkIGFzIHBlcmNlbnRhZ2VzIG9mIHRoZSBmbG9vciBhcmVhOiAxMCUsIDI1JSwgYW5kIDQwJS4NCg0KDQojIyBFeGNlcmNpc2UgMA0KVGhlcmUgYXJlIHNvbWUgY2F0ZWdvcmljYWwgZmVhdHVyZXMgdGhhdCBSIGhhcyBpbnRlcnByZXRlZCBhcyBudW1lcmljLiBUdXJuIHRoZW0gaW50byBjYXRlZ29yaWNhbCB1c2luZyBmdW5jdGlvbiBgZmFjdG9yKClgLCBhbmQgc3RvcmUgdGhlbSBpbiBuZXcgY29sdW1ucy4NCg0KYGBge3J9DQojIENvZGUgaGVyZQ0KYGBgDQoNCg0KIyMgRXhjZXJjaXNlIDENCg0KQSBkYXRhIHNldCBpcyBiYWxhbmNlZCBpZiBpdCBoYXMgdGhlIHNhbWUgKGF0IGxlYXN0LCBhIHNpbWlsYXIpIG51bWJlciBvZiBzYW1wbGVzIGZvciBlYWNoIHZhbHVlIG9mIGEgY2F0ZWdvcmljYWwgdmFyaWFibGUuIExvb2sgaW50byB0aGUgZGF0YSBhbmQgZGV0ZXJtaW5lIGhvdyBtYW55IGRhdGEgcG9pbnRzIGFyZSBmb3IgZWFjaCBvcmllbnRhdGlvbiwgZWFjaCBnbGF6aW5nIGFyZWEgYW5kIGVhY2ggZ2xhemluZyBhcmVhIGRpc3RyaWJ1dGlvbi4NCg0KSXMgdGhlIGRhdGEgc2V0IGJhbGFuY2VkIGZvciB2YXJpYWJsZSBfZ2xhemluZyBhcmVhXz8NCg0KX0hpbnQ6IFVzZSBmdW5jdGlvbiBgY291bnQoKWAgZnJvbSAqKmRwbHlyKiogcGFja2FnZS5fDQoNCmBgYHtyfQ0KIyBDb2RlIGhlcmUNCmBgYA0KDQojIyBFeGNlcmNpc2UgMg0KDQpVc2UgdGhlIHJlc3VsdCBmcm9tIGV4ZXJjaXNlIDEgdG8gcGxvdCB0aGUgcmVzdWx0cyBmb3IgdmFyaWFibGUgX2dsYXppbmcgYXJlYV8uIF9IaW50Ol8geW91IG11c3QgaWRlbnRpZnkgd2hpY2ggcGxvdCB0byBwZXJmb3JtLg0KDQpgYGB7cn0NCiMgQ29kZSBoZXJlDQpgYGANCg0KTGV0J3MgZG8gc29tZSBtYXJnaW5hbCBhbmFseXNpcywgaS5lLCBleHBsb3JhdGlvbiBpbiBwYWlycy4NCg0KIyMgRXhjZXJjaXNlIDMNCg0KV2hpY2ggZ2xhemluZyBhcmVhIGRpc3RyaWJ1dGlvbiBpcyBiZXR0ZXI/IENvbXB1dGUgc3VtbWFyeSBzdGF0aXN0aWNzIG9mIHJlc3BvbnNlcyBieSBncm91cHMuICBUYWtlIGZlYXR1cmUgYFg4YCwgZmlsdGVyIGRhdGEgZnJhbWVzIGZvciBlYWNoIHZhbHVlIGFuZCBjb21wdXRlIHRoZSBzdW1tYXJ5IHN0YXRpc3RpY3MgZm9yIGVhY2ggY2FzZS4gVXNlIGBudW1TdW1tYXJ5KClgIGZ1bmN0aW9uIGZyb20gKipSY21kck1pc2MqKi4NCg0KYGBge3J9DQojIENvZGUgaGVyZQ0KYGBgDQoNCioqQm9udXMgKG9wdGlvbmFsKSoqOiBZb3UgY2FuIGFsc28gdXNlIGBkZXNjcmliZSgpYCBmcm9tICoqSG1pc2MqKiB0byBjb21wdXRlIHN1bW1hcnkgc3RhdGlzdGljcy4NCl9IaW50OiBGb3IgZnVydGhlciBpbmZvcm1hdGlvbiBvZiBgZGVzY3JpYmUoKWAsIHBsZWFzZSwgcnVuIGA/ZGVzY3JpYmVgIGFmdGVyIGxvYWRpbmcgKipIbWlzYyoqIHBhY2thZ2UuXw0KDQpgYGB7cn0NCiMgQ29kZSBoZXJlDQpgYGANCg0KIyMgRXhlcmNpc2UgNA0KDQpIb3cgZG9lcyB0aGUgb3JpZW50YXRpb24gYW5kIHRoZSBnbGF6aW5nIGFyZWEgaW5mbHVlbmNlIHRoZSBlbmVyZ3kgY29uc3VtcHRpb24/IFVzZSBib3hwbG90cyB0byBhbnN3ZXIgdGhpcyBxdWVzdGlvbi4NCg0KYGBge3J9DQojIENvZGUgaGVyZQ0KYGBgDQoNCk5vdywgdXNlIG90aGVyIHBsb3RzIHRvIGFuc3dlciB0aGlzIHF1ZXN0aW9uLiBfSGludDpfIGluIGxlY3R1cmUgbm90ZXMgd2UgZXhwbG9yZWQgYmFzaWMgZ3JhcGhpY3MuIFBsZWFzZSwgYmUgY3JlYXRpdmUuIE1heWJlLCBpdCBpcyBlYXNpZXIgdHJ5aW5nIF9nZ3Bsb3RzXy4NCg0KIyMgRXhjZXJjaXNlIDUNCg0KV2hpY2ggb2YgdGhlIChnZW9tZXRyaWMpIGZlYXR1cmVzIGhhcyBsaXR0bGUgaW5mbHVlbmNlIGluIHJlc3BvbnNlcz8gX0hpbnQ6XyB1c2UgcGFpcndpc2UgY29tcGFyaXNvbiBiZXR3ZWVuIHRoZSBmZWF0dXJlcyBhbiB0aGUgcmVzcG9uc2VzIGluIG9yZGVyIHRvIGdpdmUgdGhlIGFuc3dlci4NCg0KYGBge3J9DQojIENvZGUgaGVyZQ0KYGBgDQoNCiMjIENvbmNsdXNpb24NCg0KQmFzZWQgb24geW91ciBhbmFseXNpcywgcHJvdmlkZSBhIHNob3J0IGFuc3dlcjogd2hpY2ggY29uZmlndXJhdGlvbiBhbGxvd3MgdG8gb2J0YWluIGEgbW9yZSBlZmZpY2llbnQgYnVpbGRpbmc/DQo=