For this project for Data 606, we are going to work on the Energy Efficient data set which is present at this web link: http://archive.ics.uci.edu/ml/datasets/Energy+efficiency#
Loading the data set into R, so that it can be used
library(readxl)
library(ggplot2)
library(gridExtra)
energy.efficiency.df1 <- read_xlsx("ENB2012_data.xlsx", sheet = 1)
head(energy.efficiency.df1)
## # A tibble: 6 x 10
## X1 X2 X3 X4 X5 X6 X7 X8 Y1 Y2
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.98 514. 294 110. 7 2 0 0 15.6 21.3
## 2 0.98 514. 294 110. 7 3 0 0 15.6 21.3
## 3 0.98 514. 294 110. 7 4 0 0 15.6 21.3
## 4 0.98 514. 294 110. 7 5 0 0 15.6 21.3
## 5 0.9 564. 318. 122. 7 2 0 0 20.8 28.3
## 6 0.9 564. 318. 122. 7 3 0 0 21.5 25.4
names(energy.efficiency.df1) <- c("relative.compactness", "surface.area", "wall.area", "roof.area", "overall.height", "orientation", "glazing.area", "glazing.area.distribution", "heating.load", "cooling.load")
summary(energy.efficiency.df1)
## relative.compactness surface.area wall.area roof.area
## Min. :0.6200 Min. :514.5 Min. :245.0 Min. :110.2
## 1st Qu.:0.6825 1st Qu.:606.4 1st Qu.:294.0 1st Qu.:140.9
## Median :0.7500 Median :673.8 Median :318.5 Median :183.8
## Mean :0.7642 Mean :671.7 Mean :318.5 Mean :176.6
## 3rd Qu.:0.8300 3rd Qu.:741.1 3rd Qu.:343.0 3rd Qu.:220.5
## Max. :0.9800 Max. :808.5 Max. :416.5 Max. :220.5
## overall.height orientation glazing.area glazing.area.distribution
## Min. :3.50 Min. :2.00 Min. :0.0000 Min. :0.000
## 1st Qu.:3.50 1st Qu.:2.75 1st Qu.:0.1000 1st Qu.:1.750
## Median :5.25 Median :3.50 Median :0.2500 Median :3.000
## Mean :5.25 Mean :3.50 Mean :0.2344 Mean :2.812
## 3rd Qu.:7.00 3rd Qu.:4.25 3rd Qu.:0.4000 3rd Qu.:4.000
## Max. :7.00 Max. :5.00 Max. :0.4000 Max. :5.000
## heating.load cooling.load
## Min. : 6.01 Min. :10.90
## 1st Qu.:12.99 1st Qu.:15.62
## Median :18.95 Median :22.08
## Mean :22.31 Mean :24.59
## 3rd Qu.:31.67 3rd Qu.:33.13
## Max. :43.10 Max. :48.03
energy.efficiency.df1$orientation <- as.factor(energy.efficiency.df1$orientation)
levels(energy.efficiency.df1$orientation)[levels(energy.efficiency.df1$orientation) == "2"] <- "north"
levels(energy.efficiency.df1$orientation)[levels(energy.efficiency.df1$orientation) == "3"] <- "east"
levels(energy.efficiency.df1$orientation)[levels(energy.efficiency.df1$orientation) == "4"] <- "south"
levels(energy.efficiency.df1$orientation)[levels(energy.efficiency.df1$orientation) == "5"] <- "west"
head(energy.efficiency.df1)
## # A tibble: 6 x 10
## relative.compac~ surface.area wall.area roof.area overall.height
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.98 514. 294 110. 7
## 2 0.98 514. 294 110. 7
## 3 0.98 514. 294 110. 7
## 4 0.98 514. 294 110. 7
## 5 0.9 564. 318. 122. 7
## 6 0.9 564. 318. 122. 7
## # ... with 5 more variables: orientation <fct>, glazing.area <dbl>,
## # glazing.area.distribution <dbl>, heating.load <dbl>,
## # cooling.load <dbl>
Attribute Details:
Relative Compactness - Ratio
Surface Area - sq. meters
Wall Area - sq. meters
Roof Area - sq. meters
Overall Height - meters
Orientation - 2:North, 3:East, 4:South, 5:West
Glazing area (ratio) - 0.00, 0.10, 0.25, 0.40
Glazing area distribution - 1:Uniform, 2:North, 3:East, 4:South, 5:West
Heating Load - kWh/sq. meters
Cooling Load - kWh/sq. meters
You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.
Is the relative compactness of a building predictive of heating load and cooling load of a building ?
Is the area (wall area / roof area / surface area) of a building predictive of the heating or cooling load ?
Does the building orientation suggest anything about the heating or cooling efficiency of a building ?
What are the cases, and how many are there?
Cases are all these variable values for each building which is a part of the sample
There are total of 768 cases in this data set
Describe the method of data collection.
The dataset was created by Angeliki Xifara (angxifara ‘@’ gmail.com, Civil/Structural Engineer) and was processed by Athanasios Tsanas (tsanasthanasis ‘@’ gmail.com, Oxford Centre for Industrial and Applied Mathematics, University of Oxford, UK).
What type of study is this (observational/experiment)?
This is an observational study as the data has been collected for the buildings which are there already. This is not an experiment.
If you collected the data, state self-collected. If not, provide a citation/link.
The data is collected from: http://archive.ics.uci.edu/ml/datasets/Energy+efficiency#
What is the response variable? Is it quantitative or qualitative?
The response variables are: heating.load cooling.load
You should have two independent variables, one quantitative and one qualitative.
In this case, we have the below independent variables: relative.compactness - numerical surface.area - numerical wall.area - numerical roof.area - numerical overall.height - numerical orientation - categorical glazing.area - numerical glazing.area.distribution - categorical
Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
Efficiencies against relative compactness
a <- ggplot(energy.efficiency.df1, aes(x=energy.efficiency.df1$relative.compactness, y=energy.efficiency.df1$heating.load)) +
geom_point()
b <- ggplot(energy.efficiency.df1, aes(x=energy.efficiency.df1$relative.compactness, y=energy.efficiency.df1$cooling.load)) +
geom_point()
grid.arrange(a, b, nrow = 1)
Efficiencies against surface area
a <- ggplot(energy.efficiency.df1, aes(x=energy.efficiency.df1$surface.area, y=energy.efficiency.df1$heating.load)) +
geom_point()
b <- ggplot(energy.efficiency.df1, aes(x=energy.efficiency.df1$surface.area, y=energy.efficiency.df1$cooling.load)) +
geom_point()
grid.arrange(a, b, nrow = 1)
Relationships between the 2 efficiency values
ggplot(energy.efficiency.df1, aes(x=energy.efficiency.df1$heating.load, y=energy.efficiency.df1$cooling.load)) +
geom_point() + stat_smooth(method = lm)