Energy efficiency Data Set

For this project for Data 606, we are going to work on the Energy Efficient data set which is present at this web link: http://archive.ics.uci.edu/ml/datasets/Energy+efficiency#

Data Preparation

Loading the data set into R, so that it can be used

library(readxl)
library(ggplot2)
library(gridExtra)

energy.efficiency.df1 <- read_xlsx("ENB2012_data.xlsx", sheet = 1)

head(energy.efficiency.df1)
## # A tibble: 6 x 10
##      X1    X2    X3    X4    X5    X6    X7    X8    Y1    Y2
##   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1  0.98  514.  294   110.     7     2     0     0  15.6  21.3
## 2  0.98  514.  294   110.     7     3     0     0  15.6  21.3
## 3  0.98  514.  294   110.     7     4     0     0  15.6  21.3
## 4  0.98  514.  294   110.     7     5     0     0  15.6  21.3
## 5  0.9   564.  318.  122.     7     2     0     0  20.8  28.3
## 6  0.9   564.  318.  122.     7     3     0     0  21.5  25.4
names(energy.efficiency.df1) <- c("relative.compactness", "surface.area", "wall.area", "roof.area", "overall.height", "orientation", "glazing.area", "glazing.area.distribution", "heating.load", "cooling.load")

summary(energy.efficiency.df1)
##  relative.compactness  surface.area     wall.area       roof.area    
##  Min.   :0.6200       Min.   :514.5   Min.   :245.0   Min.   :110.2  
##  1st Qu.:0.6825       1st Qu.:606.4   1st Qu.:294.0   1st Qu.:140.9  
##  Median :0.7500       Median :673.8   Median :318.5   Median :183.8  
##  Mean   :0.7642       Mean   :671.7   Mean   :318.5   Mean   :176.6  
##  3rd Qu.:0.8300       3rd Qu.:741.1   3rd Qu.:343.0   3rd Qu.:220.5  
##  Max.   :0.9800       Max.   :808.5   Max.   :416.5   Max.   :220.5  
##  overall.height  orientation    glazing.area    glazing.area.distribution
##  Min.   :3.50   Min.   :2.00   Min.   :0.0000   Min.   :0.000            
##  1st Qu.:3.50   1st Qu.:2.75   1st Qu.:0.1000   1st Qu.:1.750            
##  Median :5.25   Median :3.50   Median :0.2500   Median :3.000            
##  Mean   :5.25   Mean   :3.50   Mean   :0.2344   Mean   :2.812            
##  3rd Qu.:7.00   3rd Qu.:4.25   3rd Qu.:0.4000   3rd Qu.:4.000            
##  Max.   :7.00   Max.   :5.00   Max.   :0.4000   Max.   :5.000            
##   heating.load    cooling.load  
##  Min.   : 6.01   Min.   :10.90  
##  1st Qu.:12.99   1st Qu.:15.62  
##  Median :18.95   Median :22.08  
##  Mean   :22.31   Mean   :24.59  
##  3rd Qu.:31.67   3rd Qu.:33.13  
##  Max.   :43.10   Max.   :48.03
energy.efficiency.df1$orientation <- as.factor(energy.efficiency.df1$orientation)
levels(energy.efficiency.df1$orientation)[levels(energy.efficiency.df1$orientation) == "2"] <- "north"
levels(energy.efficiency.df1$orientation)[levels(energy.efficiency.df1$orientation) == "3"] <- "east"
levels(energy.efficiency.df1$orientation)[levels(energy.efficiency.df1$orientation) == "4"] <- "south"
levels(energy.efficiency.df1$orientation)[levels(energy.efficiency.df1$orientation) == "5"] <- "west"

head(energy.efficiency.df1)
## # A tibble: 6 x 10
##   relative.compac~ surface.area wall.area roof.area overall.height
##              <dbl>        <dbl>     <dbl>     <dbl>          <dbl>
## 1             0.98         514.      294       110.              7
## 2             0.98         514.      294       110.              7
## 3             0.98         514.      294       110.              7
## 4             0.98         514.      294       110.              7
## 5             0.9          564.      318.      122.              7
## 6             0.9          564.      318.      122.              7
## # ... with 5 more variables: orientation <fct>, glazing.area <dbl>,
## #   glazing.area.distribution <dbl>, heating.load <dbl>,
## #   cooling.load <dbl>

Attribute Details:

Relative Compactness - Ratio

Surface Area - sq. meters

Wall Area - sq. meters

Roof Area - sq. meters

Overall Height - meters

Orientation - 2:North, 3:East, 4:South, 5:West

Glazing area (ratio) - 0.00, 0.10, 0.25, 0.40

Glazing area distribution - 1:Uniform, 2:North, 3:East, 4:South, 5:West

Heating Load - kWh/sq. meters

Cooling Load - kWh/sq. meters

Research question

You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.

  1. Is the relative compactness of a building predictive of heating load and cooling load of a building ?

  2. Is the area (wall area / roof area / surface area) of a building predictive of the heating or cooling load ?

  3. Does the building orientation suggest anything about the heating or cooling efficiency of a building ?

Cases

What are the cases, and how many are there?

Describe the method of data collection.

The dataset was created by Angeliki Xifara (angxifara ‘@’ gmail.com, Civil/Structural Engineer) and was processed by Athanasios Tsanas (tsanasthanasis ‘@’ gmail.com, Oxford Centre for Industrial and Applied Mathematics, University of Oxford, UK).

Type of study

What type of study is this (observational/experiment)?

This is an observational study as the data has been collected for the buildings which are there already. This is not an experiment.

Data Source

If you collected the data, state self-collected. If not, provide a citation/link.

The data is collected from: http://archive.ics.uci.edu/ml/datasets/Energy+efficiency#

Dependent Variable

What is the response variable? Is it quantitative or qualitative?

The response variables are: heating.load cooling.load

Independent Variable

You should have two independent variables, one quantitative and one qualitative.

In this case, we have the below independent variables: relative.compactness - numerical surface.area - numerical wall.area - numerical roof.area - numerical overall.height - numerical orientation - categorical glazing.area - numerical glazing.area.distribution - categorical

Relevant summary statistics

Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

Efficiencies against relative compactness

a <- ggplot(energy.efficiency.df1, aes(x=energy.efficiency.df1$relative.compactness, y=energy.efficiency.df1$heating.load)) +
  geom_point()

b <- ggplot(energy.efficiency.df1, aes(x=energy.efficiency.df1$relative.compactness, y=energy.efficiency.df1$cooling.load)) +
  geom_point()

grid.arrange(a, b, nrow = 1)

Efficiencies against surface area

a <- ggplot(energy.efficiency.df1, aes(x=energy.efficiency.df1$surface.area, y=energy.efficiency.df1$heating.load)) +
  geom_point()

b <- ggplot(energy.efficiency.df1, aes(x=energy.efficiency.df1$surface.area, y=energy.efficiency.df1$cooling.load)) +
  geom_point()

grid.arrange(a, b, nrow = 1)

Relationships between the 2 efficiency values

ggplot(energy.efficiency.df1, aes(x=energy.efficiency.df1$heating.load, y=energy.efficiency.df1$cooling.load)) +
  geom_point() + stat_smooth(method = lm)