This report is based on the dataset available at https://archive.ics.uci.edu/ml/datasets/Energy+efficiency# and provides the raw data. Our interest in this dataset is to explore 3 interesting data points, and visualize how they are influencing the energy load.The dataset contains 8 input variables and 2 output variables (see variable information section).
We have figured that the Wall Area, Roof Area, Glazing Area are the key indicators and can influence the energy load efficiency for both (Heating and Cooling).
NOTE: In order to run this file from source you need to ensure the following R packages are installed:
We have a cleansed dataset available at Github from: https://raw.githubusercontent.com/StephenElston/DataScience350/master/Lecture1/EnergyEfficiencyData.csv.
Below are the steps followed in downloading the RAW data from the github and also, the actual dataset contains only the numerical data. So let’s categorize few useful variables and convert them to a categorical for “Orientation”, “Glazing Area”,“Glazing Area Distribution (variance)” variables.
rm(list = ls())
SourceURL_Raw <- "https://raw.githubusercontent.com/StephenElston/DataScience350/master/Lecture1/EnergyEfficiencyData.csv"
energy.efficiency <- read.csv( SourceURL_Raw, header = TRUE)
require(ggplot2)
## Loading required package: ggplot2
energy.efficiency$Orientation <- as.factor(energy.efficiency$Orientation)
levels(energy.efficiency$Orientation) <- c("North", "East", "South", "West")
energy.efficiency$Glazing.Area.Distribution <- as.factor(energy.efficiency$Glazing.Area.Distribution)
levels(energy.efficiency$Glazing.Area.Distribution) <- c("UnKnown", "Uniform", "North", "East", "South", "West")
energy.efficiency$Glazing.Area <- as.factor(energy.efficiency$Glazing.Area)
levels(energy.efficiency$Glazing.Area) <- c("0%", "10%", "25%", "40%")
Lets look at the summary of the energy.effiiency data.
summary(energy.efficiency)
## Relative.Compactness Surface.Area Wall.Area Roof.Area
## Min. :0.6200 Min. :514.5 Min. :245.0 Min. :110.2
## 1st Qu.:0.6825 1st Qu.:606.4 1st Qu.:294.0 1st Qu.:140.9
## Median :0.7500 Median :673.8 Median :318.5 Median :183.8
## Mean :0.7642 Mean :671.7 Mean :318.5 Mean :176.6
## 3rd Qu.:0.8300 3rd Qu.:741.1 3rd Qu.:343.0 3rd Qu.:220.5
## Max. :0.9800 Max. :808.5 Max. :416.5 Max. :220.5
## Overall.Height Orientation Glazing.Area Glazing.Area.Distribution
## Min. :3.50 North:192 0% : 48 UnKnown: 48
## 1st Qu.:3.50 East :192 10%:240 Uniform:144
## Median :5.25 South:192 25%:240 North :144
## Mean :5.25 West :192 40%:240 East :144
## 3rd Qu.:7.00 South :144
## Max. :7.00 West :144
## Heating.Load Cooling.Load
## Min. : 6.01 Min. :10.90
## 1st Qu.:12.99 1st Qu.:15.62
## Median :18.95 Median :22.08
## Mean :22.31 Mean :24.59
## 3rd Qu.:31.67 3rd Qu.:33.13
## Max. :43.10 Max. :48.03
Lets visualize and find out if there is any relation between Roof Area, Surface Area and Glazing Area and how the Load is distributed using scatter plot.
ggplot(energy.efficiency, aes(x = Cooling.Load, y = Heating.Load), alpha = 0.3)+
geom_point(aes(colour = Roof.Area ))+
facet_grid(Overall.Height + Glazing.Area ~ Surface.Area, space = "free") +
ggtitle("Load distribuiton of energy by Roof Area and Surface Area \n by Glazing Area and Overall Height")
Lets visualize again, how the Wall area is influencing heating load using raster plot.
ggplot(energy.efficiency, aes( Surface.Area, Roof.Area)) +
geom_raster(aes(fill = Heating.Load), interpolate = TRUE) +
scale_fill_gradient(low = "steelblue", high = "red")+
facet_wrap(~Wall.Area, scales = "free" )+
ggtitle('Measuring Heating Load distribution \n by Wall Area, Surface Area and Roof Area') +
xlab('Surface Area') + ylab('Roof Area')
Similarly, lets figure out for the cooling load.
ggplot(energy.efficiency, aes(Surface.Area, Roof.Area)) +
geom_raster(aes(fill = Cooling.Load), interpolate = TRUE) +
scale_fill_gradient(low = "grey", high = "steelblue")+
facet_wrap(~Wall.Area, scales = "free" )+
ggtitle('Measuring Cooling Load distribution \n by Wall Area, Surface Area and Roof Area') +
xlab('Surface Area') + ylab('Roof Area')
So, Wall Area plays a significant role in both Heating and Cooling Load efficiency.
We have seen more variation in load data when the overall height is (7.0). So lets create a subset named(energy.eff.sub7.0) which contains the filtered data with overall height = 7.0.
Lets visualize, if the Roof Area, Wall Area, Surface Area and Glazing Area are influencing the load efficiency.
energy.eff.sub7.0 <- energy.efficiency[ energy.efficiency$Overall.Height ==7.0,]
ggplot(energy.eff.sub7.0,
aes(x = Cooling.Load, y = Heating.Load, group = factor(round(Wall.Area)),
size = Glazing.Area,
shape = factor(round(Wall.Area))))+
geom_point(aes(colour= factor(round(Surface.Area))), alpha = 0.3)+
geom_smooth(method = "lm",se = TRUE )+
facet_grid(~ Roof.Area ) +
ggtitle('Load efficiency by Roof Area, by Wall Area, \n by Surface Area and by Glazing Area')
## Warning: Using size for a discrete variable is not advised.