R Markdown

R is a free open source statistical analysis package, which can process data and generate visualisations. OpenAir, a succint package that has been created to quickly provide insights into air quality data. This post is going to have a look at a the data from a simple loop electricity monitor and a HOBO CO2 monitor that ended up sitting on my desk for all of 2019.

A quick way to get an overview of the full year is to use the calendarPlot function in openair

##read in the office data
office_kWh <- read.csv("office.csv", stringsAsFactors = FALSE)
##read in the kitchen electricity data
kitchen_kWh <- read.csv("kitchen.csv", stringsAsFactors = FALSE)
##read in the CO2 data
CO2 <- read.csv("CO2.csv", stringsAsFactors = FALSE)

Then tell R what format the dates are in. For this we load in the ‘Lubridate’ package. This is one of the most powerful features, which allows quick manipulation by date and time. One of the

library(lubridate)
##tell R the date structure of both datasets
office_kWh$date <- ymd_hms(office_kWh$date)
kitchen_kWh$date <- dmy_hm(kitchen_kWh$date)
CO2$date <- ymd_hms(CO2$date)

The CO2 data has been logging at 5 minute intervals, which is great for getting an accurate hourly average, however, in order to compare with other datasets we need to convert it to an hourly average. Openair has a very easy function for this: ‘timeAverage’. Because we have already defined the date and time format, R can easily manipulate the time series. The electricity data is already logged at hourly intervals, but some of the raw files used to compile the single file used for this had a few duplicated days and hours. The timeAverage function will effectively remove duplicated hours. Conversely if we needed the hourly data in 30 or 15 minute time series it can split the data down.

library(openair)
##average the 5 minute time series to 1 hour
CO2 <- timeAverage(CO2, "hour")
##average the time series to 1 hour to remove any duplicates
office_kWh <- timeAverage(office_kWh, "hour")
kitchen_kWh <- timeAverage(kitchen_kWh, "hour")
##give the headers unique names
names(office_kWh) <- c("date", "office_kWh")
names(kitchen_kWh) <- c("date", "kitchen_kWh")

Calendar Plots

A quick way to get an overview of the full year is to use the calendarPlot function from openair. This gives us the diagram below. Straight away it is obvious the weekends are consistently the days with the lowest consumption, which is no surprise as not many people come into the office.

##specify the column to be plotted, a header for the chart (main), for the key and the year to be plotted
calendarPlot(office_kWh, "office_kWh", key.header = "kWh",year = "2019", main = "Office electricity consumption (kWh)")

One of the most useful openAir functions is the timeVariation. This enables datasets to be broken down by time periods or compared with other datasets. Below is the code and the output for full electricity monitoring dataset grouped by year.

##plot the average electricity consumption for each year. Key.columns specifies the layout of the legend/key
timeVariation(office_kWh, "office_kWh", group = "year", main = "Office Electricity Consumption (kWh)", ylab = "kWh", key.columns = 2)

Now that the data is in the same time series, the two datasets can be combined to make analysis simpler. To do this we load in the dyplr package which is one of the most common R packages and is used for manipulation of data. The column that is unique to both is specified, which is “date”.

library(dplyr)
##join the electricity and CO2 data
CO2_kWh <- left_join(CO2, office_kWh, by = "date")
CO2_kWh <- left_join(CO2_kWh, kitchen_kWh, by = "date")

We can use the timeVariation function again to see if electricity consumption relates to indoor CO2. Before we do this let’s get rid of weekend and bank holidays as these skew the daily averages. To do this we use the ‘cutData’ function in openair and the ‘holidayLONDON’ function in the package ‘timeDate’, which contains dates of all public holidays for London (which of course are the same as Bristol) for past years.

##use the openair package to determine if a date is a weekend or a weekday and then keep only the weekdays
CO2_kWh <- CO2_kWh %>% 
  cutData(type = "weekend") %>% 
  filter(weekend == "weekday")

library(timeDate)
##find the bank holidays for England between 2017 and 2020.
Bank_Hols <- as.character(holidayLONDON(2017:2020))
##change the date to a character
CO2_kWh$day <- as.character(date(CO2_kWh$date))
##filter out the bank holidays days (filter using the ! will remove the specified variables)
CO2_kWh <- filter(CO2_kWh, !day %in% Bank_Hols)
##generate time variation plot. Use ylab to define y scale name and alpha for the transparency of the confidence intervals
timeVariation(CO2_kWh, "office_kWh", group = "CO2", main = "Office Electricity Consumption (kWh)", ylab = "kWh", alpha = 0.2)
## Warning: removing 1707 missing rows due to CO2

We can see that when office electricity consumption is high, CO2 levels are generally. As we are only using 1 years worth of data, the bins for each category are not that large hence some spurious results (such as the peak on Tuesday afternoons when CO2 is low).

Met Data

Another great feature of R is the ability to not only access historic weather data, but also join it with the datasets.

To access weather data we use the ‘Worldmet’ package also written by David Carslaw. This accesses the National Oceanic and Atmospheric Administration (NOAA) database which logs weather data from sites all around the world.

Use the ‘getMeta’ function to bring up a map of the nearest weather stations

library(worldmet)
library(mapview)
#use the getMeta function to find nearest weather stations to a location
met_info <- getMeta(lat = 51.45, lon = -2.5)

Now that Filton has closed down, the closest weather station to Central Bristol is Lulsgate (Bristol Airport) shown on the map.

#copy and paste site code from the map output and specify the years required. If rain data is required use the precip command. Unfortunately precipitation data is not available for Bristol Airport from the NOAA database
IMPORT_met <- importNOAA(code = "037243-99999", year = 2019:2019, precip = TRUE)

Once we’ve got our weather data we can join it to our CO2 and electricity data using the ‘left_join’ function again

##join the CO2 and electricity data with the downloaded weather data
CO2_kWh_met <- left_join(CO2_kWh, IMPORT_met, by = "date")

timeVariation(CO2_kWh_met, "office_kWh", main = "Office Electricity Consumption (kWh)", ylab = "kWh")

Another useful insight is the variation of CO2 with wind speed and direction. The office is in an exposed bridge over the road shown in the figure below.

Proximity of office (looking west)

The polarPlot and polarMap (openairmaps package) functions lets us quickly see the variation in indoor CO2 for wind speed and direction as measured at Bristol Airport.

library(openairmaps)
##for polarMap to work correctly the latitude and longitude of the site needs to be specified
CO2_kWh_met$latitude <- 51.447726
CO2_kWh_met$longitude <- -2.596540
##define the column to be plotted (CO2), whether a key is required and the size of the plot on the map. The later require some tweaking depending on the plot
polarMap(CO2_kWh_met, "CO2",latitude = "latitude", longitude = "longitude", key = TRUE, iconWidth = 250, iconHeight = 250, fig.width = 3, fig.height = 3)

The polar plot averages the data for the entire period, which in this case is 2019. During 2019 the office was refurbished and the single glazed ‘leaky’ windows were replaced with tight fitting double glazing. See before and after shot below.

Comparison of the office in 2016 and 2019

Using the polarPlot function we can break down the plot by season to see if there is a discernable difference between the earlier part of the year and the later.

polarPlot(CO2_kWh_met, "CO2", type = "season")

##Rainfall

A similar ‘loop’ clip was also attached to the kitchen electricity supply. I wondered if wet weather would mean more people would stop in and use the kitchen.

Rainfall data from the worldmet weather station archive is a bit hit and miss. Bristol Lulsgate unfortunately doesn’t provide rainfall data. However, the environment agency have a network of rainfall monitoring stations. These are shown at https://blaise.shinyapps.io/rainfall/ (below). Hourly data since 2010 can be downloaded for each station using the link provided at the bottom of each pop up.

<iframe height="800" width="100%" frameborder="no" src="https://blaise.shinyapps.io/rainfall/"> </iframe>

Before plotting the rainfall data, the values are binned. The kitchen electricity data is loaded in and the same lubridate function applied to the date.

##rainfall data from site 418367, Bristol Oakfield Road is imported
rainfall <- read.csv("418367.csv", stringsAsFactors = FALSE)
##date specified using lubridate
rainfall$date <- dmy_hms(rainfall$Time.stamp)
##rainfall measurements converted to numeric values
rainfall$Value.mm. <- as.numeric(as.character(rainfall$Value.mm.))
##trim the columns down to only those interested in, which are date and rainfall measurements
rainfall <- select(rainfall, date, Value.mm.)
##generate a new column with the rainfall bins which are defined using the 'cut' function
rainfall$rain <- cut(rainfall$Value.mm., breaks = c(-Inf, 0,1,2,10), labels = c("dry", "<1mm", "1-2mm",">2mm"))

##join electricity and rainfall data together
kitchen_rain <- left_join(CO2_kWh, rainfall, by = "date")
##plot electricity split by level of rainfall
p1 <- timeVariation(kitchen_rain, pollutant = "kitchen_kWh" , group = "rain", ci = TRUE, alpha = 0.2,
                    ylab = "kWh")
## Warning: removing 1096 missing rows due to rain

##we are only interested in the daily average so subset this plot from the timeVarition output
p1 <- plot(p1, subset ="hour")

Plotting the electricity consumption of the kitchen split by rainfall shows there is no discernible difference between wet days and dry. In reality if people bring in lunch to be cooked in the microwave or toaster, they probably don’t decide after looking out the window in the morning and if they decide not to go out for lunch at the last minute they will buy from the sandwich lady insead, which requires no electricity from the kitchen.

This has been a quick overview of some functions available in R that can be used to broaden analysis from a CO2 and a couple of electricity monitors.