Overview:

New York City has an ordinance known as LL84 Benchmarking which requires all building types 50,000 sqft and above to report whole building (common area + tenant) annual energy data to the City annually. LL84 Benchmarking is a way for NYC to measure building energy performance and use the data to inform policies around energy conservation and decreasing Green House Gas (CO2) Emissions.

Beginning in 2018, NYC lowered the threshold requiring buildings 25,000 sqft and above to comply; however, 2018 LL84 Due 2019 data is not publicly available yet. This analysis is based on the most recent dataset 2017 LL84 Data Due 2018.

I wanted to analyze this dataset to get a better idea of where are the ‘energy hogs’ around New York City.

In this project I analyze:

  1. Top 5 Property Types in New York City required to comply with LL84
  2. Site Energy Use Internsity (Site EUI) of Multifamily properties by Borough
  3. Site EUI by Building Era for Multifamily properties in the Bronx

Import R Libraries

This code needs methods and data tools from tidyverse.

These are all the R packages installed for this analysis:

library(tidyverse)
library(janitor)
library(dplyr)
library(ggplot2)
#library(RcolorBrewer)- R Markdown does not like this library, it gives an error messaging saying library not found causing Knitting to stop, but the package is installed.

Import the Data and Clean Column Names using janitor

I downloaded the 2017 LL84 Due 20118 submission data from NYC Open Data

Using the function read_csv() allows me to import the data from the csv and make it available in a dataFrame called LL84. I also used the function clean_names from the janitor package to make reading the columns names easier.

LL84 <-read_csv("2017_LL84_Due_2018.csv") %>% 
  janitor::clean_names()

Simplify the Data

Since the dataset has many columns, I wanted to narrow some of the data down by selectings a few columns in dataframe based on borough, building identification Number (BIN), a few building characteristics, and energy metrics.

LL84Edits <- LL84 %>%
  select(borough, nyc_building_identification_number_bin, self_reported_gross_floor_area_ft, primary_property_type_self_selected, year_built, number_of_buildings, metered_areas_energy, energy_star_score, site_eui_k_btu_ft)

Remove Duplicate Building Submissions using distinct from dplyr package

Since the City accepts multiple submissions until they begin cleaning and analyzing the data, I realized I had to remove dupilicate properties based on the Building Identification Number (BIN) which is unique to each building in NYC.

LL84Edits2 <- distinct(LL84Edits,nyc_building_identification_number_bin, .keep_all= TRUE)

Summarize by Property Type and Select Top 5 using summarize, mutate and reorder

NYC has alot of buildings and I wanted to get an idea of how many of each building type using summarize, mutate, and reorder. I decide to analyze the top five primary property types since these buildings make up a majority of NYC’s building stock required to comply with LL84 at the time.

Top5prop <- LL84Edits2 %>% 
  group_by(primary_property_type_self_selected) %>% 
  summarize(
    n_property = n()
  ) %>% 
  top_n(5, n_property) %>% 
  mutate(
    primary_property_type_self_selected = fct_reorder(primary_property_type_self_selected, n_property)
  ) %>% 
  arrange(desc(primary_property_type_self_selected))

Plot Top 5 Property Types using geom_col() from the ggplot_2 package

Using ggplot2 I am able to visualize the Top 5 property types in NYC in a bar graph using geom_col(). The Rcolorbrewer package allows me to use a color gradient based on the number of properties types.

Here we can see there are significantly more Multifamily properties than any other building type.

Top5_plot <- ggplot(Top5prop) +
  aes(x = primary_property_type_self_selected, y =n_property, fill =  primary_property_type_self_selected) +
  scale_fill_brewer(palette = "Reds", name = "Primary Property Type")+
  geom_col(position = "dodge") +
  geom_text(aes(label = n_property), vjust = -0.5, position = position_dodge(0.9), size = 2.7) +
  theme_classic(base_size= 10) +
  theme(axis.text.x=element_blank()) +
  labs(
    title = "Top 5 Property Types in NYC Required to Comply with LL84",
      subtitle = "Based on 2017 LL84 Due 2018",
      y = "Number of Properties",
      x = "",
      caption= "Source: NYC 2017 LL84 Due 2018 via Open Data"
    )

Top5_plot

Select Columns in the Dataframe based on Multifamily Buildings and Energy Performance

Site Energy Use Intensity(Site EUI) is an energy industry strandard that helps measure building performance. Site EUIs is measured in kBTU/ sqft/ year; this translates to Total Energy per sqft per year. Site EUI allows us to get a more ‘apples to apples comparison’ when comparing energy performance across various building sizes.

LL84SiteEUI<- LL84 %>%
  select(borough, primary_property_type_self_selected, self_reported_gross_floor_area_ft, year_built, site_eui_k_btu_ft, energy_star_score, fuel_oil_number_2_use_k_btu, fuel_oil_number_4_use_k_btu, fuel_oil_number_5_6_use_k_btu, diesel_number_2_use_k_btu, district_steam_use_k_btu, natural_gas_use_k_btu, electricity_use_grid_purchase_k_btu)

Select Only Multifamily Properties using filter()

MF_SiteEUI <- LL84SiteEUI %>% filter(primary_property_type_self_selected== 'Multifamily Housing') 

Remove Null Site EUI values using !is.na()

MF_SiteEUI_NoNulls <- filter(MF_SiteEUI, !is.na(site_eui_k_btu_ft))

Filter out properties < 50k sqft

I filtered out properties less than 50,000 sqft because they are not required to comply with LL84 at the time. Since they are not required to comply, the City is less likely to be checking for complete data than for properties that are required to comply with the oridinance.

MFFilter <-filter(MF_SiteEUI_NoNulls, self_reported_gross_floor_area_ft > 49999)

Cleaning the Data of Extreme Outliers

According to Energy Star Portfolio Manager,which is the U.S Government’s public energy management tool, the US National Median Site EUI for a multifamily property is 59.6 kBTU/sqft/yr. Since properties are required to self report, there are often a lot of data entry errors when it comes to adding property information and collecting whole building energy data from the utilities.

Filter Out Extreme Outliers < 400 Site EUI using

While 400 kBTU/sqft/yr is still quite high, I decided to remove outliers that were more than 6x the US National Median Site EUI because I believe those LL84 submissions would have had insufficient or inaccurate data.

SiteEUIv1 <- filter(MFFilter, site_eui_k_btu_ft <400) 

Filter Out Extreme Outliers > 0 Energy per sqft

SiteEUIv2 <-filter(SiteEUIv1, site_eui_k_btu_ft > 5)

Graph Site EUI by Borough using geom_boxplot() in ggplot2

This box plot allows me to see the range, median, and variability of Site EUIs by borough.

SiteEUI_plot <-ggplot(SiteEUIv2) +
    aes(x = borough, y= site_eui_k_btu_ft) + 
    geom_boxplot() + 
    theme_bw() +
        labs(
        title = "Site Energy Use Intensity by Borough",
        subtitle = "Mutifamily Properties",
        y = "Site EUI (kBTU/sqft/yr)", 
        x = "Borough",
        caption = "Source: NYC 2017 LL84 Due 2018 via Open Data"
  )

SiteEUI_plot

Filter to Bronx Mutlifamily Properties

Since the Bronx properties seemed to have less variance in the Site EUIs reported and the maximum is slightly more than other boroughs, I wanted to look into those properties a bit more.

BronxMF <- filter(SiteEUIv2, borough == 'Bronx')

Group Buildings by Era of Construction using mutate and case_when in the dplyr package

There are many factors that can impact building energy performance. The era that a building was constructed impacts the age of the buildings, as well as the building systems set up. For example, some older buildings still have boilers and pipes that are set up to need to burn oil, while some newer buildings are all electric.

MFbyEra <- BronxMF %>% 
mutate(year_built= case_when(
   year_built < 1939 ~ "Pre-War
   Before 1939",
   year_built < 1990 ~ "Post-War
   1939-1990",
   year_built > 1991 ~ "Modern
   1991-2018"
 )
)

Plot Site EUI by Era Built using geom_point() from gggplot_2

This plot shows there are a lot of Pre-War multifamily properties in the Bronx. This plot also shows based on the density of the points both Post-War and Pre-War buildings have Site EUIs significantly above the US National Median Site EUI of 59.6 kBTU/sqft/yr which indicates opportunities for energy efficiency measures.

Energy Efficiency measures can help decrease the cost of utility bills, improve tenant comfort, and decrease Greenhouse Gas (CO2) Emissions.

There are also outliers where people did not include the year built in their submission. This is something the City should make a required field before people are allowed to report their building for LL84 compliance.

Bronx_plot <-ggplot(MFbyEra) +
    aes(x = year_built, y= site_eui_k_btu_ft, colour = site_eui_k_btu_ft) + 
    geom_point(position= "jitter", size = 0.5, alpha = 0.85) + 
    scale_color_gradient(name = "Site EUI (kBTU/sqft/yr)", low= 'gray', high = 'red') +
    theme_classic(base_size =10) +
        labs(
        title = "Site EUI of Bronx Multifamily Properties by Construction Era",
        subtitle = "Based on 2017 LL84 Due May 2018", 
        y = "Site EUI(kBTU/sqft/yr)", 
        x = "Era Built", size =1,
        caption = "Source: NYC 2017 LL84 Due 2018 via Open Data"
  )

Bronx_plot

Future Analysis

If I had more time I would have liked to analyze fuel type based on the era of construction. Looking at fuel type can give a better understanding of which building systems are using the most energy which can lead to targetted retrofits.