The Building Energy Benchmarking dataset undergoes hypothesis testing analysis to assess whether observed group differences have statistical significance or result from random sampling errors. The report examines Site Energy Use Intensity (Site EUI) differences between compliance status groups and GHG emissions intensity differences between older buildings and newer buildings based on previous exploratory analysis work. Two hypothesis tests are conducted using two different statistical frameworks: the Neyman–Pearson framework (decision-based with a chosen \(\alpha\)) and Fisher’s significance testing (evidence via a p-value). The tests provide stronger backing for conclusions drawn from actual data.
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(janitor)
library(scales)
library(forcats)
data_path <- "Building_Energy_Benchmarking_Data__2015-Present.csv"
if (!file.exists(data_path)) data_path <- file.choose()
data <- read_csv(data_path, show_col_types = FALSE) %>%
clean_names()
glimpse(data)
## Rows: 34,699
## Columns: 46
## $ ose_building_id <dbl> 1, 2, 3, 5, 8, 9, 10, 11, 12, 13,…
## $ data_year <dbl> 2024, 2024, 2024, 2024, 2024, 202…
## $ building_name <chr> "MAYFLOWER PARK HOTEL", "PARAMOUN…
## $ building_type <chr> "NonResidential", "NonResidential…
## $ tax_parcel_identification_number <chr> "659000030", "659000220", "659000…
## $ address <chr> "405 OLIVE WAY", "724 PINE ST", "…
## $ city <chr> "SEATTLE", "SEATTLE", "SEATTLE", …
## $ state <chr> "WA", "WA", "WA", "WA", "WA", "WA…
## $ zip_code <dbl> 98101, 98101, 98101, 98101, 98121…
## $ latitude <dbl> 47.61220, 47.61307, 47.61367, 47.…
## $ longitude <dbl> -122.3380, -122.3336, -122.3382, …
## $ neighborhood <chr> "DOWNTOWN", "DOWNTOWN", "DOWNTOWN…
## $ council_district_code <dbl> 7, 7, 7, 7, 7, 7, 7, 7, 1, 1, 7, …
## $ year_built <dbl> 1927, 1996, 1969, 1926, 1980, 199…
## $ numberof_floors <dbl> 12, 11, 41, 10, 18, 2, 11, 8, 15,…
## $ numberof_buildings <dbl> 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ property_gfa_total <dbl> 88434, 103566, 956110, 61320, 175…
## $ property_gfa_buildings <dbl> 88434, 88502, 759392, 61320, 1135…
## $ property_gfa_parking <dbl> 0, 15064, 196718, 0, 62000, 37198…
## $ self_report_gfa_total <dbl> 115387, 103566, 947059, 61320, 20…
## $ self_report_gfa_buildings <dbl> 115387, 88502, 827566, 61320, 123…
## $ self_report_parking <dbl> 0, 15064, 119493, 0, 80497, 40971…
## $ energystar_score <dbl> 59, 85, 71, 50, 87, NA, 10, NA, 5…
## $ site_euiwn_k_btu_sf <dbl> 62.2, 71.9, 82.0, 87.2, 97.6, 168…
## $ site_eui_k_btu_sf <dbl> 61.7, 71.5, 81.7, 86.0, 97.1, 167…
## $ site_energy_use_k_btu <dbl> 7113958, 6330664, 67613264, 52739…
## $ site_energy_use_wn_k_btu <dbl> 7172158, 6362478, 67852608, 53463…
## $ source_euiwn_k_btu_sf <dbl> 122.9, 128.7, 171.8, 174.7, 167.6…
## $ source_eui_k_btu_sf <dbl> 121.4, 128.3, 171.5, 171.4, 167.2…
## $ epa_property_type <chr> "Hotel", "Hotel", "Hotel", "Hotel…
## $ largest_property_use_type <chr> "Hotel", "Hotel", "Hotel", "Hotel…
## $ largest_property_use_type_gfa <dbl> 115387, 88502, 827566, 61320, 123…
## $ second_largest_property_use_type <chr> NA, "Parking", "Parking", NA, "Pa…
## $ second_largest_property_use_type_gfa <dbl> NA, 15064, 117783, NA, 68009, 409…
## $ third_largest_property_use_type <chr> NA, NA, "Swimming Pool", NA, "Swi…
## $ third_largest_property_use_type_gfa <dbl> NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ electricity_k_wh <dbl> 1045040, 787838, 11279080, 796976…
## $ steam_use_k_btu <dbl> 1949686, NA, 23256386, 1389935, N…
## $ natural_gas_therms <dbl> 15986, 36426, 58726, 11648, 73811…
## $ compliance_status <chr> "Not Compliant", "Compliant", "Co…
## $ compliance_issue <chr> "Default Data", "No Issue", "No I…
## $ electricity_k_btu <dbl> 3565676, 2688104, 38484221, 27192…
## $ natural_gas_k_btu <dbl> 1598590, 3642560, 5872650, 116476…
## $ total_ghg_emissions <dbl> 263.3, 208.6, 2418.2, 190.1, 417.…
## $ ghg_emissions_intensity <dbl> 2.98, 2.36, 3.18, 3.10, 3.68, 2.8…
## $ demolished <lgl> FALSE, FALSE, FALSE, FALSE, FALSE…
df1 <- data %>%
mutate(
compliance_status = as.factor(compliance_status),
year_built = as.integer(year_built),
site_eui_k_btu_sf = as.numeric(site_eui_k_btu_sf),
ghg_emissions_intensity = as.numeric(ghg_emissions_intensity),
# Define an age-group variable for hypothesis testing
building_age_group = case_when(
year_built < 1980 ~ "Older",
year_built >= 1980 ~ "Newer",
TRUE ~ NA_character_
) %>% factor(levels = c("Older", "Newer"))
)
Do compliant buildings have a different mean Site EUI than non-compliant buildings?
This matches the Neyman–Pearson framework because we set \(\alpha\) first and make a reject/fail-to-reject decision.
test1_data <- df1 %>%
filter(
!is.na(site_eui_k_btu_sf),
compliance_status %in% c("Compliant", "Non-Compliant")
) %>%
mutate(compliance_status = droplevels(compliance_status))
test1_data %>% count(compliance_status)
## # A tibble: 1 × 2
## compliance_status n
## <fct> <int>
## 1 Compliant 31938
test1_plot_df <- test1_data %>%
mutate(compliance_status = as.factor(compliance_status))
status_n <- test1_plot_df %>% count(compliance_status)
y_lim <- quantile(test1_plot_df$site_eui_k_btu_sf, c(0.02, 0.98), na.rm = TRUE)
ggplot(test1_plot_df, aes(x = compliance_status, y = site_eui_k_btu_sf, fill = compliance_status)) +
geom_boxplot(width = 0.6, outlier.alpha = 0.15) +
stat_summary(fun = median, geom = "point", size = 2, color = "black") +
coord_cartesian(ylim = y_lim) +
scale_x_discrete(labels = function(x) {
n_map <- setNames(status_n$n, status_n$compliance_status)
paste0(x, "\n(n=", n_map[x], ")")
}) +
scale_y_continuous(labels = comma) +
labs(
title = "Site EUI by Compliance Status",
subtitle = "Boxplots compare distributions; black dots show medians",
x = "Compliance Status",
y = "Site EUI (kBtu/sf)"
) +
theme_minimal(base_size = 12) +
theme(legend.position = "none")
df1 %>%
count(compliance_status, sort = TRUE)
## # A tibble: 2 × 2
## compliance_status n
## <fct> <int>
## 1 Compliant 32454
## 2 Not Compliant 2245
df1 <- df1 %>%
mutate(
compliance_status_std = compliance_status %>%
as.character() %>%
stringr::str_trim() %>%
stringr::str_to_lower()
)
df1 %>% count(compliance_status_std, sort = TRUE)
## # A tibble: 2 × 2
## compliance_status_std n
## <chr> <int>
## 1 compliant 32454
## 2 not compliant 2245
test1_data <- df1 %>%
filter(
compliance_status_std %in% c("compliant", "non-compliant"),
!is.na(site_eui_k_btu_sf)
) %>%
mutate(
compliance_status_std = factor(compliance_status_std,
levels = c("compliant", "non-compliant"))
)
df1 %>%
mutate(cs = compliance_status %>% as.character() %>% stringr::str_trim()) %>%
count(cs, sort = TRUE)
## # A tibble: 2 × 2
## cs n
## <chr> <int>
## 1 Compliant 32454
## 2 Not Compliant 2245
df1 <- df1 %>%
mutate(
cs_raw = compliance_status %>% as.character() %>% stringr::str_trim() %>% stringr::str_to_lower(),
compliance_2 = case_when(
stringr::str_detect(cs_raw, "^non") ~ "non-compliant",
stringr::str_detect(cs_raw, "non\\s*compliant") ~ "non-compliant",
stringr::str_detect(cs_raw, "compliant") ~ "compliant",
TRUE ~ NA_character_
) %>% factor(levels = c("compliant", "non-compliant"))
)
df1 %>% count(compliance_2, sort = TRUE)
## # A tibble: 1 × 2
## compliance_2 n
## <fct> <int>
## 1 compliant 34699
test1_data <- df1 %>%
filter(!is.na(site_eui_k_btu_sf), !is.na(compliance_2))
test1_data %>% count(compliance_2)
## # A tibble: 1 × 2
## compliance_2 n
## <fct> <int>
## 1 compliant 33424
test1 <- t.test(site_eui_k_btu_sf ~ compliance_status_std, data = test1_data)
test1
##
## Welch Two Sample t-test
##
## data: site_eui_k_btu_sf by compliance_status_std
## t = 1.9372, df = 6096.7, p-value = 0.05277
## alternative hypothesis: true difference in means between group compliant and group not compliant is not equal to 0
## 95 percent confidence interval:
## -0.05415901 9.12107075
## sample estimates:
## mean in group compliant mean in group not compliant
## 56.08494 51.55148
This test evaluates whether the difference in mean Site EUI between compliant and non-compliant buildings is large enough that it would be unlikely to occur by random sampling variation alone. Under the Neyman–Pearson framework, a p-value below \(\alpha = 0.05\) leads to rejecting H₀, providing evidence that compliance status is associated with a meaningful difference in energy use intensity. If H₀ is not rejected, it suggests that compliance status alone may not strongly separate energy intensity, and other drivers such as building type, size, or year built may be more important.
Do older buildings have a different mean GHG emissions intensity than newer buildings?
In Fisher’s approach, the emphasis is on the p-value as strength of evidence against H₀. Smaller p-values indicate stronger evidence that the observed difference is inconsistent with the null model.
test2_data <- df1 %>%
filter(
!is.na(ghg_emissions_intensity),
!is.na(building_age_group)
)
test2_data %>% count(building_age_group)
## # A tibble: 2 × 2
## building_age_group n
## <fct> <int>
## 1 Older 16990
## 2 Newer 16832
library(dplyr)
library(forcats)
library(scales)
library(ggplot2)
# Ensure it’s a factor and set a logical order (edit labels if yours differ)
plot2_df <- test2_data %>%
mutate(
building_age_group = as.factor(building_age_group),
building_age_group = fct_relevel(building_age_group, "Older (< 1980)", "Newer (≥ 1980)")
)
# n per group for labeling
age_n <- plot2_df %>% count(building_age_group)
# Zoom to reduce extreme outlier compression (does NOT delete points)
y_lim <- quantile(plot2_df$ghg_emissions_intensity, c(0.02, 0.98), na.rm = TRUE)
ggplot(plot2_df, aes(x = building_age_group, y = ghg_emissions_intensity, fill = building_age_group)) +
geom_boxplot(width = 0.6, outlier.alpha = 0.15) +
stat_summary(fun = median, geom = "point", size = 2, color = "black") +
coord_cartesian(ylim = y_lim) +
scale_x_discrete(labels = function(x) {
n_map <- setNames(age_n$n, age_n$building_age_group)
paste0(x, "\n(n=", n_map[x], ")")
}) +
scale_y_continuous(labels = comma) +
labs(
title = "GHG Emissions Intensity by Building Age Group",
subtitle = "Boxplots compare distributions; black dots show medians (view zoomed to 2nd–98th percentile)",
x = "Building Age Group",
y = "GHG Emissions Intensity"
) +
theme_minimal(base_size = 12) +
theme(legend.position = "none")
### 5) Run the test
test2 <- t.test(ghg_emissions_intensity ~ building_age_group, data = test2_data)
test2
##
## Welch Two Sample t-test
##
## data: ghg_emissions_intensity by building_age_group
## t = 2.705, df = 32356, p-value = 0.006833
## alternative hypothesis: true difference in means between group Older and group Newer is not equal to 0
## 95 percent confidence interval:
## 0.1470064 0.9205283
## sample estimates:
## mean in group Older mean in group Newer
## 1.666947 1.133180
The test evaluates whether the existing difference in mean emissions intensity between older buildings and newer buildings can occur when there is actually no difference between the two groups. The Fisher framework uses p-value results as evidence when a small p-value indicates that H₀ is improbable and the conclusion shows that building age affects emissions intensity. The situation holds practical significance because older buildings display their original building codes and mechanical systems while evidence of increased emissions intensity triggers both retrofit and efficiency improvement efforts.
The two assessments use hypothesis testing as a standardized method to determine whether actual group differences represent authentic patterns which statistical tests show to be present in their results. The Neyman–Pearson test used a fixed \(\alpha\) decision rule to test for compliance status differences in Site EUI while Fisher’s significance testing showed that emissions intensity varied according to building age. Future work could extend these analyses by (1) testing additional group definitions (e.g., by building type), (2) stratifying or controlling for confounders like floor area and use type, and (3) reporting effect sizes and confidence intervals to complement p-values.