The Heating Gas Consumption from 2010-2023 is from NYC Open Data. The variables I will be using for this Project are Development Name, Revenue Month, Service Start Day, Service End Date, Number of Days and Current Charges to explore how much gas does each organization need in order to maintain a their area.
To prepare the dataset for analysis, the following steps were taken:
- Missing Values: Checked for null or missing meter counts or boroughs. Rows with incomplete information were either removed or imputed as appropriate.
- Data Types: Verified and ensured all variables were in their appropriate formats: borough and rate class as categorical, meter counts as numerical.
I would like to explore the trend analysis from this dataset such: Are there noticeable peaks or dips in certain years? Is there a consistent relationship between gas consumption and costs over time? I chose this dataset specifically because in NYC, where winters can be severe and gas is the main source of heating for many households, heating expenses are a major concern. Understanding the effects on the economy and the environment is related to this issue. The analysis could contribute to energy-saving initiatives or better planning for heating infrastructure in NYC.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 230213 Columns: 25
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (20): Development Name, Borough, Account Name, Location, Meter AMR, Mete...
dbl (5): EDP, UMIS BILL ID, # days, Current Charges, Consumption (Therms)
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
library(highcharter)
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
Development Name Borough Account Name Location
Length:230213 Length:230213 Length:230213 Length:230213
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
Meter AMR Meter Scope TDS # EDP
Length:230213 Length:230213 Length:230213 Min. :200.0
Class :character Class :character Class :character 1st Qu.:322.0
Mode :character Mode :character Mode :character Median :507.0
Mean :500.7
3rd Qu.:650.0
Max. :999.0
RC Code Funding Source AMP # Vendor Name
Length:230213 Length:230213 Length:230213 Length:230213
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
UMIS BILL ID Revenue Month Service Start Date Service End Date
Min. : 1755537 Length:230213 Length:230213 Length:230213
1st Qu.: 2414370 Class :character Class :character Class :character
Median : 5244017 Mode :character Mode :character Mode :character
Mean : 5721687
3rd Qu.: 8526052
Max. :12236444
# days Meter Number Estimated Current Charges
Min. :-335.00 Length:230213 Length:230213 Min. :-54559.0
1st Qu.: 29.00 Class :character Class :character 1st Qu.: 327.3
Median : 30.00 Mode :character Mode :character Median : 1176.5
Mean : 30.28 Mean : 10370.9
3rd Qu.: 31.00 3rd Qu.: 6121.9
Max. : 761.00 Max. :982291.3
NA's :9
Rate Class Bill Analyzed Consumption (Therms) ES Commodity
Length:230213 Length:230213 Min. : -1744 Length:230213
Class :character Class :character 1st Qu.: 583 Class :character
Mode :character Mode :character Median : 2157 Mode :character
Mean : 20164
3rd Qu.: 12564
Max. :2351667
Underlying Utility
Length:230213
Class :character
Mode :character
These cleaning procedures are essential for preserving data integrity and preventing mistakes during modeling and visualization.
Quantitative Variables
Distribution of Consumption (Therms)
ggplot(data, aes(x =`Consumption (Therms)`)) +geom_histogram(fill ="steelblue", color ="black", bins =30) +labs(title ="Distribution of Gas Consumption (Therms)", x ="Therms", y ="Frequency") +theme_minimal()
The histogram shows the distribution of gas consumption (in therms) among developments and also shows the frequency of specific consumption levels. By emphasizing basic patterns, dispersion, and potential skewness, it highlights outliers or irregularities and offers information on typical consumption ranges. Interesting patterns include gaps and clusters that display atypical usage patterns, skewness that shows most projects consume either low or large volumes of gas, and peaks that represent normal consumption levels. Surprises might happen if the distribution exhibits extreme outliers, unexpected homogeneity, or consumption levels that differ from economic or seasonal expectations.
Categorical Variables
Count of Developments by Borough
ggplot(data, aes(x = Borough, fill = Borough)) +geom_bar() +labs(title ="Number of Developments by Borough", x ="Borough", y ="Count") +theme_classic() +scale_fill_brewer(palette ="Set3") +theme(axis.text.x =element_text(angle =45, hjust =1))
This bar graph uses the Borough variable to aggregate and color-code the number of developments in each NYC borough. The number of developments is displayed on the y-axis, while the x-axis depicts the boroughs. Colors from the “Set3” palette and slanted axis labels for readability are used in the visualization’s simple, classic look. This graph illustrates the variations in the quantity of developments in each borough, which may indicate intriguing trends like a surprisingly uniform distribution or one borough having noticeably more developments than others. Unexpected outcomes might include boroughs with fewer projects than predicted or distinct regional patterns associated with the density of development.
I summarize the boroughs with the greatest average gas use along with a comparison of their total charges. Larger developments or higher energy demands may be represented by boroughs with high average consumption. Inefficiencies can be found by comparing average consumption to total charges. For example, a borough with high consumption but relatively low prices may be the result of price differentials or discounts.
I filtered to only include projects with gas usage over 1000 therms. Only focuses on significant or energy-intensive developments. Helps in identifying instances where energy use is noticeably higher than usual.
1st Visualization (For Grading)
ggplot(summary_data, aes(x =reorder(Borough, -Average_Consumption), y = Average_Consumption, fill = Borough)) +geom_bar(stat ="identity") +geom_text(aes(label =round(Average_Consumption, 2)), vjust =-0.5, size =2.5) +labs(title ="Average Gas Consumption by Borough",x ="Borough", y ="Average Consumption (Therms)",caption ="Sources: NYC Open Data") +scale_fill_brewer(palette ="Set3") +theme_minimal() +theme(axis.text.x =element_text(angle =35, hjust =1))
Observation
This bar graph displays the typical gas usage in each of NYC’s boroughs. Borough with the highest gas consumption:With an average gas usage of 43,420.44 therms, Queens has the greatest gas consumption. Notable Levels of Consumption: The next two largest users are Manhattan (21,500.56 therms) and Staten Island (27,574.83 therms). Minimal Utilization: Significantly lower gas consumption is shown by FHA (193.43 therms) and Non-Development Facility (1,348.22 therms), which may indicate that these facilities are either minor or anomalies. Middle Range: At 20,487.24 therms and 17,553.77 therms, respectively, the Bronx and Brooklyn have moderate levels of typical gas consumption.
Compared to other boroughs, Queens has a significantly higher level of consumption. There are many single-family houses in Queens, and these residences often require more gas for heating than apartment complexes. This might be the reason why Queens uses the most gas. On the other hand, cities such as Manhattan often have a greater number of apartment buildings, which frequently share heating systems (centralized systems) and use gas better per unit. It is noticeable how much less gas is used in the boroughs compared to the FHA or non-development facilities.Because non-development facilities and FHA are probably smaller, less energy-intensive establishments, such administrative buildings, temporary dwelling units, or underutilized government buildings, they use a lot less gas.
Borough-specific building styles, infrastructure, and population density may all have an impact on changes in gas use.
Statistic Analysis
Predicting Charges based on Consumption
model <-lm(`Current Charges`~`Consumption (Therms)`+ Borough, data = data)summary(model)
Call:
lm(formula = `Current Charges` ~ `Consumption (Therms)` + Borough,
data = data)
Residuals:
Min 1Q Median 3Q Max
-464464 -1357 -430 1525 745612
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.344e+03 4.218e+01 31.861 < 2e-16 ***
`Consumption (Therms)` 5.140e-01 4.272e-04 1203.182 < 2e-16 ***
BoroughBROOKLYN -2.908e+03 5.289e+01 -54.993 < 2e-16 ***
BoroughFHA -1.336e+03 1.641e+02 -8.143 3.87e-16 ***
BoroughMANHATTAN 4.847e+01 5.818e+01 0.833 0.405
BoroughNON DEVELOPMENT FACILITY -1.328e+03 1.930e+02 -6.882 5.90e-12 ***
BoroughQUEENS -1.912e+03 1.070e+02 -17.868 < 2e-16 ***
BoroughSTATEN ISLAND -3.972e+03 1.579e+02 -25.153 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9981 on 230205 degrees of freedom
Multiple R-squared: 0.8654, Adjusted R-squared: 0.8654
F-statistic: 2.114e+05 on 7 and 230205 DF, p-value: < 2.2e-16
Current Charges are positively and significantly impacted by gas use. This makes obvious because increasing gas use naturally results in higher costs. Borough Differences: The majority of other boroughs, including Brooklyn, Queens, Staten Island, FHA, and Non-Development Facility, have far cheaper fees than the reference borough, which is probably Manhattan. The biggest fee drop (-3,972) is seen on Staten Island, with Brooklyn coming in second. Although the baseline costs in Manhattan seem to be the highest, this difference is not statistically significant.
The Adjusted R-squared is 0.8654 indicates that the predictors Borough and Consumption (Therms) account for 86.54% of the variation in Current Charges. This number is really high, suggesting that the model fits the data quite well.The high Adjusted R-squared indicates that even when using numerous borough categories, the model does not overfit.
`summarise()` has grouped output by 'Rate Class'. You can override using the
`.groups` argument.
print(summarized_rate_data)
# A tibble: 72 × 3
`Rate Class` Borough Meter_Count
<chr> <chr> <int>
1 120 Del&Sup,Res,Gen Use FHA 1
2 140 Del&Sup,Res,Home Heat FHA 1
3 158 Del,Com,Multidwel,Heat QUEENS 1
4 177 Del,Com,Heat QUEENS 1
5 250 Gas Non Resid General Use QUEENS 1
6 GS2 BRONX 2
7 GS2 MANHATTAN 1
8 GS2 QUEENS 1
9 GS2 General Non-Residential BRONX 1
10 GS2 General Non-Residential MANHATTAN 1
# ℹ 62 more rows
Data is grouped by Borough and Rate Class to illustrate the distinct meter counts. This enables the evaluation of the spatial and rate-type concentrations of gas consumption activities.The code is necessary to efficiently summarize and organize the data, producing a clear and useful dataset. This summary is a fundamental step in comprehending patterns of heating gas usage since it may guide decision-making, resource allocation, and additional analysis or visualizations.
`summarise()` has grouped output by 'Borough'. You can override using the
`.groups` argument.
hc <-hchart( heatmap_data, type ="heatmap", hcaes(x = Borough, y =`Rate Class`, value = Meter_Count)) %>%hc_title(text ="Heatmap: Meter Count Distribution by Borough and Rate Class") %>%hc_xAxis(title =list(text ="Borough")) %>%hc_yAxis(title =list(text ="Rate Class")) %>%hc_colorAxis(minColor ="#ffffcc", # Light color for low valuesmaxColor ="#ff0000", # Dark color for high valuesstops =list(list(0, "#ffffcc"),list(0.5, "#f03b20"),list(1, "#bd0026") ) ) %>%hc_tooltip(pointFormat ="<b>Borough:</b> {point.x}<br><b>Rate Class:</b> {point.y}<br><b>Meter Count:</b> {point.value}" ) %>%hc_add_annotation(labels =list(list(point =list(x =0.5, y =-0.1, xAxis =0, yAxis =0),text ="Sources: NYC Open Data",style =list(color ="gray", fontSize ="12px") ) ) )hc
High Concentration in Particular Rate Classes: In boroughs like the Bronx, Brooklyn, and Queens, some rate classes, such “TRMDHDF” and “T2-2 TRAN GENERAL,” display greater meter counts. Throughout several boroughs, the rate class “TRMDHDF” is very noticeable. Staten Island and Non-Development Facilities have extremely few rate classes with high meter counts. This indicates either a lower concentration of structures or fewer meters measured in certain regions. Queens and Manhattan show distinct patterns, with deeper shades clustered in specific rate classes, most likely due to differences in infrastructure or home heating demands. Manhattan has a wide yet low-intensity distribution, which might indicate a larger density of meters scattered across more buildings.Overall, the Bronx and Brooklyn appear to have more high-meter count clusters than the other boroughs, which might indicate denser growth and bigger structures. The heatmap clearly displays differences in meter count distributions between boroughs and rate classes. It identifies boroughs like the Bronx and Brooklyn as having larger numbers of meters, while other locations, such as Staten Island, have less. Patterns in various rate classes demonstrate how infrastructure or use rates vary around the city, offering useful information for resource planning and energy management.
Background Bibliography
Climate, building types, and population density are some of the elements that influence energy usage in New York State, which includes New York City. Natural gas is the main source of heating for New York’s residential structures, accounting for a sizeable amount of energy use, according to the U.S. Energy Information Administration (EIA). The paper points out that because older buildings are more common in urban regions like Manhattan and Brooklyn and require more energy to heat, household gas usage is greater there.When heating demands are at their highest during the winter, these places often see higher demand. On the other hand, average consumption is often lower in more suburban boroughs like Queens and Staten Island. Cold winters and other seasonal conditions make gas use even worse, particularly in places with high population densities where heating is necessary (U.S. EIA, n.d.). This site offers a thorough analysis of New York State’s energy consumption trends, with an emphasis on gas use in residential structures. It provides useful information on energy sources, consumption patterns, and the ways that urban infrastructure and climate affect regional differences in gas use.
Source: U.S. Energy Information Administration. (n.d.). State Energy Profile: New York. U.S. Department of Energy. Retrieved from https://www.eia.gov/state/print.php?sid=NY