Executive summary

Question. Did the gas company over-bill Young Israel for the meter readings recorded in March through August 2025?

Answer. Yes — and the evidence is strong.

  • The customer’s expected gas consumption is well-modeled as a non-heating baseline of about 57 therms/month (hot water and cooking, measured empirically from the prior summer) plus a heating-degree-day (HDD) term. Fit on the 13 non-disputed months with a valid prior-month HDD value, the model predicts roughly 6021 therms for the six disputed months.
  • The customer was billed 9,824 therms across that window. The excess is ~3,803 therms (95% CI: 2,188 – 5,418 therms; one-sided t-test p ≈ 0.00015).
  • The over-billing is concentrated in two months — April 2025 (5,476 therms) and May 2025 (2,998 therms) — which are wildly inconsistent with the weather for those months on every diagnostic (prediction interval, Cook’s distance, externally studentized residuals, robust regression weights).
  • Three consecutive zero readings in June, July, and August 2025 are physically implausible given the customer’s own prior-summer baseline. Under the primary model the June 2025 zero is also outside its 95% PI; July and August fall inside only because the model’s HDD-driven PI is wide enough to admit them. The joint probability of three real zero months under any reasonable noise model is essentially nil.
  • The pattern is the classic signature of estimated reads followed by a “true-up” correction that overshoots. Even granting the gas company every benefit of the doubt — refitting the model treating March 2025 and the three zeros as valid — April and May 2025 still fall far outside any reasonable prediction interval.
Monthly therms (black = normal reads, red = flagged) against the HDD-model prediction (dashed line) and its 95% prediction interval (shaded band). April and May 2025 sit dramatically above the band; March 2025 sits below it; the three Jun–Aug 2025 zeros sit at the floor of the band but are physically implausible (see §4).

Monthly therms (black = normal reads, red = flagged) against the HDD-model prediction (dashed line) and its 95% prediction interval (shaded band). April and May 2025 sit dramatically above the band; March 2025 sits below it; the three Jun–Aug 2025 zeros sit at the floor of the band but are physically implausible (see §4).

Cumulative actual billing (solid black) versus the counterfactual model-predicted billing (dashed blue), both plotted against cumulative HDD. The model line is anchored to actual at the last pre-suspect month (Feb 2025), so the two coincide by construction up to that point and the gap that opens afterwards is exactly the cumulative over-billing. The bracket at August 2025 marks the end of the disputed window: ~3,800 therms of excess. Lines run roughly parallel after that, indicating the model resumes tracking actual billing once normal reads resume.

Cumulative actual billing (solid black) versus the counterfactual model-predicted billing (dashed blue), both plotted against cumulative HDD. The model line is anchored to actual at the last pre-suspect month (Feb 2025), so the two coincide by construction up to that point and the gap that opens afterwards is exactly the cumulative over-billing. The bracket at August 2025 marks the end of the disputed window: ~3,800 therms of excess. Lines run roughly parallel after that, indicating the model resumes tracking actual billing once normal reads resume.

Recommendation. Pursue a refund or rebilling correction for the March–August 2025 window, anchored on the ~3,803-therm aggregate excess (or ~2,188 therms if the gas company demands a conservative 95% lower bound).


Data

Monthly meter reads (therms) and Boston Logan heating degree days (base 65°F).
month therms hdd_65f flagged
May 2024 205 236.4
Jun 2024 75 24.7
Jul 2024 62 1.1
Aug 2024 43 12.8
Sep 2024 48 63.7
Oct 2024 123 287.3
Nov 2024 422 495.9
Dec 2024 1396 922.4
Jan 2025 2405 1121.2
Feb 2025 2805 952.3
Mar 2025 1350 707.6 yes
Apr 2025 5476 448.9 yes
May 2025 2998 236.2 yes
Jun 2025 0 61.0 yes
Jul 2025 0 2.2 yes
Aug 2025 0 20.2 yes
Sep 2025 123 55.8
Oct 2025 130 299.8
Nov 2025 654 617.3
Dec 2025 1311 1030.6

Sources. Meter reads were provided by the customer. HDD data are from degreedays.net for KBOS (Boston Logan), base temperature 65°F.

Period covered. May 2024 through December 2025 (20 monthly observations). Six months are flagged on the source spreadsheet as candidate outliers: March, April, May, June, July, and August 2025.


Methodology

We model monthly therms as the sum of a non-heating baseline (hot water + cooking) plus a linear function of heating degree days in the current and previous calendar month. The lag captures gas burned before the meter-read date but appearing on the following month’s bill:

\[ \widehat{\text{therms}}_t \;=\; x_{\text{baseline}} + \beta_1\,\text{HDD}_t + \beta_2\,\text{HDD}_{t-1} + \varepsilon_t \]

where \(x_{\text{baseline}} = 57\) therms/month is fixed at the customer’s empirical prior-summer mean (Jun–Sep 2024), and \(\beta_1, \beta_2\) are estimated from the data. Pinning the intercept to the empirical baseline does three useful things:

  1. It encodes a physically sensible constraint — non-heating gas demand must be non-negative and is well-measured from a four-month window when no heating could have been on.
  2. It prevents the model from “borrowing” implausible summer credits to explain the disputed zero readings (an unconstrained OLS fit produces slightly negative summer point-predictions, which would unfairly net against the over-billing claim).
  3. It matches the natural mental model of a gas-bill auditor: total use equals fixed loads (hot water, cooking) plus weather-driven heating.

The model is fit only on the 13 non-suspect months, so the disputed months play no role in determining what “normal” looks like — they are then evaluated as held-out observations against the clean-month fit. (Fitting on the contaminated data would let a single large outlier like April 2025 drag the regression line and partly mask itself.)

Diagnostics reported below:

  • 95% prediction intervals on each suspect month — flags points whose realized value is statistically incompatible with the HDD-driven baseline.
  • Cook’s distance and externally studentized residuals (computed on the full-data fit) — quantify the influence and outlier-ness of each observation. We use a Bonferroni-corrected threshold for the studentized residuals to control the family-wise error rate across all 19 modelable months.
  • Robust regression (MASS::rlm, Huber loss) — an independent estimator that automatically down-weights outliers. If the suspect months are truly anomalous, the robust fit’s coefficients should resemble the clean-only fit’s coefficients, and the suspect-month observations should receive small weights.
  • Aggregate-window test — sums the residuals over the entire six-month disputed window and tests whether that sum is significantly different from zero. This is the appropriate framing if the gas company shifted consumption between months via estimated reads and a true-up: only the total over the window is physically meaningful.

Findings

1. The baseline-pinned fit explains 97% of the variance

Primary model: (therms - 57.0) ~ 0 + HDD + lag1(HDD), fit on the 13 non-suspect months. Adj. R^2 = 0.963, residual sigma = 219 therms.
term estimate std_err t_value p_value
hdd65 -0.014 0.281 -0.05 0.961
hdd_lag1 2.367 0.360 6.57 4.0e-05

The previous-month HDD coefficient (≈ 2.37 therms per HDD) carries essentially all the heating signal — most of the gas burned during a given calendar month shows up on the next month’s meter read, consistent with monthly read dates trailing the calendar month. The current-month HDD coefficient is statistically indistinguishable from zero. The fit is tight (residual σ ≈ 219 therms against winter readings of 1,400–2,800 therms), and the resulting predictions are now physically sensible at every point in the year — the floor is the empirical baseline rather than wherever an unconstrained OLS intercept happens to land.

2. Per-month prediction intervals: four of the six suspect months are outside

Suspect-month predictions against actual, with 95% prediction intervals from the primary (baseline-pinned) fit.
month actual predicted pi_95 residual outside_pi
Mar 2025 1350 2301 [1,689, 2,913] -951 yes
Apr 2025 5476 1725 [1,149, 2,302] 3751 yes
May 2025 2998 1116 [585, 1,648] 1882 yes
Jun 2025 0 615 [110, 1,120] -615 yes
Jul 2025 0 201 [-282, 685] -201
Aug 2025 0 62 [-420, 543] -62
  • April 2025: actual 5,476 therms vs. predicted 1725 (PI [1,149, 2,302]). The bill exceeds the upper PI by ~3,174 therms.
  • May 2025: actual 2,998 vs. predicted 1116 (PI [585, 1,648]). Exceeds upper PI by ~1,350 therms.
  • March 2025: actual 1,350 vs. predicted 2301 (PI [1,689, 2,913]). The bill is ~339 therms below the lower PI — i.e., March looks under-billed, consistent with a meter that began drifting before the large correction.
  • June 2025: zero reading vs. predicted 615 (PI [110, 1,120]). Outside the lower PI — the model says this month should have been at least ~110 therms.
  • July & August 2025: zero readings sit just inside the (wide) summer prediction interval — the PI bracket itself extends below zero because the residual SD is large relative to summer’s small heating signal. The statistical model is uninformative here; see §4 for the physical-implausibility argument that closes the loop on these two months.
Monthly therms with 95% prediction interval (shaded) from the clean-only HDD model. Red points: flagged months. April and May 2025 sit far above the upper PI; March 2025 sits below the lower PI.

Monthly therms with 95% prediction interval (shaded) from the clean-only HDD model. Red points: flagged months. April and May 2025 sit far above the upper PI; March 2025 sits below the lower PI.

3. Influence diagnostics independently flag April 2025 — and Cook’s D flags March too

Top 8 most influential observations from the full-data fit, sorted by |rstudent|. Bonferroni-corrected outlier threshold: |rstudent| > 3.60. Cook’s D threshold (rule of thumb): 4/n = 0.211.
month therms rstudent cooks_d dffits rstud_flag cooks_flag
Apr 2025 5476 5.72 0.846 2.75 yes yes
Mar 2025 1350 -2.17 0.463 -1.31 yes
May 2025 2998 1.49 0.099 0.57
Jun 2025 0 -0.99 0.044 -0.36
Jun 2024 75 -0.97 0.050 -0.39
Feb 2025 2805 -0.78 0.100 -0.54
Dec 2024 1396 0.43 0.021 0.24
Jul 2025 0 -0.33 0.004 -0.11

April 2025 is the only point that exceeds the Bonferroni-corrected externally-studentized-residual threshold (|rstudent| = 5.72 > 3.60). It also has by far the largest Cook’s distance (0.85), confirming that even with the suspect months included, the regression cannot smooth them away. March 2025 also exceeds the Cook’s-D threshold despite a more modest studentized residual, consistent with the under-billing story.

4. Three consecutive zero readings are physically impossible

The customer’s pre-disputed-window summer usage gives an empirical floor for non-heating gas demand.
statistic value
Prior-summer (Jun–Sep 2024) baseline mean (therms/mo) 57.0
Prior-summer (Jun–Sep 2024) baseline SD 14.4
Z-score of a single 0 reading vs. baseline -3.95
Pr(single zero | Normal baseline) 3.97e-05
Pr(three consecutive zeros | indep. Normal) 6.28e-14
Pr(three consecutive zeros | Poisson, rate=mean) 5.44e-75

The prior summer (June–September 2024) shows a tight baseline of ~57 therms/month for hot water and cooking. A single zero reading is ~4 SDs below that baseline; three independent zero readings have a joint probability somewhere between 6 × 10⁻¹⁴ (under a Normal model that allows some negative noise) and 5 × 10⁻⁷⁵ (under a Poisson model with the same mean). Either bound is far beyond any reasonable rejection threshold. The customer cannot have actually consumed exactly zero therms in three consecutive summer months.

5. Aggregate-window test: ~3,803 therms over-billed (p ≈ 0.00015)

If the gas company shifted real consumption between months via estimated reads, the only physically meaningful quantity is the total therms delivered across the disputed window. We test the sum of the residuals, using the full covariance of the fitted coefficients to get an honest standard error:

Aggregate-window reconciliation. The excess therms billed are statistically distinguishable from zero at any reasonable threshold.
quantity value
Sum of actual therms billed (Mar–Aug 2025) 9,824
Sum of model-predicted therms 6,021
Excess billed (actual − predicted) 3,803
Standard error of the excess 734
95% CI on the excess [2,188, 5,418]
Test statistic t = 5.18 on 11 df
One-sided p-value (H0: excess ≤ 0) 0.00015

Even the lower bound of the 95% confidence interval — ~2,188 therms — is substantial. The upper-bound prediction-interval sum across all six suspect months (9,211 therms) is still below the actual 9,824 therms billed, meaning the bill exceeds what the model would tolerate even in its most generous month-by-month interpretation.


Sensitivity checks

S1. Unconstrained OLS (intercept free to wander)

The primary model in §1 pins the non-heating intercept at the customer’s empirical prior-summer baseline (57 therms/month) instead of estimating it from the data. The natural sensitivity check is to drop that pin and let ordinary least squares estimate the intercept freely from the clean months — both because OLS is the methodologically conventional default, and because we want to confirm that the headline excess doesn’t hinge on the modeling choice. If the two fits roughly agree, the baseline-pinning is doing what it’s supposed to do (excluding physically-impossible negative summer predictions) without distorting the substantive answer.

Aggregate excess under the two model specs. The unconstrained fit produces a slightly larger headline number because it allows physically-implausible negative summer point predictions to net against the over-billing.
quantity Baseline-pinned (primary) Unconstrained OLS
Sum predicted 6,021 5,293
Excess billed 3,803 4,531
95% CI on excess [2,188, 5,418] [2,830, 6,231]

The two specifications agree within their confidence intervals. The baseline-pinned model is the more defensible headline — it forecloses the “you got free credit for the impossible zero readings” objection.

S2. Robust regression on the full sample

Eight most-down-weighted observations under MASS::rlm Huber regression on the full sample. The suspect months Apr, May, Mar, and Jun 2025 are flagged automatically without any prior knowledge of the disputed window.
month therms suspect weight
Apr 2025 5476 TRUE 0.071
May 2025 2998 TRUE 0.138
Mar 2025 1350 TRUE 0.283
Jun 2025 0 TRUE 0.528
Jun 2024 75 FALSE 0.623
Jul 2024 62 FALSE 1.000
Aug 2024 43 FALSE 1.000
Sep 2024 48 FALSE 1.000

The robust fit’s coefficients (intercept −67, HDD 0.09, lag1 HDD 2.42) are close to the unconstrained clean fit’s (intercept −97, HDD 0.21, lag1 HDD 2.31). A completely different estimator, told nothing about which months we suspected, reaches the same conclusion about which months are anomalous.

S3. Refit excluding only April and May 2025

The most conservative possible reading: take Mar 25 and the three summer zeros at face value, exclude only the two largest spikes from the training set, and ask whether Apr/May 2025 are still statistically extreme.

Baseline-pinned model refit excluding only Apr and May 2025 (n = 17). Adj. R^2 = 0.919.
term estimate std_err p_value
hdd65 0.536 0.334 0.129
hdd_lag1 1.502 0.391 0.002
Apr/May 2025 predictions under the conservative refit.
month actual predicted pi_95 outside_pi
Apr 2025 5476 1361 [651, 2,070] yes
May 2025 2998 858 [185, 1,531] yes

Even under this maximally-charitable refit, Apr 2025 (5,476) exceeds the upper PI by ~3,406 therms and May 2025 (2,998) exceeds it by ~1,467 therms. The case for over-billing on these two months does not depend on the treatment of the other suspect observations.

S4. Single-predictor model (lag-1 HDD only)

Parsimonious baseline-pinned model with only lag-1 HDD (n = 13). Adj. R^2 = 0.966.
term estimate std_err p_value
hdd_lag1 2.35 0.121 2.1e-10

Dropping the (insignificant) current-month HDD term still leaves a model with R² ≈ 0.97 and the same qualitative conclusions for Apr/May 2025.


Residual diagnostics

Residuals vs. fitted values, clean-only model. The flagged points are visually distinguishable from the cluster of well-behaved residuals around zero.

Residuals vs. fitted values, clean-only model. The flagged points are visually distinguishable from the cluster of well-behaved residuals around zero.


Caveats

  • Sample size. Twenty months of data is a modest panel; coefficient estimates carry meaningful uncertainty, which is why we report 95% intervals rather than point claims. The conclusions hold across multiple model specifications and an independent robust estimator, which is the appropriate guard against overfitting to a small sample.
  • HDD station. KBOS (Boston Logan) is used as the weather reference. If the customer’s building is materially warmer or cooler than the airport microclimate, the absolute level of predicted therms would shift, but the per-month residual pattern — and therefore the over-billing argument — would not.
  • Non-heating baseline. We pin the intercept at the customer’s empirical prior-summer mean (57 therms/month). If true baseline demand has drifted between summer 2024 and 2025, this number could be off by a few therms either way — but the magnitude of the disputed over-billing dwarfs any plausible shift. As a cross-check, the unconstrained OLS sensitivity (S1) lands in the same neighborhood.

Conclusion

Three independent lines of evidence point to over-billing during March–August 2025:

  1. Per-month tests. April and May 2025 fall outside any reasonable prediction interval on any model specification we tried, and June 2025 also fails the per-month PI test under the baseline-pinned primary model.
  2. Aggregate-window test. The total billed in the disputed window exceeds the model-implied total by ~3,803 therms, with a 95% lower bound of ~2,188 therms and a vanishingly small p-value.
  3. Physical implausibility. Three consecutive zero summer readings are incompatible with the customer’s own prior-summer baseline.

The pattern matches a meter that was reading low (or being estimated low) through early 2025, followed by one or two large “catch-up” estimated bills in April and May that substantially overshot true consumption, and then three months of further estimated reads at zero. Real metered service appears to resume from September 2025 onward, with usage tracking the HDD model cleanly through year-end.