── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): Month
dbl (1): U.S. Natural Gas Residential Consumption Million Cubic Feet
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
1.1 Data Inspection
We quickly inspect the structure and missingness to confirm the data was read correctly.
We regress consumption on month indicators. This mirrors an Excel regression with all months included.
reg<-lm(data =df_dummies, formula =Consumption~.)# str(reg)summary(reg)
Call:
lm(formula = Consumption ~ ., data = df_dummies)
Residuals:
Min 1Q Median 3Q Max
-206820 -24199 -1414 21833 220508
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 693832 7387 93.922 < 2e-16 ***
Month_name_Jan 172866 10398 16.625 < 2e-16 ***
Month_name_Feb 63970 10398 6.152 1.37e-09 ***
Month_name_Mar -83862 10398 -8.065 3.81e-15 ***
Month_name_Apr -296965 10398 -28.560 < 2e-16 ***
Month_name_May -457522 10398 -44.001 < 2e-16 ***
Month_name_Jun -540075 10447 -51.695 < 2e-16 ***
Month_name_Jul -567629 10447 -54.333 < 2e-16 ***
Month_name_Aug -577021 10447 -55.232 < 2e-16 ***
Month_name_Sep -565792 10447 -54.157 < 2e-16 ***
Month_name_Oct -473732 10447 -45.345 < 2e-16 ***
Month_name_Nov -270796 10447 -25.920 < 2e-16 ***
Month_name_Dec NA NA NA NA
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 53270 on 617 degrees of freedom
Multiple R-squared: 0.9612, Adjusted R-squared: 0.9605
F-statistic: 1388 on 11 and 617 DF, p-value: < 2.2e-16
3 Verifying Excel vs R Calculations
These quick checks confirm that R’s fitted values match Excel’s month-based calculations.
693832-577021# Aug R coefficient calc matches Aug Excel coefficient below (116881)
[1] 116811
693832+172866# Jan R coefficient calc matches
[1] 866698
116810+749887# Jan Excel coefficient below
[1] 866697
Excel’s Output
3.1 Plotting Actual vs Fitted Values
Finally, we plot the actual consumption against the fitted values from our regression model.
df_plot<-df%>%mutate( Month =tsibble::yearmonth(Month), Fitted =fitted(reg))ggplot(df_plot, aes(x =Month))+geom_line(aes(y =`U.S. Natural Gas Residential Consumption Million Cubic Feet`, color ="Actual"), linewidth =1)+geom_line(aes(y =Fitted, color ="Fitted"), linewidth =.8, linetype ="dashed")+labs( title ="Actual vs Fitted Residential Natural Gas Consumption", x ="Time", y ="Consumption (Million Cubic Feet)", color ="")+theme_minimal()
3.2 Excel Comparison
The figure below shows the corresponding Excel chart used for replication.
Figure 1: Excel chart.
ImportantKey Takeaway
This exercise demonstrates how a month-dummy regression reproduces Excel’s seasonal logic exactly, while providing a clearer and more extensible framework in R.
TipReplicate the regression using TSLM() from fpp3