This report analyzes NovaBrew Coffee Roasters’ monthly revenue data from January 2007 to August 2023 (200 observations) using linear and quadratic trend models.
The project is to identify which model better captures the long-run growth pattern and provides more reliable forecasts for decision-making, particularly for a major capital investment.
The analysis includes data preparation, exploratory data analysis, fitting models, residual diagnostics, model comparison, and forecasting.
#Preparation data
## Rows: 200
## Columns: 2
## $ Month <mth> 2007 Jan, 2007 Feb, 2007 Mar, 2007 Apr, 2007 May, 2007 Jun, 20…
## $ Revenue <dbl> 17.8, 25.0, 32.3, 27.2, 54.2, 34.3, 14.8, 28.7, 42.6, 26.8, 32…
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 14.8 113.0 329.3 413.6 669.8 1141.3
## [1] 329.0079
The time series shows a clear upward trend. Revenue increases over time,
and the growth appears to be accelerating, not constant. There is no
obvious seasonal pattern.
The month-to-month changes are not constant and tend to increase over time. This suggests that revenue is growing faster as time goes on, so a quadratic model is more appropriate than a linear model.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 14.8 113.0 329.3 413.6 669.8 1141.3
## [1] 329.0079
The standard deviation is large compared to the mean, showing there is moderate variability in revenue over time. ## Coefficient of Variation
## Coefficient of Variation: 79.6 %
## Series: Revenue
## Model: TSLM
##
## Residuals:
## Min 1Q Median 3Q Max
## -110.01 -63.53 -25.80 56.27 177.67
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -142.02980 10.89674 -13.03 <2e-16 ***
## trend() 5.52831 0.09402 58.80 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 76.76 on 198 degrees of freedom
## Multiple R-squared: 0.9458, Adjusted R-squared: 0.9456
## F-statistic: 3458 on 1 and 198 DF, p-value: < 2.22e-16
\[ \hat{Y}_t = -142.02980 + 5.52831t \]
The average monthly revenue increases for NovaBrew is around $5.53 thousand per month , suggesting the company is adding about $5,528 in revenue per month.
The R² is 0.946, meaning the model explains about 94.6% of the variation in revenue. However, a high R² alone does not guarantee a good model, so residuals still need to be checked.
The residuals display a distinct U-shaped pattern over time, starting positive, dipping negative in the middle, and rising positive again toward the end. This pattern suggests that the linear model fails to capture the curvature in the data, indicating that a quadratic trend would fit better.
Many of the ACF bars rise above the 95% confidence level, especially in the beginning delays. This implies that the residuals are not random and continue to display autocorrelation, showing that the linear model has not fully captured the data’s underlying time-related structure.
The residual histogram is slightly skewed and not perfectly centered, indicating deviations from ideal random errors.
The U-shape in the residuals and significant ACF spikes indicate the linear model is misspecified. It cannot be trusted for forecasting, as it leaves systematic patterns and autocorrelation unaccounted for.
## Series: Revenue
## Model: TSLM
##
## Residuals:
## Min 1Q Median 3Q Max
## -61.2796 -10.5152 0.2241 10.4967 46.9041
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.700e+01 3.667e+00 7.363 4.78e-12 ***
## trend() 5.077e-01 8.423e-02 6.027 8.06e-09 ***
## I(trend()^2) 2.498e-02 4.059e-04 61.543 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17.11 on 197 degrees of freedom
## Multiple R-squared: 0.9973, Adjusted R-squared: 0.9973
## F-statistic: 3.668e+04 on 2 and 197 DF, p-value: < 2.22e-16
\[ \hat{Y}_t = 27.00 + 0.5077t + 0.02498t^2 \]
The positive quadratic term indicates accelerating growth, with each month adding more revenue than the previous one.
The p-value is < (2 × 10⁻¹⁶) shows the quadratic term is highly significant, confirming the revenue trend’s curvature is accuracy.
## # A tibble: 2 × 4
## .model r_squared adj_r_squared AICc
## <chr> <dbl> <dbl> <dbl>
## 1 quadratic 0.997 0.997 1141.
## 2 linear 0.946 0.946 1740.
The quadratic model outperforms the linear model across all key metrics. It has a higher (^2) and adjusted (^2)indicating it explains more variation in revenue, and a much lower AICc, showing a better balance between fit and model complexity. Overall, the quadratic model is clearly the preferred choice.
The U-shape is no longer visible in the residuals, which suggests that the quadratic term has captured the curvature in the data.
The histogram is similarly properly centered at 0 and nearly bell-shaped.
## # A tibble: 2 × 3
## .model lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 linear 1396. 0
## 2 quadratic 11.3 0.334
The Ljung-Box test shows the linear model is not adequate since its residuals still have autocorrelation (p ≈ 0). In contrast, the quadratic model is adequate because its residuals behave like white noise (p = 0.334).
Adding variables always increases R² or leaves it unchanged.
This is why adjusted R² and AICc are useful: they penalize unnecessary complexity.
Residual diagnostics such as U-shapes, ACF, and the Ljung-Box test are equally important.
A model with higher R² but patterned residuals is worse than a model with slightly lower R² but random residuals.
## # A tibble: 2 × 3
## .model Month .mean
## <chr> <mth> <dbl>
## 1 linear 2024 Mar 1002.
## 2 quadratic 2024 Mar 1202.
The gap between the two forecasts gets larger over time because the linear model follows a straight line, while the quadratic model curves upward. As time goes on, the\(t^2\) term has a bigger impact on the forecast.
September 2023 is \(t = 201\), and December 2026 is \(t = 240\), so the forecast horizon is 40 months.
## # A tibble: 3 × 2
## Month .mean
## <mth> <dbl>
## 1 2026 May 1501.
## 2 2026 Jun 1514.
## 3 2026 Jul 1526.
## # A tibble: 2 × 10
## .model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 linear Training -3.41e-15 76.4 65.3 34.7 59.7 0.955 0.979 0.920
## 2 quadratic Training 2.94e-15 17.0 13.2 -2.78 9.36 0.193 0.218 -0.120
Step 1: Model fit.The quadratic model performs better than the linear model since it has a lower AICc and a higher adjusted \(R^2\)meaning it fits the data better overall.
Step 2: Residuals. The quadratic model also passes the residual check. The Ljung-Box test gives a p-value of\(p = 0.334\), which is above 0.05, so the residuals look like white noise.
Step 3: Bias. The mean error is very close to zero, which suggests there is little to no systematic bias in the forecasts.
Step 4: Forecast risk. Even though the quadratic model fits the data well, it may not be reliable for long-term forecasts because it can grow too quickly. This could be risky when making big decisions like opening a new facility.
The Quadratic Model is recommended for short- to medium-term forecasting because it has the best statistical fit and passes all residual diagnostic tests. However, it should be applied with caution when making long-term projections, as projecting a quadratic trend too far into the future might lead to incorrect or overstated predictions.