Introduction

This report analyzes NovaBrew Coffee Roasters’ monthly revenue data from January 2007 to August 2023 (200 observations) using linear and quadratic trend models.

The project is to identify which model better captures the long-run growth pattern and provides more reliable forecasts for decision-making, particularly for a major capital investment.

The analysis includes data preparation, exploratory data analysis, fitting models, residual diagnostics, model comparison, and forecasting.

#Preparation data

## Rows: 200
## Columns: 2
## $ Month   <mth> 2007 Jan, 2007 Feb, 2007 Mar, 2007 Apr, 2007 May, 2007 Jun, 20…
## $ Revenue <dbl> 17.8, 25.0, 32.3, 27.2, 54.2, 34.3, 14.8, 28.7, 42.6, 26.8, 32…
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    14.8   113.0   329.3   413.6   669.8  1141.3
## [1] 329.0079

Exploratory Data Analysis

Q1: Time Series Plot

The time series shows a clear upward trend. Revenue increases over time, and the growth appears to be accelerating, not constant. There is no obvious seasonal pattern.

Q2: Month-over-Month Change

The month-to-month changes are not constant and tend to increase over time. This suggests that revenue is growing faster as time goes on, so a quadratic model is more appropriate than a linear model.

Q3: Summary Statistics

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    14.8   113.0   329.3   413.6   669.8  1141.3
## [1] 329.0079

The standard deviation is large compared to the mean, showing there is moderate variability in revenue over time. ## Coefficient of Variation

## Coefficient of Variation: 79.6 %

Linear Trend Model

## Series: Revenue 
## Model: TSLM 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -110.01  -63.53  -25.80   56.27  177.67 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -142.02980   10.89674  -13.03   <2e-16 ***
## trend()        5.52831    0.09402   58.80   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 76.76 on 198 degrees of freedom
## Multiple R-squared: 0.9458,  Adjusted R-squared: 0.9456
## F-statistic:  3458 on 1 and 198 DF, p-value: < 2.22e-16

Q4: Linear Regression Equation

\[ \hat{Y}_t = -142.02980 + 5.52831t \]

  • \(b_0 = -142.02980\)
  • \(b_1 = 5.52831\)

Q5: Business Interpretation of \(b_1\)

The average monthly revenue increases for NovaBrew is around $5.53 thousand per month , suggesting the company is adding about $5,528 in revenue per month.

The R² is 0.946, meaning the model explains about 94.6% of the variation in revenue. However, a high R² alone does not guarantee a good model, so residuals still need to be checked.

Residual Diagnostics: Linear Model

Q7:Residual Time Plot

The residuals display a distinct U-shaped pattern over time, starting positive, dipping negative in the middle, and rising positive again toward the end. This pattern suggests that the linear model fails to capture the curvature in the data, indicating that a quadratic trend would fit better.

Q8: Residual ACF

Many of the ACF bars rise above the 95% confidence level, especially in the beginning delays. This implies that the residuals are not random and continue to display autocorrelation, showing that the linear model has not fully captured the data’s underlying time-related structure.

Q9: Residual Histogram

The residual histogram is slightly skewed and not perfectly centered, indicating deviations from ideal random errors.

Q10: Overall Assessment

The U-shape in the residuals and significant ACF spikes indicate the linear model is misspecified. It cannot be trusted for forecasting, as it leaves systematic patterns and autocorrelation unaccounted for.

Quadratic Trend Model

## Series: Revenue 
## Model: TSLM 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -61.2796 -10.5152   0.2241  10.4967  46.9041 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.700e+01  3.667e+00   7.363 4.78e-12 ***
## trend()      5.077e-01  8.423e-02   6.027 8.06e-09 ***
## I(trend()^2) 2.498e-02  4.059e-04  61.543  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.11 on 197 degrees of freedom
## Multiple R-squared: 0.9973,  Adjusted R-squared: 0.9973
## F-statistic: 3.668e+04 on 2 and 197 DF, p-value: < 2.22e-16

Q11: Quadratic Regression Equation

\[ \hat{Y}_t = 27.00 + 0.5077t + 0.02498t^2 \]

  • \(b_0 = 27.00\)
  • \(b_1 = 0.5077\)
  • \(b_2 = 0.02498\)

Q12: Interpretation of \(b_2\)

The positive quadratic term indicates accelerating growth, with each month adding more revenue than the previous one.

Q13: Significance of \(b_2\)

The p-value is < (2 × 10⁻¹⁶) shows the quadratic term is highly significant, confirming the revenue trend’s curvature is accuracy.

Model Comparison

Fitting Both Models

Q14: Model Comparison Table

## # A tibble: 2 × 4
##   .model    r_squared adj_r_squared  AICc
##   <chr>         <dbl>         <dbl> <dbl>
## 1 quadratic     0.997         0.997 1141.
## 2 linear        0.946         0.946 1740.

The quadratic model outperforms the linear model across all key metrics. It has a higher (^2) and adjusted (^2)indicating it explains more variation in revenue, and a much lower AICc, showing a better balance between fit and model complexity. Overall, the quadratic model is clearly the preferred choice.

Q15: Quadratic Residual Diagnostics

The U-shape is no longer visible in the residuals, which suggests that the quadratic term has captured the curvature in the data.

The histogram is similarly properly centered at 0 and nearly bell-shaped.

Q16: Ljung-Box Test for Both Models

## # A tibble: 2 × 3
##   .model    lb_stat lb_pvalue
##   <chr>       <dbl>     <dbl>
## 1 linear     1396.      0    
## 2 quadratic    11.3     0.334

The Ljung-Box test shows the linear model is not adequate since its residuals still have autocorrelation (p ≈ 0). In contrast, the quadratic model is adequate because its residuals behave like white noise (p = 0.334).

Q17: Conceptual Note

  • Adding variables always increases R² or leaves it unchanged.

  • This is why adjusted R² and AICc are useful: they penalize unnecessary complexity.

  • Residual diagnostics such as U-shapes, ACF, and the Ljung-Box test are equally important.

  • A model with higher R² but patterned residuals is worse than a model with slightly lower R² but random residuals.

Forecasts

Q18: 12-Month Forecast

## # A tibble: 2 × 3
##   .model       Month .mean
##   <chr>        <mth> <dbl>
## 1 linear    2024 Mar 1002.
## 2 quadratic 2024 Mar 1202.

The gap between the two forecasts gets larger over time because the linear model follows a straight line, while the quadratic model curves upward. As time goes on, the\(t^2\) term has a bigger impact on the forecast.

Q19: Extend Forecast to December 2026

September 2023 is \(t = 201\), and December 2026 is \(t = 240\), so the forecast horizon is 40 months.

## # A tibble: 3 × 2
##      Month .mean
##      <mth> <dbl>
## 1 2026 May 1501.
## 2 2026 Jun 1514.
## 3 2026 Jul 1526.

Accuracy Summary

## # A tibble: 2 × 10
##   .model    .type           ME  RMSE   MAE   MPE  MAPE  MASE RMSSE   ACF1
##   <chr>     <chr>        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 linear    Training -3.41e-15  76.4  65.3 34.7  59.7  0.955 0.979  0.920
## 2 quadratic Training  2.94e-15  17.0  13.2 -2.78  9.36 0.193 0.218 -0.120

Q20: Deployment Decision Checklist

  • Step 1: Model fit.The quadratic model performs better than the linear model since it has a lower AICc and a higher adjusted \(R^2\)meaning it fits the data better overall.

  • Step 2: Residuals. The quadratic model also passes the residual check. The Ljung-Box test gives a p-value of\(p = 0.334\), which is above 0.05, so the residuals look like white noise.

  • Step 3: Bias. The mean error is very close to zero, which suggests there is little to no systematic bias in the forecasts.

  • Step 4: Forecast risk. Even though the quadratic model fits the data well, it may not be reliable for long-term forecasts because it can grow too quickly. This could be risky when making big decisions like opening a new facility.

Conclusion

The Quadratic Model is recommended for short- to medium-term forecasting because it has the best statistical fit and passes all residual diagnostic tests. However, it should be applied with caution when making long-term projections, as projecting a quadratic trend too far into the future might lead to incorrect or overstated predictions.