Introduction

This report evaluates linear and quadratic trend regression models for NovaBrew Coffee Roasters’ monthly revenue data. The objective is to compare the two models and identify the one that most appropriately captures the long-run growth pattern for forecasting and for informing a major capital investment decision.

The analysis includes data preparation, exploratory data analysis, model fitting, residual diagnostics, model comparison, and forecast generation. The final recommendation is based on both statistical performance and practical considerations for deployment.

The dataset contains 200 monthly observations of NovaBrew’s revenue from January 2007 to August 2023.

Data Setup

## Rows: 200
## Columns: 2
## $ Month   <mth> 2007 Jan, 2007 Feb, 2007 Mar, 2007 Apr, 2007 May, 2007 Jun, 20…
## $ Revenue <dbl> 17.8, 25.0, 32.3, 27.2, 54.2, 34.3, 14.8, 28.7, 42.6, 26.8, 32…
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    14.8   113.0   329.3   413.6   669.8  1141.3
## [1] 329.0079

Exploratory Data Analysis

Q1: Time Series Plot

The plot shows a strong upward trend in monthly revenue. Growth appears to accelerate slightly over time, suggesting that a nonlinear trend may be present.

Q2: Month-over-Month Change

The month-over-month changes fluctuate around zero without a clear long-run pattern. This suggests that a linear trend may be a reasonable initial model, although some curvature is still possible.

Q3: Summary Statistics

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    14.8   113.0   329.3   413.6   669.8  1141.3
## [1] 329.0079

Coefficient of Variation

## Coefficient of Variation: 79.6 %

Linear Trend Model

## Series: Revenue 
## Model: TSLM 
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -110.01  -63.53  -25.80   56.27  177.67 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -142.02980   10.89674  -13.03   <2e-16 ***
## trend()        5.52831    0.09402   58.80   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 76.76 on 198 degrees of freedom
## Multiple R-squared: 0.9458,  Adjusted R-squared: 0.9456
## F-statistic:  3458 on 1 and 198 DF, p-value: < 2.22e-16

Q4: Linear Regression Equation

\[ \hat{Y}_t = -142.02980 + 5.52831t \]

  • \(b_0 = -142.02980\)
  • \(b_1 = 5.52831\)

Q5: Business Interpretation of \(b_1\)

NovaBrew’s monthly revenue increases by about $5.53 thousand per month on average, which means the business is adding roughly $5,528 in revenue each month.

The model has \(R^2 = 0.9458\), meaning time explains about 94.6% of the variation in revenue. That is very high, but a high \(R^2\) alone does not guarantee that the model is correctly specified.

Residual Diagnostics: Linear Model

Q7: Residual Time Plot

The residuals form a clear U-shape over time: positive at the beginning, negative in the middle, and positive again near the end. This indicates that the linear model is missing curvature, so a quadratic trend is likely more appropriate.

Q8: Residual ACF

Most of the ACF bars fall outside the 95% confidence bands, especially at early lags. This shows that the residuals are not white noise and still contain autocorrelation. In other words, the linear model has not captured all of the time structure in the data.

Q9: Residual Histogram

The histogram is not perfectly centered at zero and is not fully bell-shaped. It shows some asymmetry and spread, which suggests that the residuals are not ideal random errors.

Q10: Overall Assessment

Taken together, the U-shape in the residual plot and the significant ACF spikes show that the linear model is misspecified. I would not trust this linear model for forecasting because it leaves systematic pattern and autocorrelation in the residuals, which means important information is still being missed.

Quadratic Trend Model

## Series: Revenue 
## Model: TSLM 
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -61.2796 -10.5152   0.2241  10.4967  46.9041 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.700e+01  3.667e+00   7.363 4.78e-12 ***
## trend()      5.077e-01  8.423e-02   6.027 8.06e-09 ***
## I(trend()^2) 2.498e-02  4.059e-04  61.543  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17.11 on 197 degrees of freedom
## Multiple R-squared: 0.9973,  Adjusted R-squared: 0.9973
## F-statistic: 3.668e+04 on 2 and 197 DF, p-value: < 2.22e-16

Q11: Quadratic Regression Equation

\[ \hat{Y}_t = 27.00 + 0.5077t + 0.02498t^2 \]

  • \(b_0 = 27.00\)
  • \(b_1 = 0.5077\)
  • \(b_2 = 0.02498\)

Q12: Interpretation of \(b_2\)

The positive quadratic coefficient means NovaBrew’s growth is accelerating over time. Each new month adds slightly more revenue than the month before, so the business is not only growing, but growing faster.

Q13: Significance of \(b_2\)

The p-value for \(I(\text{trend}()^2)\) is less than \(2 \times 10^{-16}\), so the quadratic term is statistically significant. This provides very strong evidence that the curvature in the revenue trend is real.

Model Comparison

Fitting Both Models

Q14: Model Comparison Table

## # A tibble: 2 × 4
##   .model    r_squared adj_r_squared  AICc
##   <chr>         <dbl>         <dbl> <dbl>
## 1 quadratic     0.997         0.997 1141.
## 2 linear        0.946         0.946 1740.

The quadratic model performs better on all three metrics. It has a higher \(R^2\), a higher adjusted \(R^2\), and a much lower AICc than the linear model.

Based on this table, the quadratic model is strongly preferred because it explains more variation while also achieving a substantially better AICc value.

Q15: Quadratic Residual Diagnostics

The quadratic term has absorbed the curvature, so the U-shape no longer appears in the residuals.

The histogram is also perfectly centered at zero and is somewhat bell-shaped.

Q16: Ljung-Box Test for Both Models

## # A tibble: 2 × 3
##   .model    lb_stat lb_pvalue
##   <chr>       <dbl>     <dbl>
## 1 linear     1396.      0    
## 2 quadratic    11.3     0.334

The Ljung-Box test indicates that the linear model is inadequate because its residuals still show significant autocorrelation \((p \approx 0)\).

In contrast, the quadratic model appears adequate since its residuals are consistent with white noise \((p = 0.334)\).

Q17: Conceptual Note

  • \(R^2\) always increases, or stays the same, when variables are added to a model.

  • This is why adjusted \(R^2\) and AICc are useful: they penalize unnecessary complexity.

  • Residual diagnostics such as U-shapes, ACF, and the Ljung-Box test are equally important.

  • A model with higher \(R^2\) but patterned residuals is worse than a model with slightly lower \(R^2\) and random residuals.

Forecasts

Q18: 12-Month Forecast

## # A tibble: 2 × 3
##   .model       Month .mean
##   <chr>        <mth> <dbl>
## 1 linear    2024 Mar 1002.
## 2 quadratic 2024 Mar 1202.

The gap between the two forecasts widens over time because the linear model extrapolates a straight line, while the quadratic model curves upward.

As the forecast horizon increases, the \(t^2\) term becomes more influential.

Q19: Extend Forecast to December 2026

September 2023 is \(t = 201\), and December 2026 is \(t = 240\), so the forecast horizon is 40 months.

## # A tibble: 3 × 2
##      Month .mean
##      <mth> <dbl>
## 1 2026 May 1501.
## 2 2026 Jun 1514.
## 3 2026 Jul 1526.

Accuracy Summary

## # A tibble: 2 × 10
##   .model    .type           ME  RMSE   MAE   MPE  MAPE  MASE RMSSE   ACF1
##   <chr>     <chr>        <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 linear    Training -9.12e-15  76.4  65.3 34.7  59.7  0.955 0.979  0.920
## 2 quadratic Training  1.54e-15  17.0  13.2 -2.78  9.36 0.193 0.218 -0.120

Q20: Deployment Decision Checklist

  • Step 1: Model fit. The quadratic model is better than the linear model because it has the lower AICc and the higher adjusted \(R^2\), so it explains the data more effectively without relying only on raw fit.

  • Step 2: Residuals. The quadratic model also passes the residual check because the Ljung-Box result had \(p = 0.334\), which is above 0.05, so the residuals are consistent with white noise.

  • Step 3: Bias. The mean error for the quadratic model is essentially zero, which suggests little to no systematic forecast bias.

  • Step 4: Forecast risk. Even though the quadratic model fits best, quadratic forecasts can become unrealistic far into the future. This is a concern for capital decisions, especially beyond the observed 2007–2023 range.

Recommendation

The Quadratic Model should be used for short- to medium-term forecasting because it provides the best statistical fit and passes residual diagnostics.

However, it should be applied cautiously for long-range forecasting, since extrapolated quadratic trends may become unrealistic over time.

References

  • Regression Trend Code from the file Regression_Trend

  • Data from NovaBrew_Revenue.csv

  • R Markdown formatting guidance: Perplexity reference

  • YAML Code is take from the above source to format and clean the report - Line 14:47