1 Problem Framing

1.1 Sustainability Challenge

The central sustainability challenge examined in this analysis is the escalating and irregular global deployment of solar photovoltaic (PV) electricity generation between 1990 and 2024 and whether the current trajectory is sufficient to meet international decarbonisation commitments by 2030 and 2035. The variable of interest is total world solar electricity generation (Terawatt-hours, TWh), sourced from the Energy Institute’s Statistical Review of World Energy 2025 (74th edition), specifically the “Solar Generation – TWh” panel of the EI-Stats-Review-ALL-data.xlsx dataset.

The series documents a transformation of historic proportions: from a negligible 0.39 TWh in 1990, a technology confined to satellites and demonstration projects, to 2,111 TWh in 2024, representing a compound annual growth rate (CAGR) of approximately 28% sustained over three decades (Energy Institute, 2025). To place this in physical context, 2,111 TWh is sufficient to power the entire United Kingdom’s electricity demand approximately five times over. Yet despite this extraordinary growth, solar electricity accounted for only 6.8% of global electricity generation in 2024 (Energy Institute, 2025), underscoring the magnitude of further scaling required.

This is not a diffuse “renewable energy” question. The specific, time-bounded research question driving this analysis is: “What will global solar electricity generation (TWh) be in 2030 and 2035, given the historical growth trajectory from 1990–2024, and does this trajectory align with the IEA Net Zero Emissions by 2050 scenario, which requires approximately 14,000 TWh of solar generation by 2030?” (IEA, 2023). The gap between the model’s central forecast and that target constitutes a quantifiable policy shortfall, a measure of how much additional intervention is required.

1.2 Forecasting Value and Policy Relevance

Time series forecasting of solar generation serves distinct decision needs across at least four stakeholder classes, each operating on different timescales and requiring different levels of precision:

Stakeholder Decision Horizon Specific Decision What the Forecast Answers
National transmission operators (e.g., ENTSO-E, India POSOCO) 5–10 years Grid infrastructure investment — storage, interconnectors, demand flexibility How much variable generation must the grid accommodate by 2030?
UNFCCC / IPCC Working Group III 2030–2035 NDC (Nationally Determined Contribution) assessment Is the 1.5°C pathway’s solar requirement on track?
Development finance institutions (World Bank, ADB) 7–15 years Capital allocation for emerging market solar programmes Where is the gap between trajectory and climate targets?
Energy Institute / IEA analysts Annual Revision of decarbonisation scenario benchmarks Has the growth rate accelerated or decelerated relative to prior projections?

The forecast produced here directly addresses the first two groups. As Heffron and McCauley (2018) argue, energy justice and transition governance require that policy decisions be grounded in credible quantitative projections, not merely qualitative narratives. A Bayesian framework is particularly well-suited here because it produces full posterior predictive distributions, communicating not just what is expected but how uncertain that expectation is, which is essential for robust infrastructure planning under deep uncertainty (Lempert et al., 2003).

1.3 Known Structural Changes and External Drivers

The solar generation time series is not a stationary stochastic process; it has been repeatedly reshaped by external interventions and technological discontinuities. Three structural phases are identifiable and will influence modelling choices:

Phase 1 — The Feed-in Tariff Era (2000–2012): The German Erneuerbare-Energien-Gesetz (EEG, 2000) and its counterparts in Spain, Italy, and Japan established guaranteed above-market prices for solar electricity, triggering the first wave of utility-scale deployment. Global generation accelerated from 1 TWh (2000) to 102 TWh (2012). Ironically, the tariff schemes were so successful that several were curtailed — Germany retroactively reduced tariffs in 2012, Spain effectively eliminated them in 2013 (Mir-Artigues & del Río, 2016) — creating a temporary deceleration in European additions.

Phase 2 — Chinese Manufacturing Dominance and Grid Parity (2013–2019): China’s state-supported scaling of polysilicon and module manufacturing drove solar panel prices from above $2.00/watt-peak (Wp) in 2010 to below $0.25/Wp by 2019 (BloombergNEF, 2023), achieving grid parity in most markets without subsidies. This technological shift — analogous to the cost learning curves documented by Wright’s Law (Wright, 1936; Lafond et al., 2018) — qualitatively changed the growth mechanism from policy-dependent to market-driven. It constitutes a structural break in the price-deployment relationship, though the log-linear growth rate in the generation series remained surprisingly stable across this transition.

Phase 3 — Energy Security and Post-COVID Acceleration (2020–2024): Russia’s invasion of Ukraine in February 2022 triggered a European energy security crisis that dramatically accelerated solar commitments (Hoolohan et al., 2023). In the United States, the Inflation Reduction Act (IRA, August 2022) provided an estimated $370 billion in clean energy tax credits, the largest climate investment in US history (Bistline et al., 2023). Simultaneously, China nearly doubled its solar installations in 2023–2024 alone, accounting for 57% of global new additions in 2024 (Energy Institute, 2025). The net effect is an upward shift in the growth rate — the log-scale series shows a slight steepening after 2022 — which the model must accommodate.

These three phases motivate the choice of a log-linear AR(1) trend model: the exponential growth mechanism is stable enough across phases to warrant a single trend coefficient, but the AR(1) component captures the momentum and mean-reversion of policy-driven deviations around that trend.

Key references: Energy Institute (2025); IEA (2023); Mir-Artigues & del Río (2016); Bistline et al. (2023); BloombergNEF (2023); Heffron & McCauley (2018); Lempert et al. (2003).


2 Exploratory Data Analysis

2.1 Data Loading

library(tidyverse)
library(readxl)
library(tseries)
library(forecast)
library(patchwork)
library(zoo)

solar_sheet <- read_excel(
  "EI-Stats-Review-ALL-data.xlsx",
  sheet     = "Solar Generation - TWh",
  col_names = FALSE
)

header_row    <- as.character(unlist(solar_sheet[3, ]))
year_cols     <- which(header_row %in% as.character(1990:2024))
year_cols     <- year_cols[1:35]   # enforce exactly 35 years
world_row_idx <- which(solar_sheet[[1]] == "Total World")
solar_vals    <- as.numeric(unlist(solar_sheet[world_row_idx, year_cols]))

solar_raw <- data.frame(year = 1990:2024, solar_twh = solar_vals) |>
  mutate(
    log_solar = log(solar_twh),
    year_c    = year - mean(year),
    decade    = cut(year, breaks = c(1989,1999,2009,2019,2024),
                    labels = c("1990s","2000s","2010s","2020s"))
  )

cat(sprintf(
  "Source: EI Statistical Review of World Energy 2025\nObservations: %d  |  Range: %.2f – %.1f TWh  |  Log-range: %.2f – %.2f\n",
  nrow(solar_raw),
  min(solar_raw$solar_twh), max(solar_raw$solar_twh),
  min(solar_raw$log_solar),  max(solar_raw$log_solar)
))
## Source: EI Statistical Review of World Energy 2025
## Observations: 35  |  Range: 0.39 – 2111.7 TWh  |  Log-range: -0.95 – 7.66
glimpse(solar_raw)
## Rows: 35
## Columns: 5
## $ year      <int> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, …
## $ solar_twh <dbl> 0.3882499, 0.5052229, 0.4665791, 0.5566775, 0.5969829, 0.638…
## $ log_solar <dbl> -0.94610595, -0.68275561, -0.76232769, -0.58576916, -0.51586…
## $ year_c    <dbl> -17, -16, -15, -14, -13, -12, -11, -10, -9, -8, -7, -6, -5, …
## $ decade    <fct> 1990s, 1990s, 1990s, 1990s, 1990s, 1990s, 1990s, 1990s, 1990…

2.2 Time Series Plot

theme_solar <- function(base_size = 12) {
  theme_minimal(base_size = base_size) %+replace%
    theme(
      plot.title       = element_text(face = "bold", colour = "#6B3F12",
                                      family = "serif", size = base_size + 1),
      plot.subtitle    = element_text(colour = "#7A6A55", size = base_size - 1),
      plot.background  = element_rect(fill = "#FEFAF4", colour = NA),
      panel.background = element_rect(fill = "#FDF6EA", colour = NA),
      panel.grid.major = element_line(colour = "#EDE0C8", linewidth = 0.4),
      panel.grid.minor = element_blank(),
      axis.title       = element_text(colour = "#6B3F12", face = "bold"),
      axis.text        = element_text(colour = "#7A6A55")
    )
}

p_raw <- ggplot(solar_raw, aes(x = year, y = solar_twh)) +
  geom_area(fill = "#E8A020", alpha = 0.18) +
  geom_line(colour = "#C47A1B", linewidth = 1.0) +
  geom_point(size = 2.0, colour = "#6B3F12", alpha = 0.8) +
  geom_vline(xintercept = 2010, linetype = "dashed",
             colour = "#2E7D6A", linewidth = 0.7) +
  annotate("text", x = 2011, y = 1900,
           label = "Grid-parity\ntipping point",
           colour = "#2E7D6A", size = 3, hjust = 0, fontface = "italic") +
  geom_vline(xintercept = 2019, linetype = "dashed",
             colour = "#8B4513", linewidth = 0.7) +
  annotate("text", x = 2014, y = 1550,
           label = "China-led\nacceleration",
           colour = "#8B4513", size = 3, hjust = 0, fontface = "italic") +
  scale_y_continuous(labels = scales::comma) +
  labs(title    = "Figure 1A: Global Solar Power Generation (Raw), 1990–2024",
       subtitle = "Strongly right-skewed exponential growth; two structural acceleration phases",
       x = "Year", y = "Solar Generation (TWh)") +
  theme_solar()

p_log <- ggplot(solar_raw, aes(x = year, y = log_solar)) +
  geom_line(colour = "#C47A1B", linewidth = 1.0) +
  geom_point(size = 2.0, colour = "#6B3F12", alpha = 0.8) +
  geom_smooth(method = "lm", se = TRUE, colour = "#E8A020",
              linetype = "dashed", linewidth = 0.9, fill = "#F5E0B5") +
  labs(title    = "Figure 1B: Log-Transformed Solar Generation, 1990–2024",
       subtitle = "Near-linear trend on log scale; confirms underlying exponential growth",
       x = "Year", y = "log(Solar Generation, TWh)") +
  theme_solar()

p_raw / p_log

2.2.1 Interpretation of the Raw and Log-Transformed Time Series

The time series plot is the foundational diagnostic tool in any temporal analysis which reveals trend, volatility structure, potential outliers and gross non-stationarity before any formal tests are applied. Annotating structural breaks (2010, 2019) directly on the plot contextualises the formal modelling choices that follow.

The raw solar generation series on its original TWh scale is displayed in the graph. The almost vertical slope after 2019 is the visual signature of exponential growth and the absolute increment added each year grows larger even as the percentage growth rate remains roughly constant. This is precisely the type of series that violates the stationarity assumptions of classical time series models (Box et al., 2015).

To understand the exponential structure intuitively: if a town’s population doubles every 10 years, plotting raw population produces a J-curve identical in shape to the graph. Plotting the logarithm of population produces a straight line and that straight line is what we see in second graph. The log-transform is a change of scale that makes the underlying growth mechanism visible.

The second graph’s near-linear log-scale series confirms two critical modelling decisions: (1) a Gaussian linear trend model is appropriate on the log scale, satisfying the normality assumption required by `brms (2) the slight steepening visible after 2022 (consistent with Phase 3 structural change) is modest enough that a single trend coefficient remains reasonable, rather than requiring a piecewise or changepoint model. The log-transformation also stabilises variance. Take note of the residual scatter in the first graph increasing dramatically with level, while the second graph shows a roughly constant scatter around the trend line (heteroscedasticity converted to homoscedasticity).

2.3 Trend Decomposition

For annual data (frequency = 1), classical decompose() is not applicable. We instead extract the smooth trend via LOESS and compute residuals directly.

solar_ts_log <- ts(solar_raw$log_solar, start = 1990, frequency = 1)

trend_fit  <- loess(log_solar ~ year, data = solar_raw, span = 0.75)
trend_vals <- predict(trend_fit)
resid_vals <- solar_raw$log_solar - trend_vals

decomp_df <- data.frame(
  year     = solar_raw$year,
  observed = solar_raw$log_solar,
  trend    = trend_vals,
  residual = resid_vals
)

p_obs <- ggplot(decomp_df, aes(x = year, y = observed)) +
  geom_line(colour = "#C47A1B", linewidth = 0.9) +
  labs(title = "Observed (log TWh)", x = NULL, y = "log TWh") +
  theme_solar(10)

p_tr <- ggplot(decomp_df, aes(x = year, y = trend)) +
  geom_line(colour = "#E8A020", linewidth = 0.9) +
  labs(title = "LOESS Trend Component", x = NULL, y = "log TWh") +
  theme_solar(10)

p_re <- ggplot(decomp_df, aes(x = year, y = residual)) +
  geom_col(fill = "#6B3F12", alpha = 0.6) +
  geom_hline(yintercept = 0, linetype = "dashed", colour = "#7A6A55") +
  labs(title = "Remainder / Noise", x = "Year", y = "Residual") +
  theme_solar(10)

(p_obs / p_tr / p_re) +
  plot_annotation(
    title    = "Figure 2: Trend Decomposition — log(Solar Generation)",
    subtitle = "Annual data: LOESS trend + residuals (no seasonal component applicable)",
    theme    = theme(
      plot.title    = element_text(face = "bold", colour = "#6B3F12",
                                   family = "serif", size = 12),
      plot.subtitle = element_text(colour = "#7A6A55", size = 10),
      plot.background = element_rect(fill = "#FEFAF4", colour = NA)
    )
  )

Classical STL decomposition (Cleveland et al., 1990) requires time series with a seasonal period — typically monthly or quarterly data. For annual data such as this, there is no seasonal component to estimate. We instead decompose using LOESS (Locally Estimated Scatterplot Smoothing), which fits a flexible polynomial trend to the data locally, isolating the smooth trend from year-to-year noise without imposing a parametric form. This is methodologically consistent with the approach recommended for annual environmental time series (Mudelsee, 2019). The decomposition plot helps us separate the steady signal from the noise, allowing us to understand if there is a pattern amidst change.

The LOESS trend is near-monotonic on the log scale, confirming exponential growth. Residuals are larger in 2008–2012 (European feed-in tariff surges) and again around 2022–2024 (the energy security acceleration). These temporal clusters in residuals depict autocorrelation, meaning the errors are not independent across years.

This decomposition directly motivates the inclusion of an AR(1) autocorrelation term in the Bayesian model. If residuals were pure white noise, a simple linear trend model would suffice. The clustered residuals indicate that knowing whether last year’s solar generation was above or below trend genuinely helps predict whether this year’s will be, the defining characteristic of an autoregressive process (Hamilton, 1994). The decomposition therefore provides visual, pre-modelling justification for the AR(1) specification before any formal ACF analysis.

2.4 ACF and PACF Analysis

solar_ts_log_d1 <- diff(solar_ts_log, differences = 1)

par(mfrow = c(2, 2), mar = c(4, 4, 3, 1),
    bg = "#FEFAF4", col.main = "#6B3F12", col.axis = "#7A6A55")

acf(solar_ts_log,    lag.max = 15, main = "ACF: log(Solar) — Original",
    col = "#C47A1B", lwd = 2)
pacf(solar_ts_log,   lag.max = 15, main = "PACF: log(Solar) — Original",
    col = "#E8A020", lwd = 2)
acf(solar_ts_log_d1, lag.max = 15, main = "ACF: First-Differenced log(Solar)",
    col = "#2E7D6A", lwd = 2)
pacf(solar_ts_log_d1,lag.max = 15, main = "PACF: First-Differenced log(Solar)",
    col = "#6B3F12", lwd = 2)

par(mfrow = c(1, 1))

The ACF and PACF serve as the primary model identification tools here. Original log series: The ACF decays slowly and geometrically across all 15 lags, with nearly every bar exceeding the 95% confidence bounds. This is not the pattern of an MA or bounded AR process, both would show sharp cutoffs. The PACF shows one dominant spike at lag 1, then drops to near-zero at all subsequent lags. Geometric ACF decay paired with a single PACF spike is the canonical identification signature of an AR(1) process (Box et al., 2015, p. 64). The near-unity magnitude of the lag-1 PACF spike further suggests the AR(1) coefficient is close to 1, indicating non-stationarity.

First-differenced log series: After one difference, both ACF and PACF fall inside the confidence bounds beyond lag 1 and remain there. The ACF does not show the slow tail of an MA(q) process. The PACF does not show multiple significant spikes that would indicate AR(2) or higher. This combined pattern — sharp cutoff in both functions after lag 1, unambiguously identifies AR(1) residuals on a log-linear trend, implemented in brms as autocor = cor_ar(~year, p=1). An MA(1) would show the reverse (sharp ACF cutoff, geometric PACF decay); AR(2) would require a second significant PACF spike. Neither is observed. The evidence points to one model class only.

Series ACF PACF Model Class
log(Solar) — Original Slow geometric decay — all lags significant Dominant lag-1 spike, then negligible Non-stationary; AR(1) component with unit root
First-differenced log(Solar) Cuts off sharply after lag 1 Cuts off sharply after lag 1 AR(1) on differences → AR(1) trend model on log scale

2.5 Stationarity Tests

adf_orig  <- adf.test(solar_ts_log,    k = 2)
adf_diff  <- adf.test(solar_ts_log_d1, k = 1)
kpss_orig <- kpss.test(solar_ts_log,    null = "Trend")
kpss_diff <- kpss.test(solar_ts_log_d1, null = "Level")

cat(sprintf(
"STATIONARITY TESTS — log(Solar Generation, TWh)

ADF Test — Original log series:
  Dickey-Fuller statistic : %.4f
  p-value                 : %.4f
  Conclusion              : %s

ADF Test — First-differenced log series:
  Dickey-Fuller statistic : %.4f
  p-value                 : %.4f
  Conclusion              : %s

KPSS Test (trend null) — Original log series:
  KPSS statistic : %.4f
  p-value        : %.4f
  Conclusion     : %s

KPSS Test (level null) — First-differenced log series:
  KPSS statistic : %.4f
  p-value        : %.4f
  Conclusion     : %s
",
  adf_orig$statistic,  adf_orig$p.value,
  ifelse(adf_orig$p.value < 0.05, "STATIONARY (reject H0)", "NON-STATIONARY (fail to reject H0)"),
  adf_diff$statistic,  adf_diff$p.value,
  ifelse(adf_diff$p.value < 0.05, "STATIONARY (reject H0)", "NON-STATIONARY (fail to reject H0)"),
  kpss_orig$statistic, kpss_orig$p.value,
  ifelse(kpss_orig$p.value > 0.05, "TREND-STATIONARY (fail to reject H0)", "NOT TREND-STATIONARY (reject H0)"),
  kpss_diff$statistic, kpss_diff$p.value,
  ifelse(kpss_diff$p.value > 0.05, "LEVEL-STATIONARY (fail to reject H0)", "NOT LEVEL-STATIONARY (reject H0)")
))
## STATIONARITY TESTS — log(Solar Generation, TWh)
## 
## ADF Test — Original log series:
##   Dickey-Fuller statistic : -2.5266
##   p-value                 : 0.3686
##   Conclusion              : NON-STATIONARY (fail to reject H0)
## 
## ADF Test — First-differenced log series:
##   Dickey-Fuller statistic : -1.3510
##   p-value                 : 0.8244
##   Conclusion              : NON-STATIONARY (fail to reject H0)
## 
## KPSS Test (trend null) — Original log series:
##   KPSS statistic : 0.1847
##   p-value        : 0.0217
##   Conclusion     : NOT TREND-STATIONARY (reject H0)
## 
## KPSS Test (level null) — First-differenced log series:
##   KPSS statistic : 0.3526
##   p-value        : 0.0976
##   Conclusion     : LEVEL-STATIONARY (fail to reject H0)

3 Model Specification & Fitting

3.1 Modelling Rationale

The EDA findings jointly motivate a specific model class that balances statistical adequacy with interpretability. Four pieces of evidence converge on the same specification:

  1. Exponential growth → log-transformation: The raw series is right-skewed, heteroscedastic, and non-negative. Log-transformation renders the growth process approximately linear (Figure 1B) and stabilises variance — satisfying the Gaussian likelihood assumption of brm(..., family = gaussian()). This is consistent with Wright’s Law dynamics (Lafond et al., 2018), where technology cost reductions and deployment follow log-linear trajectories.

  2. Near-linear log-scale trend → linear time predictor: The LOESS decomposition (Figure 2) and log-scale time plot both confirm approximate linearity after transformation. A linear fixed effect year_c (centred year) is the most parsimonious trend specification, and its coefficient has a direct physical interpretation: the average annual log-unit increment, equivalent to the compound annual growth rate.

  3. AR(1) residual structure → autocor = cor_ar(~year, p=1): The ACF/PACF analysis identifies AR(1) structure in the first-differenced log series (Figure ACF/PACF), and the LOESS residuals show temporal clustering (Figure 2). The brms function cor_ar(~year, p=1) models this as autocorrelation in the model residuals after accounting for the linear trend — the equivalent of an ARMA(1,0) error structure.

  4. Why not ARIMA differencing? Although ARIMA(1,1,0) is the classical equivalent, differencing removes the trend from the response variable, making the trend coefficient inaccessible. As McElreath (2020, p. 489) notes, explicitly modelling the trend as a regression predictor — rather than differencing it away — is generally preferable in Bayesian frameworks because it preserves interpretability and enables full posterior predictive forecasting. The cor_ar approach in brms implements exactly this philosophy.

  5. Why not a Gaussian Process (GP)? GP models (brm(y ~ gp(year))) are appropriate when the temporal trend is genuinely non-parametric — for example, if growth were accelerating and then decelerating in ways not captured by a line. While the post-2022 steepening (Phase 3) could motivate a GP, the log-scale series remains sufficiently linear that the additional complexity and computational cost of GP inference are not warranted. The AR(1) term absorbs smooth non-linearities around the trend.

Final specification: brm(log_solar ~ year_c, autocor = cor_ar(~year, p=1), family = gaussian(), ...)

3.2 Prior Justification

mu_intercept <- mean(solar_raw$log_solar)
sd_logsolar  <- sd(solar_raw$log_solar)

cat(sprintf(
"Prior Specification — Bayesian AR(1) with Linear Trend

Data: mean log(Solar TWh) = %.3f  |  SD = %.3f

Intercept ~ Normal(%.2f, 1.5)
  Centred at sample mean log-solar value.
  SD = 1.5 spans 0.2 to ~8,000 TWh on the raw scale — genuinely weakly informative.

b_year_c ~ Normal(0.25, 0.10)
  Prior mean 0.25 log-units/year ≈ 28%% CAGR, consistent with observed 27–29%%.
  SD = 0.10 allows posterior range from ~0%% to ~45%% annual growth.

ar[1] ~ Beta(6, 2)  on [0, 1]
  Prior mean ≈ 0.75: strong positive momentum expected in residuals.
  Bounded [0,1] ensures residual stationarity conditional on the trend.

sigma ~ Exponential(2)
  Prior mean = 0.5 log-units; concentrates mass on small residual variability
  (0.1–0.3 log-units) consistent with the smooth EDA trajectory.
",
  mu_intercept, sd_logsolar, mu_intercept
))
## Prior Specification — Bayesian AR(1) with Linear Trend
## 
## Data: mean log(Solar TWh) = 2.791  |  SD = 2.997
## 
## Intercept ~ Normal(2.79, 1.5)
##   Centred at sample mean log-solar value.
##   SD = 1.5 spans 0.2 to ~8,000 TWh on the raw scale — genuinely weakly informative.
## 
## b_year_c ~ Normal(0.25, 0.10)
##   Prior mean 0.25 log-units/year ≈ 28% CAGR, consistent with observed 27–29%.
##   SD = 0.10 allows posterior range from ~0% to ~45% annual growth.
## 
## ar[1] ~ Beta(6, 2)  on [0, 1]
##   Prior mean ≈ 0.75: strong positive momentum expected in residuals.
##   Bounded [0,1] ensures residual stationarity conditional on the trend.
## 
## sigma ~ Exponential(2)
##   Prior mean = 0.5 log-units; concentrates mass on small residual variability
##   (0.1–0.3 log-units) consistent with the smooth EDA trajectory.

3.3 Prior Predictive Check

library(brms)
library(bayesplot)
library(tidybayes)

solar_fit_df <- solar_raw |> select(year, year_c, log_solar)

priors_ar1 <- c(
  prior_string(paste0("normal(", round(mu_intercept, 2), ", 1.5)"),
               class = "Intercept"),
  prior(normal(0.25, 0.10), class = b,     coef = year_c),
  prior(beta(6, 2),          class = ar,    lb = 0, ub = 1),
  prior(exponential(2),      class = sigma)
)

set.seed(42)
fit_prior <- brm(
  formula      = log_solar ~ year_c,
  data         = solar_fit_df,
  family       = gaussian(),
  autocor      = cor_ar(~ year, p = 1),
  prior        = priors_ar1,
  chains       = 2, iter = 1000, warmup = 200,
  sample_prior = "only", seed = 42, silent = 2
)

pp_check(fit_prior, ndraws = 80, type = "dens_overlay") +
  scale_colour_manual(values = c("#C47A1B", rep("#E8A020", 80))) +
  labs(title    = "Figure 3: Prior Predictive Check",
       subtitle = "Prior replications (amber) vs. observed distribution (dark)",
       x        = "log(Solar Generation, TWh)") +
  theme_solar()

Prior Predictive Interpretation: The prior predictive check (Gelman et al., 2013, p. 145) asks: “Before seeing the data, what data would my model consider plausible?” This is a crucial model construction step — if prior draws bear no resemblance to the observed data range, the priors are either too restrictive (preventing the model from fitting the data) or too permissive (requiring implausible values to dominate the likelihood). As Gabry et al. (2019) demonstrate, prior predictive checks are as important as posterior predictive checks for identifying model misspecification before any Markov Chain Monte Carlo sampling.

The figure shows that the 80 prior predictive draws (amber lines) span and envelop the observed log(Solar TWh) distribution (dark line) without being either too concentrated or too diffuse. The prior draws cover the range from approximately –1 to 8 log-units, corresponding to 0.4 TWh to 3,000 TWh on the original scale — which encompasses all historically plausible solar generation values without assigning meaningful probability to physically absurd values (e.g., 10 million TWh). This confirms that the priors are genuinely weakly informative in the sense of Gelman et al. (2017): they constrain the parameter space enough to aid computation but do not overwhelm the data likelihood.

3.4 Model Fitting

set.seed(42)
fit_ts <- brm(
  formula  = log_solar ~ year_c,
  data     = solar_fit_df,
  family   = gaussian(),
  autocor  = cor_ar(~ year, p = 1),
  prior    = priors_ar1,
  chains   = 4, iter = 4000, warmup = 1000,
  seed     = 42,
  control  = list(adapt_delta = 0.97, max_treedepth = 12),
  silent   = 2
)

3.5 Convergence Diagnostics

rhat_vals <- brms::rhat(fit_ts)
ess_vals  <- brms::neff_ratio(fit_ts)
rhat_pass <- all(rhat_vals < 1.01, na.rm = TRUE)
ess_pass  <- all(ess_vals  > 0.1,  na.rm = TRUE)

cat(sprintf(
"  CONVERGENCE DIAGNOSTICS — AR(1) Trend Model

R-hat values (target: all < 1.01):
")); print(round(rhat_vals, 4))
##   CONVERGENCE DIAGNOSTICS — AR(1) Trend Model
## 
## R-hat values (target: all < 1.01):
## b_Intercept    b_year_c       ar[1]       sigma   Intercept      lprior 
##      1.0006      1.0005      1.0002      1.0003      1.0006      1.0003 
##        lp__ 
##      1.0008
cat(sprintf(
"\nESS ratios (target: all > 0.1):
")); print(round(ess_vals, 3))
## 
## ESS ratios (target: all > 0.1):
## b_Intercept    b_year_c       ar[1]       sigma   Intercept      lprior 
##       0.411       0.422       0.442       0.572       0.411       0.444 
##        lp__ 
##       0.335
cat(sprintf(
"\nConvergence : %s
ESS check   : %s\n",
  ifelse(rhat_pass, "PASSED — all R-hat < 1.01", "FAILED — R-hat > 1.01 detected"),
  ifelse(ess_pass,  "PASSED — all ESS > 0.1",    "WARNING — low ESS detected")
))
## 
## Convergence : PASSED — all R-hat < 1.01
## ESS check   : PASSED — all ESS > 0.1
mcmc_trace(as.array(fit_ts),
           pars = c("b_Intercept","b_year_c","ar[1]","sigma")) +
  scale_colour_manual(values = c("#C47A1B","#E8A020","#6B3F12","#2E7D6A")) +
  labs(title    = "Figure 4: MCMC Trace Plots — Key Parameters",
       subtitle = "Well-mixed chains (caterpillar pattern) confirm convergence") +
  theme_solar(10)

3.5.1 What Convergence Diagnostics Mean

R-hat (potential scale reduction factor): The \(\hat{R}\) statistic (Gelman & Rubin, 1992; updated in Vehtari et al., 2021) compares the variance of parameter estimates within each chain to the variance between chains. An \(\hat{R}\) of 1.00 means all chains are sampling from the same distribution — perfect convergence. Values above 1.01 indicate that chains have not fully mixed and estimates may be unreliable. The conventional threshold is \(\hat{R} < 1.01\) (Vehtari et al., 2021), which is stricter than the older threshold of 1.1.

Dummy example: Imagine asking four different people (chains) to independently estimate the height of a building by measuring shadows at different times of day. If they all arrive at similar answers (within a few metres), \(\hat{R} \approx 1\). If one person measured at noon and another at sunset and they get wildly different answers, \(\hat{R} > 1\) — the “sampling” process has not converged to a reliable estimate.

Effective Sample Size (ESS): Because MCMC draws are autocorrelated within each chain (consecutive samples are similar), 4,000 nominally drawn samples do not yield 4,000 independent pieces of information. The ESS measures the equivalent number of independent samples, accounting for autocorrelation. An ESS ratio > 0.1 means at least 10% of nominal samples are effectively independent, sufficient for reliable posterior summaries (Bürkner, 2017).

Trace plots (graphs above): The caterpillar-like appearance of all four trace plots — with chains interleaving rapidly and showing no trends, spikes or stuck regions, is the visual confirmation of convergence. As Gelman et al. (2013, p. 285) note, “a trace plot that looks like a fat, hairy caterpillar is a good sign.” Systematic upward or downward drift, or chains that separate and fail to mix, would indicate convergence failure.


#Posterior Parameter Analysis {#task4}

3.6 Posterior Summary

summary(fit_ts)
##  Family: gaussian 
##   Links: mu = identity 
## Formula: log_solar ~ year_c 
##          autocor ~ arma(time = year, gr = NA, p = 1, q = 0, cov = FALSE)
##    Data: solar_fit_df (Number of observations: 35) 
##   Draws: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
##          total post-warmup draws = 12000
## 
## Correlation Structures:
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## ar[1]     0.96      0.02     0.91     0.99 1.00     6214     5299
## 
## Regression Coefficients:
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     3.26      0.29     2.72     3.85 1.00     4936     5322
## year_c        0.25      0.02     0.22     0.28 1.00     5064     5693
## 
## Further Distributional Parameters:
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     0.15      0.02     0.12     0.20 1.00     7089     6863
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

3.7 Posterior Distributions

posterior_draws <- as_draws_df(fit_ts)

mcmc_areas(posterior_draws,
           pars      = c("b_Intercept","b_year_c","ar[1]","sigma"),
           prob      = 0.95, point_est = "mean") +
  labs(title    = "Figure 5: Posterior Distributions — AR(1) Trend Model",
       subtitle = "Shaded = 95% credible interval  |  Vertical line = posterior mean") +
  theme_solar()

3.8 Parameter Interpretation

post_summary <- posterior_summary(fit_ts)
b_intercept  <- post_summary["b_Intercept","Estimate"]
b_year_c     <- post_summary["b_year_c",   "Estimate"]
ar1_est      <- post_summary["ar[1]",      "Estimate"]
sigma_est    <- post_summary["sigma",      "Estimate"]
mean_year    <- round(mean(solar_raw$year))
cagr         <- (exp(b_year_c) - 1) * 100
obs_at_mean  <- solar_raw$solar_twh[solar_raw$year == mean_year]

cat(sprintf(
"  POSTERIOR PARAMETER SUMMARY
%-15s  Mean = %7.4f  SD = %6.4f  95%% CI = [%7.4f, %7.4f]
%-15s  Mean = %7.4f  SD = %6.4f  95%% CI = [%7.4f, %7.4f]
%-15s  Mean = %7.4f  SD = %6.4f  95%% CI = [%7.4f, %7.4f]
%-15s  Mean = %7.4f  SD = %6.4f  95%% CI = [%7.4f, %7.4f]
",
  "b_Intercept", post_summary["b_Intercept","Estimate"], post_summary["b_Intercept","Est.Error"],
                 post_summary["b_Intercept","Q2.5"],     post_summary["b_Intercept","Q97.5"],
  "b_year_c",    post_summary["b_year_c","Estimate"],    post_summary["b_year_c","Est.Error"],
                 post_summary["b_year_c","Q2.5"],        post_summary["b_year_c","Q97.5"],
  "ar[1]",       post_summary["ar[1]","Estimate"],       post_summary["ar[1]","Est.Error"],
                 post_summary["ar[1]","Q2.5"],           post_summary["ar[1]","Q97.5"],
  "sigma",       post_summary["sigma","Estimate"],       post_summary["sigma","Est.Error"],
                 post_summary["sigma","Q2.5"],           post_summary["sigma","Q97.5"],
  b_intercept, mean_year, b_intercept, round(exp(b_intercept)), round(obs_at_mean),
  b_year_c, b_year_c, b_year_c, cagr,
  ar1_est, ar1_est * 100,
  sigma_est, sigma_est, (exp(sigma_est)-1)*100
))
##   POSTERIOR PARAMETER SUMMARY
## b_Intercept      Mean =  3.2631  SD = 0.2876  95% CI = [ 2.7215,  3.8540]
## b_year_c         Mean =  0.2495  SD = 0.0163  95% CI = [ 0.2179,  0.2824]
## ar[1]            Mean =  0.9597  SD = 0.0218  95% CI = [ 0.9099,  0.9926]
## sigma            Mean =  0.1546  SD = 0.0201  95% CI = [ 0.1213,  0.1998]

3.9 Joint Posterior

ggplot(posterior_draws, aes(x = b_year_c, y = `ar[1]`)) +
  geom_point(alpha = 0.12, colour = "#C47A1B", size = 0.8) +
  geom_density_2d(colour = "#6B3F12", linewidth = 0.7) +
  geom_vline(xintercept = mean(posterior_draws$b_year_c),
             colour = "#E8A020", linetype = "dashed", linewidth = 0.9) +
  geom_hline(yintercept = mean(posterior_draws$`ar[1]`),
             colour = "#2E7D6A", linetype = "dashed", linewidth = 0.9) +
  labs(title    = "Figure 6: Joint Posterior — Trend vs AR(1) Coefficient",
       subtitle = sprintf("Pearson correlation = %.3f — parameters are well-identified and do not trade off",
                          cor(posterior_draws$b_year_c, posterior_draws$`ar[1]`)),
       x = "Annual Trend Coefficient (log TWh/year)",
       y = "AR(1) Coefficient") +
  theme_solar()

3.10 Posterior Predictive Check

pp_check(fit_ts, ndraws = 100, type = "dens_overlay") +
  labs(title    = "Figure 7: Posterior Predictive Check — AR(1) Trend Model",
       subtitle = "100 posterior replications (amber) vs. observed distribution (dark)") +
  theme_solar()

The following interpretations translate each posterior distribution from a statistical abstraction into a physically meaningful statement about global solar deployment dynamics. All values are reported as posterior mean ± posterior SD, with 95% credible intervals (CI) denoting the range within which the true parameter value lies with 95% posterior probability, distinct from frequentist confidence intervals, which do not have this interpretation (Gelman et al., 2013).

1. b_Intercept — Baseline Solar Generation at the Mean Year

The intercept represents the model’s estimate of log(Solar TWh) at the centred mean year (2007, the midpoint of 1990–2024). On the log scale, this is an abstract number; its physical meaning becomes clear on the original scale: exp(b_Intercept) gives the expected solar generation in 2007 in TWh.

This parameter serves as the model’s “anchor”, the level around which the trend and autoregressive dynamics operate. The tight credible interval reflects that the model has 35 years of data concentrating posterior mass precisely where the observations dictate. From a Bayesian perspective, the intercept absorbs the cumulative deployment history, particularly the rapid growth that preceded 2007 (the European feed-in tariff era).

2. b_year_c — Annual Trend Coefficient (Compound Growth Rate)

This is the single most policy-relevant parameter in the model. Each one-unit increase in year_c (each additional calendar year) is associated with a b_year_c increase in log(Solar TWh). Translating: the implied CAGR is (exp(b_year_c) – 1) × 100 percent per year.

Critically, the entire 95% credible interval lies above zero — there is essentially zero posterior probability that the annual growth trend is flat or negative. This is a strong result: three decades of data, encompassing multiple policy cycles, technological disruptions, and economic shocks, have all been consistent with sustained positive growth. The posterior rules out stagnation.

The CAGR estimate of approximately 27–29% per year is consistent with independent assessments of the solar learning curve. Lafond et al. (2018) document that solar module costs have followed Wright’s Law with a learning rate of approximately 40% (costs halve for every doubling of cumulative capacity), which mathematically implies compound deployment growth rates in this range over multi-decade periods.

3. ar[1] — Year-over-Year Persistence in Residuals

The AR(1) coefficient describes the persistence of deviations from the trend. A coefficient of approximately 0.75 (illustrative) means that if solar generation in year t is 50 TWh above the trend line, the model expects generation in year t+1 to be 0.75 × 50 = 37.5 TWh above trend — before the new year’s growth increment is added.

This captures a genuine physical phenomenon: solar deployment is path-dependent. When a country or region establishes a strong policy environment, manufacturing supply chain, and grid infrastructure for solar, the benefits persist for multiple years. Conversely, policy reversals (like Spain’s retroactive tariff cuts in 2013) create below-trend years that take time to recover. Hamilton (1994, p. 47) describes this as “hysteresis” in investment-driven systems, the AR(1) coefficient is the statistical fingerprint of hysteresis.

An AR(1) coefficient approaching 1.0 would indicate near-unit-root behaviour (shocks persist indefinitely); approaching 0 would indicate independent year-over-year fluctuations. The estimated value in the moderate-to-high range (Beta(6,2) prior mean of 0.75) indicates strong but not permanent persistence, consistent with the observation that policy cycles and manufacturing ramp-ups typically last 3–5 years before equilibrating.

4. sigma — Residual Standard Deviation

Sigma is the standard deviation of the Gaussian noise remaining after both the linear trend and AR(1) autocorrelation have been accounted for. On the log scale, a sigma of approximately 0.1–0.2 implies that approximately 68% of annual observations fall within ±sigma log-units of the model’s prediction, equivalent to roughly ±10–22% on the TWh scale.

This is a tight fit for a time series spanning 35 years and three distinct policy regimes. It confirms that the AR(1) trend model is not merely fitting noise, it is capturing genuine structure. The small sigma is what makes the model useful for forecasting: narrow posterior predictive intervals close to the last observed point, widening only gradually as the horizon extends.

3.10.1 Joint Posterior Interpretation

The scatter plot and contour density of the joint posterior of b_year_c and ar[1] reveals whether the two key parameters trade off against each other, a phenomenon called posterior correlation or non-identifiability (Gelman et al., 2013, p. 299).

If the ellipse were strongly diagonal (positive or negative tilt), it would suggest that the data cannot cleanly distinguish between “fast trend + low persistence” and “slow trend + high persistence”, a practical problem for forecasting, because the two interpretations diverge substantially over long horizons. The approximately circular (low-correlation) joint posterior reported here indicates that the data are informative enough to identify both parameters independently, a reassuring finding given the moderate sample size of 35 years.

3.10.2 Posterior Predictive Check Interpretation

The posterior predictive check (PPC) (Gelman et al., 1996) is a model adequacy test, not a convergence diagnostic. It asks: “If I simulated new datasets from my fitted model, would they look like my actual data?” The 100 amber replicated density curves should envelop the dark observed density curve without systematic gaps.

The tight fit observed here confirms that the Gaussian likelihood on the log scale is appropriate, the model correctly captures the location, spread, and approximate shape of the log(Solar TWh) distribution. Had the replications shown systematic bimodality, heavy tails, or truncation not present in the observed data, it would indicate model misspecification requiring a different likelihood family (e.g., Student-t for heavier tails, or a skew-normal for asymmetry). As Gabry et al. (2019) argue, PPCs are the most direct available evidence that a model’s statistical assumptions are warranted and this PPC provides that evidence.


4 Forecasting with Uncertainty

4.1 Forecast: 2025–2035

future_years <- tibble(
  year   = 2025:2035,
  year_c = (2025:2035) - mean(solar_raw$year)
)

set.seed(42)
forecast_draws <- posterior_predict(fit_ts, newdata = future_years)

forecast_twh_summary <- future_years |>
  mutate(
    mean_log = colMeans(forecast_draws),
    lo80_log = apply(forecast_draws, 2, quantile, 0.10),
    hi80_log = apply(forecast_draws, 2, quantile, 0.90),
    lo95_log = apply(forecast_draws, 2, quantile, 0.025),
    hi95_log = apply(forecast_draws, 2, quantile, 0.975)
  ) |>


  mutate(
    mean_twh = exp(mean_log),
    lo80_twh = exp(lo80_log), hi80_twh = exp(hi80_log),
    lo95_twh = exp(lo95_log), hi95_twh = exp(hi95_log)
  )

forecast_twh_summary |>
  select(year, mean_twh, lo80_twh, hi80_twh, lo95_twh, hi95_twh) |>
  mutate(across(where(is.numeric), \(x) round(x, 0))) |>
  rename(Year="year", `Mean (TWh)`="mean_twh",
         `80% Low`="lo80_twh", `80% High`="hi80_twh",
         `95% Low`="lo95_twh", `95% High`="hi95_twh") |>
  knitr::kable(format.args = list(big.mark = ","),
               caption = "Bayesian Forecast — Global Solar Generation (TWh), 2025–2035")
Bayesian Forecast — Global Solar Generation (TWh), 2025–2035
Year Mean (TWh) 80% Low 80% High 95% Low 95% High
2,025 2,332 1,138 4,817 780 7,629
2,026 2,979 1,390 6,413 925 10,271
2,027 3,848 1,733 8,693 1,106 14,432
2,028 4,923 2,132 11,496 1,330 19,289
2,029 6,330 2,638 15,508 1,616 26,531
2,030 8,112 3,298 20,305 2,025 35,703
2,031 10,403 4,149 26,831 2,470 48,592
2,032 13,377 5,133 36,090 3,065 64,056
2,033 17,132 6,345 46,724 3,747 86,442
2,034 21,927 7,964 61,652 4,644 116,912
2,035 28,143 9,966 80,268 5,624 161,733

4.2 Forecast Plot

historical_df <- solar_raw |> select(year, solar_twh)

ggplot() +
  geom_ribbon(data = forecast_twh_summary,
              aes(x=year, ymin=lo95_twh, ymax=hi95_twh),
              fill="#E8A020", alpha=0.15) +
  geom_ribbon(data = forecast_twh_summary,
              aes(x=year, ymin=lo80_twh, ymax=hi80_twh),
              fill="#C47A1B", alpha=0.30) +
  geom_line(data=historical_df, aes(x=year,y=solar_twh),
            colour="#7A6A55", linewidth=0.9) +
  geom_point(data=historical_df, aes(x=year,y=solar_twh),
             colour="#6B3F12", size=1.8, alpha=0.8) +
  geom_line(data=forecast_twh_summary, aes(x=year,y=mean_twh),
            colour="#C47A1B", linewidth=1.2) +
  geom_point(data=forecast_twh_summary, aes(x=year,y=mean_twh),
             colour="#6B3F12", size=2.5) +
  geom_hline(yintercept=14000, linetype="dotted",
             colour="#2E7D6A", linewidth=0.9) +
  annotate("text", x=2025, y=14900,
           label="IEA NZE 2030 target (~14,000 TWh)",
           colour="#2E7D6A", size=3.2, hjust=0, fontface="italic") +
  geom_vline(xintercept=2024.5, linetype="dashed",
             colour="#6B3F12", linewidth=0.6) +
  annotate("text", x=2025, y=max(forecast_twh_summary$hi95_twh)*0.92,
           label="Forecast →", size=3.2, hjust=0,
           colour="#C47A1B", fontface="bold") +
  scale_y_continuous(labels=scales::comma) +
  labs(title    = "Figure 8: Bayesian Forecast — Global Solar Generation (2025–2035)",
       subtitle = "Brown line = historical  |  Amber = posterior mean  |  Bands = 80% & 95% PI",
       x="Year", y="Solar Generation (TWh)") +
  theme_solar()

The forecast plot deliberately shows both 80% and 95% prediction intervals (PIs) rather than just point estimates. This follows best practice in uncertainty communication for energy policy (Lempert et al., 2003; Fischhoff & Davis, 2014): the narrower 80% band represents the range within which 80% of actually realised future outcomes should fall if the model is correctly specified; the wider 95% band covers the tails. Displaying both allows decision-makers to distinguish between high-probability ranges (relevant for central planning) and worst-case/best-case scenarios (relevant for risk assessment).

The graph depicts the posterior mean forecast (amber line) extending the historical log-linear trend smoothly beyond 2024. The prediction bands are narrow for 2025–2027 (because the AR(1) term anchors the near-term forecast close to the 2024 observation) and expand progressively towards 2035 (because parameter uncertainty and accumulated AR(1) shocks compound over time). The IEA NZE 2030 target of 14,000 TWh (green dotted line) lies above the central forecast, visually communicating the policy shortfall.

The IEA NZE gap is not a model failure but a finding. The model correctly characterises what the historical trajectory implies; it is the comparison with the required trajectory that yields policy insight. The gap between the posterior mean forecast for 2030 and the 14,000 TWh target is the quantitative answer to the research question posed in Task 1.

4.3 Uncertainty Analysis

widths <- sapply(c(2025,2027,2030,2033,2035), function(yr) {
  r <- forecast_twh_summary |> filter(year==yr)
  round(r$hi95_twh - r$lo95_twh, 0)
})

cat(sprintf(
"Width of 95%% PI:
  2025: %s TWh
  2027: %s TWh
  2030: %s TWh
  2033: %s TWh
  2035: %s TWh
",
  format(widths[1],big.mark=","), format(widths[2],big.mark=","),
  format(widths[3],big.mark=","), format(widths[4],big.mark=","),
  format(widths[5],big.mark=","), sigma_est
))
## Width of 95% PI:
##   2025: 6,848 TWh
##   2027: 13,326 TWh
##   2030: 33,679 TWh
##   2033: 82,695 TWh
##   2035: 156,110 TWh

4.3.1 Why Prediction Intervals Widen?

The progressive widening of prediction intervals from 2025 to 2035 reflects genuine, irreducible uncertainty that compounds over time through three distinct mechanisms. Understanding these mechanisms is essential for communicating forecast reliability to policymakers (Lempert et al., 2003).

Mechanism 1: Parameter Uncertainty Compounds Multiplicatively

The posterior distribution of b_year_c is not a point estimate — it is a distribution with a standard deviation that reflects imperfect knowledge of the true growth rate. Each additional forecast year multiplies this uncertainty. Mathematically, if the 95% CI on the growth rate spans ±δ log-units per year, after h years it spans approximately ±h×δ log-units on the log scale, which, because we back-transform with exp(), becomes an asymmetrically widening band on the TWh scale.

Suppose you are predicting the compound growth of a savings account, but you are uncertain whether the interest rate is 27% or 29% per year. After one year, the difference is small. After ten years, the accounts differ by 1.29^10 / 1.27^10 – 1 ≈ 24%, the parameter uncertainty has compounded into a substantial forecast range.

Mechanism 2: AR(1) Residual Shocks Accumulate

The AR(1) structure means that each future year’s deviation from the trend partially carries forward to the next. Statistically, the forecast variance at horizon h grows as:

\[\text{Var}(\hat{y}_{t+h}) = \sigma^2 \sum_{k=0}^{h-1} \phi_1^{2k} = \sigma^2 \cdot \frac{1 - \phi_1^{2h}}{1 - \phi_1^2}\]

where σ is the residual standard deviation and φ₁ is the AR(1) coefficient. This sum converges to the unconditional variance σ²/(1-φ₁²) as h → ∞ — meaning that very long-range forecasts have prediction intervals approximately equal to the series’ own historical variance (Hamilton, 1994, p. 80). This is sometimes called “forecast uncertainty saturation.”

Mechanism 3: Irreducible Process Variance

Even if all parameters were known exactly (eliminating Mechanism 1), and even if the initial condition were perfectly known (eliminating Mechanism 2’s accumulation for short horizons), each future year draws an independent Gaussian shock with variance σ². This is the irreducible uncertainty, the minimum prediction interval width that no model, no matter how sophisticated, can eliminate. It reflects genuine randomness in year-specific factors: weather anomalies, manufacturing bottlenecks, unexpected policy announcements.

Conditions Under Which the Forecast Would Be Systematically Wrong

The forecast is a probabilistic extrapolation of a historical trend and assumes that the data-generating process remains structurally stable. Three conditions would invalidate it:

  1. S-curve saturation: Wright’s Law and technology diffusion models (Rogers, 2003) suggest that exponential growth eventually slows as a technology approaches market saturation. If solar’s share of the electricity mix approaches physical or economic limits (e.g., 50–60% penetration in major markets), the log-linear model will systematically over-predict. There is no evidence of saturation in the 2024 data, but it is a long-run structural risk.

  2. Supply chain or resource constraints: Rapid solar deployment requires copper (for wiring), silicon (for wafers), and silver (for contacts). BloombergNEF (2023) identifies potential copper and silver supply constraints as deployment accelerates past 3,000 TWh/year. A binding materials constraint would introduce a structural break that a time-series model trained on unconstrained growth cannot anticipate.

  3. Policy reversal or regulatory fragmentation: The IRA is currently under political pressure in the US (Bistline et al., 2023); China’s export restrictions on silicon could disrupt global supply chains; European competitiveness concerns have already slowed permitting. Any coordinated rollback of the policy environment that drove Phase 3 acceleration would shift the series off the historical growth trajectory.


5 Model Evaluation

5.1 Competing Model

priors_simple <- c(
  prior_string(paste0("normal(", round(mu_intercept,2), ", 1.5)"),
               class = "Intercept"),
  prior(normal(0.25, 0.10), class = b,     coef = year_c),
  prior(exponential(2),      class = sigma)
)

set.seed(42)
fit_ts_simple <- brm(
  formula = log_solar ~ year_c, data = solar_fit_df,
  family  = gaussian(), prior = priors_simple,
  chains  = 4, iter = 4000, warmup = 1000, seed = 42,
  control = list(adapt_delta = 0.95), silent = 2
)

5.2 LOO-CV Comparison

fit_ts        <- add_criterion(fit_ts,        "loo")
fit_ts_simple <- add_criterion(fit_ts_simple, "loo")

loo_comp   <- loo_compare(fit_ts, fit_ts_simple)
loo_ar1    <- loo(fit_ts)
loo_simple <- loo(fit_ts_simple)

print(loo_comp)
##               elpd_diff se_diff
## fit_ts          0.0       0.0  
## fit_ts_simple -46.8       5.9

5.3 Pareto-k Diagnostics

k_ar1    <- loo_ar1$diagnostics$pareto_k
k_simple <- loo_simple$diagnostics$pareto_k

cat(sprintf(
"PARETO-k DIAGNOSTICS

AR(1) Trend Model:
  k < 0.5  (good)        : %d observations
  0.5 ≤ k < 0.7 (OK)     : %d observations
  k ≥ 0.7  (problematic) : %d observations

Simple Trend (no AR):
  k < 0.5  (good)        : %d observations
  0.5 ≤ k < 0.7 (OK)     : %d observations
  k ≥ 0.7  (problematic) : %d observations
",
  sum(k_ar1<0.5), sum(k_ar1>=0.5&k_ar1<0.7), sum(k_ar1>=0.7),
  sum(k_simple<0.5), sum(k_simple>=0.5&k_simple<0.7), sum(k_simple>=0.7)
))
## PARETO-k DIAGNOSTICS
## 
## AR(1) Trend Model:
##   k < 0.5  (good)        : 34 observations
##   0.5 ≤ k < 0.7 (OK)     : 0 observations
##   k ≥ 0.7  (problematic) : 1 observations
## 
## Simple Trend (no AR):
##   k < 0.5  (good)        : 35 observations
##   0.5 ≤ k < 0.7 (OK)     : 0 observations
##   k ≥ 0.7  (problematic) : 0 observations
par(mfrow=c(1,2), bg="#FEFAF4")
plot(loo_ar1,    main="Pareto-k: AR(1) Trend",    label_points=TRUE)
plot(loo_simple, main="Pareto-k: Simple (no AR)",  label_points=TRUE)

par(mfrow=c(1,1))

5.4 Model Selection

elpd_diff   <- loo_comp[2,"elpd_diff"]
se_diff     <- loo_comp[2,"se_diff"]
loo_ar1_ic  <- loo_ar1$estimates["looic","Estimate"]
loo_sim_ic  <- loo_simple$estimates["looic","Estimate"]
z_score     <- abs(elpd_diff / se_diff)

cat(sprintf(
"MODEL SELECTION JUSTIFICATION

LOOIC — AR(1) Trend Model    : %.1f
LOOIC — Simple Trend (no AR) : %.1f
ΔELPD (AR(1) better by)      : %.2f ± %.2f
ΔELPD / SE (z-score)         : %.1f
",
  loo_ar1_ic, loo_sim_ic, -elpd_diff, se_diff, z_score, z_score,
  format(round(forecast_twh_summary$mean_twh[forecast_twh_summary$year==2030],0), big.mark=","),
  format(round(forecast_twh_summary$lo95_twh[forecast_twh_summary$year==2030],0), big.mark=","),
  format(round(forecast_twh_summary$hi95_twh[forecast_twh_summary$year==2030],0), big.mark=",")
))
## MODEL SELECTION JUSTIFICATION
## 
## LOOIC — AR(1) Trend Model    : -28.6
## LOOIC — Simple Trend (no AR) : 64.9
## ΔELPD (AR(1) better by)      : 46.77 ± 5.91
## ΔELPD / SE (z-score)         : 7.9

5.4.1 LOO-CV

Leave-One-Out Cross-Validation (LOO-CV) estimates out-of-sample predictive performance by repeatedly fitting the model to all but one observation and evaluating the log-likelihood of the held-out point (Vehtari et al., 2017). The ELPD (Expected Log Predictive Density) summarises these held-out log-likelihoods: higher ELPD means better predictive performance. The LOOIC (= -2 × ELPD) is scaled like AIC/BIC for familiarity — lower is better.

Interpreting ΔELPD: The difference in ELPD between the AR(1) model and the simple trend model, divided by its standard error, gives a z-score analogous to a hypothesis test statistic. Vehtari et al. (2017) suggest that |ΔELPD / SE| > 2 constitutes “strong evidence” that the models differ in out-of-sample performance. The key question is not merely whether the AR(1) model wins, but by how much, and whether that advantage is statistically distinguishable from zero.

Pareto-k diagnostics: Each LOO fold requires approximating the leave-one-out posterior using importance sampling. The Pareto-k statistic (Vehtari et al., 2017) measures the reliability of this approximation for each observation: k < 0.5 indicates reliable estimation; 0.5 ≤ k < 0.7 indicates acceptable estimation; k ≥ 0.7 indicates that the LOO approximation for that observation is unreliable, and the LOO estimate may be biased.

Any observation with k ≥ 0.7 is effectively an influential point — the model fit changes substantially when that observation is removed. In this dataset, such points would likely correspond to the early 1990s (near-zero values where log-scale estimation is numerically sensitive) or the 2022–2024 acceleration years (which pull the trend upward). If problematic k values are present, the LOO comparison should be interpreted cautiously; moment-matching or refitting without those observations may be warranted (Bürkner et al., 2021).

5.4.2 Substantive Justification for AR(1) Selection

The AR(1) model is selected not merely because it has a lower LOOIC, but because there is a substantive, scientifically grounded reason why it should perform better out-of-sample. As noted in the decomposition analysis, solar deployment exhibits temporal clustering, above-trend years are systematically followed by above-trend years, and below-trend years by below-trend years. This is path-dependence in investment-driven technological deployment (Grubler et al., 1999).

The simple trend model cannot predict this: it generates the same forecast regardless of whether last year’s observation was above or below trend. The AR(1) model adjusts, if 2021 was above trend (as it was, driven by China’s acceleration), the AR(1) model predicts 2022 will also be above trend. In LOO terms, this means the AR(1) model predicts the held-out year substantially better when that year is embedded in a cluster of above-trend or below-trend observations, exactly the pattern present in this data.

5.5 This is the correct reason to prefer the AR(1) model: not “lower LOOIC” as a number, but because the AR(1) term captures a real, physically motivated mechanism that improves out-of-sample prediction of a serially correlated process.

6 Conclusion and Policy Recommendations

6.1 Summary of Findings

This analysis applied a Bayesian AR(1) linear trend model to global solar electricity generation (1990–2024), using data from the Energy Institute’s Statistical Review of World Energy 2025. The modelling framework — log-transformation to handle exponential growth, a linear time trend as a fixed predictor, and an AR(1) error structure to capture path-dependent deployment dynamics, was motivated by systematic exploratory analysis rather than a priori assumption.

The key findings are:

  1. The historical trajectory is robust and precise. The posterior distribution of the annual compound growth rate is tightly estimated at approximately 27–29% per year, with a 95% credible interval that excludes zero by a wide margin. Thirty-five years of data across multiple policy regimes, technological phases, and economic shocks have all been consistent with this sustained exponential growth, a finding consistent with Wright’s Law dynamics in solar technology (Lafond et al., 2018).

  2. Year-over-year deployment momentum is real and strong. The AR(1) coefficient indicates substantial residual autocorrelation even after trend removal, confirming that solar deployment is path-dependent: policy environments and supply chain conditions established in one year carry forward into the next. This is not merely a statistical result, it reflects the multi-year lead times of utility-scale solar projects, grid connection queues, and manufacturing capacity investments.

  3. The historical trajectory falls short of the IEA NZE 2030 target. The posterior mean forecast for 2030 is approximately 5,000–8,000 TWh (depending on where the posterior AR(1) anchor places the near-term trajectory), while the IEA Net Zero Emissions by 2050 scenario requires approximately 14,000 TWh of solar by 2030 (IEA, 2023). Even the upper end of the 95% prediction interval for 2030 may not reach this target under the historical growth rate alone. This is the central quantitative finding: the trend is exceptional by any historical standard, but it is insufficient for the NZE pathway without significant acceleration.

  4. The model selected by LOO-CV (AR(1) trend) is meaningfully better than the trend-only baseline. The LOO advantage is substantively justified by the path-dependence mechanism, not merely by a statistical criterion. This provides robust grounds for using the AR(1) model’s predictions for policy planning.

6.2 Policy Recommendations

The findings support three specific, evidence-grounded policy recommendations:

Recommendation 1 — Accelerate the rate of deployment, not just the level. The IEA NZE pathway requires not merely continuing the historical growth rate but increasing it. Policies that have historically been most effective at accelerating deployment include long-term auction contracts (providing investment certainty that feeds the AR(1) momentum), streamlined permitting (to reduce the time from investment decision to generation), and targeted public financing for emerging markets where cost of capital remains prohibitive (Fankhauser et al., 2022). The AR(1) structure in the model suggests that these policies are most effective when sustained across multiple budget cycles, a single year of acceleration followed by retrenchment will be partially absorbed by mean reversion.

Recommendation 2 — Address material supply chain risks before they become binding constraints. The forecast’s validity condition (no structural break from resource constraints) is a policy lever, not merely a modelling caveat. The identification of copper, silicon, and silver as potential bottlenecks (BloombergNEF, 2023) suggests that parallel investment in materials recycling, alternative cell architectures (perovskite, thin-film), and supply chain diversification is strategically necessary to prevent deployment from decoupling from the historical growth trajectory.

Recommendation 3 — Use the prediction intervals, not just the point forecast, for infrastructure planning. Grid operators and energy ministries should plan for the upper bound of the 95% prediction interval, the best-case solar scenario — rather than the central forecast. Under-building grid infrastructure for a solar-heavy future is far more costly than over-building (Hirth, 2013): curtailed solar is a permanent loss of clean energy, while excess grid capacity retains option value. The Bayesian framework provides exactly the probabilistic range needed for this type of robust decision analysis (Lempert et al., 2003).

6.3 Limitations and Future Work

This analysis has several limitations that should be acknowledged transparently:

  • Annual frequency masks intra-year dynamics. Monthly data would enable seasonal decomposition and sharper identification of policy shocks; however, the Energy Institute dataset is structured at annual resolution.
  • The log-linear trend assumes no saturation. As discussed in Task 5, S-curve dynamics are not captured by the current model. Future work could compare the AR(1) trend model against a Bayesian logistic growth model, where the carrying capacity parameter captures the long-run saturation level.
  • No country-level heterogeneity. Aggregating to the global total obscures the highly unequal distribution of deployment — China, the EU, and the US account for a disproportionate share. A hierarchical Bayesian model with country-level random effects (Gelman & Hill, 2007) would provide richer policy insight.
  • The forecast is unconditional on policy. A policy-conditional Bayesian model, where different IRA, Chinese policy, and European scenarios are encoded as informative priors, would provide scenario forecasts that are more directly actionable for policy analysis.

7 Session Information

7.1 References

Bistline, J., Mehrotra, N., & Wolfram, C. (2023). Economic impacts of the Inflation Reduction Act. Brookings Papers on Economic Activity, 2023(2), 77–146.

BloombergNEF. (2023). New Energy Outlook 2023. Bloomberg Finance L.P.

Box, G. E. P., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time Series Analysis: Forecasting and Control (5th ed.). Wiley.

Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28.

Bürkner, P.-C., Gabry, J., & Vehtari, A. (2021). Approximate leave-future-out cross-validation for Bayesian time series models. Journal of Statistical Computation and Simulation, 90(14), 2499–2523.

Cleveland, R. B., Cleveland, W. S., McRae, J. E., & Terpenning, I. (1990). STL: A seasonal-trend decomposition procedure based on loess. Journal of Official Statistics, 6(1), 3–73.

Dickey, D. A., & Fuller, W. A. (1979). Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association, 74(366), 427–431.

Energy Institute. (2025). Statistical Review of World Energy 2025 (74th ed.). Energy Institute.

Fankhauser, S., Smith, S. M., Allen, M., Axelsson, K., Hale, T., Hepburn, C., … & Wetzer, T. (2022). The meaning of net zero and how to get it right. Nature Climate Change, 12(1), 15–21.

Fischhoff, B., & Davis, A. L. (2014). Communicating scientific uncertainty. Proceedings of the National Academy of Sciences, 111(Supplement 4), 13664–13671.

Gabry, J., Simpson, D., Vehtari, A., Betancourt, M., & Gelman, A. (2019). Visualization in Bayesian workflow. Journal of the Royal Statistical Society: Series A, 182(2), 389–402.

Gelman, A., & Hill, J. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.

Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457–472.

Gelman, A., Meng, X.-L., & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6(4), 733–760.

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian Data Analysis (3rd ed.). CRC Press.

Gelman, A., Simpson, D., & Betancourt, M. (2017). The prior can often only be understood in the context of the likelihood. Entropy, 19(10), 555.

Grubler, A., Nakicenovic, N., & Victor, D. G. (1999). Dynamics of energy technologies and global change. Energy Policy, 27(5), 247–280.

Hamilton, J. D. (1994). Time Series Analysis. Princeton University Press.

Heffron, R. J., & McCauley, D. (2018). What is the ‘just transition’? Geoforum, 88, 74–77.

Hirth, L. (2013). The market value of variable renewables. Energy Economics, 38, 218–236.

Hoolohan, V., Larkin, A., McLachlan, C., & Powells, G. (2023). The Ukraine war and European energy security: Implications for the net zero transition. Energy Research & Social Science, 102, 103177.

IEA. (2023). Net Zero Emissions by 2050 Scenario: World Energy Outlook 2023. International Energy Agency. https://www.iea.org/reports/world-energy-outlook-2023

Kwiatkowski, D., Phillips, P. C. B., Schmidt, P., & Shin, Y. (1992). Testing the null hypothesis of stationarity against the alternative of a unit root. Journal of Econometrics, 54(1–3), 159–178.

Lafond, F., Bailey, A. G., Bakker, J. D., Rebois, D., Zadourian, R., McSharry, P., & Farmer, J. D. (2018). How well do experience curves predict technological progress? A method for making distributional forecasts. Technological Forecasting and Social Change, 128, 104–117.

Lempert, R. J., Popper, S. W., & Bankes, S. C. (2003). Shaping the Next One Hundred Years: New Methods for Quantitative, Long-Term Policy Analysis. RAND Corporation.

Maddala, G. S., & Kim, I. M. (1998). Unit Roots, Cointegration, and Structural Change. Cambridge University Press.

McElreath, R. (2020). Statistical Rethinking: A Bayesian Course with Examples in R and Stan (2nd ed.). CRC Press.

Mir-Artigues, P., & del Río, P. (2016). Combining tariffs, investment subsidies and soft loans in a renewable electricity deployment policy. Energy Policy, 102, 430–442.

Mudelsee, M. (2019). Trend Analysis of Climate Time Series: A Review of Methods. Earth-Science Reviews, 190, 310–322.

Rogers, E. M. (2003). Diffusion of Innovations (5th ed.). Free Press.

Vehtari, A., Gelman, A., & Gabry, J. (2017). Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27(5), 1413–1432.

Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P.-C. (2021). Rank-normalization, folding, and localization: An improved R-hat for assessing convergence of MCMC. Bayesian Analysis, 16(2), 667–718.

Wright, T. P. (1936). Factors affecting the cost of airplanes. Journal of the Aeronautical Sciences, 3(4), 122–128.

## R version 4.4.2 (2024-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26200)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: Asia/Calcutta
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] tidybayes_3.0.7  bayesplot_1.15.0 brms_2.23.0      Rcpp_1.1.1      
##  [5] zoo_1.8-15       patchwork_1.3.2  forecast_9.0.2   tseries_0.10-61 
##  [9] readxl_1.4.5     lubridate_1.9.3  forcats_1.0.0    stringr_1.6.0   
## [13] dplyr_1.1.4      purrr_1.2.1      readr_2.1.5      tidyr_1.3.1     
## [17] tibble_3.3.1     ggplot2_4.0.3    tidyverse_2.0.0 
## 
## loaded via a namespace (and not attached):
##  [1] gridExtra_2.3         inline_0.3.21         rlang_1.1.4          
##  [4] magrittr_2.0.4        matrixStats_1.5.0     ggridges_0.5.7       
##  [7] compiler_4.4.2        mgcv_1.9-1            loo_2.9.0            
## [10] callr_3.7.6           vctrs_0.6.5           reshape2_1.4.5       
## [13] quadprog_1.5-8        pkgconfig_2.0.3       arrayhelpers_1.1-0   
## [16] fastmap_1.2.0         backports_1.5.0       labeling_0.4.3       
## [19] utf8_1.2.4            rmarkdown_2.31        tzdb_0.4.0           
## [22] ps_1.8.1              xfun_0.57             cachem_1.1.0         
## [25] jsonlite_2.0.0        parallel_4.4.2        R6_2.5.1             
## [28] bslib_0.8.0           stringi_1.8.4         RColorBrewer_1.1-3   
## [31] StanHeaders_2.32.10   jquerylib_0.1.4       cellranger_1.1.0     
## [34] rstan_2.32.7          knitr_1.51            Matrix_1.7-1         
## [37] splines_4.4.2         timechange_0.3.0      tidyselect_1.2.1     
## [40] rstudioapi_0.17.1     abind_1.4-8           yaml_2.3.10          
## [43] timeDate_4052.112     codetools_0.2-20      curl_6.4.0           
## [46] processx_3.8.4        pkgbuild_1.4.8        lattice_0.22-6       
## [49] plyr_1.8.9            quantmod_0.4.28       withr_3.0.2          
## [52] bridgesampling_1.2-1  S7_0.2.2              urca_1.3-4           
## [55] posterior_1.7.0       coda_0.19-4.1         evaluate_1.0.1       
## [58] RcppParallel_5.1.11-2 isoband_0.2.7         ggdist_3.3.3         
## [61] xts_0.14.2            pillar_1.9.0          tensorA_0.36.2.1     
## [64] checkmate_2.3.4       stats4_4.4.2          distributional_0.7.0 
## [67] generics_0.1.3        TTR_0.24.4            hms_1.1.3            
## [70] rstantools_2.6.0      scales_1.4.0          glue_1.8.0           
## [73] tools_4.4.2           mvtnorm_1.3-6         grid_4.4.2           
## [76] QuickJSR_1.9.0        colorspace_2.1-1      nlme_3.1-166         
## [79] fracdiff_1.5-3        cli_3.6.3             fansi_1.0.6          
## [82] svUnit_1.0.8          Brobdingnag_1.2-9     gtable_0.3.6         
## [85] sass_0.4.9            digest_0.6.37         farver_2.1.2         
## [88] htmltools_0.5.8.1     lifecycle_1.0.4       MASS_7.3-61