Data Prepare

1. Weather

2. Climate Indices

3. Data Preview

date yiel year doy bio5 bio11 bio10 id jci moi ntg ogs10 wki ws wsdi xtg hi hd17 tn90p tr tx10p vwd prcptot cdd cwd r10mm r20mm rx1day rx5d sdii r95tot r99tot fd su csd stx32 mi fg fgcalm fg6bft mai mfi ssd ssp snd wci utci periodicity dom_wdirhk dphk rh2hk_dailymean srhk_dailymean t2hk_dailymax t2hk_dailymean t2hk_dailymin wsphk_dailymax wsphk_dailymean
2001-01-01 2001 1 30.63180 19.28192 28.04827 0 27.49678 1.690107 15.88942 0 0 15.88942 0 15.88942 15 1.1105785 0 0 0 0 0 1 0 0 0 0e+00 NA 0 0 0 0 0 0 0 0 2.695067 0 0 71.76713 0e+00 315.8694 1316.1224 0 24.95087 -123.72464 monthly 67.5 0e+00 68.01212 315.8694 17.15674 15.88942 14.341217 4.613824 2.695067
2004-01-01 2004 1 30.52903 18.26718 28.55370 0 35.05992 1.459625 18.11393 0 0 18.11393 0 18.11393 18 0.0000000 0 0 0 0 0 1 0 0 0 0e+00 NA 0 0 0 0 0 0 0 0 3.756847 0 0 27.26053 0e+00 207.0936 862.8902 0 24.79248 -85.42874 monthly 67.5 0e+00 66.78092 207.0936 20.62766 18.11393 15.581024 5.635863 3.756847
2006-01-01 5961.38 2006 1 30.46052 19.12600 28.23819 0 28.71821 1.648079 20.09398 0 0 20.09398 0 20.09398 20 0.0000000 0 0 0 0 0 1 0 0 0 0e+00 NA 0 0 0 0 0 0 0 0 2.746764 0 0 70.00171 0e+00 202.2794 842.8307 0 25.21861 -67.40000 monthly 67.5 0e+00 71.20263 202.2794 22.73886 20.09398 17.799957 4.530318 2.746764
2019-01-01 20492.00 2019 1 30.77942 20.42255 28.77577 0 27.54951 1.688249 14.73851 0 0 14.73851 0 14.73851 14 2.2614946 0 0 0 0 0 1 0 0 0 8e-07 NA 0 0 0 0 0 0 0 0 3.800786 0 0 45.32357 8e-07 194.4447 810.1861 0 25.83674 -56.88514 monthly 45.0 8e-07 59.67574 194.4447 18.49609 14.73851 9.629028 7.122638 3.800786
2024-01-01 21523.00 2024 1 31.36483 19.75575 29.12367 0 33.91850 1.490297 19.13332 0 0 19.13332 0 19.13332 19 0.0000000 0 0 0 0 0 1 0 0 0 0e+00 NA 0 0 0 0 0 0 0 0 5.000105 0 0 49.57399 0e+00 201.9484 841.4518 0 25.89776 -58.87297 monthly 67.5 0e+00 70.21452 201.9484 21.24637 19.13332 17.068298 6.053437 5.000105
2012-01-01 12890.86 2012 1 30.62774 18.61248 28.07238 0 33.01736 1.515438 16.75205 0 0 16.75205 0 16.75205 16 0.2479477 0 0 0 0 0 1 0 0 0 0e+00 NA 0 0 0 0 0 0 0 0 3.079289 0 0 85.96466 0e+00 204.3631 851.5130 0 24.86616 -48.07879 monthly 67.5 0e+00 65.90116 204.3631 20.31604 16.75205 14.298065 4.742382 3.079289
2022-01-01 22177.00 2022 1 31.14147 18.89542 28.81193 0 36.15213 1.431435 17.31522 0 0 17.31522 0 17.31522 17 0.0000000 0 0 0 0 0 1 0 0 0 0e+00 NA 0 0 0 0 0 0 0 0 3.174846 0 0 61.11502 0e+00 203.2604 846.9184 0 24.94639 -72.01952 monthly 67.5 0e+00 70.77531 203.2604 20.43677 17.31522 14.415924 4.915345 3.174846
2016-01-01 16531.50 2016 1 31.42702 17.73512 28.94781 0 36.23008 1.429464 17.59181 0 0 17.59181 0 17.59181 17 0.0000000 0 0 0 0 0 1 0 0 0 0e+00 NA 0 0 0 0 0 0 0 0 3.891066 0 0 44.90873 0e+00 203.7141 848.8086 0 25.30550 -21.61573 monthly 67.5 0e+00 57.14178 203.7141 20.94125 17.59181 14.697449 6.208151 3.891066
2010-01-01 9756.29 2010 1 30.52635 19.64243 28.40109 0 28.87433 1.642858 16.58170 0 0 16.58170 0 16.58170 16 0.4183025 0 0 0 0 0 1 0 0 0 0e+00 NA 0 0 0 0 0 0 0 0 4.488943 0 0 90.56621 0e+00 192.3433 801.4303 0 24.87201 -50.86973 monthly 67.5 0e+00 79.20745 192.3433 18.27646 16.58170 14.640808 5.967861 4.488943
2018-01-01 20289.25 2018 1 30.54786 18.12408 28.45458 0 34.54116 1.473407 16.48720 0 0 16.48720 0 16.48720 16 0.5128040 0 0 0 0 0 1 0 0 0 0e+00 NA 0 0 0 0 0 0 0 0 3.972923 0 0 66.34983 0e+00 203.3076 847.1151 0 25.08855 -47.56812 monthly 67.5 0e+00 70.02759 203.3076 19.21570 16.48720 13.822662 5.564889 3.972923

3.1 Preliminary Analysis

##          date       bio5  ...  wsphk_dailymax  wsphk_dailymean
## 0  2001-01-01  30.631801  ...        4.613824         2.695067
## 1  2001-01-02        NaN  ...        4.357427         2.921393
## 2  2001-01-03        NaN  ...        4.911554         3.461336
## 3  2001-01-04        NaN  ...        5.330330         3.840347
## 4  2001-01-05        NaN  ...        5.145489         4.542647
## 
## [5 rows x 54 columns]
## <class 'pandas.core.frame.DataFrame'>

## [{'mean': np.float64(24.07278709845588), 'std': np.float64(4.691036119980351), 'kurtosis': np.float64(nan), 'skewness': np.float64(nan), 'ks_stat': np.float64(nan), 'ks_pval': np.float64(nan), 'jb_stat': np.float64(nan), 'jb_pval': np.float64(nan)}]

4. Temp - CAR(p) Continous-Time Autoregressive Model

Concept and Methodology Link

4.2 Indices CDD, HDD, CAT/PRIM for Contract Structure

The typical seasons of temperature contracts correspond to when heating demands or cooling demands are high, thus the HDD index usually measures the temperature from October to April whereas the CDD index measures the temperature from April to October. The CAT index substitutes the CDD index for the European cities traded on the CME.

Hence, indices created are of types -

monthly HDD, CAT:

HDD for heating demand heavy months and CDD (substituted by CAT in European market convention)

OCT, NOV, DEC, JAN, FEB, MAR: HDD APR, MAY, JUN, JUL, AUG, SEP, OCT: CAT

3 and 6 month strips

HDD OCT-JAN, JAN-APR, OCT-APR CAT APR-JUL, JUL-OCT, APR-OCT

Actual Indices and Payoff Calculations -

  1. Using Euler–scheme CAR(p) simulator -

  2. Defining the payoff indices (baseline temp - 18 degree celcius) -

\[\begin{equation} HDD(\tau_{1},\tau_{2}) = \int_{\tau_{1}}^{\tau_{2}} \max(c - T_{u}, 0) du, \end{equation}\]

\[\begin{equation} CDD(\tau_{1},\tau_{2}) = \int_{\tau_{1}}^{\tau_{2}} \max(T_{u}-c, 0) du, \end{equation}\]

\[\begin{equation} CAT(\tau_{1},\tau_{2}) = \int_{\tau_{1}}^{\tau_{2}} T_{u} du, \end{equation}\]

with HDD-CDD parity

\[\begin{equation} CDD(\tau_{1},\tau_{2})- HDD(\tau_{1},\tau_{2})= CAT(\tau_{1},\tau_{2}) - c(\tau_{2}-\tau_{1}). \end{equation}\]

  1. Monthly grouping and rolling calculation - Daily indices (HDD, CDD, CAT, OT) are calculated as a weighted average of model simulation and observed data.

OT (Overall Temperature) index is placeholder for month-dependent index selection (months 5,6,7,8,9 - CAT else HDD)

  1. Payoff - Daily payoff is obtained by first difference index values which resets to 0 at the beginning of teh month (assuming a monthly boundary/holding period).

Model Equations

The fitted AR(3) model is
\[ y_t \;=\; \phi_1\,y_{t-1} \;+\; \phi_2\,y_{t-2} \;+\; \phi_3\,y_{t-3} \;+\; \varepsilon_t,\quad \varepsilon_t \sim \mathcal{N}(0, \sigma_t^2). \]

The seasonal mean function uses a Fourier series:
\[ m(t) \;=\; a_0 + a_1\,t + a_2\cos(\omega t) + a_3\sin(\omega t), \quad \omega = \frac{2\pi}{365.25}. \]

The time‐varying variance is
\[ \sigma_t^2 \;=\; d_1\cos(\omega t) + d_2\sin(\omega t). \] ## A) Fitted AutoReg Model

                            AutoReg Model Results                             
==============================================================================
Dep. Variable:           seasonalized   No. Observations:                17898
Model:                     AutoReg(3)   Log Likelihood              -29377.318
Method:               Conditional MLE   S.D. of innovations              1.249
Date:                Thu, 19 Jun 2025   AIC                          58762.635
Time:                        10:56:30   BIC                          58793.804
Sample:                    01-04-2001   HQIC                         58772.888
                         - 01-01-2050                                         
===================================================================================
                      coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------------
seasonalized.L1     1.0183      0.007    138.167      0.000       1.004       1.033
seasonalized.L2    -0.4146      0.010    -40.925      0.000      -0.434      -0.395
seasonalized.L3     0.1675      0.007     22.725      0.000       0.153       0.182
                                    Roots                                    
=============================================================================
                  Real          Imaginary           Modulus         Frequency
-----------------------------------------------------------------------------
AR.1            1.3114           -0.0000j            1.3114           -0.0000
AR.2            0.5821           -2.0528j            2.1338           -0.2060
AR.3            0.5821           +2.0528j            2.1338            0.2060
-----------------------------------------------------------------------------

B) Deseasonalized & Residuals (head)

date seasonalized residuals_cv fitted_variance residuals_tv
2001-01-01 00:00:00 -2.12601 nan 3.25328 nan
2001-01-02 00:00:00 -0.498213 nan 3.25678 nan
2001-01-03 00:00:00 0.435505 nan 3.25974 nan
2001-01-04 00:00:00 -0.572836 -0.866796 3.26221 -0.479911
2001-01-05 00:00:00 0.22907 1.07639 3.26422 0.595771

C) All estimated parameters

parameter estimate
a₀ -19.7378
a₁ 5.92484e-05
a₂ -5.9183
a₃ 0.0843081
d₁ 5.0438
d₂ -146.533
seasonalized.L1 1.01827
seasonalized.L2 -0.414631
seasonalized.L3 0.167486
a0 1.56136
a1 1.52839
a2 0.326036
a3 -0.0423424
a4 -0.108629
b1 0.130793
b2 0.0333528
b3 -0.0694925
b4 -0.0160239

D) Sample of computed indices

    date  year  ...  strip_hdd_continuous_payoff  strip_ot_continuous_payoff

0 2001-01-01 2001 … 0.000000 0.000000 1 2001-01-02 2001 … 0.000000 0.000000 2 2001-01-03 2001 … 10.287056 10.287056 3 2001-01-04 2001 … 0.000000 0.000000 4 2001-01-05 2001 … 12.942390 12.942390

[5 rows x 49 columns]

4.1 Temperature Dynamics for Time Series Decomposition

## \begin{tabular}{lrrrrrrrrr}
##  & \multicolumn{6}{c}{Seasonal Component} & \multicolumn{3}{c}{Autoregressive Component} \\
##  & a_0 & a_1 & a_2 & a_3 & d_1 & d_2 & seasonalized.L1 & seasonalized.L2 & seasonalized.L3 \\
## station_id &  &  &  &  &  &  &  &  &  \\
## CW0063 & -19.738 & 0.059 & -5.918 & 0.084 & 5.044 & -146.533 & 1.018 & -0.415 & 0.167 \\
## \end{tabular}

The K-S test statistic measures the largest distance between the EDF x) and the theoretical function (x), measured in a vertical direction (Kolmogorov as cited in Stephens 1992).

  mean std kurtosis skewness ks_stat ks_pval jb_stat jb_pval
station_id                
CW0063 -0.01 1.00 6.32 -0.88 0.07 0.00 10525.65 0.00
## <pandas.io.formats.style.Styler object at 0x00000295E6C1BD90>

## []

4.3 Weather Station Results of the CAR(p) Model

Model Output Results

5. Temp - Alaton Model - Seasonal SDE

data <- combined_df
plot_tavg <- create_plot("t2hk_dailymean")
plot_prcp <- create_plot("dphk")
plot_wspd <- create_plot("wsphk_dailymax")
plot_pres <- create_plot("rh2hk_dailymean")

grid.arrange(plot_tavg, plot_prcp, plot_wspd, plot_pres, ncol = 2, top = "Empiric vs Theoric Distribution for Weather Variables")

## 
## Call:
## lm(formula = Temp ~ t + sin(Omega * t) + cos(Omega * t), data = df_model)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.3164  -1.0493   0.0734   1.4042   5.5610 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     2.355e+01  3.152e-02  747.12   <2e-16 ***
## t               5.928e-05  3.050e-06   19.44   <2e-16 ***
## sin(Omega * t) -2.571e+00  2.229e-02 -115.33   <2e-16 ***
## cos(Omega * t) -5.318e+00  2.227e-02 -238.75   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.108 on 17894 degrees of freedom
##   (334 observations deleted due to missingness)
## Multiple R-squared:  0.7981, Adjusted R-squared:  0.7981 
## F-statistic: 2.358e+04 on 3 and 17894 DF,  p-value: < 2.2e-16
Estimated coefficients for seasonal OLS model
term estimate std.error statistic p.value
(Intercept) 23.546 0.032 747.120 0
t 0.000 0.000 19.437 0
sin(Omega * t) -2.571 0.022 -115.330 0
cos(Omega * t) -5.318 0.022 -238.748 0
Estimated parameters for the seasonal model
Model a1 a2 a3 a4 r.squared sigma
\(T_t^{m}\) (OLS) 23.5463 5.93e-05 -2.57103 -5.317991 0.7981256 2.10788
Coefficients of the regression of Tm
Model A B C Phi
\(T_m\) 23.5463 5.93e-05 5.90688 -2.021124

6. Wind - univariate OU based on wind speed

## [{'station_id': 'CW0063', 'theta_ols': np.float64(4.49450916924405), 'kappa_ols': np.float64(2.444947015974006), 'sigma_ols': np.float64(1.3217379144842238), 'theta_mle': np.float64(4.494258052517118), 'kappa_mle': np.float64(2.4425439461097915), 'sigma_mle': np.float64(2.6914355377269192), 'mle_2_mu': np.float64(4.494510995381748), 'mle_2_eta': array([1.]), 'mle_2_sigma_sq': array([5.98216371])}]

7. Wind - Multivariate OU (Mean reverting SDE) based on wind speed and wind direction

Long‐term mean vector (\(\mu\))
Value
wsphk_dailymax 4.4945
dom_wdirhk 116.3609
Drift matrix (\(\Theta\))
θ_11 θ_12
wsphk_dailymax 1 0.4958 -0.0007
wsphk_dailymax 2 -3.0215 0.3359
Diffusion matrix (\(\Sigma\))
σ_11 σ_12
dom_wdirhk 1 1.4605 -3.5294
dom_wdirhk 2 -3.5294 3120.8882

7.1 Wind Speed - Seasonal model

Wind Dynamics - Continous time transformation: Benth (2010) employed a Box-Cox transformation to achieve symmetry in wind speed data before fitting an AR(4) process with seasonal mean and volatility. General transformation of wind speed data when defining continuous time dynamics - \[\begin{equation} W(t)= \begin{cases} \big [ \lambda (\Lambda(t) + Y(t)) + 1 \big ]^{1/\lambda} &, \lambda \neq 0, \\ \exp (\Lambda(t) + Y(t)) &, \lambda=0, \end{cases} \end{equation}\]

for a constant and (t) seasonal function as in the previous section. A problem with this transformation is that for we may theoretically obtain negative wind speeds. The research of Benth (2010) argued that 1/ performed best.

8. Precipitation - Wilks Occurance Model

Concept and Methodology Link

Using Multisite Rainfall Generation Methodology - generate daily precipitation at multiple stations by following method:

  1. Rain/No‑Rain Occurrence

For each station \(s\) and day \(t\), define the wet‑day indicator

\[ I_{s,t} \;=\; \begin{cases} 1, & \text{if station $s$ on day $t$ has precipitation}\;\ge v_{\min},\\ 0, & \text{otherwise}. \end{cases} \]

(a) Marginal Logistic (or Probit) Model

For each station \(s\), fit a generalized linear model

\[ \Pr(I_{s,t}=1 \mid \mathbf{x}_t) = g^{-1}\!\bigl(\beta_{s,0} + \boldsymbol\beta_s^\top\,\mathbf{x}_t\bigr), \]

where:

  • \(\mathbf{x}_t\) are exogenous covariates (e.g. daily \(\mathrm{T_{max}}-\mathrm{T_{min}}\)),
  • \(g^{-1}\) is the inverse link (logistic: \(g^{-1}(z)=1/(1+e^{-z})\), or probit: \(g^{-1}(z)=\Phi(z)\)).

This yields fitted marginal probabilities

\[ p_{s,t} \;=\;\Pr(I_{s,t}=1\mid\mathbf{x}_t)\,. \]

(b) Copula‐Based Multisite Dependence (Wilks 1998)

  1. Gaussian latent: transform each marginal \(p_{s,t}\) to a standard normal quantile

    \[ z_{s,t} \;=\;\Phi^{-1}(p_{s,t})\,. \]

  2. Estimate the \(S\times S\) residual correlation matrix \(\Sigma\) from the \(\{z_{s,t}\}\).

  3. Simulate

    \[ \mathbf Z_t \;\sim\;\mathcal N(\mathbf0,\Sigma), \]

    and recover occurrence by thresholding:

    \[ \tilde I_{s,t} = \mathbf1\bigl\{Z_{s,t} > \Phi^{-1}(u)\bigr\},\quad u\approx0.5. \]

This preserves both each station’s marginal wet‐day frequency and the inter‐site correlation structure.

2. Non‑Zero Amounts

Conditional on a wet day (\(\tilde I_{s,t}=1\)), model the positive rainfall depth \(Y_{s,t}\). A common choice is a Gamma‐GLM:

\[ Y_{s,t} \;\big|\;I_{s,t}=1 \;\sim\; \Gamma\bigl(\alpha_s,\;\theta_s(\mathbf{x}_t)\bigr), \]

where one can link the scale via

\[ \log\bigl(\theta_s(\mathbf{x}_t)\bigr) = \gamma_{s,0} + \boldsymbol\gamma_s^\top\,\mathbf{x}_t. \]

#### Sampling

  1. Draw \(u\sim U(0,1)\).
  2. Invert the fitted marginal CDF:

    \[ Y_{s,t} \;=\; F^{-1}_{s}(u)\,. \]

This two‐stage “occurrence-amount” approach, coupled via a Gaussian copula, produces spatially coherent daily rainfall series.