Marketing Models
Designed Experiments
Time Series Regression
Tweedie Distribution
Fat-Tailed Regression
Modeling Discussion
Suppose we are sending out a direct-mail campaign to raise money for charity. Then, our expected profit is as follows:
\[ \begin{align*} E[\$|person_i] &= P(Accept=0|person_i)\cdot E[Profit|Accept=0,person_i] + P(Accept=1|person_i)\cdot E[Profit|Accept=1,person_i]\\ &=P(Accept=0|person_i)\cdot 0 + P(Accept=1|person_i)\cdot E[Profit|Accept=1,person_i]\\ \end{align*} \]
This two-part model is generally better than a single model that models expected profit conditioned on the person. We could have also used a tweedie which is just a zero-inflated gamma.
Another example would be that the cost of a mail costs \(c\) dollars and the profit from each mail is \(d\) dollars then, we can target all customers such that:
\[P(Convert=1|person_i)\cdot d > c.\]
Some notes from edwin chen’s blog:
Ask causal questions: What is the effect of __ on __?
What is the effect of advertising on customer behavior?
What is the effect of treatment on outcome?
The effect of a treatment would be measured as:
\[effect_{treated}=E[Y|person_i,treated=1]-E[Y|person_i,treated=0].\]
or we can take ratios: \[effect_{treated}=\frac{E[Y|person_i,treated=1]}{E[Y|person_i,treated=0]}.\]
However, the equation above is often difficult to measure since there are confounding variables. One way to deal with this is by propensity modeling. Suppose we have variables \(A\), \(W\), and \(Y\) corresponding to treatment, confounders, and response.
The joint distribution is: \[P(Y,W,A)=P(W)P(A|W)P(Y|A,W).\]
First, we would like to marginalize out the treatment to get the following: \[P(Y,W)=P(W)P(A=0|W)P(Y|A=0,W)+P(W)P(A=1|W)P(Y|A=1,W)\]
and then assume that conditioned on \(W\), \(Y\) and \(A\) are independent. That is, there are no more confounders.
Some notes from mathbabe.org:
The linear regression assumption of (1) fixed coefficients is often violated and can be checked by having a moving window estimates through the training data. If the sign is consitently changing, then a good estimate for it would be zero since it is very unstable. This means that the coefficients are not fixed and that coefficients are always evolving.
We would like to also have running estimates of other descriptive statistics like the mean, variance, and correlation between variables to see the relationships with time.
Additionally, for financial data, log returns are usually used, that is:
\[log(\frac{f_t}{f_{t-1}}).\]
It is helpful to use a gains chart where you plot the cumulative \(\hat{y}\cdot y\).
Next, we fit a regression on the detrended series. That is, we take some transformation on \(y\) to get white noise and fit regression on:
\[e_t=x_t^T\beta.\]
I havent tried this yet, but we can also fit a time series model and then take out the autoregressive effects to get white noise and then model the regression on that.
Next, we would need to:
Normalize returns by volatility.
Exponentially downweigh old data with a decay factor.
For point (2), we use essentially a weighted least squares with older terms being some function of a decay factor \(s\). So for example, let \(i\) be the number of intervals since newest (first) point, then this equals:
\[V_{old}=\frac{\sum_ir_i^2s^i}{\frac{1}{1-s}}\] \[=(1-s)\sum_ir_i^2s^i\]
this implies that to update the model, we need: \[V_{new}=sV_{old}+(1-s)r_0^2\]
Some priors are that “new data is more important than old ones so weight it more” and “coefficients vary smoothly”.
Tweedie GLMs are used for distributions that have a large stack at zeros. This occurs frequently in insurance data where most claim frequency and severity have a large stack at 0 (>90%). For the frequency model, the book mentions that zero-inflated negative binomials or poissons are often used. The model has two parts:
Model the probability of claim or severity being equal to zero.
Model the loss ($) given that claim or severity is greater than zero.
Typically, for frequency-severity models, we have a two-part model of the following form:
Use a count regression model with \(N_i\) as the response; the number of claims or accidents.
Conditioned on \(N_i>0\), model \(S_i/N_i\) (the average loss per claim) as the response using a gamma or log-normal with weight \(1/N_i\).
Or we can model each loss individually. That is, model \(S_{ij}\) using gamma, log-normal, or a mixed linear model.
Finally, we can use the Tweedie GLM to model total loss. The tweedie has a fat stack of zeros and is right-skewed. It’s defined as a Poisson sum of gamma random variables. That is, let \(N\) (representing the number of claims) with mean \(\lambda\). Then, let \(y_j\) be i.i.d. independent of \(N\), with each \(y_j \sim Gamma(\alpha,\gamma)\). Then \(S_N=y_1+y_2+...+y_N\) is a Poisson sum of gammas. In english, we let \(N\) be drawn from a Poisson, then specify the distribution of the sum of independent gammas (\(S_N\)) marginalized over \(N\). More specifically, we want
\[Pr(S_N=s_n)=\sum_{n=0}^{\infty}Pr(N=n)Pr(S_N=s_n|N=n)s_n\]
The probability of zero claims is: \[Pr(S_N=0)=Pr(N=0)=e^{-\lambda}.\]
Then the cumulative density function of the random variable \(S_N\) is: \[Pr(S_N\leq y)=e^{-\lambda}+\int_{x=0}^{y}Pr(N=n)Pr(S_n\leq y|N=n), y\geq 0.\]
where \(S_n \sim Gamma(n\alpha,\gamma)\). This implies that the density of \(S_n\) or \(y\) is: \[f_S(y)=\sum_{n=1}^{\infty}e^{-\lambda}\frac{\lambda^n}{n!}\frac{\gamma^{n\alpha}}{\Gamma (n\alpha)}y^{n\alpha -1}e^{-y\gamma}.\]
Notice that the left two terms are the poisson and the sum is because we are marginalizing out the poisson part. The right three terms are gamma.
It can be shown that this is part of the exponential family with mean \(\mu\) and variance \(\phi\mu^p\) and can be calculated by using the R function ptweedie.
Fat-Tailed situations often arise in financial or in insurance contexts. A fat-tail means fat in the tail-ends of a distribution with the common ones being the following distributions:
Gamma
Weibull
Pareto
These distributions are important to study since the events in the fat tails often cause huge losses with huge loss events occuring more frequently than assumed by the normal distribution.
There are many methods in modeling fat tails with regression models.
We could transform variables (log it) and then model the transformed response as a log-normal.
We could use a GLM with a fat-tailed distribution in the exponential family: common ones being gamma and generalized beta.
We use a nonparametric median regression.
For part (1), this method is popular since it is easy to implement and pretty straightforward to impement. The downsides are that we get a multiplicative model and \(E[g(u)]\neq g(E[u])\).
There are other transformations as well; namely,
Box-Cox
Signed power
Modulus
Inverse hyperbolic sine
In terms of tail strength (where stronger means higher kurtosis), gamma is weaker than inverse Gaussian. The Pareto is a common distribution used to model liability insurance and excess if loss reinsurance. In other circumstances, the data is modeled with two distributions so that it is closer to the empirical distribution.
We now introduce the generalized beta of the second kind (GB2) that uses four-parameters to specify the distribution.
Let \(z\sim B(\phi_1, \phi_2)\), then we can write the GB2 distribution as:
\[ln(Y)=\mu +\sigma ln\frac{B(\phi_1, \phi_2)}{1-B(\phi_1, \phi_2)}\]
\[=\mu +\sigma ln\frac{z}{1-z}\]
Where \(\frac{B(\phi_1, \phi_2)}{1-B(\phi_1, \phi_2)}\) follows the beta prime distribution also known as the beta distribution of the second kind.
The models presented in the book ultimately end up being mathematical models using statistical ideas. For example, regression is just a function mapping \(x\) to \(y_{mean}\) where \(y_{mean}\) is constant and fixed since it is the expeced value.
Some more interesting topics on understanding functions and optimization may be the book: \(A Primer for Financial Mathematics\) since it covers partial diffy-q’s, some nonlinear optimization, linear programming with constraints, and more time series stuff.