Overview

Part of this semester’s coursework includes a project that incorporates the concepts from the course. As you may have noticed, from week to week, the homework builds on each other. Over the course of the semester, you will have created a few applications using various techniques. The project will give you the opportunity to create something similar, but explore deeper into topics you are interested in. This is an individual project (unless you have a very compelling argument for some other arrangement).

It must use some kind of data, such as the data we used in class. Data can be acquired from public sources such as data.gov.
It will need to have a visualization element. That could be graphs or images.
It should be analytic in nature.
It should be different than what we have done in class.

Most of the specifics of design and implementation will be determined by you. Grading will be determined by how well it meets the requirements, how innovative the software is, and the quality of the source code.

Description

I will look at insurance premiums. It will involve calculations previously done with C/C++ in an undergraduate stochastics class, but translated to Python. Translating the calculations will help in better understanding Python and allow me to make comparisons on the calculation speed of both languages. The analysis will involve calculating the fair premiums (the premium that makes net present value of the cashflow equal to zero) on simulated life expectancies. It will also be a good refresher since it was done a few years back. The data being used are from the Social Security Administration Actuarial Life Tables and the visualizations/analysis will be of life expectancies and premiums.

Outline

The Social Security Actuarial Life Table gives the probability of death and expected number of years of life left for a person of age X given their gender. For example, if you look at age 18 on the current 2013 period life table, males have a 0.000735 probability of dying in the year of their life which they are 18, but are expected to continue living for 59.05 years from the point they turned 18.

The Python program is a .py file. The program imports the Social Security Actuarial Life Table from the Social Security website and then creates a probability distribution. This distribution is then be inverted. This inverted distribution is then used to simulate the expected lifespan for a given gender and date of birth. The simulation is be repeated n times until a remaining lifespan estimate within a specified margin of error is obtained. With the remaining lifespan estimate, a fair life insurance premium is calculated. This is the premium the annual premium that makes the expected net present value of the cashflow generated by the policy exactly zero for the given benefit amount and interest rate. Correlated lifespans are derived using a Guassian copula and the premiums are derived using common financial mathematics. Each scenario includes visualizations of lifespan and premium distributions.

Scenario 1

Five people, Bill (age 30), Jim (age 35), Mike (age 40), Jill (age 37), and Martha (age 42) enter into a contract with Risky Life Insurance Company whereby each of them pays a fixed premium (different from person to person) at the beginning of each year they survive to (starting immediately). At the time of the penultimate death, Risky Life pays $1 million to the sole survivor. For example, if Bill lives 30.3 more years, Jim 5.8, Mike 20.9, Jill 31.2, and Martha 30.1, Jill will get paid $1 million in 30.3 years. Jim would make 6 payments of his premium: at times 0, (now) 1, 2, 3, 4, and 5 years.

What is the fair annual premium for all of them? Use a 5% annual continuously compounding discount rate and assume their lifespans are independent.

Scenario 2

A couple, both age 30, purchase a policy from Risky Life whereby they pay an annual premium, starting immediately, at the beginning of each year they both survive to. Beginning at the time of the first death, an annual benefit of $50 thousand is paid to the survivor until the death of that survivor. For example, if one of them lives 40.2 more years and the other 50.6 years, there are 41 payments of the annual premium (begining at time 0 and ending at time 40) and 11 annual benefit payments (beginning at time 40.2 and ending at time 50.2).

What is a fair premium assuming the lives are independent? Use a 5% annual continuously compounding discount rate. Using a Gaussian copula, what is the fair premium if the Gaussian correlation $\rho$ is 0.25? How about 0.50? How about 0.75?

Scenario 3

Big Company, Inc., has 10 recent retirees, 6 male and 4 female, all at age 65. They are all to receive pension benefits of $30 thousand to be paid at the end of each year they survive through. Big Company wishes to fund this pension liability immediately by putting a sum of money into an account earning 5% per year compounded continuously. It wishes to put sufficient funds into the account so that with 90% probability the account and its earnings are sufficient to cover the pension payments.

How much should it put into the account assuming the remaining lifespans are independent?

Because the retirees have all participated in Big Company’s health plan for the last 20 years, an actuary determines that their remaining lifespans are not independent. The actuary models their remaining lifespans with a Gaussian copula where any two of the underlying normals have correlation $\rho$ = .25. Based on this model, how much funding is required?

Computations

import bs4, urllib2, pandas, datetime, random, math, numpy
import matplotlib.pyplot as plt
from matplotlib.pyplot import *

# SOCIAL SECURITY ACTUARIAL LIFE TABLE
url = "https://www.ssa.gov/oact/STATS/table4c6.html"
page = urllib2.urlopen(url)
soup = bs4.BeautifulSoup(page.read(), 'html.parser')
table  = soup.find("table")#.get_text()
Life_List, j = ([], 0)
for tr in table.findAll("tr")[4:]:
    row = []
    td = tr.findAll("td")
    for columns in range(0 , 7):
        string = td[columns].text.replace(',', '')
        value = pandas.to_numeric(string)
        row.append(value)
    Life_List.append(row)
names = ["Age", "P(M)", "n(M)", "E(M)", "P(F)", "n(F)", "E(F)"]
Life_Table = pandas.DataFrame(Life_List, columns=names)
# For computational convenience, no one lives to more than 120 years
Life_Table.set_value(119, "P(M)", 1.0)
Life_Table.set_value(119, "P(F)", 1.0)
plt.figure(1)
for i in (1, 3, 4, 6):
    plt.subplot(211 + i % 3)
    if i == 1 or i == 3:
        p = plt.plot(Life_Table[names[0]], Life_Table[names[i]], 'b-', label="Male")
    else:
        p = plt.plot(Life_Table[names[0]], Life_Table[names[i]], 'r-', label="Female")
    if i == 1 or i == 4:
        plt.ylabel('Probability of Death')
        legend()
    else:
        plt.ylabel('Life Expectancy')
        legend()
plt.savefig(".\DATA_602_Project_1.png")
# plt.show()
plt.clf()

# GET DATE VARIABLES NEEDED FOR CALCULATIONS AND OUTPUT TEXT
def date_vars(DOB, L=0):
    birthdate = pandas.to_datetime(DOB, format='%m/%d/%Y')
    today = datetime.date.today()
    birthday = datetime.date(today.year, birthdate.month, birthdate.day)
    birthday_last = datetime.date(today.year - 1, birthdate.month, birthdate.day)
    year_int = (today.year if birthday <= today else today.year - 1) - birthdate.year
    date_diff = (today - birthday) if birthday <= today else (today - birthday_last)
    year_length = birthday - birthday_last
    year_dec = float(date_diff.days) / float(year_length.days)
    death_year = datetime.date(today.year + int(L), 1, 1)
    death_year_length = datetime.date(today.year + int(L) + 1, 1, 1) - death_year
    death_year_days = (L - int(L)) * death_year_length.days
    death_date = death_year + datetime.timedelta(days=death_year_days)
    return today, year_int, year_dec, death_date

# GENERATE CORRELATED STANDARD NORMALS USING THE POLAR METHOD, THEN CONVERT TO CORRELATED UNIFORMS
def CorrelatedUniforms(n, rho):
    if rho == 0:
        X = numpy.random.uniform(0.0, 1.0, n)
    else:
        rho = numpy.sign(rho) if (abs(rho) > 1) else rho
        accepted, U = (0, numpy.zeros(n))
        while not (accepted):
            U = numpy.random.uniform(0.0, 1.0, n)
            V = [-1.0 + 2.0 * u for u in U]
            S = sum([v * v for v in V])
            accepted = (S < 1)
        W = math.sqrt(-2 * math.log(S) / S)
        X = [v * W for v in V]
        for i in range(2, n):
            X[i] = rho * X[1] + math.sqrt(1.0 - rho**2) * X[i]
        A = (8 * (math.pi - 3)) / (3 * math.pi * (4 - math.pi))
        B = 4 / math.pi
        Y = [x * x * 0.5 for x in X]
        erf = [math.sqrt(1.0 - math.exp(-y * (B + A*y) / (1.0 + A*y))) for y in Y]
        X = [(0.5 + 0.5 * e) if (X >= 0) else (0.5 - 0.5 * e) for e in erf]
    return X

# CREATE CUMULATIVE PROBABILITY DISTRIBUTION 0<=F(120-a)<=1 WHERE a IS EXACT AGE
def InvertF(Gender, DOB):
    today, year_int, year_dec, not_used = date_vars(DOB)
    t, F = ([0], [0]) # Cumulative probability of death at time 0.
    t.append(1 - year_dec) # First period is for a partial year of length t[1]
    F.append(t[1] * Life_Table["P(" + Gender + ")"][year_int])
    # Now use that P[T <= t+1] = P[T <= t] + P[T <= t+1 | T > t] * P[T > t].
    for k in range(year_int + 1, 120):
        t.append(t[k - year_int] + 1.0)
        F.append(F[k - year_int] + Life_Table["P(F)"][k] * (1.0 - F[k - year_int]))
    du, H, k = (1.0 / 1000, [0], 0) # Invert the function starting with u = 0.
    for i in range(1, 1001):
        u = i * du
        while not (F[k] < u and u <= F[k + 1]):
            k += 1 # Locate the inversion interval.
        w = (u - F[k]) / (F[k + 1] - F[k]) # Invert "u" (interpolate between t[k] and t[k+1]).
        H.append((1.0 - w) * t[k] + w * t[k + 1])
    return H

# INVERSE TRANSFORM MAPS 0<=U<=1 TO 0<=H(U)<=(120-a) WHERE a IS EXACT AGE
def HofU(H, U):
    du = 1.0 / 1000
    i = int(1000 * U) # Determine the appropriate interpolation interval so that (i*du)<=U <=((i+1)*du)
    w = (U - i * du) / du # Determine the appropriate interpolation weights
    L = (1.0 - w) * H[i]  +  w * H[i + 1] # Interpolate between H[i] and H[i+1].
    return L

# MONTE CARLO SIMULATION OF REMAINING LIFESPAN
def Simulate(Insured, Policy, rho):
    persons = len(Insured)
    # Gender = [sublist[0:1][0] for sublist in Insured]
    # DOB = [sublist[1:2][0] for sublist in Insured]
    H = [InvertF(sublist[0:1][0], sublist[1:2][0])for sublist in Insured]
    limit, r = (Policy[0], Policy[2])
    Term = 120 if (Policy[1] == None) else Policy[1]
    Lifespan_Data, Lbar, L2bar, epsilon_1, done, n = ([], 0, 0, 0.01, 0, 0)
    Premium_Data, pihat, Pbar, P2bar, Bbar, B2bar, BPbar, benefit = ([], 0, 0, 0, 0, 0, 0, 0)
    start = datetime.datetime.now()
    while not done:
        U = CorrelatedUniforms(persons, rho)
        LS_Estimates, LS_Antithetic, L, P, B = ([], [], 0, 0, 0)
        Lifespans, Premiums, Benefits = ([], [], [])
        for i in range(0, persons):
            LS_Estimates.append(HofU(H[i], U[i]))
            LS_Antithetic.append(HofU(H[i], 1.0 - U[i]))
            P1, P2 = (0, 0)
            for t in range(0, int(LS_Estimates[i]) if LS_Estimates[i] < Term else Term - 1):
                P1 += math.exp(-r * t) # PV of premium payments (at $1/year) received
            for t in range(0, int(LS_Antithetic[i]) if LS_Antithetic[i] < Term else Term - 1):
                P2 += math.exp(-r * t) # PV of premium payments (at $1/year) received
            benefit = limit if (Policy[3] == "Lump") else (limit * int(LS_Estimates[i]))
            B1 = benefit * math.exp(-r * LS_Estimates[i]) # PV of benefit payments
            benefit = limit if (Policy[3] == "Lump") else (limit * int(LS_Antithetic[i]))
            B2 = benefit * math.exp(-r * LS_Antithetic[i]) # PV of benefit payments
            Lifespans.append((LS_Estimates[i] + LS_Antithetic[i]) / 2.0) # Average the two realizations
            Premiums.append((P1 + P2) / 2.0) # Average the two realizations
            Benefits.append((B1 + B2) / 2.0) # Average the two realizations
        if Policy[3] == "Lump":
            L = sorted(Lifespans)[-2] # penultimate life for end of premium payments and lump benefit payment
            P = sum([sorted(Premiums)[-2] if p == max(Premiums) else p for p in Premiums])
            B = sorted(Benefits)[-2]
        if Policy[3] == "Annuity":
            L = max(Lifespans) # penultimate life for end of premium payments, max life for annuity begin/end
            P = sum([sorted(Premiums)[-2] if p == max(Premiums) else p for p in Premiums])
            B = Benefits[numpy.argmax(numpy.max(Lifespans, axis=0))]
        if Policy[3] == "Pension":
            L = max(Lifespans) # max life for end of both all premium payments and benefit payments
            P = Premiums[numpy.argmax(numpy.max(Lifespans, axis=0))]
            B = sum(Benefits)
        n += 1 # Update sample moments
        Lbar = ((n - 1) * Lbar + L) / n
        L2bar = ((n - 1) * L2bar + L * L) / n
        Pbar  = ((n-1) * Pbar + P) / n
        P2bar = ((n-1) * P2bar + P*P) / n
        Bbar  = ((n-1) * Bbar + B) / n
        B2bar = ((n-1) * B2bar + B*B) / n
        BPbar = ((n-1) * BPbar + B*P) / n
        Lifespan_Data.append(Lbar)
        Premium_Data.append(Bbar / Pbar)
        if (n % 100000 == 0): # Test if error is acceptable, and report results to this point
            pihat = Bbar / Pbar # estimator of the "fair" premium
            epsilon_2 = pihat * 0.001
            varB = B2bar - Bbar*Bbar
            varP = P2bar - Pbar*Pbar
            covBP = BPbar - Bbar*Pbar
            s_pihat = pihat * math.sqrt((varB/(Bbar*Bbar) - 2 * covBP/(Bbar*Pbar) + varP/(Pbar*Pbar))/n)
            s_Lbar = math.sqrt((L2bar - Lbar ** 2) / n)
            t = datetime.datetime.now() - start # Computing elapsed time and estimated time of completion
            t_star_1 = t.total_seconds() * (1.96 * s_Lbar / epsilon_1)**2
            t_star_2 = t.total_seconds() * (1.96 * s_pihat / epsilon_2)**2
            if n == 100000:
                print "\nFAIR ANNUAL PREMIUM FOR YEARS REMAINING BASED ON  ACTUARIAL LIFE EXPECTANCIES\n" \
                    '%-12s%-11s%-10s%-12s%-10s%-13s%-15s' % ("SIMULATIONS", "YEARS", \
                    "+/-", "PREMIUM", "+/-", "ELAPSED TIME", "ESTIMATED TIME")
            print '%-12i%-11f%-10f%-12.2f%-10.2f%-13f%-15f' % \
                  (n, Lbar, 1.96 * s_Lbar, pihat, 1.96 * s_pihat, t.total_seconds(), max(t_star_1, t_star_2))
            if (1.96 * s_Lbar <= epsilon_1 and 1.96 * s_pihat <= epsilon_2):
                done = 1  # When error tolerances are met
    return Lbar, Lifespan_Data, pihat, Premium_Data

# VISUALIZATIONS
def Plot_Simulation(Lbar, Lifespan_Data, pihat, Premium_Data, save):
    plt.title('Monte Carlo Simulation Results')
    plt.xlabel('Years of Life Remaining')
    plt.ylabel('Fair Annual Premium')
    plt.plot(Lifespan_Data, Premium_Data, 'b.')
    plt.axhline(y=pihat, xmin=0, xmax=Lbar, color='k', linestyle='-')
    plt.axvline(x=Lbar, ymin=0, ymax=pihat, color='k', linestyle='-')
    plt.plot(Lbar, pihat, 'ro')
    plt.savefig(".\DATA_602_Project_" + str(save) + ".png")
    # plt.show()
    plt.clf()

# SCENARIO 1 (n premium payment sources, 1 benefit payment)
Person1 = ["M", "01/01/1987"] # Bill
Person2 = ["M", "04/02/1982"] # Jim
Person3 = ["M", "07/02/1977"] # Mike
Person4 = ["F", "10/01/1980"] # Jill
Person5 = ["F", "12/31/1975"] # Martha
Insured = [Person1, Person2, Person3, Person4, Person5]
Policy = [1000000, None, 0.05, "Lump"]
Correlation = [0.00]

print "\n", "=" * 40, "SCENARIO 1", "=" * 40
for rho in Correlation:
    w, x, y, z = Simulate(Insured, Policy, rho)
    Plot_Simulation(w, x, y, z, Correlation.index(rho) + 2)

# SCENARIO 2 (1 premium payment source, n benefit payments)
Person1 = ["M", "01/01/1987"]
Person2 = ["F", "06/01/1987"]
Insured = [Person1, Person2]
Policy = [50000, None, 0.05, "Annuity"]
Correlation = [0.00, 0.25, 0.5, 0.75]

print "\n", "=" * 40, "SCENARIO 2", "=" * 40
for rho in Correlation:
    w, x, y, z = Simulate(Insured, Policy, rho)
    Plot_Simulation(w, x, y, z, Correlation.index(rho) + 3)

# SCENARIO 3 (n premium payment sources, n benefit payments)
Person1 = ["M", "01/01/1952"]
Person2 = ["M", "03/14/1952"]
Person3 = ["M", "05/26/1952"]
Person4 = ["M", "08/07/1952"]
Person5 = ["M", "10/19/1952"]
Person6 = ["M", "12/31/1952"]
Person7 = ["F", "01/01/1952"]
Person8 = ["F", "05/02/1952"]
Person9 = ["F", "08/31/1952"]
Person10 = ["F", "12/31/1952"]
Insured = [Person1, Person2, Person3, Person4, Person5, Person6, Person7, Person8, Person9, Person10]
Policy = [30000, None, 0.05, "Pension"]
Correlation = [0.00, 0.25]

print "\n", "=" * 40, "SCENARIO 3", "=" * 40
for rho in Correlation:
    w, x, y, z = Simulate(Insured, Policy, rho)
    Plot_Simulation(w, x, y, z, Correlation.index(rho) + 7)

## 
## ======================================== SCENARIO 1 ========================================
## 
## FAIR ANNUAL PREMIUM FOR YEARS REMAINING BASED ON  ACTUARIAL LIFE EXPECTANCIES
## SIMULATIONS YEARS      +/-       PREMIUM     +/-       ELAPSED TIME ESTIMATED TIME 
## 100000      47.830834  0.009865  1648.74     3.46      34.163000    150.316005     
## 200000      47.828671  0.006977  1648.62     2.43      67.983000    147.485560     
## 300000      47.829729  0.005694  1648.49     1.98      101.356000   146.349658     
## 400000      47.830530  0.004933  1648.94     1.72      134.644000   146.279512     
## 500000      47.832187  0.004407  1649.23     1.54      167.955000   145.860186     
## 
## ======================================== SCENARIO 2 ========================================
## 
## FAIR ANNUAL PREMIUM FOR YEARS REMAINING BASED ON  ACTUARIAL LIFE EXPECTANCIES
## SIMULATIONS YEARS      +/-       PREMIUM     +/-       ELAPSED TIME ESTIMATED TIME 
## 100000      53.482531  0.009986  5397.34     4.66      19.165000    19.111011      
## 
## FAIR ANNUAL PREMIUM FOR YEARS REMAINING BASED ON  ACTUARIAL LIFE EXPECTANCIES
## SIMULATIONS YEARS      +/-       PREMIUM     +/-       ELAPSED TIME ESTIMATED TIME 
## 100000      53.491450  0.009900  5397.99     4.67      21.625000    21.193019      
## 
## FAIR ANNUAL PREMIUM FOR YEARS REMAINING BASED ON  ACTUARIAL LIFE EXPECTANCIES
## SIMULATIONS YEARS      +/-       PREMIUM     +/-       ELAPSED TIME ESTIMATED TIME 
## 100000      53.489898  0.009932  5400.86     4.67      23.246000    22.930260      
## 
## FAIR ANNUAL PREMIUM FOR YEARS REMAINING BASED ON  ACTUARIAL LIFE EXPECTANCIES
## SIMULATIONS YEARS      +/-       PREMIUM     +/-       ELAPSED TIME ESTIMATED TIME 
## 100000      53.485239  0.010029  5399.57     4.67      21.788000    21.914962      
## 200000      53.484151  0.007104  5399.99     3.32      42.814000    21.606459      
## 
## ======================================== SCENARIO 3 ========================================
## 
## FAIR ANNUAL PREMIUM FOR YEARS REMAINING BASED ON  ACTUARIAL LIFE EXPECTANCIES
## SIMULATIONS YEARS      +/-       PREMIUM     +/-       ELAPSED TIME ESTIMATED TIME 
## 100000      21.507399  0.003263  158752.48   98.28     44.730000    17.142009      
## 
## FAIR ANNUAL PREMIUM FOR YEARS REMAINING BASED ON  ACTUARIAL LIFE EXPECTANCIES
## SIMULATIONS YEARS      +/-       PREMIUM     +/-       ELAPSED TIME ESTIMATED TIME 
## 100000      21.656014  0.000282  164740.68   9.87      1010.883000  3.629664

Actuarial Life Experience	Scenario 1, $\rho=0.00$	Scenario 2, $\rho=0.00$	Scenario 2, $\rho=0.25$

Scenario 2, $\rho=0.50$	Scenario 2, $\rho=0.75$	Scenario 3, $\rho=0.00$	Scenario 2, $\rho=0.25$

Observations

Python is much slower than C/C++. Yet it is easier to work with than C/C++ and has more help available online. Speed could be improved by using Python alone without PyCharm which would reduce overhead on system resources. Speed could also be improved with more specific functions, as the above were generalized to allow for execution of all three scenarios.