Part of this semester’s coursework includes a project that incorporates the concepts from the course. As you may have noticed, from week to week, the homework builds on each other. Over the course of the semester, you will have created a few applications using various techniques. The project will give you the opportunity to create something similar, but explore deeper into topics you are interested in. This is an individual project (unless you have a very compelling argument for some other arrangement).
Most of the specifics of design and implementation will be determined by you. Grading will be determined by how well it meets the requirements, how innovative the software is, and the quality of the source code.
I will look at insurance premiums. It will involve calculations previously done with C/C++
in an undergraduate stochastics class, but translated to Python
. Translating the calculations will help in better understanding Python
and allow me to make comparisons on the calculation speed of both languages. The analysis will involve calculating the fair premiums (the premium that makes net present value of the cashflow equal to zero) on simulated life expectancies. It will also be a good refresher since it was done a few years back. The data being used are from the Social Security Administration Actuarial Life Tables and the visualizations/analysis will be of life expectancies and premiums.
The Social Security Actuarial Life Table gives the probability of death and expected number of years of life left for a person of age X given their gender. For example, if you look at age 18 on the current 2013 period life table, males have a 0.000735 probability of dying in the year of their life which they are 18, but are expected to continue living for 59.05 years from the point they turned 18.
The Python
program is a .py
file. The program imports the Social Security Actuarial Life Table from the Social Security website and then creates a probability distribution. This distribution is then be inverted. This inverted distribution is then used to simulate the expected lifespan for a given gender and date of birth. The simulation is be repeated n
times until a remaining lifespan estimate within a specified margin of error is obtained. With the remaining lifespan estimate, a fair life insurance premium is calculated. This is the premium the annual premium that makes the expected net present value of the cashflow generated by the policy exactly zero for the given benefit amount and interest rate. Correlated lifespans are derived using a Guassian copula and the premiums are derived using common financial mathematics. Each scenario includes visualizations of lifespan and premium distributions.
Five people, Bill (age 30), Jim (age 35), Mike (age 40), Jill (age 37), and Martha (age 42) enter into a contract with Risky Life Insurance Company whereby each of them pays a fixed premium (different from person to person) at the beginning of each year they survive to (starting immediately). At the time of the penultimate death, Risky Life pays $1 million to the sole survivor. For example, if Bill lives 30.3 more years, Jim 5.8, Mike 20.9, Jill 31.2, and Martha 30.1, Jill will get paid $1 million in 30.3 years. Jim would make 6 payments of his premium: at times 0, (now) 1, 2, 3, 4, and 5 years.
What is the fair annual premium for all of them? Use a 5% annual continuously compounding discount rate and assume their lifespans are independent.
A couple, both age 30, purchase a policy from Risky Life whereby they pay an annual premium, starting immediately, at the beginning of each year they both survive to. Beginning at the time of the first death, an annual benefit of $50 thousand is paid to the survivor until the death of that survivor. For example, if one of them lives 40.2 more years and the other 50.6 years, there are 41 payments of the annual premium (begining at time 0 and ending at time 40) and 11 annual benefit payments (beginning at time 40.2 and ending at time 50.2).
What is a fair premium assuming the lives are independent? Use a 5% annual continuously compounding discount rate. Using a Gaussian copula, what is the fair premium if the Gaussian correlation \(\rho\) is 0.25? How about 0.50? How about 0.75?
Big Company, Inc., has 10 recent retirees, 6 male and 4 female, all at age 65. They are all to receive pension benefits of $30 thousand to be paid at the end of each year they survive through. Big Company wishes to fund this pension liability immediately by putting a sum of money into an account earning 5% per year compounded continuously. It wishes to put sufficient funds into the account so that with 90% probability the account and its earnings are sufficient to cover the pension payments.
How much should it put into the account assuming the remaining lifespans are independent?
Because the retirees have all participated in Big Company’s health plan for the last 20 years, an actuary determines that their remaining lifespans are not independent. The actuary models their remaining lifespans with a Gaussian copula where any two of the underlying normals have correlation \(\rho\) = .25. Based on this model, how much funding is required?
import bs4, urllib2, pandas, datetime, random, math, numpy
import matplotlib.pyplot as plt
from matplotlib.pyplot import *
# SOCIAL SECURITY ACTUARIAL LIFE TABLE
url = "https://www.ssa.gov/oact/STATS/table4c6.html"
page = urllib2.urlopen(url)
soup = bs4.BeautifulSoup(page.read(), 'html.parser')
table = soup.find("table")#.get_text()
Life_List, j = ([], 0)
for tr in table.findAll("tr")[4:]:
row = []
td = tr.findAll("td")
for columns in range(0 , 7):
string = td[columns].text.replace(',', '')
value = pandas.to_numeric(string)
row.append(value)
Life_List.append(row)
names = ["Age", "P(M)", "n(M)", "E(M)", "P(F)", "n(F)", "E(F)"]
Life_Table = pandas.DataFrame(Life_List, columns=names)
# For computational convenience, no one lives to more than 120 years
Life_Table.set_value(119, "P(M)", 1.0)
Life_Table.set_value(119, "P(F)", 1.0)
plt.figure(1)
for i in (1, 3, 4, 6):
plt.subplot(211 + i % 3)
if i == 1 or i == 3:
p = plt.plot(Life_Table[names[0]], Life_Table[names[i]], 'b-', label="Male")
else:
p = plt.plot(Life_Table[names[0]], Life_Table[names[i]], 'r-', label="Female")
if i == 1 or i == 4:
plt.ylabel('Probability of Death')
legend()
else:
plt.ylabel('Life Expectancy')
legend()
plt.savefig(".\DATA_602_Project_1.png")
# plt.show()
plt.clf()
# GET DATE VARIABLES NEEDED FOR CALCULATIONS AND OUTPUT TEXT
def date_vars(DOB, L=0):
birthdate = pandas.to_datetime(DOB, format='%m/%d/%Y')
today = datetime.date.today()
birthday = datetime.date(today.year, birthdate.month, birthdate.day)
birthday_last = datetime.date(today.year - 1, birthdate.month, birthdate.day)
year_int = (today.year if birthday <= today else today.year - 1) - birthdate.year
date_diff = (today - birthday) if birthday <= today else (today - birthday_last)
year_length = birthday - birthday_last
year_dec = float(date_diff.days) / float(year_length.days)
death_year = datetime.date(today.year + int(L), 1, 1)
death_year_length = datetime.date(today.year + int(L) + 1, 1, 1) - death_year
death_year_days = (L - int(L)) * death_year_length.days
death_date = death_year + datetime.timedelta(days=death_year_days)
return today, year_int, year_dec, death_date
# GENERATE CORRELATED STANDARD NORMALS USING THE POLAR METHOD, THEN CONVERT TO CORRELATED UNIFORMS
def CorrelatedUniforms(n, rho):
if rho == 0:
X = numpy.random.uniform(0.0, 1.0, n)
else:
rho = numpy.sign(rho) if (abs(rho) > 1) else rho
accepted, U = (0, numpy.zeros(n))
while not (accepted):
U = numpy.random.uniform(0.0, 1.0, n)
V = [-1.0 + 2.0 * u for u in U]
S = sum([v * v for v in V])
accepted = (S < 1)
W = math.sqrt(-2 * math.log(S) / S)
X = [v * W for v in V]
for i in range(2, n):
X[i] = rho * X[1] + math.sqrt(1.0 - rho**2) * X[i]
A = (8 * (math.pi - 3)) / (3 * math.pi * (4 - math.pi))
B = 4 / math.pi
Y = [x * x * 0.5 for x in X]
erf = [math.sqrt(1.0 - math.exp(-y * (B + A*y) / (1.0 + A*y))) for y in Y]
X = [(0.5 + 0.5 * e) if (X >= 0) else (0.5 - 0.5 * e) for e in erf]
return X
# CREATE CUMULATIVE PROBABILITY DISTRIBUTION 0<=F(120-a)<=1 WHERE a IS EXACT AGE
def InvertF(Gender, DOB):
today, year_int, year_dec, not_used = date_vars(DOB)
t, F = ([0], [0]) # Cumulative probability of death at time 0.
t.append(1 - year_dec) # First period is for a partial year of length t[1]
F.append(t[1] * Life_Table["P(" + Gender + ")"][year_int])
# Now use that P[T <= t+1] = P[T <= t] + P[T <= t+1 | T > t] * P[T > t].
for k in range(year_int + 1, 120):
t.append(t[k - year_int] + 1.0)
F.append(F[k - year_int] + Life_Table["P(F)"][k] * (1.0 - F[k - year_int]))
du, H, k = (1.0 / 1000, [0], 0) # Invert the function starting with u = 0.
for i in range(1, 1001):
u = i * du
while not (F[k] < u and u <= F[k + 1]):
k += 1 # Locate the inversion interval.
w = (u - F[k]) / (F[k + 1] - F[k]) # Invert "u" (interpolate between t[k] and t[k+1]).
H.append((1.0 - w) * t[k] + w * t[k + 1])
return H
# INVERSE TRANSFORM MAPS 0<=U<=1 TO 0<=H(U)<=(120-a) WHERE a IS EXACT AGE
def HofU(H, U):
du = 1.0 / 1000
i = int(1000 * U) # Determine the appropriate interpolation interval so that (i*du)<=U <=((i+1)*du)
w = (U - i * du) / du # Determine the appropriate interpolation weights
L = (1.0 - w) * H[i] + w * H[i + 1] # Interpolate between H[i] and H[i+1].
return L
# MONTE CARLO SIMULATION OF REMAINING LIFESPAN
def Simulate(Insured, Policy, rho):
persons = len(Insured)
# Gender = [sublist[0:1][0] for sublist in Insured]
# DOB = [sublist[1:2][0] for sublist in Insured]
H = [InvertF(sublist[0:1][0], sublist[1:2][0])for sublist in Insured]
limit, r = (Policy[0], Policy[2])
Term = 120 if (Policy[1] == None) else Policy[1]
Lifespan_Data, Lbar, L2bar, epsilon_1, done, n = ([], 0, 0, 0.01, 0, 0)
Premium_Data, pihat, Pbar, P2bar, Bbar, B2bar, BPbar, benefit = ([], 0, 0, 0, 0, 0, 0, 0)
start = datetime.datetime.now()
while not done:
U = CorrelatedUniforms(persons, rho)
LS_Estimates, LS_Antithetic, L, P, B = ([], [], 0, 0, 0)
Lifespans, Premiums, Benefits = ([], [], [])
for i in range(0, persons):
LS_Estimates.append(HofU(H[i], U[i]))
LS_Antithetic.append(HofU(H[i], 1.0 - U[i]))
P1, P2 = (0, 0)
for t in range(0, int(LS_Estimates[i]) if LS_Estimates[i] < Term else Term - 1):
P1 += math.exp(-r * t) # PV of premium payments (at $1/year) received
for t in range(0, int(LS_Antithetic[i]) if LS_Antithetic[i] < Term else Term - 1):
P2 += math.exp(-r * t) # PV of premium payments (at $1/year) received
benefit = limit if (Policy[3] == "Lump") else (limit * int(LS_Estimates[i]))
B1 = benefit * math.exp(-r * LS_Estimates[i]) # PV of benefit payments
benefit = limit if (Policy[3] == "Lump") else (limit * int(LS_Antithetic[i]))
B2 = benefit * math.exp(-r * LS_Antithetic[i]) # PV of benefit payments
Lifespans.append((LS_Estimates[i] + LS_Antithetic[i]) / 2.0) # Average the two realizations
Premiums.append((P1 + P2) / 2.0) # Average the two realizations
Benefits.append((B1 + B2) / 2.0) # Average the two realizations
if Policy[3] == "Lump":
L = sorted(Lifespans)[-2] # penultimate life for end of premium payments and lump benefit payment
P = sum([sorted(Premiums)[-2] if p == max(Premiums) else p for p in Premiums])
B = sorted(Benefits)[-2]
if Policy[3] == "Annuity":
L = max(Lifespans) # penultimate life for end of premium payments, max life for annuity begin/end
P = sum([sorted(Premiums)[-2] if p == max(Premiums) else p for p in Premiums])
B = Benefits[numpy.argmax(numpy.max(Lifespans, axis=0))]
if Policy[3] == "Pension":
L = max(Lifespans) # max life for end of both all premium payments and benefit payments
P = Premiums[numpy.argmax(numpy.max(Lifespans, axis=0))]
B = sum(Benefits)
n += 1 # Update sample moments
Lbar = ((n - 1) * Lbar + L) / n
L2bar = ((n - 1) * L2bar + L * L) / n
Pbar = ((n-1) * Pbar + P) / n
P2bar = ((n-1) * P2bar + P*P) / n
Bbar = ((n-1) * Bbar + B) / n
B2bar = ((n-1) * B2bar + B*B) / n
BPbar = ((n-1) * BPbar + B*P) / n
Lifespan_Data.append(Lbar)
Premium_Data.append(Bbar / Pbar)
if (n % 100000 == 0): # Test if error is acceptable, and report results to this point
pihat = Bbar / Pbar # estimator of the "fair" premium
epsilon_2 = pihat * 0.001
varB = B2bar - Bbar*Bbar
varP = P2bar - Pbar*Pbar
covBP = BPbar - Bbar*Pbar
s_pihat = pihat * math.sqrt((varB/(Bbar*Bbar) - 2 * covBP/(Bbar*Pbar) + varP/(Pbar*Pbar))/n)
s_Lbar = math.sqrt((L2bar - Lbar ** 2) / n)
t = datetime.datetime.now() - start # Computing elapsed time and estimated time of completion
t_star_1 = t.total_seconds() * (1.96 * s_Lbar / epsilon_1)**2
t_star_2 = t.total_seconds() * (1.96 * s_pihat / epsilon_2)**2
if n == 100000:
print "\nFAIR ANNUAL PREMIUM FOR YEARS REMAINING BASED ON ACTUARIAL LIFE EXPECTANCIES\n" \
'%-12s%-11s%-10s%-12s%-10s%-13s%-15s' % ("SIMULATIONS", "YEARS", \
"+/-", "PREMIUM", "+/-", "ELAPSED TIME", "ESTIMATED TIME")
print '%-12i%-11f%-10f%-12.2f%-10.2f%-13f%-15f' % \
(n, Lbar, 1.96 * s_Lbar, pihat, 1.96 * s_pihat, t.total_seconds(), max(t_star_1, t_star_2))
if (1.96 * s_Lbar <= epsilon_1 and 1.96 * s_pihat <= epsilon_2):
done = 1 # When error tolerances are met
return Lbar, Lifespan_Data, pihat, Premium_Data
# VISUALIZATIONS
def Plot_Simulation(Lbar, Lifespan_Data, pihat, Premium_Data, save):
plt.title('Monte Carlo Simulation Results')
plt.xlabel('Years of Life Remaining')
plt.ylabel('Fair Annual Premium')
plt.plot(Lifespan_Data, Premium_Data, 'b.')
plt.axhline(y=pihat, xmin=0, xmax=Lbar, color='k', linestyle='-')
plt.axvline(x=Lbar, ymin=0, ymax=pihat, color='k', linestyle='-')
plt.plot(Lbar, pihat, 'ro')
plt.savefig(".\DATA_602_Project_" + str(save) + ".png")
# plt.show()
plt.clf()
# SCENARIO 1 (n premium payment sources, 1 benefit payment)
Person1 = ["M", "01/01/1987"] # Bill
Person2 = ["M", "04/02/1982"] # Jim
Person3 = ["M", "07/02/1977"] # Mike
Person4 = ["F", "10/01/1980"] # Jill
Person5 = ["F", "12/31/1975"] # Martha
Insured = [Person1, Person2, Person3, Person4, Person5]
Policy = [1000000, None, 0.05, "Lump"]
Correlation = [0.00]
print "\n", "=" * 40, "SCENARIO 1", "=" * 40
for rho in Correlation:
w, x, y, z = Simulate(Insured, Policy, rho)
Plot_Simulation(w, x, y, z, Correlation.index(rho) + 2)
# SCENARIO 2 (1 premium payment source, n benefit payments)
Person1 = ["M", "01/01/1987"]
Person2 = ["F", "06/01/1987"]
Insured = [Person1, Person2]
Policy = [50000, None, 0.05, "Annuity"]
Correlation = [0.00, 0.25, 0.5, 0.75]
print "\n", "=" * 40, "SCENARIO 2", "=" * 40
for rho in Correlation:
w, x, y, z = Simulate(Insured, Policy, rho)
Plot_Simulation(w, x, y, z, Correlation.index(rho) + 3)
# SCENARIO 3 (n premium payment sources, n benefit payments)
Person1 = ["M", "01/01/1952"]
Person2 = ["M", "03/14/1952"]
Person3 = ["M", "05/26/1952"]
Person4 = ["M", "08/07/1952"]
Person5 = ["M", "10/19/1952"]
Person6 = ["M", "12/31/1952"]
Person7 = ["F", "01/01/1952"]
Person8 = ["F", "05/02/1952"]
Person9 = ["F", "08/31/1952"]
Person10 = ["F", "12/31/1952"]
Insured = [Person1, Person2, Person3, Person4, Person5, Person6, Person7, Person8, Person9, Person10]
Policy = [30000, None, 0.05, "Pension"]
Correlation = [0.00, 0.25]
print "\n", "=" * 40, "SCENARIO 3", "=" * 40
for rho in Correlation:
w, x, y, z = Simulate(Insured, Policy, rho)
Plot_Simulation(w, x, y, z, Correlation.index(rho) + 7)
##
## ======================================== SCENARIO 1 ========================================
##
## FAIR ANNUAL PREMIUM FOR YEARS REMAINING BASED ON ACTUARIAL LIFE EXPECTANCIES
## SIMULATIONS YEARS +/- PREMIUM +/- ELAPSED TIME ESTIMATED TIME
## 100000 47.830834 0.009865 1648.74 3.46 34.163000 150.316005
## 200000 47.828671 0.006977 1648.62 2.43 67.983000 147.485560
## 300000 47.829729 0.005694 1648.49 1.98 101.356000 146.349658
## 400000 47.830530 0.004933 1648.94 1.72 134.644000 146.279512
## 500000 47.832187 0.004407 1649.23 1.54 167.955000 145.860186
##
## ======================================== SCENARIO 2 ========================================
##
## FAIR ANNUAL PREMIUM FOR YEARS REMAINING BASED ON ACTUARIAL LIFE EXPECTANCIES
## SIMULATIONS YEARS +/- PREMIUM +/- ELAPSED TIME ESTIMATED TIME
## 100000 53.482531 0.009986 5397.34 4.66 19.165000 19.111011
##
## FAIR ANNUAL PREMIUM FOR YEARS REMAINING BASED ON ACTUARIAL LIFE EXPECTANCIES
## SIMULATIONS YEARS +/- PREMIUM +/- ELAPSED TIME ESTIMATED TIME
## 100000 53.491450 0.009900 5397.99 4.67 21.625000 21.193019
##
## FAIR ANNUAL PREMIUM FOR YEARS REMAINING BASED ON ACTUARIAL LIFE EXPECTANCIES
## SIMULATIONS YEARS +/- PREMIUM +/- ELAPSED TIME ESTIMATED TIME
## 100000 53.489898 0.009932 5400.86 4.67 23.246000 22.930260
##
## FAIR ANNUAL PREMIUM FOR YEARS REMAINING BASED ON ACTUARIAL LIFE EXPECTANCIES
## SIMULATIONS YEARS +/- PREMIUM +/- ELAPSED TIME ESTIMATED TIME
## 100000 53.485239 0.010029 5399.57 4.67 21.788000 21.914962
## 200000 53.484151 0.007104 5399.99 3.32 42.814000 21.606459
##
## ======================================== SCENARIO 3 ========================================
##
## FAIR ANNUAL PREMIUM FOR YEARS REMAINING BASED ON ACTUARIAL LIFE EXPECTANCIES
## SIMULATIONS YEARS +/- PREMIUM +/- ELAPSED TIME ESTIMATED TIME
## 100000 21.507399 0.003263 158752.48 98.28 44.730000 17.142009
##
## FAIR ANNUAL PREMIUM FOR YEARS REMAINING BASED ON ACTUARIAL LIFE EXPECTANCIES
## SIMULATIONS YEARS +/- PREMIUM +/- ELAPSED TIME ESTIMATED TIME
## 100000 21.656014 0.000282 164740.68 9.87 1010.883000 3.629664
Actuarial Life Experience | Scenario 1, \(\rho=0.00\) | Scenario 2, \(\rho=0.00\) | Scenario 2, \(\rho=0.25\) |
---|---|---|---|
Scenario 2, \(\rho=0.50\) | Scenario 2, \(\rho=0.75\) | Scenario 3, \(\rho=0.00\) | Scenario 2, \(\rho=0.25\) |
---|---|---|---|
Python
is much slower than C/C++
. Yet it is easier to work with than C/C++
and has more help available online. Speed could be improved by using Python
alone without PyCharm
which would reduce overhead on system resources. Speed could also be improved with more specific functions, as the above were generalized to allow for execution of all three scenarios.
Prof. Howard, Baruch College CUNY, MATH 4135 (Computational Methods in Probability)
http://www.acooke.org/random.pdf
https://www.tutorialspoint.com/compile_c_online.php
https://www.accountingcoach.com/blog/net-present-value
https://docs.scipy.org/doc/numpy/reference/generated/numpy.empty.html
https://docs.scipy.org/doc/numpy/reference/generated/numpy.zeros.html
http://srome.github.io/Parsing-HTML-Tables-in-Python-with-BeautifulSoup-and-pandas/
http://stackoverflow.com/questions/31771619/html-table-to-pandas-table-info-inside-html-tags