Title: “Analyze Death Age Difference of Right Handers with Left Handers”
Author: Vandana Tanwar
Date: 29/02/2024


This report has used two datasets: death distribution data for the United States from the year 1999 (source website here)** and rates of left-handedness digitized from a figure in this 1992 paper by Gilbert and Wysocki.

The technical skill used in this project is R and Python programming language. The IDE used here is R Markdown. Python libraries such as pandas, NumPy and matplotlib.pyplot will be used to analyze the dataset to solve the given tasks.

import pandas as pd
import matplotlib.pyplot as plt

# **TASK_01**
# **Load handedness data from the National Geographic survey and create a scatter plot.**

# ... CODE FOR TASK 1 ...
data_url_1 = "https://gist.githubusercontent.com/mbonsma/8da0990b71ba9a09f7de395574e54df1/raw/aec88b30af87fad8d45da7e774223f91dad09e88/lh_data.csv"
lefthanded_data = pd.read_csv(data_url_1)

fig, ax = plt.subplots()
ax.plot('Age', 'Female', data = lefthanded_data, marker = 'o')
ax.plot('Age', 'Male', data = lefthanded_data, marker = 'x')
ax.legend()
ax.set_xlabel('<- Age ->')
ax.set_ylabel('<- Left-handled rate ->')
plt.title('Left-handled rate by Age')
plt.show()

# **TASK_02**
# **Add two new columns,then plot the mean as a function of birth year.**

# ... CODE FOR TASK 2 ...
lefthanded_data['Birth_year'] = 1986 - lefthanded_data['Age']
lefthanded_data['Mean_lh'] = lefthanded_data[['Male', 'Female']].mean(axis=1)

fig, ax = plt.subplots()
ax.plot('Birth_year','Mean_lh', data=lefthanded_data)
ax.set_xlim(1890,1990)
## (1890.0, 1990.0)
ax.set_ylim(4,15)
## (4.0, 15.0)
ax.legend()
ax.set_xlabel('<-Birth_year ->')
ax.set_ylabel('<- Mean_lh ->')
plt.title('Left-handness by birth year')
plt.show()

# **TASK_03**
# **Create a function that will return P(LH | A)**

# ... CODE FOR TASK 3 ...
import numpy as np
def P_lh_given_A(ages_of_death, study_year = 1990):

    early_1900s_rate = lefthanded_data['Mean_lh'][-10:].mean() 
    late_1900s_rate = lefthanded_data['Mean_lh'][:10].mean()
    middle_rates = lefthanded_data.loc[lefthanded_data['Birth_year'].isin(study_year - ages_of_death)]['Mean_lh']
    youngest_age = study_year - 1986 + 10
    oldest_age = study_year - 1986 + 86
    
    P_return = np.zeros(ages_of_death.shape)
    P_return[ages_of_death > oldest_age] = early_1900s_rate / 100
    P_return[ages_of_death < youngest_age] = late_1900s_rate / 100
    P_return[np.logical_and((ages_of_death <= oldest_age), (ages_of_death >= youngest_age))] = middle_rates / 100
    
    return P_return


# **TASK_04**
# **Load death distribution data for the United States and plot it.**

data_url_2 = "https://gist.githubusercontent.com/mbonsma/2f4076aab6820ca1807f4e29f75f18ec/raw/62f3ec07514c7e31f5979beeca86f19991540796/cdc_vs00199_table310.tsv"

# ... CODE FOR TASK 4 ...
death_distribution_data = pd.read_csv(data_url_2, sep='\t', skiprows=[1])

death_distribution_data = death_distribution_data.dropna(subset = ['Both Sexes'])

fig, ax = plt.subplots()
ax.plot('Age', 'Both Sexes', data = death_distribution_data, marker='o') 
ax.legend()
plt.title('Death distribution data for the United States in 1999')
ax.set_xlabel('<- Age ->') 
ax.set_ylabel('<- Both Sexes ->')
plt.show()

# **TASK_05**
# **Create a function called P_lh() for overall probability of left-handedness.**

# ... CODE FOR TASK 5 ...
def P_lh(death_distribution_data, study_year = 1990):
    """ Overall probability of being left-handed if you died in the study year
    Input: dataframe of death distribution data, study year
    Output: P(LH), a single floating point number """
    p_list = death_distribution_data['Both Sexes'] * P_lh_given_A(death_distribution_data['Age'], study_year) 
    p = np.sum(p_list)
    return p / np.sum(death_distribution_data['Both Sexes'])

print(P_lh(death_distribution_data))
## 0.07766387615350638
# **TASK_06**
# **Write a function to calculate P_A_given_lh().**

# ... CODE FOR TASK 6 ...
def P_A_given_lh(ages_of_death, death_distribution_data, study_year = 1990):
    P_A = death_distribution_data['Both Sexes'][ages_of_death] / np.sum(death_distribution_data['Both Sexes'])
    P_left = P_lh(death_distribution_data, study_year)
    P_lh_A = P_lh_given_A(ages_of_death, study_year) 
    return P_lh_A*P_A/P_left

# **TASK_07**
# **Write a function to calculate P_A_given_rh()**

# ... CODE FOR TASK 7 ...
def P_A_given_rh(ages_of_death, death_distribution_data, study_year = 1990):
    P_A = death_distribution_data['Both Sexes'][ages_of_death] / np.sum(death_distribution_data['Both Sexes'])
    P_right = 1 - P_lh(death_distribution_data, study_year)
    P_rh_A = 1 - P_lh_given_A(ages_of_death, study_year)
    return P_rh_A*P_A/P_right

# **TASK_08**
# **Plot the probability of being a certain age at death given that**
# **you're left- or right-handed for a range of ages.**

# ... CODE FOR TASK 8 ...
ages = np.arange(6, 115, 1)

left_handed_probability = P_A_given_lh(ages, death_distribution_data)
right_handed_probability = P_A_given_rh(ages, death_distribution_data)

fig, ax = plt.subplots()
ax.plot(ages, left_handed_probability, label = "Left-handed")
ax.plot(ages, right_handed_probability, label = 'Right-handed')
ax.legend()
plt.title('Probability of being a certain age at death')
ax.set_xlabel("<- Age at death ->")
ax.set_ylabel("<- Probability of being age A at death ->")
plt.show()

# **TASK_09**
# **Find the mean age at death for left-handers and right-handers.**

# ... CODE FOR TASK 9 ...
average_lh_age =  np.nansum(ages*np.array(left_handed_probability))
average_rh_age =  np.nansum(ages*np.array(right_handed_probability))

print("Average age of lefthanded is: " + str(average_lh_age))
## Average age of lefthanded is: 67.24503662801027
print("Average age of righthanded is: " + str(average_rh_age))
## Average age of righthanded is: 72.79171936526477
print("The difference in average ages is: " + str(round(average_rh_age - average_lh_age, 1)) + " years.")
## The difference in average ages is: 5.5 years.
# **TASK_10**
# **Redo the calculation from Task 8, setting the study_year parameter to 2018.**

# ... CODE FOR TASK 10 ...
left_handed_probability_2018 = P_A_given_lh(ages, death_distribution_data, 2018)
right_handed_probability_2018 = P_A_given_rh(ages, death_distribution_data, 2018)

average_lh_age_2018 = np.nansum(ages*np.array(left_handed_probability_2018))
average_rh_age_2018 = np.nansum(ages*np.array(right_handed_probability_2018))

print("The difference in average ages is " + 
      str(round(average_rh_age_2018 - average_lh_age_2018, 1)) + " years.")
## The difference in average ages is 2.3 years.

CONCLUSION Based on the thorough analysis of age distribution data, it becomes apparent that the widely accepted notion suggesting left-handers tend to pass away at an earlier age compared to right-handers lacks substantial support. The Bayesian probability computations carried out in this investigation consistently demonstrate that the average ages at the time of death for both left-handed and right-handed individuals are remarkably similar. This revelation challenges conventional beliefs and offers valuable insights into the ongoing debate surrounding the connection between handedness and lifespan. These results underscore the necessity for a more nuanced examination of the various factors affecting longevity and highlight the significance of drawing conclusions based on data-driven approaches.

Thank you,
Vandana Tanwar,
Trainee, MedTourEasy
Email ID: