Introduction

This document uses the Nobel Prize API data on Nobel Prize winners’ ages and the Our world in Data data on life expectancies, to calculate the “relative” ages of the winners, and supplements the Oxford Institute of Population Ageing blog post of 12.10.2016

HT to Neil Saunders who’s blog post alerted me to the Nobel API and prompted this back of the envelope calculatio. His very useful code for analysing the API data is in this github repo here, and I stole from it liberally - all additional kludginess is purely my own.

Getting the data

# preliminaries
library(jsonlite)
library(dplyr)

Nobel prize laureates - and some errors?

We use the Nobel Prize API to fetch laureate data in JSON format. (at one point I got really confused (violenlty so) with the nested data.frames produced by fromJSON, and found this post on tidyjson which looks great, TODO!)

nobel <- fromJSON("http://api.nobelprize.org/v1/laureate.json")

Top level is nrow(nobel$laureates)=909 laureates. Six of which have more than one prize, which you can tell by looking at how many rows their nobel$laureates$prizes variables have:

cnt <- sapply(nobel$laureates$prizes, function(x) nrow(x))
multi <- which(cnt > 1)
nobel.multi.df <- cbind(bind_rows(nobel$laureates[multi,])[rep(1:length(cnt[multi]),cnt[multi]),1:2],
            bind_rows(nobel$laureates$prizes[multi])[1:2])
nobel.multi.df
##     id
## 1    6
## 2    6
## 3   66
## 4   66
## 5  217
## 6  217
## 7  222
## 8  222
## 9  482
## 10 482
## 11 482
## 12 515
## 13 515
##                                                                            firstname
## 1                                                                              Marie
## 2                                                                              Marie
## 3                                                                               John
## 4                                                                               John
## 5                                                                         Linus Carl
## 6                                                                         Linus Carl
## 7                                                                          Frederick
## 8                                                                          Frederick
## 9  Comité international de la Croix Rouge (International Committee of the Red Cross)
## 10 Comité international de la Croix Rouge (International Committee of the Red Cross)
## 11 Comité international de la Croix Rouge (International Committee of the Red Cross)
## 12               Office of the United Nations High Commissioner for Refugees (UNHCR)
## 13               Office of the United Nations High Commissioner for Refugees (UNHCR)
##    year  category
## 1  1903   physics
## 2  1911 chemistry
## 3  1956   physics
## 4  1972   physics
## 5  1954 chemistry
## 6  1962     peace
## 7  1958 chemistry
## 8  1980 chemistry
## 9  1917     peace
## 10 1944     peace
## 11 1963     peace
## 12 1954     peace
## 13 1981     peace

That’s all good and well, but when you do the whole table for all the prizes, it turns out that there are in fact six entries in the original table that should not be there. They are empty rows - but for the gender of these laureates being recorded as male.

nobel$laureates[is.na(nobel$laureates$firstname),]
##      id firstname surname       born       died bornCountry
## 870 897      <NA>    <NA> 0000-00-00 0000-00-00        <NA>
## 871 898      <NA>    <NA> 0000-00-00 0000-00-00        <NA>
## 872 899      <NA>    <NA> 0000-00-00 0000-00-00        <NA>
## 873 900      <NA>    <NA> 0000-00-00 0000-00-00        <NA>
## 874 901      <NA>    <NA> 0000-00-00 0000-00-00        <NA>
## 875 902      <NA>    <NA> 0000-00-00 0000-00-00        <NA>
##     bornCountryCode bornCity diedCountry diedCountryCode diedCity gender
## 870            <NA>     <NA>        <NA>            <NA>     <NA>   male
## 871            <NA>     <NA>        <NA>            <NA>     <NA>   male
## 872            <NA>     <NA>        <NA>            <NA>     <NA>   male
## 873            <NA>     <NA>        <NA>            <NA>     <NA>   male
## 874            <NA>     <NA>        <NA>            <NA>     <NA>   male
## 875            <NA>     <NA>        <NA>            <NA>     <NA>   male
##     prizes
## 870   NULL
## 871   NULL
## 872   NULL
## 873   NULL
## 874   NULL
## 875   NULL

So that’s not good, these need to be removed and someone possibly needs to be alerted about this (TODO: open issue with neilfws, he’s got the same error; also have a look at Noble API contacts..)

OK, so now we want our dataframe with a single row per prize, with the recipients’ names, DOB, year awarded and category.

#merge into final df
nobel.df <- cbind(bind_rows(nobel$laureates)[rep(1:length(cnt),cnt),c(1:4,12)],
            bind_rows(nobel$laureates$prizes)[1:2])
# remove weird six rows
nobel.df <- nobel.df[!is.na(nobel.df$firstname),]

Now as od 11.10. this data was not teriffically up to date, since the new 10 recipients’ birthdays were not picked up yet, although their names etc. had been. So these need to be added manually, but also in a way that if this is rerun in the future it doesn’t overwrite actual data pulled from the API.

# add new 2016 one's birthdays 
# https://en.wikipedia.org/wiki/Yoshinori_Ohsumi
# http://starsunfolded.com/michael-kosterlitz/ (only one not precise, assume bday before 10.10.)
# https://en.wikipedia.org/wiki/Duncan_Haldane
# https://en.wikipedia.org/wiki/David_J._Thouless
# https://en.wikipedia.org/wiki/Fraser_Stoddart
# https://en.wikipedia.org/wiki/Jean-Pierre_Sauvage
# https://en.wikipedia.org/wiki/Ben_Feringa
# https://en.wikipedia.org/wiki/Juan_Manuel_Santos
# https://en.wikipedia.org/wiki/Oliver_Hart_(economist)
# https://en.wikipedia.org/wiki/Bengt_R._Holmstr%C3%B6m

nobel.df %>% 
  mutate(born = ifelse(surname == "Ohsumi" & born =="0000-00-00","1945-02-09", born)) %>%
  mutate(born = ifelse(surname == "Kosterlitz" & born =="0000-00-00", "1942-01-01", born)) %>%
  mutate(born = ifelse(surname == "Haldane" & born =="0000-00-00", "1951-09-14", born)) %>%
  mutate(born = ifelse(surname == "Thouless" & born =="0000-00-00", "1934-09-21", born)) %>%
  mutate(born = ifelse(surname == "Stoddart" & born =="0000-00-00", "1942-05-24", born)) %>%
  mutate(born = ifelse(surname == "Sauvage" & born =="0000-00-00", "1944-10-21", born)) %>%
  mutate(born = ifelse(surname == "Feringa" & born =="0000-00-00", "1951-05-18", born)) %>%
  mutate(born = ifelse(surname == "Santos" & born =="0000-00-00", "1951-08-10", born)) %>%
  mutate(born = ifelse(surname == "Hart" & born =="0000-00-00", "1948-10-09", born)) %>%
  mutate(born = ifelse(surname == "Holmström" & born =="0000-00-00", "1949-04-18", born)) ->
  nobel.df

Now just need to calculate the age at the time of being awarded the prize, and remove all the organizations, keeping humans only

nobel.df %>%
  mutate(age = as.numeric(as.Date(paste(year, "12-31", sep = "-"), 
                                  "%Y-%m-%d") - as.Date(born,"%Y-%m-%d"))/365,
         year = as.numeric(year)) %>%
  filter(gender != "org") -> nobel.df
           
rm(cnt, nobel, nobel.multi.df)

Life expectancy

An earlier version had me parsing the whole Human Mortalitdy Database Ex data, but not only would the country mathcing be a pain, it also makes sense conceptually to look at the gloal life expectancy when measuring the relative age of mankinds biggest contributors :)

But HMD doesn’t have world data, Max does. The data points are a bit sparse early on.. I use zoo spline to interpolate the missing values.

## Joining by: "Year"

And there we have it, full.lexp is the full table of 884 instances of humans being awarded a Nobel peace prise, with their age at award and the (estimated) average global (period) life expectancy at the time of award.