This report provides an analysis of Nobel Prize Winners in their fields since the very first Nobel Prize in 1901. This document illustrates some ways to analyse data from the Nobel Prize API using the R programming language. The analysis includes multiple prize winners, gender gap, gender by category - Chemestry/Economics/Literature/Medicine and Physiology/Peace/Physics, gender over time, nobel prize share, distribution by age, and distribution by countries etc.
Thankfully, the Nobel Foundation & Nobelprize.org - is a registered trademark, and is produced, managed and maintained by Nobel Media, had created exactly the database of information for every Nobel Prize since 1901, including the Nobel Laureate’s biographies, Nobel Lectures, interviews, photos, articles, video clips, and press releases. Nobelprize.org provides comprehensive, first-hand information about the Nobel Prize and Nobel Laureates in Physics, Chemistry, Physiology or Medicine, Literature and Peace starting in 1901, as well as the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel and the Economics Laureates starting in 1969.
I would like to thanks, the ‘Nobel Foundation’ — had created exactly this dataset right before the 2016 Nobel Prize Announcements. Why, I have come to Nobel Foundation, because there wasn’t any other place that had the data as nicely structured and informative.
After, 2016 Nobel Prize Announcements — I “only” needed to add the prizes for 2016. Again the Nobel Foundation helped out by supplying a dataset with all of the Nobel Laureates from 1901.
Now, We have to load some required packages, to work on!
if (!require("jsonlite") | !require("ggplot2") | !require("plyr") | !require("dplyr") | !require("xtable")) {
stop('Some required package(s) is not installed!')
} else {
library("jsonlite")
library("ggplot2")
library("plyr")
library("dplyr")
library("xtable")
}
## Loading required package: jsonlite
## Warning: package 'jsonlite' was built under R version 3.2.5
## Loading required package: ggplot2
## Loading required package: plyr
## Warning: package 'plyr' was built under R version 3.2.5
## Loading required package: dplyr
## Warning: package 'dplyr' was built under R version 3.2.5
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Loading required package: xtable
Before, I even started searching online I already had an idea in my head about the data that I wanted to play with this month; thanks to Nobel Foundation, they have developed a nice API and I got it through - http://api.nobelprize.org/.
theData <- "http://api.nobelprize.org/v1/laureate.json"
nobels <- fromJSON(theData)
names(nobels)
## [1] "laureates"
names(nobels$laureates)
## [1] "id" "firstname" "surname"
## [4] "born" "died" "bornCountry"
## [7] "bornCountryCode" "bornCity" "diedCountry"
## [10] "diedCountryCode" "diedCity" "gender"
## [13] "prizes"
names(nobels$laureates$prizes[[1]])
## [1] "year" "category" "share" "motivation"
## [5] "affiliations"
The variable nobels is a list with one named element, laureates. The variable laureates is a data frame with 13 columns, one row per laureate. The last column, prizes is a list of data frames.
Note: that analyses that use prizes may count some laureates twice. However, there are only four such individuals, which makes little difference to these charts.
Now, I am going to work for the analysis of acquired data from the Nobel API:
We can retrieve those laureates who won more than one prize by selecting records, where nobels$laureates$prizes has more than one row.
multi <- which(sapply(nobels$laureates$prizes, function(x) nrow(x)) > 1)
winners <- nobels$laureates[multi, c("firstname", "surname", "born", "bornCountry")]
print(xtable(winners), type = "html", comment = FALSE, include.rownames = FALSE)
| firstname | surname | born | bornCountry |
|---|---|---|---|
| Marie | Curie, née Sklodowska | 1867-11-07 | Russian Empire (now Poland) |
| John | Bardeen | 1908-05-23 | USA |
| Linus Carl | Pauling | 1901-02-28 | USA |
| Frederick | Sanger | 1918-08-13 | United Kingdom |
| Comité international de la Croix Rouge (International Committee of the Red Cross) | 0000-00-00 | ||
| Office of the United Nations High Commissioner for Refugees (UNHCR) | 0000-00-00 |
# Result of Analysis.
# Four individuals have won two prizes:
# 1. Marie Curie (physics, chemistry);
# 2. John Bardeen (physics);
# 3. Linus Pauling (chemistry, peace);
# 4. Frederick Sanger (chemistry).
Counting up prizes by gender reveals the huge gender gap in Nobel Laureates.
gender <- as.data.frame(table(nobels$laureates$gender), stringsAsFactors = FALSE)
ggplot(gender) + geom_bar(aes(Var1, Freq), stat = "identity", fill = "skyblue3") +
theme_bw() +
labs(x = "Gender", y = "Count", title = "All Nobel Prizes by Gender")
# Result of Analysis.
# Nobel Prize Awarded to Women!
# The Nobel Prize and Prize in Economic Sciences have been awarded to women 49 times between 1901 and 2015.
# Only one woman, Marie Curie,
# has been honoured twice, with the 1903 Nobel Prize in Physics & the 1911 Nobel Prize in Chemistry.
# This means that 48 women in total have been awarded the Nobel Prize between 1901 and 2015.
# √ 26 are organizations and rest of laureates are male.
Categories: Chemestry/Economics/Literature/Medicine + Physiology/Peace/Physics.
cnt <- sapply(nobels$laureates$prizes, function(x) nrow(x))
prizes <- ldply(nobels$laureates$prizes, as.data.frame)
prizes$id <- rep(nobels$laureates$id, cnt)
prizes$gender <- rep(nobels$laureates$gender, cnt)
pg <- as.data.frame(table(prizes$category, prizes$gender), stringsAsFactors = FALSE)
ggplot(pg) + geom_bar(aes(Var1, Freq), stat = "identity", fill = "skyblue3") +
theme_bw() +
facet_grid(Var2 ~ .) + labs(x = "Category", y = "Count", title = "All Nobel Prizes by Gender and Category")
Is there any indication of an increase in female laureates over time?
p5 <- as.data.frame(table(prizes$year, prizes$gender), stringsAsFactors = FALSE)
colnames(p5) <- c("year", "gender", "Freq")
p5.1 <- mutate(group_by(p5, gender), cumsum = cumsum(Freq))
ggplot(subset(p5.1, gender != "org")) + geom_point(aes(year, log(cumsum), color = gender)) +
theme_bw() +
scale_x_discrete(breaks = seq(1900, 2015, 10)) +
scale_color_manual(values = c("darkorange", "skyblue3")) +
labs(x = "Year", y = "log(cumulative sum) of laureates", title = "Cumulative Sum of Nobel Laureates by Gender over Time")
Note:- There is some indication that since about 1975, more women have won prizes than in the preceding years.
What if we subset by category?
p6 <- as.data.frame(table(prizes$year, prizes$category, prizes$gender), stringsAsFactors = FALSE)
colnames(p6) <- c("year", "category", "gender", "Freq")
p6.1 <- mutate(group_by(p6, category, gender), cumsum = cumsum(Freq))
ggplot(subset(p6.1, gender != "org")) + geom_point(aes(year, log(cumsum), color = gender)) +
facet_grid(category ~ .) +
theme_bw() +
scale_x_discrete(breaks = seq(1900, 2015, 10)) +
scale_color_manual(values = c("darkorange", "skyblue3")) +
labs(x = "Year", y = "log(cumulative sum) of laureates",
title = "Cumulative Sum of Nobel Laureates by Gender and Category over Time")
Conclusion of Gender Analysis:- There is some indication that since about 1975, more women have won prizes in medicine and peace than in the preceding years. The rate of awards to women for literature also rises after about 1990.
To date, only one woman has won the prize for economics, two women have won for physics and four have won for chemistry.