This analysis use the Nobel Prize organization’s public API to explore structured data on laureates and prizes. The goal is to answer the following four data-driven questions:
How has the share of female laureates changed across decades and categories?
Which countries have “lost” the most laureates to other nations (i.e., born in one country but affiliated elsewhere at award time)?
Which countries have produced the most Nobel laureates by birth?
How has the number of prizes per category evolved over time?
We will first establish a connection to the Nobel Prize API endpoints (`/laureates` and `/nobelPrizes`). The raw JSON responses will be saved into tibbles, then cleaned and joined to create two primary data frames: - `laureate_prizes` and `prizes_df for analysis.
Setup API Connection
library(httr2)library(jsonlite)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Explore questions1 : How has the share of female laureates changed across decades and categories?
To answer this, we group laureates by decade (based on prize_year) and category, then compute the percentage of female recipients. We also calculate an overall trend line across all categories to see if progress is universal or field-dependent.
ggplot(gender_decade |>filter(total >=3),aes(decade, pct_women, colour = category, group = category)) +geom_line(linewidth =0.9) +geom_point(aes(size = n_women), alpha =0.8) +geom_smooth(data = gender_overall,aes(decade, pct_women, group =1),colour ="black", linewidth =1.2,method ="loess", se =TRUE, inherit.aes =FALSE) +scale_size_continuous(range =c(1, 6), guide ="none") +scale_y_continuous(labels =label_percent(scale =1)) +labs(title ="Share of Nobel Prizes Awarded to Women, by Decade",x ="Decade",y ="% of laureates who are women",colour ="Category",caption ="Source: Nobel Prize API" ) +theme(legend.position ="bottom")
`geom_smooth()` using formula = 'y ~ x'
Questions 2 Which country has “lost” the most Nobel laureates — born there but awarded while a citizen of another country?
We define “lost” laureates as individuals whose birth country differs from their award_country (country of affiliation at prize time). Organizations are excluded to focus on individual mobility. The table ranks countries by number of emigrated laureates and identifies their most common destination.
brain_drain |>mutate(birth_country =fct_reorder(birth_country, laureates_lost)) |>ggplot(aes(laureates_lost, birth_country, fill = top_destination)) +geom_col() +geom_text(aes(label = laureates_lost), hjust =-0.2, size =3.5) +scale_x_continuous(expand =expansion(mult =c(0, 0.15))) +labs(title ="Countries That 'Lost' the Most Nobel Laureates",x ="Number of laureates won elsewhere",y ="Birth country",fill ="Most common\ndestination",caption ="Source: Nobel Prize API" ) +theme(legend.position ="right")
Q3. Which questions have produced the most Nobel laureates?
This question ignores later migration and simply counts laureates by their birth country. Organisations are excluded. The top 15 countries are visualized to show which nations have historically generated the most Nobel‑winning talent.
top_countries <- laureate_prizes |>filter(!is.na(birth_country), gender !="org") |>count(birth_country, name ="n_laureates") |>arrange(desc(n_laureates)) |>slice_head(n =15)top_countries |>kable(col.names =c("Birth Country", "Number of Laureates"))
Birth Country
Number of Laureates
USA
298
United Kingdom
94
Germany
78
France
60
Sweden
30
Japan
27
Canada
22
the Netherlands
20
Switzerland
19
Italy
18
Russia
18
Russian Empire
16
Austria
15
Austria-Hungary
13
Norway
13
#| fig-height: 5top_countries |>mutate(birth_country =fct_reorder(birth_country, n_laureates)) |>ggplot(aes(n_laureates, birth_country, fill = n_laureates)) +geom_col() +geom_text(aes(label = n_laureates), hjust =-0.2, size =3.5) +scale_fill_gradient(low ="#c6dbef", high ="#084594", guide ="none") +scale_x_continuous(expand =expansion(mult =c(0, 0.15))) +labs(title ="Top 15 Countries by Nobel Laureate Birth Country",x ="Number of laureates",y =NULL,caption ="Source: Nobel Prize API" )
Q4. How many prizes have been awarded per category per decades?
We count prizes (not laureates) by decade and category. A stacked bar chart makes category composition easy to compare across decades.
prizes_by_decade <- prizes_df |>mutate(decade = (year %/%10) *10) |>group_by(decade, category) |>summarise(n_prizes =n(), .groups ="drop")prizes_by_decade |>pivot_wider(names_from = category,values_from = n_prizes,values_fill =0) |>arrange(decade) |>kable(caption ="Number of prizes awarded per decade by category")
Number of prizes awarded per decade by category
decade
Chemistry
Literature
Peace
Physics
Physiology or Medicine
Economic Sciences
1900
9
9
9
9
9
0
1910
10
10
10
10
10
0
1920
10
10
10
10
10
0
1930
10
10
10
10
10
0
1940
10
10
10
10
10
0
1950
10
10
10
10
10
0
1960
10
10
10
10
10
1
1970
10
10
10
10
10
10
1980
10
10
10
10
10
10
1990
10
10
10
10
10
10
2000
10
10
10
10
10
10
2010
10
10
10
10
10
10
2020
6
6
6
6
6
6
prizes_by_decade |>ggplot(aes(decade, n_prizes, fill = category)) +geom_col(position ="stack") +scale_fill_brewer(palette ="Set2") +labs(title ="Nobel Prizes Awarded per Decade by Category",x ="Decade",y ="Number of prizes",fill ="Category",caption ="Source: Nobel Prize API" )
Conclusion
This analysis uncovered several trend about Nobel Prize data:
Gender gap: The share of female laureates has increased slowly but unevenly across disciplines. The overall upward trend (black line) is driven largely by the Peace and Literature categories, while Physics and Chemistry remain male‑dominated.
Germany has “lost” the most laureates (i.e., born there but awarded elsewhere), with the United States being the most common destination.
The United States leads by far in laureates, followed by the United Kingdom and Germany.
Prize frequency: The number of prizes per decade has remained stable at about 50 per decade across most categories, except for Economic Sciences, which was introduced after 1960.
Together, these findings demonstrate how simple API data can be transformed into meaningful sociological and historical insights and showing through visualizations.