For this assignment, I will be using the Nobel Prize API to answer the following questions:
We will use the following libraries
The Nobel Prize website has this to say about the number of Laureates over the years:
“Between 1901 and 2025, the Nobel Prizes and the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel were awarded 633 times to 1,026 people and organisations. With some receiving the Nobel Prize more than once, this makes a total of 990 individuals and 28 organisations.”
The Nobel Prize API does not have a limit (so far as I have found) on the number of results you can ask for, and given that the total number of Laureate data should not surpass 1026 (according the Nobel Prize website) I will not ask for more than that when retrieving the data.
Let’s first retrieve our data from the Nobel Prize API.
url <- "https://api.nobelprize.org/2.1/laureates?limit=1026"
response <- GET(url)
#extracting our response and saving the raw JSON text
raw_data <- content(response, as = "text")
#using jsonlite to parse our raw JSON data
json_data <- fromJSON(raw_data, flatten = FALSE)
names(json_data)
## [1] "laureates" "meta" "links"
Now we can save the laureates results as a tibble, we want to do a bit of exploring of the data so that we can extract the information we’re interested in:
# converting to a tibble
laureates2 <- as_tibble(json_data$laureates)
names(laureates2)
## [1] "id" "knownName" "givenName"
## [4] "familyName" "fullName" "fileName"
## [7] "gender" "birth" "wikipedia"
## [10] "wikidata" "sameAs" "links"
## [13] "nobelPrizes" "death" "orgName"
## [16] "acronym" "founded" "nativeName"
## [19] "penName" "penNameOf" "foundedCountry"
## [22] "foundedCountryNow" "foundedContinent"
laureates2
Let’s now find which city has had the most most Nobel Prize laureates born there over the years.
#getting a count of the laureates per borth city
top_birth_city <- laureates2 %>%
filter(!is.na(birth$place$city$en)) %>%
count(birth$place$city$en, sort = TRUE) %>%
slice_max(n, n = 10)
top_birth_city <- top_birth_city %>%
rename(Birth_City = `birth$place$city$en`,
Count = n )
Now we can visualize the ten cities where the most Nobel Prize laureates were born.
ggplot(top_birth_city, aes(x= Birth_City, y = Count, fill= Birth_City)) +
geom_col() +
labs(
title = "Most Popular Cities of Birth of Nobel Prize Winners",
x ="City",
y = "Count of Nobel Winners"
)+
theme(axis.title.x = element_text(vjust = 0.5),
axis.text.x = element_text(angle = 45, vjust = 0.55))
We can see that the city that has produced the most Nobel Prize laureates was New York City with over 50 Nobel Prize laureates having been born there. Also, note that most of the top ten cities are American cities.
Let’s now find which country has produced the most Nobel Prize laureates over the years, not including U.S. female Nobel Prize laureates.
top_birth_countries <- laureates2 %>%
filter(!is.na(birth$place$country$en) & birth$place$country$en != "USA" ) %>%
filter(gender == "female") %>%
count(birth$place$country$en, sort = TRUE) %>%
slice_max(n, n = 5)
top_birth_countries <- top_birth_countries %>%
rename(Birth_Country = `birth$place$country$en` ,
Count_of_Awardees = n )
ggplot(top_birth_countries, aes(x= Birth_Country, y = Count_of_Awardees, fill= Birth_Country)) +
geom_col() +
labs(
title = "Countries with The Most Nobel Prize Winners",
subtitle = "Exclusing the USA; Countries of Birth of Winners",
x ="City",
y = "Count of Nobel Winners"
)+
theme(axis.title.x = element_text(vjust = 0.5),
axis.text.x = element_text(angle = 45, vjust = 0.55))
We can see that excluding the United States, France has produced the most Nobel Prize laureates over the year.
If we want to find out which Nobel Prize category has awarded the most Nobel Prizes to women, we must first find and extract the category information for each Nobel Prize recipient.
To do this we will first have to take a closer look at the laureates data, expanding out any lists (in my case they have become data frames)
laureates2 %>%
unnest_longer(nobelPrizes)%>%
unnest_wider(nobelPrizes, names_sep = "_") %>%
unnest_longer(nobelPrizes_category)
Looking at through the laureate data, I am only interested in extracting the category and award year columns from nobelPrizes. Thus, I will extract the id, gender, and full name information from the laureate level, then extract the award year and the category information from nobelPrizes.
laureates_trim2 <-laureates2 %>%
select(id, gender, fullName, nobelPrizes) %>%
unnest_longer(nobelPrizes) %>%
unnest_wider(nobelPrizes, names_sep = "_") %>%
select(id, gender, fullName, nobelPrizes_awardYear, nobelPrizes_category)
names(laureates_trim2)
## [1] "id" "gender" "fullName"
## [4] "nobelPrizes_awardYear" "nobelPrizes_category"
Now we can visualize the number of women to receive the Nobel Prize for each category.
categories <- laureates_trim2 %>%
filter(gender == "female") %>%
count(nobelPrizes_category$en, sort = TRUE) %>%
arrange(n)
categories <-categories %>%
rename(Category = `nobelPrizes_category$en`,
Count = n)
ggplot(categories, aes(x = Category, y = Count))+
geom_col(fill = "lightblue", color ="white") +
coord_flip()+
labs(
title = "Number of Nobel Prizes by Category",
subtitle = "Female Recipients",
x = "Category",
y = "Count"
) +
theme_dark()
As we can see the category that has awarded the most women a Nobel Prize is the peace category.
Given that we have seen that the number of female recipients is low, it may be interesting to compare the number of Nobel Prize recipients that have been men to number that have been women.
categories_gm <- laureates_trim2 %>%
drop_na(gender) %>%
group_by(gender) %>%
count(nobelPrizes_category$en, sort = TRUE)
categories_gm %>%kable(
format = "html",
caption = "Male vs Female Nobel Laureates ",
col.names = c("Gender","Nobel Prize Category", "Count")
) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE)
| Gender | Nobel Prize Category | Count |
|---|---|---|
| male | Physics | 225 |
| male | Physiology or Medicine | 218 |
| male | Chemistry | 192 |
| male | Literature | 104 |
| male | Economic Sciences | 96 |
| male | Peace | 92 |
| female | Peace | 20 |
| female | Literature | 18 |
| female | Physiology or Medicine | 14 |
| female | Chemistry | 8 |
| female | Physics | 5 |
| female | Economic Sciences | 3 |
The difference between the number of male and female Nobel Prize laureates is staggering.
Apart from knowing which year had the most women receive a Nobel Prize, we may also want to know the names of the women who won that year.
top_year <- laureates_trim2 %>%
filter(gender == "female") %>%
count(nobelPrizes_awardYear, sort = TRUE) %>%
slice_max(n, n = 1)
# saving the year
female_year <- top_year$nobelPrizes_awardYear
# names of the recipients
top_year_f <- laureates_trim2 %>%
filter(gender == "female", nobelPrizes_awardYear == female_year) %>%
select(fullName, nobelPrizes_awardYear, nobelPrizes_category)
The year that the most women were awarded the Nobel Prize was 2009
The women who won the Nobel Prize in 2009 are listed in table below:
top_year_f %>%kable(
format = "html",
caption = "Female Nobel Laureates the Year with the Most Female Nobel Recipients",
col.names = c("Name","Award Year", "Category")
) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = TRUE)
| Name | Award Year | Category |
|---|---|---|
| Ada E. Yonath | 2009 | Chemistry |
| Carol W. Greider | 2009 | Physiology or Medicine |
| Elinor Ostrom | 2009 | Economic Sciences |
| Elizabeth H. Blackburn | 2009 | Physiology or Medicine |
| Herta Müller | 2009 | Literature |