Introduction

For this assignment, I will be using the Nobel Prize API to answer the following questions:

  1. Which Cities Produced the Most Nobel Prize Winners Overall?
  2. Which Country had the Most Female Nobel Prize Winners Excluding U.S. Recipients?
  3. Which Category has awarded the most Nobel Prizes to Women?
  4. Which Year Were the Most Women the Recipients of the Nobel Prize?

We will use the following libraries

  • The httr library
  • The jsonlite library
  • The tidyverse library
  • The kableExtra library
  • The reshape2 library

The Nobel Prize website has this to say about the number of Laureates over the years:

“Between 1901 and 2025, the Nobel Prizes and the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel were awarded 633 times to 1,026 people and organisations. With some receiving the Nobel Prize more than once, this makes a total of 990 individuals and 28 organisations.”

The Nobel Prize API does not have a limit (so far as I have found) on the number of results you can ask for, and given that the total number of Laureate data should not surpass 1026 (according the Nobel Prize website) I will not ask for more than that when retrieving the data.

Retrieving the Data

Let’s first retrieve our data from the Nobel Prize API.

url <- "https://api.nobelprize.org/2.1/laureates?limit=1026"
  
response <- GET(url)

#extracting our response and saving the raw JSON text
raw_data <- content(response, as = "text")

#using jsonlite to parse our raw JSON data
json_data <- fromJSON(raw_data, flatten = FALSE)


names(json_data)
## [1] "laureates" "meta"      "links"

Now we can save the laureates results as a tibble, we want to do a bit of exploring of the data so that we can extract the information we’re interested in:

# converting to a tibble
laureates2 <- as_tibble(json_data$laureates)
names(laureates2)
##  [1] "id"                "knownName"         "givenName"        
##  [4] "familyName"        "fullName"          "fileName"         
##  [7] "gender"            "birth"             "wikipedia"        
## [10] "wikidata"          "sameAs"            "links"            
## [13] "nobelPrizes"       "death"             "orgName"          
## [16] "acronym"           "founded"           "nativeName"       
## [19] "penName"           "penNameOf"         "foundedCountry"   
## [22] "foundedCountryNow" "foundedContinent"
laureates2

Q1: Which Cities Produced the Most Nobel Prize Winners Overall?

Let’s now find which city has had the most most Nobel Prize laureates born there over the years.

#getting a count of the laureates per borth city

top_birth_city <- laureates2 %>%
  filter(!is.na(birth$place$city$en)) %>% 
  count(birth$place$city$en, sort = TRUE) %>%
  slice_max(n, n = 10)  


top_birth_city <- top_birth_city %>%
  rename(Birth_City = `birth$place$city$en`,
         Count = n )

Now we can visualize the ten cities where the most Nobel Prize laureates were born.

ggplot(top_birth_city, aes(x= Birth_City, y = Count, fill= Birth_City)) +
  geom_col() +
  labs(
    title = "Most Popular Cities of Birth of Nobel Prize Winners",
    x ="City",
    y = "Count of Nobel Winners"
  )+
  theme(axis.title.x = element_text(vjust = 0.5),
        axis.text.x = element_text(angle = 45, vjust = 0.55))

We can see that the city that has produced the most Nobel Prize laureates was New York City with over 50 Nobel Prize laureates having been born there. Also, note that most of the top ten cities are American cities.

Q2: Which Country Had the Most Female Nobel Prize Winners Excluding U.S. Recipients?

Let’s now find which country has produced the most Nobel Prize laureates over the years, not including U.S. female Nobel Prize laureates.

top_birth_countries <- laureates2 %>%
  filter(!is.na(birth$place$country$en) & birth$place$country$en != "USA" ) %>%
  filter(gender == "female") %>%
  count(birth$place$country$en, sort = TRUE) %>%
  slice_max(n, n = 5)  

top_birth_countries <- top_birth_countries %>%
  rename(Birth_Country = `birth$place$country$en` ,
         Count_of_Awardees = n )

ggplot(top_birth_countries, aes(x= Birth_Country, y = Count_of_Awardees, fill= Birth_Country)) +
  geom_col() +
  labs(
    title = "Countries with The Most Nobel Prize Winners",
    subtitle = "Exclusing the USA; Countries of Birth of Winners",
    x ="City",
    y = "Count of Nobel Winners"
  )+
  theme(axis.title.x = element_text(vjust = 0.5),
        axis.text.x = element_text(angle = 45, vjust = 0.55))

We can see that excluding the United States, France has produced the most Nobel Prize laureates over the year.

Q3: Which Category Has Awarded the Most Nobel Prizes to Women?

If we want to find out which Nobel Prize category has awarded the most Nobel Prizes to women, we must first find and extract the category information for each Nobel Prize recipient.

To do this we will first have to take a closer look at the laureates data, expanding out any lists (in my case they have become data frames)

laureates2 %>%
  unnest_longer(nobelPrizes)%>%
  unnest_wider(nobelPrizes, names_sep = "_") %>%
  unnest_longer(nobelPrizes_category)

Looking at through the laureate data, I am only interested in extracting the category and award year columns from nobelPrizes. Thus, I will extract the id, gender, and full name information from the laureate level, then extract the award year and the category information from nobelPrizes.

laureates_trim2 <-laureates2 %>%
  select(id, gender, fullName, nobelPrizes) %>%
  unnest_longer(nobelPrizes) %>%
  unnest_wider(nobelPrizes, names_sep = "_") %>%
  select(id, gender, fullName, nobelPrizes_awardYear, nobelPrizes_category)

names(laureates_trim2)
## [1] "id"                    "gender"                "fullName"             
## [4] "nobelPrizes_awardYear" "nobelPrizes_category"

Now we can visualize the number of women to receive the Nobel Prize for each category.

categories <- laureates_trim2 %>%
  filter(gender == "female") %>%
  count(nobelPrizes_category$en, sort = TRUE) %>%
  arrange(n)

categories <-categories %>%
  rename(Category = `nobelPrizes_category$en`,
         Count = n) 

ggplot(categories, aes(x = Category, y = Count))+
  geom_col(fill = "lightblue", color ="white") +
  coord_flip()+
  labs(
    title = "Number of Nobel Prizes by Category",
    subtitle = "Female Recipients",
    x = "Category",
    y = "Count"
  ) +
  theme_dark()

As we can see the category that has awarded the most women a Nobel Prize is the peace category.

Male vs Female Laureates

Given that we have seen that the number of female recipients is low, it may be interesting to compare the number of Nobel Prize recipients that have been men to number that have been women.

categories_gm <- laureates_trim2 %>%
  drop_na(gender) %>%
  group_by(gender) %>%
  count(nobelPrizes_category$en, sort = TRUE)

categories_gm %>%kable(
  format = "html",    
  caption = "Male vs Female Nobel Laureates ",
  col.names = c("Gender","Nobel Prize Category", "Count")
) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = TRUE)
Male vs Female Nobel Laureates
Gender Nobel Prize Category Count
male Physics 225
male Physiology or Medicine 218
male Chemistry 192
male Literature 104
male Economic Sciences 96
male Peace 92
female Peace 20
female Literature 18
female Physiology or Medicine 14
female Chemistry 8
female Physics 5
female Economic Sciences 3

The difference between the number of male and female Nobel Prize laureates is staggering.

Q4: Which Year Were the Most Women the Recipients of the Nobel Prize

Apart from knowing which year had the most women receive a Nobel Prize, we may also want to know the names of the women who won that year.

top_year <- laureates_trim2 %>%
  filter(gender == "female") %>%
  count(nobelPrizes_awardYear, sort = TRUE) %>%
  slice_max(n, n = 1)

# saving the year
female_year <- top_year$nobelPrizes_awardYear

# names of the recipients
top_year_f <- laureates_trim2 %>%
  filter(gender == "female", nobelPrizes_awardYear == female_year) %>%
  select(fullName, nobelPrizes_awardYear, nobelPrizes_category)

The year that the most women were awarded the Nobel Prize was 2009

The women who won the Nobel Prize in 2009 are listed in table below:

top_year_f %>%kable(
  format = "html",     
  caption = "Female Nobel Laureates the Year with the Most Female Nobel Recipients",
  col.names = c("Name","Award Year", "Category")
) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
                full_width = TRUE)
Female Nobel Laureates the Year with the Most Female Nobel Recipients
Name Award Year Category
Ada E. Yonath 2009 Chemistry
Carol W. Greider 2009 Physiology or Medicine
Elinor Ostrom 2009 Economic Sciences
Elizabeth H. Blackburn 2009 Physiology or Medicine
Herta Müller 2009 Literature