assignment_10B

Approach

Brainstorm

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.2.0
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(jsonlite)

Attaching package: 'jsonlite'

The following object is masked from 'package:purrr':

    flatten
df <- fromJSON("https://api.nobelprize.org/2.1/nobelPrizes")
glimpse(df)
List of 3
 $ nobelPrizes:'data.frame':    25 obs. of  8 variables:
  ..$ awardYear          : chr [1:25] "1901" "1901" "1901" "1901" ...
  ..$ category           :'data.frame': 25 obs. of  3 variables:
  .. ..$ en: chr [1:25] "Chemistry" "Literature" "Peace" "Physics" ...
  .. ..$ no: chr [1:25] "Kjemi" "Litteratur" "Fred" "Fysikk" ...
  .. ..$ se: chr [1:25] "Kemi" "Litteratur" "Fred" "Fysik" ...
  ..$ categoryFullName   :'data.frame': 25 obs. of  3 variables:
  .. ..$ en: chr [1:25] "The Nobel Prize in Chemistry" "The Nobel Prize in Literature" "The Nobel Peace Prize" "The Nobel Prize in Physics" ...
  .. ..$ no: chr [1:25] "Nobelprisen i kjemi" "Nobelprisen i litteratur" "Nobels fredspris" "Nobelprisen i fysikk" ...
  .. ..$ se: chr [1:25] "Nobelpriset i kemi" "Nobelpriset i litteratur" "Nobels fredspris" "Nobelpriset i fysik" ...
  ..$ dateAwarded        : chr [1:25] "1901-11-12" "1901-11-14" "1901-12-10" "1901-11-12" ...
  ..$ prizeAmount        : int [1:25] 150782 150782 150782 150782 150782 141847 141847 141847 141847 141847 ...
  ..$ prizeAmountAdjusted: int [1:25] 10833458 10833458 10833458 10833458 10833458 10191492 10191492 10191492 10191492 10191492 ...
  ..$ links              :List of 25
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  .. ..$ :'data.frame': 1 obs. of  4 variables:
  ..$ laureates          :List of 25
  .. ..$ :'data.frame': 1 obs. of  7 variables:
  .. ..$ :'data.frame': 1 obs. of  7 variables:
  .. ..$ :'data.frame': 2 obs. of  7 variables:
  .. ..$ :'data.frame': 1 obs. of  7 variables:
  .. ..$ :'data.frame': 1 obs. of  7 variables:
  .. ..$ :'data.frame': 1 obs. of  7 variables:
  .. ..$ :'data.frame': 1 obs. of  7 variables:
  .. ..$ :'data.frame': 2 obs. of  7 variables:
  .. ..$ :'data.frame': 2 obs. of  7 variables:
  .. ..$ :'data.frame': 1 obs. of  7 variables:
  .. ..$ :'data.frame': 1 obs. of  7 variables:
  .. ..$ :'data.frame': 1 obs. of  7 variables:
  .. ..$ :'data.frame': 1 obs. of  7 variables:
  .. ..$ :'data.frame': 3 obs. of  7 variables:
  .. ..$ :'data.frame': 1 obs. of  7 variables:
  .. ..$ :'data.frame': 1 obs. of  7 variables:
  .. ..$ :'data.frame': 2 obs. of  7 variables:
  .. ..$ :'data.frame': 1 obs. of  7 variables:
  .. ..$ :'data.frame': 1 obs. of  7 variables:
  .. ..$ :'data.frame': 1 obs. of  7 variables:
  .. ..$ :'data.frame': 1 obs. of  7 variables:
  .. ..$ :'data.frame': 1 obs. of  7 variables:
  .. ..$ :'data.frame': 1 obs. of  7 variables:
  .. ..$ :'data.frame': 1 obs. of  7 variables:
  .. ..$ :'data.frame': 1 obs. of  7 variables:
 $ meta       :List of 6
  ..$ offset    : int 0
  ..$ limit     : int 25
  ..$ count     : int 682
  ..$ terms     : chr "https://www.nobelprize.org/about/terms-of-use-for-api-nobelprize-org-and-data-nobelprize-org/"
  ..$ license   : chr "https://www.nobelprize.org/about/terms-of-use-for-api-nobelprize-org-and-data-nobelprize-org/#licence"
  ..$ disclaimer: chr "https://www.nobelprize.org/about/terms-of-use-for-api-nobelprize-org-and-data-nobelprize-org/#disclaimer"
 $ links      :List of 4
  ..$ first: chr "http://nobel-external-api-app.azurewebsites.net/2.1/nobelPrizes?offset=0&limit=25"
  ..$ self : chr "http://nobel-external-api-app.azurewebsites.net/2.1/nobelPrizes?offset=0&limit=25"
  ..$ next : chr "http://nobel-external-api-app.azurewebsites.net/2.1/nobelPrizes?offset=25&limit=25"
  ..$ last : chr "http://nobel-external-api-app.azurewebsites.net/2.1/nobelPrizes?offset=675&limit=25"

Lot of nested dataframes and lists. Seems better to read the documentation where it’s better to create an https request. Unless the request is to use https://api.nobelprize.org/2.1/nobelPrizes?

Questions

After understanding the data, I can come up with the following questions:

  • Did Friedrich Nietzsche ever win a Nobel Prize?
  • What was Betrand Russell’s motivation for winning a prize?
  • Who won the Nobel Prize in Literature in 2010-2020?
  • What is the overall ratio of male to female Nobel Prize laureates? Among the five countries with the most birthed laureates, what is the male-to-female ratio in each country?

Nietzsche

Bool response for the query: http://api.nobelprize.org/2.0/laureates?name=Friedrich%20Nietzsche

Betrand Russell

Return motivation columns from the query: http://api.nobelprize.org/2.0/laureates?name=Betrand%20Russell

Nobel Prize Literature 2010-2020

http://api.nobelprize.org/2.0/laureates?nobelPrizeYear=2010&yearTo=2020&nobelPrizeCategory=lit

Ratio of male to Female Nobel Prize laureates and country outliers.

We would use the full laureates dataset to calculate the overall male-to-female ratio among Nobel Prize laureates. Then we would group laureates by birthContinent, select the five continents with the most laureates, and calculate the ratio for each continent.

http://api.nobelprize.org/2.0/laureates

Codeurl

packages

library(tidyverse)
library(jsonlite)
library(tidyr)
library(stringr)

set up

url <- "https://api.nobelprize.org/2.1"

# calling function makes it easier to answer the different questions
get_api <- function(path) {
  fromJSON(path, flatten = TRUE)
}

Q1: Did Friedrich Nietzsche ever win a Nobel Prize?

nietzsche <- get_api(
  paste0(url, "/laureates?name=Friedrich%20Nietzsche")
)

if (length(nietzsche$laureates) > 0) "Yes" else "No"
[1] "No"

Q2: What was he motivation for Bertrand Russell’s Nobel Prize?

russell <- get_api(
  paste0(url, "/laureates?name=Bertrand%20Russell")
)

russell$laureates|>
  select(fullName.en, nobelPrizes) |>
  unnest(cols = nobelPrizes) |>
  select(fullName.en, awardYear, category.en, motivation.en) |>
  pull(motivation.en)
[1] "in recognition of his varied and significant writings in which he champions humanitarian ideals and freedom of thought"

Q3: Who won the Nobel Prize in Literature from 2010 to 2020?

lit <- get_api(
  paste0(url, "/nobelPrizes?nobelPrizeCategory=literature&awardYear=2010&yearTo=2020")
)

lit_winners <- lit$nobelPrizes|>
  select(awardYear, laureates)|>
  unnest(cols = laureates) |>
  transmute(
    year = awardYear,
    winner = coalesce(knownName.en, fullName.en)
  )

lit_winners
# A tibble: 31 × 2
   year  winner                
   <chr> <chr>                 
 1 1901  Jacobus H. van 't Hoff
 2 1901  Sully Prudhomme       
 3 1901  Henry Dunant          
 4 1901  Frédéric Passy        
 5 1901  Wilhelm Conrad Röntgen
 6 1901  Emil von Behring      
 7 1902  Emil Fischer          
 8 1902  Theodor Mommsen       
 9 1902  Élie Ducommun         
10 1902  Albert Gobat          
# ℹ 21 more rows

Q4: Overall male-to-female ratio, and the ratio in the 5 birth countries with the most laureates

all_laureates <- get_api(
  paste0(url, "/laureates?limit=2000")
)

laureates_df <- all_laureates$laureates|>
  transmute(
    name = coalesce(knownName.en, fullName.en),
    gender,
    birth_country = birth.place.country.en
  )

overall_ratio <- laureates_df |>
  filter(gender %in% c("male", "female")) |>
  count(gender) |>
  pivot_wider(names_from = gender, values_from = n) |>
  mutate(male_to_female_ratio = male / female)

overall_ratio
# A tibble: 1 × 3
  female  male male_to_female_ratio
   <int> <int>                <dbl>
1     67   923                 13.8
top5_countries <- laureates_df |>
  filter(!is.na(birth_country)) |>
  count(birth_country, sort = TRUE) |>
  slice_head(n = 5)

top5_countries
   birth_country   n
1            USA 296
2 United Kingdom  94
3        Germany  80
4         France  60
5          Japan  30
country_ratios <- laureates_df |>
  filter(birth_country %in% top5_countries$birth_country,
         gender %in% c("male", "female")) |>
  count(birth_country, gender) |>
  pivot_wider(
    names_from = gender,
    values_from = n,
    values_fill = 0
  )

if (!"male" %in% names(country_ratios)) {
  country_ratios$male <- 0L
}

if (!"female" %in% names(country_ratios)) {
  country_ratios$female <- 0L
}

country_ratios <- country_ratios |>
  mutate(
    total = male + female,
    male_to_female_ratio = if_else(female == 0, NA_real_, male / female),
    ratio_text = if_else(
      female == 0,
      "No women; 100% male",
      paste0(round(male / female, 2), ":1")
    )
  ) |>
  arrange(desc(total))

country_ratios |>
  arrange(male_to_female_ratio)
# A tibble: 5 × 6
  birth_country  female  male total male_to_female_ratio ratio_text         
  <chr>           <int> <int> <int>                <dbl> <chr>              
1 France              6    54    60                  9   9:1                
2 USA                18   278   296                 15.4 15.44:1            
3 Germany             3    77    80                 25.7 25.67:1            
4 United Kingdom      0    94    94                 NA   No women; 100% male
5 Japan               0    30    30                 NA   No women; 100% male
country_ratios |>
  select(birth_country, male, female) |>
  pivot_longer(cols = c(male, female),
               names_to = "gender",
               values_to = "count") |>
  ggplot(aes(x = birth_country, y = count, fill = gender)) +
  geom_col(position = "dodge") +
  coord_flip() +
  labs(
    title = "Male and Female Nobel Laureates by Birth Country",
    x = "Birth Country",
    y = "Count"
  ) +
  theme_minimal()

France has the strongest female representation relative to men, while the United States has the highest raw counts of both female and male laureates. In our sample, countries such as Japan and the U.K. had no female laureates. These results are based on birth_country, not the country where a laureate later lived or grew up.