The four questions are about 1). gender in the proportion, 2). age(age gap, Average age, etc), 3). Which Nobel Prize category has the largest number of laureates, and 4). Which laureates were born in one country but awarded as citizens of another?
Retrieve Nobel Prize data via an API. After converting the raw data and completing data cleaning and organization, conduct exploratory data analysis to answer four specific questions.
2. What data challenges do I anticipate?
The data covers a long period, and the completeness of records may vary across different time periods; therefore, some variables may have missing values or incomplete information.
Countries may have undergone name changes, divisions, or mergers over time, which can affect further organization and comparative analysis.
3. Load the data to work with APIs
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.1 ✔ stringr 1.5.2
✔ ggplot2 4.0.2 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(httr2)library(jsonlite)
Attaching package: 'jsonlite'
The following object is masked from 'package:purrr':
flatten
# A tibble: 6 × 7
award_year category id laureate_name motivation prize_amount
<int> <chr> <chr> <chr> <chr> <int>
1 1901 Chemistry 160 Jacobus H. va… in recogn… 150782
2 1901 Literature 569 Sully Prudhom… in specia… 150782
3 1901 Peace 462 Henry Dunant for his h… 150782
4 1901 Peace 463 Frédéric Passy for his l… 150782
5 1901 Physics 1 Wilhelm Conra… in recogn… 150782
6 1901 Physiology or Medicine 293 Emil von Behr… for his w… 150782
# ℹ 1 more variable: prize_amount_adjusted <int>
gender_count |>ggplot(aes(x = gender, y = n, fill = gender)) +geom_col(width =0.5) +geom_text(aes(label = n), vjust =-0.3) +labs(title ="Number of Nobel Prize Laureates by Gender",x ="Gender",y ="Number of Laureates") +theme_minimal()
1. How are Nobel Prize laureates distributed by gender?
The results show that male laureates appear much more frequently than female laureates in the Nobel Prize data.
# A tibble: 2 × 4
laureate_name award_year category age_at_award
<chr> <int> <chr> <int>
1 Malala Yousafzai 2014 Peace 17
2 John B. Goodenough 2019 Chemistry 97
# Average age by decadeaverage_age_decade <- age_info |>mutate(decade =floor(award_year /10) *10) |>group_by(decade) |>summarise(average_age =mean(age_at_award),.groups ="drop")
ggplot( average_age_decade, aes(x =factor(decade), y = average_age,fill = decade)) +geom_col(width =0.6) +geom_text(aes(label =round(average_age, 1)), vjust =-0.3) +labs(title ="Average Age of Nobel Prize Laureates by Decade",x ="Decade",y ="Average Age at Award" ) +theme_minimal()
2. What are the youngest and oldest ages at which Nobel laureates received their prizes, and how has the average age at award changed by decade?
The youngest laureate in this data is Malala Yousafzai, who received the Peace prize in 2014 at age 17. The oldest laureate is John B. Goodenough, who received the Chemistry prize in 2019 at age 97.
The bar chart shows that the average age at award generally increased over time. Earlier decades were mostly in the 50s, while more recent decades are mostly in the 60s, with the highest average age appearing in the 2020s at about 68.5.
ggplot( category_count, aes(x =reorder(category, number_of_laureates), y = number_of_laureates,fill = category)) +geom_col(width =0.7) +geom_text(aes(label = number_of_laureates), hjust =-0.3) +coord_flip() +labs(title ="Number of Nobel Prize Laureates by Category",x ="Category",y ="Number of Laureates" ) +theme_minimal()
3. Which Nobel Prize category has the most laureates?
The category with the most laureates is Physiology or Medicine, with 231 laureates. The next highest categories are Physics with 224 laureates and Chemistry with 196 laureates. The category with the fewest laureates is Economic Sciences, with 99 laureates.
ggplot(country_pairs,aes(x =reorder(country_pair, n), y = n, fill = n)) +geom_col(width =0.6) +geom_text(aes(label = n), hjust =-0.2) +coord_flip() +labs(title ="Top 10 Birth Country and Affiliation Country Pairs",x ="Birth Country to Affiliation Country",y ="Number of Laureates" ) +theme_minimal()
4. Which birth-country and affiliation-country pairs appear most often among Nobel laureates?
The most common pairs are United Kingdom to USA and Germany to USA, both with 21 laureates.