For this assignment, I used the public APIs provided by the Nobel Prize organization to investigate areas of personal interest with the Nobel Prize. I think there are a lot of questions that sound interesting involving the Nobel Prize.
My chosen questions are:
What are the category breakdowns for Nobel Prizes won by people born in the US?
What countries have the greatest rate of improvement over the last decade compared to the previous decade?
What countries have the greatest rate of decline over the last decade compared to the previous decade?
How does the West and East compare to each other with regards to number of awards in the last decade?
In order to answer these questions, I will tidy and transform the data retrieved from the Nobel Prize APIs. From there, I will work on displaying the results in a clear format.
Reading in the data
I used fromJSON to read in the Nobel Prize data from their public API. I used unnest and filter to tidy the raw data.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.2.0 ✔ readr 2.2.0
✔ forcats 1.0.1 ✔ stringr 1.6.0
✔ ggplot2 4.0.2 ✔ tibble 3.3.1
✔ lubridate 1.9.5 ✔ tidyr 1.3.2
✔ purrr 1.2.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(jsonlite)
Attaching package: 'jsonlite'
The following object is masked from 'package:purrr':
flatten
What are the category breakdowns for Nobel Prizes won by people born in the US?
In order to answer this question, I filtered the tidy dataset by the birth.place.country.en variable for USA|United States. From there, I grouped them by nobelPrizes_category.en and used summarise to count the number of prizes by category and then used arrange to sort the data in descending order.
Next I removed the Nobel Prizes awarded with a region of Other and then used summarise to get a count of the Nobel Prizes for West and East region.
# Filter out 'Other' regions and summarize the annual prize countsregional_comparison <- regional_labeled %>%filter(region !="Other") %>%group_by(year, region) %>%summarise(prizes =n(), .groups ="drop")
Here I used ggplot to show the comparison data. Based on the bar chart, the West has a drastically higher count of Nobel Prizes in the last decade compared to the East.
library(ggplot2)ggplot(regional_comparison, aes(x =factor(year), y = prizes, fill = region)) +geom_col(position ="dodge") +theme_minimal() +labs(title ="Nobel Prizes: West vs. East (Last Decade)",x ="Year", y ="Total Prizes", fill ="Region")
Conclusion
The Nobel Prize public APIs can be used to gather data related to the Nobel Prize. It took some time intially to figure out how to read and tidy the raw data but once that was done, the actual work involved with answering the chosen questions was pretty straight forward.