#Research Goal
“The Nobel Prize is a set of annual international awards bestowed in several categories by Swedish and Norwegian institutions in recognition of academic, cultural, or scientific advances.” - Wikipedia.
The aim of this project is to create a visualization about Nobel Prize Winners and their different publications. This project will be using Rstudio. The Dataset can be found in TidyTuesday’s Github(https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-05-14). I hope you enjoy this. :)
##What is the Nobel Prize?
“On 27 November 1895, Alfred Nobel signed his last will and testament, giving the largest share of his fortune to a series of prizes in physics, chemistry, physiology or medicine, literature and peace – the Nobel Prizes. In 1968, Sveriges Riksbank (Sweden’s central bank) established The Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel. Learn more about the Nobel Prize laureates here.” - NobelPrize.org. Alfred Nobel gave SEK 31 million of his own money, around SEK 1,7 billion now, for investments in “safe securities”, where the income will be distributed to the future winners. Nobel Prize is created for those with the greatest benefit to mankind.
For those worthy, the nobel prize bounty can be divided from two to three persons. The Nobel Prize started around 1901 up til now, my data set will be exploring the 1901 winners and their publications up til 2016. There will be some years when the Nobel Prizes were not awarded due to the World War I (1914-1918) and II (1939-1945), and the prizes here will be reserved for the future awards.
##Why are they called “Nobel Laureates”?
Nobel Laureate
Apparently, this stems from Greek Mythology, and a Greek god named Apollo. According to Wikipedia, Apollo is one of the most important and complex god in Greek mythology and he is recognized as a god of archery, music and dance, truth and prophecy, healing and diseases, the Sun and light, poetry, and more. Apollo wears a laurel wreath on his head, it’s made of leaves and branches of the bay laurel turned into a circular crown. In Ancient Greece, this laurel wreath is given and awarded to the champions and victors of the athletic and literary competitions.
##What will the Nobel Laureates receive once they are chosen?
Nobel Diploma
First they will get a Nobel Prize diploma, made especially for them. They will get unique artworks created by the best Swedish and Norweigian artists and calligraphers. All handmade and fully customized. Second they will get a Nobel Prize Medal handmade with 18 carat recycled gold. This medal has the portrait of Alfred Nobel, and his lifetime years. Third, they will be getting the Nobel Prize document indicating the amount they will get.
The aim of this project is to create a visualization about Nobel Prize Winners and their different publications. This project will be using RstudIO. The Dataset can be found in TidyTuesday’s Github(https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-05-14).
#Setting up the environment
##Libraries
library(extrafont)
library(data.table)
library(tidytuesdayR)
library(lubridate)
library(ggplot2)
#library(hrbrthemes)
library(dplyr)
library(tidyverse)
library(scales)
library(readr)
library(igraph)
library(ggraph)
library(snakecase)
library(tm)
library(factoextra)
library(rpubs)
library(kableExtra)
library(gganimate)
library(tidytext)
library(viridis)
library(widyr)
library(tidygraph)
library(plotly)
library(modelsummary)
library(tidylo)
library(ggthemes)
##Loading the Data
#Exploring the data
Below is the list of the different variables, their class, and a short description found in the two data sets.
For the Nobel Prize Winners, there are 969 rows and 18 columns.
For the Nobel Prize Publications, there are 93,394 rows and 11 columns.
count_rows_win <- count(nobel_winner_all_pubs)
count_cols_win <- ncol(nobel_winner_all_pubs)
#Defining the theme
##Defining the theme
theme_nobel <- function(){
font <- "Times New Roman" #assign font family up front
theme_wsj() %+replace% #replace elements we want to change
theme(
legend.position = "top", #legend position
panel.background = element_blank(), # panel background to NA
panel.border = element_blank(), # setting panel border
#grid elements
panel.grid.major = element_blank(), #remove major gridlines
panel.grid.minor = element_blank(), #remove minor gridlines
axis.ticks = element_blank(), #remove axis ticks
#plot elements
plot.title = element_text( #title
family = font, #set font family
size = 12L, #set font size
face = 'bold', #bold typeface
hjust = 0.5, #left align
vjust = 2), #raise slightly
plot.caption = element_text( #caption
family = font, #font family
size = 8, #font size
hjust = 0.5), #right align
#axis elements
axis.text = element_text( #axis text
family = font, #axis famuly
size = 10), #font size
axis.title = element_text( #axis titles
family = font, #font family
size = 10), #font size
)
}
#Data Cleaning
To clean the data, I transformed the birth_date column to a year, month, day format. I also did the same for the death_date column. I also added the birth_year column from the birth_date. I used this to create the age column which is the difference of the prize_year and the birth_year.
# # creating new data table of countries to be used for further mapping
# countries <- nobel_winners[, .(count = .N), by = .(birth_country, prize_year)]
#
# # extracting each country and the year their content added to the Netflix
# countries <- countries %>%
# ungroup()%>%
# separate_rows(birth_country,sep = ",")%>%
# mutate(
# birth_country=str_trim(birth_country)
# )%>%
# group_by(prize_year,birth_country)%>%
# summarize(
# count=n()
# )%>%
# ungroup()%>%
# filter(birth_country!='NA',birth_country!="" )%>%
# arrange(prize_year,desc(count))
#
#
# # Create function to drop null values
# row.has.na <- apply(countries, 1, function(x){any(is.na(x))})
# sum(row.has.na)
#Descriptive Summary
23.4% of the Nobel Prizes awarded went to Medicine. Followed by Physics (22.9%), Chemistry (20%), Peace (13%), Literature (12%),and Economics (5%).
| Category | N | Percent |
|---|---|---|
| Chemistry | 194 | 20.02 |
| Economics | 83 | 8.57 |
| Literature | 113 | 11.66 |
| Medicine | 227 | 23.43 |
| Peace | 130 | 13.42 |
| Physics | 222 | 22.91 |
| Prize Sharing | N | Percent |
|---|---|---|
| 1/1 | 351 | 36.22 |
| 1/2 | 328 | 33.85 |
| 1/3 | 225 | 23.22 |
| 1/4 | 65 | 6.71 |
| Laureate Type | N | Percent |
|---|---|---|
| Individual | 939 | 96.90 |
| Organization | 30 | 3.10 |
| Birth Country | N | Percent |
|---|---|---|
| Argentina | 4 | 0.41 |
| Australia | 11 | 1.14 |
| Austria | 15 | 1.55 |
| Austria-Hungary (Austria) | 2 | 0.21 |
| Austria-Hungary (Bosnia and Herzegovina) | 1 | 0.10 |
| Austria-Hungary (Croatia) | 1 | 0.10 |
| Austria-Hungary (Czech Republic) | 4 | 0.41 |
| Austria-Hungary (Hungary) | 3 | 0.31 |
| Austria-Hungary (Poland) | 1 | 0.10 |
| Austria-Hungary (Slovenia) | 1 | 0.10 |
| Austria-Hungary (Ukraine) | 1 | 0.10 |
| Austrian Empire (Austria) | 2 | 0.21 |
| Austrian Empire (Czech Republic) | 1 | 0.10 |
| Austrian Empire (Italy) | 1 | 0.10 |
| Bavaria (Germany) | 1 | 0.10 |
| Belgium | 9 | 0.93 |
| Bosnia (Bosnia and Herzegovina) | 1 | 0.10 |
| Brazil | 1 | 0.10 |
| British India (Bangladesh) | 1 | 0.10 |
| British India (India) | 1 | 0.10 |
| British Mandate of Palestine (Israel) | 5 | 0.52 |
| British Protectorate of Palestine (Israel) | 1 | 0.10 |
| British West Indies (Saint Lucia) | 1 | 0.10 |
| Bulgaria | 1 | 0.10 |
| Burma (Myanmar) | 1 | 0.10 |
| Canada | 18 | 1.86 |
| Chile | 2 | 0.21 |
| China | 12 | 1.24 |
| Colombia | 2 | 0.21 |
| Costa Rica | 1 | 0.10 |
| Crete (Greece) | 1 | 0.10 |
| Cyprus | 1 | 0.10 |
| Czechoslovakia (Czech Republic) | 1 | 0.10 |
| Denmark | 11 | 1.14 |
| East Friesland (Germany) | 1 | 0.10 |
| East Timor | 2 | 0.21 |
| Egypt | 6 | 0.62 |
| Faroe Islands (Denmark) | 1 | 0.10 |
| Finland | 2 | 0.21 |
| France | 53 | 5.47 |
| Free City of Danzig (Poland) | 1 | 0.10 |
| French Algeria (Algeria) | 3 | 0.31 |
| German-occupied Poland (Poland) | 1 | 0.10 |
| Germany | 70 | 7.22 |
| Germany (France) | 3 | 0.31 |
| Germany (Poland) | 10 | 1.03 |
| Germany (Russia) | 3 | 0.31 |
| Gold Coast (Ghana) | 1 | 0.10 |
| Guadeloupe Island | 1 | 0.10 |
| Guatemala | 2 | 0.21 |
| Hesse-Kassel (Germany) | 1 | 0.10 |
| Hungary | 6 | 0.62 |
| Hungary (Slovakia) | 1 | 0.10 |
| Iceland | 1 | 0.10 |
| India | 7 | 0.72 |
| India (Pakistan) | 3 | 0.31 |
| Iran | 1 | 0.10 |
| Ireland | 5 | 0.52 |
| Italy | 18 | 1.86 |
| Japan | 29 | 2.99 |
| Java, Dutch East Indies (Indonesia) | 1 | 0.10 |
| Kenya | 1 | 0.10 |
| Korea (South Korea) | 2 | 0.21 |
| Liberia | 2 | 0.21 |
| Lithuania | 1 | 0.10 |
| Luxembourg | 2 | 0.21 |
| Madagascar | 1 | 0.10 |
| Mecklenburg (Germany) | 1 | 0.10 |
| Mexico | 3 | 0.31 |
| Morocco | 2 | 0.21 |
| Netherlands | 19 | 1.96 |
| New Zealand | 3 | 0.31 |
| Nigeria | 1 | 0.10 |
| Northern Ireland | 5 | 0.52 |
| Norway | 13 | 1.34 |
| Ottoman Empire (Republic of Macedonia) | 1 | 0.10 |
| Ottoman Empire (Turkey) | 1 | 0.10 |
| Pakistan | 1 | 0.10 |
| Persia (Iran) | 1 | 0.10 |
| Peru | 1 | 0.10 |
| Poland | 6 | 0.62 |
| Poland (Belarus) | 1 | 0.10 |
| Poland (Lithuania) | 1 | 0.10 |
| Poland (Ukraine) | 1 | 0.10 |
| Portugal | 3 | 0.31 |
| Prussia (Germany) | 7 | 0.72 |
| Prussia (Poland) | 6 | 0.62 |
| Prussia (Russia) | 1 | 0.10 |
| Romania | 5 | 0.52 |
| Russia | 20 | 2.06 |
| Russian Empire (Azerbaijan) | 1 | 0.10 |
| Russian Empire (Belarus) | 2 | 0.21 |
| Russian Empire (Finland) | 3 | 0.31 |
| Russian Empire (Latvia) | 1 | 0.10 |
| Russian Empire (Lithuania) | 1 | 0.10 |
| Russian Empire (Poland) | 5 | 0.52 |
| Russian Empire (Russia) | 2 | 0.21 |
| Russian Empire (Ukraine) | 2 | 0.21 |
| Saint Lucia | 1 | 0.10 |
| Schleswig (Germany) | 2 | 0.21 |
| Scotland | 9 | 0.93 |
| South Africa | 9 | 0.93 |
| Southern Rhodesia (Zimbabwe) | 1 | 0.10 |
| Spain | 7 | 0.72 |
| Sweden | 30 | 3.10 |
| Switzerland | 17 | 1.75 |
| Taiwan | 1 | 0.10 |
| Tibet (People’s Republic of China) | 1 | 0.10 |
| Trinidad | 1 | 0.10 |
| Turkey | 2 | 0.21 |
| Tuscany (Italy) | 1 | 0.10 |
| Ukraine | 1 | 0.10 |
| Union of Soviet Socialist Republics (Belarus) | 1 | 0.10 |
| Union of Soviet Socialist Republics (Russia) | 4 | 0.41 |
| United Kingdom | 88 | 9.08 |
| United States of America | 276 | 28.48 |
| Venezuela | 1 | 0.10 |
| Vietnam | 1 | 0.10 |
| Württemberg (Germany) | 1 | 0.10 |
| West Germany (Germany) | 5 | 0.52 |
| Yemen | 1 | 0.10 |
| Gender | N | Percent |
|---|---|---|
| Female | 50 | 5.16 |
| Male | 893 | 92.16 |
| Category | N | Percent |
|---|---|---|
| chemistry | 42657 | 45.67 |
| medicine | 29233 | 31.30 |
| physics | 21504 | 23.03 |
| Min | Max | Mean | Median | N | |
|---|---|---|---|---|---|
| Age | 17.00 | 90.00 | 59.48 | 60.00 | 938 |
#Distributions ## Distribution for Laureate Age Upon Awarding
## function (data = NULL, mapping = aes(), ..., environment = parent.frame())
## {
## UseMethod("ggplot")
## }
## <bytecode: 0x7fb3e5c7d100>
## <environment: namespace:ggplot2>
#How has the awarding of Nobel Prize Changed since 1901 til now? Here we can see that when the Nobel Prize Started, usually 1-2 awards was given per category. During 1930’s to 1960’s we started seeing more awards given , around 2-3 awards in the various categories. And for the early 2000’s we started seeing even more awards, around 4-5 awards.
#How has the awarding of Nobel Price Changed since 1901?
nobel_winners <- as.data.table(nobel_winners)
a1<-ggplot(nobel_winners[, .(count = .N), by = .(category, prize_year)], aes(x=prize_year, y=count, group = category, color=category)) +
geom_line() +
geom_point() +
scale_color_viridis(discrete = TRUE) +
ggtitle("Nobel Prize Per Year") +
ylab("Number of Awards") +
xlab("Prize") +
labs(color = "Category", group = "Category") +
theme(legend.position = "top", panel.background = element_rect(fill = NA),
panel.border = element_blank(), axis.text=element_text(size=8),
plot.title = element_text(size = 12L, face = "bold", hjust = 0.5) ) +
transition_reveal(prize_year) +
theme_nobel()+scale_colour_wsj("colors6") +
enter_grow() +
enter_fade() +
ease_aes("back-in")
anim_save(filename="a1.gif", animation=a1, "/Users/abigailchristinechen/data_visualization_2")
##How has the awarding of Nobel Price for Medicine Changed since 1901? We can see that some Nobel Prizes were not awarded due to the World War I (1914-1918). During the 1940’s the awards and development for medicine started picking up especially for the early 2000’s.
nobel_winners <- as.data.table(nobel_winners)
a2<-ggplot(nobel_winners[category == "Medicine"][, .(count = .N), by = .(category, prize_year)], aes(x=prize_year, y=count, group = category, color=category)) +
geom_line() +
geom_point() +
scale_color_viridis(discrete = TRUE) +
ggtitle("Nobel Prize (Medicine) Per Year") +
ylab("Number of Awards") +
xlab("Laureates in Medicine per year") +
labs(color = "Category", group = "Category") +
theme(legend.position = "top", panel.background = element_rect(fill = NA),
panel.border = element_blank(), axis.text=element_text(size=8),
plot.title = element_text(size = 12L, face = "bold", hjust = 0.5) ) +
transition_reveal(prize_year) +
theme_nobel()+scale_colour_wsj("colors6") +
enter_grow() +
enter_fade() +
ease_aes("back-in")
anim_save(filename="a2.gif", animation=a2, "/Users/abigailchristinechen/data_visualization_2")
##How has the awarding of Nobel Price for Chemistry Changed since 1901? We can see that some Nobel Prizes were not awarded due to the World War I (1914-1918). During the 1920’s to 1970’s the awards and development for medicine are stable receiving 2-3 awards. In the early 2000’s, we’ve been seeing years with 5 awards.
nobel_winners <- as.data.table(nobel_winners)
a3<-ggplot(nobel_winners[category == "Chemistry"][, .(count = .N), by = .(category, prize_year)], aes(x=prize_year, y=count, group = category, color=category)) +
geom_line() +
geom_point() +
scale_color_viridis(discrete = TRUE) +
ggtitle("Nobel Prize (Chemistry) Per Year") +
ylab("Number of Awards") +
xlab("Laureates in Chemistry per year") +
labs(color = "Category", group = "Category") +
theme(legend.position = "top", panel.background = element_rect(fill = NA),
panel.border = element_blank(), axis.text=element_text(size=8),
plot.title = element_text(size = 12L, face = "bold", hjust = 0.5) ) +
transition_reveal(prize_year) +
theme_nobel()+scale_colour_wsj("colors6") +
enter_grow() +
enter_fade() +
ease_aes("back-in")
anim_save(filename="a3.gif", animation=a3, "/Users/abigailchristinechen/data_visualization_2")
##How has the awarding of Nobel Price for Physics Changed since 1901? We can see that after the World War 2, the yearly awards for the field of Physics increased to 3, and like other fields picked up on the late 2000’s too.
nobel_winners <- as.data.table(nobel_winners)
a4<-ggplot(nobel_winners[category == "Physics"][, .(count = .N), by = .(category, prize_year)], aes(x=prize_year, y=count, group = category, color=category)) +
geom_line() +
geom_point() +
scale_color_viridis(discrete = TRUE) +
ggtitle("Nobel Prize (Physics) Per Year") +
ylab("Number of Awards") +
xlab("Laureates in Physics per year") +
labs(color = "Category", group = "Category") +
theme(legend.position = "top", panel.background = element_rect(fill = NA),
panel.border = element_blank(), axis.text=element_text(size=8),
plot.title = element_text(size = 12L, face = "bold", hjust = 0.5) ) +
transition_reveal(prize_year) +
theme_nobel()+scale_colour_wsj("colors6") +
enter_grow() +
enter_fade() +
ease_aes("back-in")
anim_save(filename="a4.gif", animation=a4, "/Users/abigailchristinechen/data_visualization_2")
##How has the awarding of Nobel Price for Peace Changed since 1901? Here, we can see that the period after the World War 1, received more constant Peace awards. The same also happened after World War 2, where Peace awards were awarded more.
nobel_winners <- as.data.table(nobel_winners)
a5<-ggplot(nobel_winners[category == "Peace"][, .(count = .N), by = .(category, prize_year)], aes(x=prize_year, y=count, group = category, color=category)) +
geom_line() +
geom_point() +
scale_color_viridis(discrete = TRUE) +
ggtitle("Nobel Prize (Peace) Per Year") +
ylab("Number of Awards") +
xlab("Laureates in Peace per year") +
labs(color = "Category", group = "Category") +
theme(legend.position = "top", panel.background = element_rect(fill = NA),
panel.border = element_blank(), axis.text=element_text(size=8),
plot.title = element_text(size = 12L, face = "bold", hjust = 0.5) ) +
transition_reveal(prize_year) +
theme_nobel()+scale_colour_wsj("colors6") +
enter_grow() +
enter_fade() +
ease_aes("back-in")
anim_save(filename="a5.gif", animation=a5, "/Users/abigailchristinechen/data_visualization_2")
##How has the awarding of Nobel Price for Literature Changed since 1901? In Literature, we can see that there were many consecutive years where 1 award was just given. And then there were just 4 years when 5 awards were given at the same time.
nobel_winners <- as.data.table(nobel_winners)
a6<-ggplot(nobel_winners[category == "Literature"][, .(count = .N), by = .(category, prize_year)], aes(x=prize_year, y=count, group = category, color=category)) +
geom_line() +
geom_point() +
scale_color_viridis(discrete = TRUE) +
ggtitle("Nobel Prize (Literature) Per Year") +
ylab("Number of Awards") +
xlab("Laureates in Literature per year") +
labs(color = "Category", group = "Category") +
theme(legend.position = "top", panel.background = element_rect(fill = NA),
panel.border = element_blank(), axis.text=element_text(size=8),
plot.title = element_text(size = 12L, face = "bold", hjust = 0.5) ) +
transition_reveal(prize_year) +
theme_nobel()+scale_colour_wsj("colors6") +
enter_grow() +
enter_fade() +
ease_aes("back-in")
anim_save(filename="a6.gif", animation=a6, "/Users/abigailchristinechen/data_visualization_2")
##How has the awarding of Nobel Price for Economics Changed since 1901? Here, we can see that the award for economics started around 1970’s. Then, the awards for Economics started picking up to 3 awards per year during the 1990’s.
nobel_winners <- as.data.table(nobel_winners)
a7<-ggplot(nobel_winners[category == "Economics"][, .(count = .N), by = .(category, prize_year)], aes(x=prize_year, y=count, group = category, color=category)) +
geom_line() +
geom_point() +
scale_color_viridis(discrete = TRUE) +
ggtitle("Nobel Prize (Economics) Per Year") +
ylab("Number of Awards") +
xlab("Laureates in Economics per year") +
labs(color = "Category", group = "Category") +
theme(legend.position = "top", panel.background = element_rect(fill = NA),
panel.border = element_blank(), axis.text=element_text(size=8),
plot.title = element_text(size = 12L, face = "bold", hjust = 0.5) ) +
transition_reveal(prize_year) +
theme_nobel()+scale_colour_wsj("colors6") +
enter_grow() +
enter_fade() +
ease_aes("back-in")
anim_save(filename="a7.gif", animation=a7, "/Users/abigailchristinechen/data_visualization_2")
Here we can see that the average age of the Laureates ranges from 50-60 years old during the 1900’s-1960’s. After that the age of Laureates got higher from 60-75 years old.
#How has the average age changed across the years?
a8 <- ggplot(nobel_winners, aes(prize_year, age, group = prize_year, fill = age )) +
xlab("Prize year") +
ylab("Age") +
labs(fill = "Age", group = "Prize Year", title = "Laureate Age across the Years") +
geom_boxplot() +
theme(legend.position = "top", panel.background = element_rect(fill = NA),
panel.border = element_blank(), axis.text=element_text(size=8),
plot.title = element_text(size = 12L, face = "bold", hjust = 0.5) ) +
scale_fill_viridis() +
transition_reveal(prize_year)+
theme_nobel()+scale_colour_wsj("colors6") +
enter_grow() +
enter_fade() +
ease_aes("back-in")
anim_save(filename="a8.gif", animation=a8, "/Users/abigailchristinechen/data_visualization_2")
##How many females are awarded across the years? Here, we can see that not a lot of females receive awards, but in the recent years it has slowly been increasing.
#How has the average age changed across the years?
a9 <- ggplot(nobel_winners[gender == "Female", .(count = .N), by = .(prize_year)], aes(x=prize_year, y=count)) +
geom_point() +
ggtitle("Female Laureates Per Year") +
ylab("Number of Awards") +
xlab("Year") +
labs(color = "Gender", group = "Gender") +
theme(legend.position = "top", panel.background = element_rect(fill = NA),
panel.border = element_blank(), axis.text=element_text(size=8),
plot.title = element_text(size = 12L, face = "bold", hjust = 0.5) ) +
theme_nobel()+scale_colour_wsj("colors6")
a9
##Which countries received the most awards?
worldmap <- map_data('world')
library(tidygeocoder)
geocodes <- tidygeocoder::geocode(df2, 'birth_country')
ggplot() +
geom_polygon(
data = worldmap,
aes(x = long, y = lat, group = group),
fill = 'gray', color = 'black') +
geom_point(
data = geocodes,
aes(long, lat, size = V1),
color = 'orange') +
theme_nobel() +
theme(legend.position = 'top') +
xlab('') + ylab('') +
coord_fixed(1.3)