#install.packages("dslabs")
#load the 'tidyverse' library for ggplot2 and dplyr 
#load the 'plotly' library to create an interactive graph

library("dslabs")
library("tidyverse")
library(plotly)
#selece the dataset 'gapminder' in the 'dslabs' package.
data(gapminder)                                                                   
#data cleaning
#delete the NA values using !is.na()
#create a new variable using mutate() 
df <- gapminder %>% filter(year == "2011") %>%
  filter(!is.na(infant_mortality) & !is.na(gdp) & !is.na(population) & !is.na(fertility)) %>%
  mutate(GDP_per_capita = round(gdp/population))
 
head(df)
##               country year infant_mortality life_expectancy fertility
## 1             Albania 2011             14.3            77.4      1.75
## 2             Algeria 2011             22.8            76.1      2.83
## 3              Angola 2011            106.8            58.1      6.10
## 4 Antigua and Barbuda 2011              7.2            75.9      2.12
## 5           Argentina 2011             12.7            76.0      2.20
## 6             Armenia 2011             15.3            73.5      1.50
##   population          gdp continent          region GDP_per_capita
## 1    2886010   6321690864    Europe Southern Europe           2190
## 2   36717132  81143448101    Africa Northern Africa           2210
## 3   21942296  27013935821    Africa   Middle Africa           1231
## 4      88152    801787943  Americas       Caribbean           9096
## 5   41655616 472935255184  Americas   South America          11353
## 6    2967984   4290990647      Asia    Western Asia           1446
#load libraries to apply different color palettes and themes
#The viridis palette is colorblind friendly.
library(hrbrthemes)
library(viridis)
# more information about ipsum theme:  https://www.rdocumentation.org/packages/hrbrthemes/versions/0.8.0/topics/theme_ipsum
# About the bubble chart:  https://r-graph-gallery.com/bubble_chart_interactive_ggplotly.html
# The default tooltip doesn't include the country names so add text =c() in ggplot(aes())
# In ggplot(aes(.)), the text() argument is used to modify contents in tooltips
# use scale_color_viridis instead of scale_fill_viridis to color discrete dots
# add ggplotly() to make an interactive graph 
# tooltip in ggplotly() displays what countries each bubble indicates
# delete the legend adding legend.position ="none" in theme()
# apply ipsum theme in 'hrbrthemes' library and viridis palatte in viridis library for a pretteir graph.
# In viridis, set discrete = True to generate a discrete palette.

gr <- df %>%
  ggplot(aes(x= fertility, y = infant_mortality, size = GDP_per_capita,
             text = paste("Country:", country, 
                          "\nGDP per capita:", GDP_per_capita, 
                          "\nInfant Motality:", infant_mortality, 
                          "\nFertility:", fertility))) +
  geom_point(alpha = 0.6, aes(color = continent)) +
  labs(title = "Fertility and Infant Mortality by Continents(2011)" , 
       subtitle = "Poverty leads to higher birth rate?", 
       x= "Fertility (Average number of children per woman)", 
       y = "Infant Mortality (per 1000)") +
  theme_ipsum(plot_title_family ="Times New Roman", 
              plot_title_face = "bold", 
              axis_title_size = 13, 
              strip_text_family = "Georgia") +
  facet_wrap( ~continent) +
  theme(legend.position ="none") +
  #scale_color_ipsum() +
  scale_color_viridis(discrete = T)

ggplotly(gr, tooltip ="text")
# After applying the ipsum theme, the subtitle disappears.
I selected the 'gapminder' dataset in the dslab package. We've seen a negative correlation between fertility and life expectancy in the last class. I created a graph for each continent showing a correlation between infant death and childbirth. I wanted to analyze the most recent years, but I chose 2011 with all information because the years 2012 through 2016 don't have nearly all infant_mortality and GDP values. I used facet_wrap () to separate the graphs by each continent so that the differences between continents could be compared. And I used ggplotly() to know which country each point represents. When you hover the mouse arrow over each point, a text box appears that displays information about each country. And I added a 'GDP per capita' variable, as infant mortality rates are expected to be related to the state of a country's economy. Gdp per capita is shown as the size of each bubble. This allows us to explore the relationship between the three variables in a single graph.

As shown in the graph, fertility and infant mortality rates are strongly correlated. Countries in Africa range from countries with low fertility rates to high ones. We can see that countries with lower fertility rates also have lower infant mortality rates. In the Americas, Asia, and Oceania, it is shown that the bubbles are mostly centered around the fertility rate 2. In Europe, it shows that fertility rates are very low and infant mortality rates are also very low.  Let's also look at the relationships to GDP per capita. As mentioned earlier, GDP per capita is expressed as the size of the bubble. Countries with the lowest infant mortality rate mostly have larger bubbles.  

In summary, low-income countries have high infant mortality rates due to poor health care facilities. Continuing high fertility rates without lowering the infant mortality rate can be a serious problem. All countries, especially developed countries, must share health care resources as humanitarian support and help developing countries reduce infant mortality.