library("dslabs")
## Warning: package 'dslabs' was built under R version 4.0.4
data(package="dslabs")
list.files(system.file("script", package = "dslabs"))
##  [1] "make-admissions.R"                   
##  [2] "make-brca.R"                         
##  [3] "make-brexit_polls.R"                 
##  [4] "make-death_prob.R"                   
##  [5] "make-divorce_margarine.R"            
##  [6] "make-gapminder-rdas.R"               
##  [7] "make-greenhouse_gases.R"             
##  [8] "make-historic_co2.R"                 
##  [9] "make-mnist_27.R"                     
## [10] "make-movielens.R"                    
## [11] "make-murders-rda.R"                  
## [12] "make-na_example-rda.R"               
## [13] "make-nyc_regents_scores.R"           
## [14] "make-olive.R"                        
## [15] "make-outlier_example.R"              
## [16] "make-polls_2008.R"                   
## [17] "make-polls_us_election_2016.R"       
## [18] "make-reported_heights-rda.R"         
## [19] "make-research_funding_rates.R"       
## [20] "make-stars.R"                        
## [21] "make-temp_carbon.R"                  
## [22] "make-tissue-gene-expression.R"       
## [23] "make-trump_tweets.R"                 
## [24] "make-weekly_us_contagious_diseases.R"
## [25] "save-gapminder-example-csv.R"
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.0.4
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.6     v dplyr   1.0.3
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(RColorBrewer)
library(readr)
library(ggplot2)
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
library(dplyr)
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 4.0.4
library(ggrepel)
## Warning: package 'ggrepel' was built under R version 4.0.4
library(highcharter)
## Warning: package 'highcharter' was built under R version 4.0.4
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## 
## Attaching package: 'highcharter'
## The following object is masked from 'package:dslabs':
## 
##     stars

Visualization of the Birth Rate.

The “nations” dataset contains socio-economic variables such as: population, GDP per capita, birth rate, neonatal mortality rate, and income level. Data are presented by country and region for the year.

I wanted to visualize how the birth rate differs across world regions and how the number of births has been changed between 1990 and 2014.

The multivariable graph contains: on the X-axis are the years, on the Y-axis are the regions, and the number of births is converted to colors.

The regions are placed on the plot in order of changing the birth rate. The Middle East & North Africa region has the biggest number of births and is located at the bottom. Then following the regions with the average number of births. On the top, there is the North America region with the smallest number of births.

I used scale_fill_gradientn to show contrast between different years in 1990 - 2014. For instance, the largest difference in the birth rate during 1990 - 2014 was in Middle East & North Africa and Latin America & Caribbean regions. The number of births in North America almost did not change over this period.

I chose the colors of “PRGn” palette because they are associated with the topic of fertility.

nations <- read_csv("nations.csv")
## 
## -- Column specification --------------------------------------------------------
## cols(
##   iso2c = col_character(),
##   iso3c = col_character(),
##   country = col_character(),
##   year = col_double(),
##   gdp_percap = col_double(),
##   population = col_double(),
##   birth_rate = col_double(),
##   neonat_mortal_rate = col_double(),
##   region = col_character(),
##   income = col_character()
## )
nations %>%
  mutate(region = reorder(region, birth_rate)) %>% #reorder regions by the means of birth rates
  ggplot(aes(year, region,  fill = birth_rate)) +
  geom_tile(color = "grey50") +
  scale_x_continuous(expand=c(0,0)) +
  scale_fill_gradientn(colors = brewer.pal(11, "PRGn"), trans = "sqrt") +
  geom_vline(xintercept = 1989, col = "black") +
  theme_classic() +  theme(panel.grid = element_blank()) +
  ggtitle('The Birth Rate in Regions in 1990 - 2014') +
  ylab("Regions") +
  xlab("Years")

Visualization of the probability of death by gender.

For doing this analysis, I used the dataset from DCLab library.

data('death_prob')
view(death_prob)

The visualization has been done by gender, so I chose the colorset “Set1” from brewer.pal. The red and blue colors from this set are clear and contrasting.

cols <- brewer.pal(3, "Set1")

I visualized the general trend of probability of death by gender. I found out from the plot, that probabilities for men and women before 60 years of age are almost the same.

highchart() %>%
  hc_add_series(data = death_prob, type = "area", hcaes(x = age, y = prob, group = sex))  %>%
  hc_colors(cols) %>% # applying colorset
  hc_xAxis(title = list(text="Age")) %>%
  hc_yAxis(title = list(text="Death Probabylity")) %>%
  hc_plotOptions(series = list(marker = list(symbol = "circle"))) %>%
  hc_legend(align = "right", 
            verticalAlign = "top") %>% # changing the legend position
  hc_title(
    text = "The Probability of Death by Gender",
    align = "center",
    style = list(color = '#336196')) # name the title, changing the title position, color the title

I created the new dataset for the age group of 60+ to look closely at the difference in probabilities for this age group.

  death_prob1 <- death_prob %>%
 filter(age > 60) 

For the new dataset for ages of 60+ I created the visualization with a line type of graph and a chalk theme to highlight the contrast.

highchart() %>%
  hc_add_series(data = death_prob1, type = "line", hcaes(x = age, y = prob, group = sex))  %>%
  hc_colors(cols) %>%
  hc_xAxis(title = list(text="Age")) %>%
  hc_yAxis(title = list(text="Death Probabylity")) %>%
  hc_plotOptions(series = list(marker = list(symbol = "circle"))) %>%
  hc_legend(align = "right", 
            verticalAlign = "top") %>%
   hc_title(
    text = "The probability of Death by Gender ages 60+",
    align = "center") %>%
  hc_add_theme(hc_theme_chalk()) 

To summarize:

  1. the difference in probability of death by gender is almost the same and becomes a smaller difference after the age of 60;
  2. the probability of death for men is higher for all ages starting from 60+ years of age.