TFR

Harold Nelson

Introduction: The Questions

In his talk, Hans Rosling noted that some population based outcomes had a general trend of improvement up to 2003. There was also a convergence.

We want to look past 2003 and see if these trends have continued.

Setup

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(janitor)
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(plotly)
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

Get Data

TFR <- read_csv("API_SP.DYN.TFRT.IN_DS2_en_csv_v2_4898766.csv", na = "empty", skip = 4) %>% 
  janitor::clean_names() 
## New names:
## • `` -> `...67`
## Warning: One or more parsing issues, see `problems()` for details
## Rows: 266 Columns: 67
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (4): Country Name, Country Code, Indicator Name, Indicator Code
## dbl (61): 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, ...
## lgl  (2): 2021, ...67
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Pivot

We need year and TFR to be variables. How do we do this?

Solution

TFR_long = TFR %>% 
  pivot_longer(cols = x1960:x2021,
               names_to = "year",
               values_to = "TFR") %>% 
  select(country_name,country_code,year,TFR) %>% 
  mutate(year = parse_number(year)) %>% 
  filter(year < 2021)
glimpse(TFR_long)
## Rows: 16,226
## Columns: 4
## $ country_name <chr> "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Ar…
## $ country_code <chr> "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "…
## $ year         <dbl> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 196…
## $ TFR          <dbl> 4.820, 4.655, 4.471, 4.271, 4.059, 3.842, 3.625, 3.417, 3…

Scatterplot

Do a scatterplot using jitter of year and TFR. Keep only every fifth year.

Solution

TFR_long %>% 
  filter(year %in% c(1960,1970,1980,1990,2000,2010,2020)) %>% 
  ggplot(aes(year,TFR)) +
  geom_jitter(size = .2)
## Warning: Removed 115 rows containing missing values (geom_point).

Do we see improvement and convergence?

Post 2003

Repeat the graphic with post 2003 data. Use every year.

Solution

g = TFR_long %>% 
  filter(year > 2003) %>% 
  ggplot(aes(year,TFR,group = country_name)) +
  geom_jitter(size = .2)
ggplotly(g)

Summary Statistics

Create summary level variables such as mean and standard deviation for every year.

Solution

summary_level = TFR_long %>% 
  group_by(year) %>% 
  summarize(mean = mean(TFR,na.rm = T),
            sd = sd(TFR, na.rm = T),
            median = median(TFR,na.rm = T),
            max = max(TFR,na.rm = T),
            min = min(TFR,na.rm = T),
            range = max - min) 

Do a scatterplot of year and standard deviation.

Solution

summary_level %>% 
  ggplot(aes(x = year,y = sd)) +
  geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Do a scatterplot of year and mean.

summary_level %>% 
  ggplot(aes(x = year,y = mean)) +
  geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Examine Countries

Look at individual countries compare the range of values.

Solution

TFR_range = TFR_long %>% 
  group_by(country_name) %>% 
  summarize(max = max(TFR, na.rm = T),
            min = min(TFR,na.rm = T),
            range = max - min) %>% 
            filter(range > 0) %>% 
  arrange(range)
## Warning in max(TFR, na.rm = T): no non-missing arguments to max; returning -Inf

## Warning in max(TFR, na.rm = T): no non-missing arguments to max; returning -Inf

## Warning in max(TFR, na.rm = T): no non-missing arguments to max; returning -Inf

## Warning in max(TFR, na.rm = T): no non-missing arguments to max; returning -Inf

## Warning in max(TFR, na.rm = T): no non-missing arguments to max; returning -Inf

## Warning in max(TFR, na.rm = T): no non-missing arguments to max; returning -Inf
## Warning in min(TFR, na.rm = T): no non-missing arguments to min; returning Inf

## Warning in min(TFR, na.rm = T): no non-missing arguments to min; returning Inf

## Warning in min(TFR, na.rm = T): no non-missing arguments to min; returning Inf

## Warning in min(TFR, na.rm = T): no non-missing arguments to min; returning Inf

## Warning in min(TFR, na.rm = T): no non-missing arguments to min; returning Inf

## Warning in min(TFR, na.rm = T): no non-missing arguments to min; returning Inf

Compare Max and Range

Solution

g = TFR_range %>% 
  ggplot(aes(max, range,group = country_name)) +
  geom_point(size = .5)
ggplotly(g)

Filter and Look Again

Solution

g = TFR_range %>% 
  filter(range < 20) %>% 
  ggplot(aes(max, range,group = country_name)) +
  geom_point(size = .5)
ggplotly(g)

Save the Data

save(TFR_long,file = "TFR_long.Rdata")