Harold Nelson
In his talk, Hans Rosling noted that some population based outcomes had a general trend of improvement up to 2003. There was also a convergence.
We want to look past 2003 and see if these trends have continued.
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
TFR <- read_csv("API_SP.DYN.TFRT.IN_DS2_en_csv_v2_4898766.csv", na = "empty", skip = 4) %>%
janitor::clean_names()
## New names:
## • `` -> `...67`
## Warning: One or more parsing issues, see `problems()` for details
## Rows: 266 Columns: 67
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): Country Name, Country Code, Indicator Name, Indicator Code
## dbl (61): 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, ...
## lgl (2): 2021, ...67
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
We need year and TFR to be variables. How do we do this?
TFR_long = TFR %>%
pivot_longer(cols = x1960:x2021,
names_to = "year",
values_to = "TFR") %>%
select(country_name,country_code,year,TFR) %>%
mutate(year = parse_number(year)) %>%
filter(year < 2021)
glimpse(TFR_long)
## Rows: 16,226
## Columns: 4
## $ country_name <chr> "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Ar…
## $ country_code <chr> "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "…
## $ year <dbl> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 196…
## $ TFR <dbl> 4.820, 4.655, 4.471, 4.271, 4.059, 3.842, 3.625, 3.417, 3…
Do a scatterplot using jitter of year and TFR. Keep only every fifth year.
TFR_long %>%
filter(year %in% c(1960,1970,1980,1990,2000,2010,2020)) %>%
ggplot(aes(year,TFR)) +
geom_jitter(size = .2)
## Warning: Removed 115 rows containing missing values (geom_point).
Do we see improvement and convergence?
Repeat the graphic with post 2003 data. Use every year.
g = TFR_long %>%
filter(year > 2003) %>%
ggplot(aes(year,TFR,group = country_name)) +
geom_jitter(size = .2)
ggplotly(g)
Create summary level variables such as mean and standard deviation for every year.
summary_level = TFR_long %>%
group_by(year) %>%
summarize(mean = mean(TFR,na.rm = T),
sd = sd(TFR, na.rm = T),
median = median(TFR,na.rm = T),
max = max(TFR,na.rm = T),
min = min(TFR,na.rm = T),
range = max - min)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Look at individual countries compare the range of values.
TFR_range = TFR_long %>%
group_by(country_name) %>%
summarize(max = max(TFR, na.rm = T),
min = min(TFR,na.rm = T),
range = max - min) %>%
filter(range > 0) %>%
arrange(range)
## Warning in max(TFR, na.rm = T): no non-missing arguments to max; returning -Inf
## Warning in max(TFR, na.rm = T): no non-missing arguments to max; returning -Inf
## Warning in max(TFR, na.rm = T): no non-missing arguments to max; returning -Inf
## Warning in max(TFR, na.rm = T): no non-missing arguments to max; returning -Inf
## Warning in max(TFR, na.rm = T): no non-missing arguments to max; returning -Inf
## Warning in max(TFR, na.rm = T): no non-missing arguments to max; returning -Inf
## Warning in min(TFR, na.rm = T): no non-missing arguments to min; returning Inf
## Warning in min(TFR, na.rm = T): no non-missing arguments to min; returning Inf
## Warning in min(TFR, na.rm = T): no non-missing arguments to min; returning Inf
## Warning in min(TFR, na.rm = T): no non-missing arguments to min; returning Inf
## Warning in min(TFR, na.rm = T): no non-missing arguments to min; returning Inf
## Warning in min(TFR, na.rm = T): no non-missing arguments to min; returning Inf
g = TFR_range %>%
filter(range < 20) %>%
ggplot(aes(max, range,group = country_name)) +
geom_point(size = .5)
ggplotly(g)