607_Project2B_DylanGold

Codebase #2

Importing data

For my second dataset, I will clean and analyze https://www.kaggle.com/datasets/aungdev/birth-rate-of-countries-world-bank-data?utm_source=chatgpt.com&select=API_SP.DYN.CBRT.IN_DS2_en_csv_v2_5607611.csv.
This was picked by Brandon Chanderban in the 5A discussion posts. I first bring it into my github to be exported.
Then I import it.

library(tidyverse)
library(dplyr)
library(ggplot2)
library(gt)
url <- "https://raw.githubusercontent.com/DylanGoldJ/607-Project-2/refs/heads/main/FileB/API_SP.DYN.CBRT.IN_DS2_en_csv_v2_5607611.csv"

df <- read_csv(
  file = url,
  col_names = FALSE
)
head(df, 8)
# A tibble: 8 × 67
  X1          X2    X3    X4        X5     X6     X7     X8     X9    X10    X11
  <chr>       <chr> <chr> <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 Data Source Worl… <NA>  <NA>    NA     NA     NA     NA     NA     NA     NA  
2 <NA>        <NA>  <NA>  <NA>    NA     NA     NA     NA     NA     NA     NA  
3 Last Updat… 2023… <NA>  <NA>    NA     NA     NA     NA     NA     NA     NA  
4 <NA>        <NA>  <NA>  <NA>    NA     NA     NA     NA     NA     NA     NA  
5 Country Na… Coun… Indi… Indi… 1960   1961   1962   1963   1964   1965   1966  
6 Aruba       ABW   Birt… SP.D…   33.9   32.8   31.6   30.4   29.1   27.9   26.7
7 Africa Eas… AFE   Birt… SP.D…   47.4   47.5   47.6   47.6   47.6   47.7   47.7
8 Afghanistan AFG   Birt… SP.D…   50.3   50.4   50.6   50.7   50.8   50.9   51.0
# ℹ 56 more variables: X12 <dbl>, X13 <dbl>, X14 <dbl>, X15 <dbl>, X16 <dbl>,
#   X17 <dbl>, X18 <dbl>, X19 <dbl>, X20 <dbl>, X21 <dbl>, X22 <dbl>,
#   X23 <dbl>, X24 <dbl>, X25 <dbl>, X26 <dbl>, X27 <dbl>, X28 <dbl>,
#   X29 <dbl>, X30 <dbl>, X31 <dbl>, X32 <dbl>, X33 <dbl>, X34 <dbl>,
#   X35 <dbl>, X36 <dbl>, X37 <dbl>, X38 <dbl>, X39 <dbl>, X40 <dbl>,
#   X41 <dbl>, X42 <dbl>, X43 <dbl>, X44 <dbl>, X45 <dbl>, X46 <dbl>,
#   X47 <dbl>, X48 <dbl>, X49 <dbl>, X50 <dbl>, X51 <dbl>, X52 <dbl>, …

We can see that the data is not formated correctly. The column headers are at the 5th row. First we will fix this.

header <- df[5,]
colnames(df) <- header
head(df, 8)
# A tibble: 8 × 67
  `Country Name`  `Country Code` `Indicator Name` `Indicator Code` `1960` `1961`
  <chr>           <chr>          <chr>            <chr>             <dbl>  <dbl>
1 Data Source     World Develop… <NA>             <NA>               NA     NA  
2 <NA>            <NA>           <NA>             <NA>               NA     NA  
3 Last Updated D… 2023-06-29     <NA>             <NA>               NA     NA  
4 <NA>            <NA>           <NA>             <NA>               NA     NA  
5 Country Name    Country Code   Indicator Name   Indicator Code   1960   1961  
6 Aruba           ABW            Birth rate, cru… SP.DYN.CBRT.IN     33.9   32.8
7 Africa Eastern… AFE            Birth rate, cru… SP.DYN.CBRT.IN     47.4   47.5
8 Afghanistan     AFG            Birth rate, cru… SP.DYN.CBRT.IN     50.3   50.4
# ℹ 61 more variables: `1962` <dbl>, `1963` <dbl>, `1964` <dbl>, `1965` <dbl>,
#   `1966` <dbl>, `1967` <dbl>, `1968` <dbl>, `1969` <dbl>, `1970` <dbl>,
#   `1971` <dbl>, `1972` <dbl>, `1973` <dbl>, `1974` <dbl>, `1975` <dbl>,
#   `1976` <dbl>, `1977` <dbl>, `1978` <dbl>, `1979` <dbl>, `1980` <dbl>,
#   `1981` <dbl>, `1982` <dbl>, `1983` <dbl>, `1984` <dbl>, `1985` <dbl>,
#   `1986` <dbl>, `1987` <dbl>, `1988` <dbl>, `1989` <dbl>, `1990` <dbl>,
#   `1991` <dbl>, `1992` <dbl>, `1993` <dbl>, `1994` <dbl>, `1995` <dbl>, …

Now lets drop the first 5 rows, they contain no relevant information for us.

birthrates <- df[-c(1:5), ]
head(birthrates, 8)
# A tibble: 8 × 67
  `Country Name`  `Country Code` `Indicator Name` `Indicator Code` `1960` `1961`
  <chr>           <chr>          <chr>            <chr>             <dbl>  <dbl>
1 Aruba           ABW            Birth rate, cru… SP.DYN.CBRT.IN     33.9   32.8
2 Africa Eastern… AFE            Birth rate, cru… SP.DYN.CBRT.IN     47.4   47.5
3 Afghanistan     AFG            Birth rate, cru… SP.DYN.CBRT.IN     50.3   50.4
4 Africa Western… AFW            Birth rate, cru… SP.DYN.CBRT.IN     47.3   47.4
5 Angola          AGO            Birth rate, cru… SP.DYN.CBRT.IN     51.0   51.3
6 Albania         ALB            Birth rate, cru… SP.DYN.CBRT.IN     41.1   40.3
7 Andorra         AND            Birth rate, cru… SP.DYN.CBRT.IN     NA     NA  
8 Arab World      ARB            Birth rate, cru… SP.DYN.CBRT.IN     47.6   47.6
# ℹ 61 more variables: `1962` <dbl>, `1963` <dbl>, `1964` <dbl>, `1965` <dbl>,
#   `1966` <dbl>, `1967` <dbl>, `1968` <dbl>, `1969` <dbl>, `1970` <dbl>,
#   `1971` <dbl>, `1972` <dbl>, `1973` <dbl>, `1974` <dbl>, `1975` <dbl>,
#   `1976` <dbl>, `1977` <dbl>, `1978` <dbl>, `1979` <dbl>, `1980` <dbl>,
#   `1981` <dbl>, `1982` <dbl>, `1983` <dbl>, `1984` <dbl>, `1985` <dbl>,
#   `1986` <dbl>, `1987` <dbl>, `1988` <dbl>, `1989` <dbl>, `1990` <dbl>,
#   `1991` <dbl>, `1992` <dbl>, `1993` <dbl>, `1994` <dbl>, `1995` <dbl>, …

I will use glimpse to get an idea of the columns and data

glimpse(birthrates)
Rows: 266
Columns: 67
$ `Country Name`   <chr> "Aruba", "Africa Eastern and Southern", "Afghanistan"…
$ `Country Code`   <chr> "ABW", "AFE", "AFG", "AFW", "AGO", "ALB", "AND", "ARB…
$ `Indicator Name` <chr> "Birth rate, crude (per 1,000 people)", "Birth rate, …
$ `Indicator Code` <chr> "SP.DYN.CBRT.IN", "SP.DYN.CBRT.IN", "SP.DYN.CBRT.IN",…
$ `1960`           <dbl> 33.88300, 47.43855, 50.34000, 47.32548, 51.02600, 41.…
$ `1961`           <dbl> 32.83100, 47.53055, 50.44300, 47.42105, 51.28200, 40.…
$ `1962`           <dbl> 31.64900, 47.59756, 50.57000, 47.52922, 51.31600, 39.…
$ `1963`           <dbl> 30.41600, 47.63614, 50.70300, 47.53103, 51.32300, 38.…
$ `1964`           <dbl> 29.14700, 47.64548, 50.83100, 47.51192, 51.28200, 36.…
$ `1965`           <dbl> 27.88900, 47.66766, 50.87200, 47.46857, 51.28200, 35.…
$ `1966`           <dbl> 26.66300, 47.69789, 50.98600, 47.44364, 51.29500, 34.…
$ `1967`           <dbl> 25.50300, 47.69133, 51.08100, 47.42593, 51.31400, 33.…
$ `1968`           <dbl> 24.59200, 47.69102, 51.14800, 47.42235, 51.34800, 33.…
$ `1969`           <dbl> 23.73500, 47.72112, 51.19500, 47.41269, 51.35300, 33.…
$ `1970`           <dbl> 22.97400, 47.67313, 51.12200, 47.41411, 51.26700, 32.…
$ `1971`           <dbl> 22.31300, 47.64967, 51.16300, 47.52970, 50.69800, 31.…
$ `1972`           <dbl> 21.76600, 47.47074, 51.10900, 47.57899, 50.47400, 31.…
$ `1973`           <dbl> 21.49200, 47.22113, 51.11400, 47.63283, 50.46700, 30.…
$ `1974`           <dbl> 21.38200, 47.07547, 51.13500, 47.81713, 50.47200, 30.…
$ `1975`           <dbl> 21.39300, 46.95020, 51.01800, 47.91150, 50.46900, 29.…
$ `1976`           <dbl> 21.48500, 46.79183, 50.93500, 47.86907, 50.51400, 29.…
$ `1977`           <dbl> 21.73900, 46.63214, 50.92100, 47.96894, 50.52300, 28.…
$ `1978`           <dbl> 21.92000, 46.51202, 50.81600, 48.03727, 50.61600, 27.…
$ `1979`           <dbl> 21.99300, 46.47196, 50.73700, 47.93830, 50.73200, 27.…
$ `1980`           <dbl> 21.93100, 46.33961, 50.48200, 47.77071, 50.89200, 26.…
$ `1981`           <dbl> 21.73000, 46.23755, 50.26400, 47.51406, 51.10900, 26.…
$ `1982`           <dbl> 21.47600, 46.15826, 50.13800, 47.25192, 51.30700, 26.…
$ `1983`           <dbl> 21.38500, 46.13473, 50.13900, 47.11112, 51.61000, 26.…
$ `1984`           <dbl> 21.18300, 46.13520, 50.23500, 46.70656, 51.93500, 26.…
$ `1985`           <dbl> 20.91900, 46.14379, 50.55300, 46.20665, 52.13600, 26.…
$ `1986`           <dbl> 20.61700, 46.06749, 50.72800, 45.72924, 52.19000, 25.…
$ `1987`           <dbl> 20.26900, 45.83058, 50.84500, 45.34627, 52.14600, 25.…
$ `1988`           <dbl> 19.82100, 45.33736, 50.98000, 45.00171, 51.97300, 25.…
$ `1989`           <dbl> 19.18400, 44.81301, 51.16200, 44.92848, 51.69900, 24.…
$ `1990`           <dbl> 18.66200, 44.23072, 51.42300, 44.67619, 51.34400, 24.…
$ `1991`           <dbl> 17.72200, 43.84232, 51.78800, 44.47423, 50.92600, 23.…
$ `1992`           <dbl> 16.44300, 43.34168, 51.94800, 44.30932, 50.37400, 23.…
$ `1993`           <dbl> 16.12600, 42.96601, 52.03800, 44.16810, 49.89300, 22.…
$ `1994`           <dbl> 15.43100, 42.53329, 52.17400, 43.94269, 49.55000, 22.…
$ `1995`           <dbl> 15.99100, 42.48572, 52.07300, 43.73024, 49.18500, 21.…
$ `1996`           <dbl> 16.15300, 42.13563, 51.87300, 43.49103, 48.86000, 20.…
$ `1997`           <dbl> 16.38800, 41.57346, 51.40000, 43.21922, 48.41200, 19.…
$ `1998`           <dbl> 15.07800, 41.12879, 50.88000, 43.02697, 48.00900, 18.…
$ `1999`           <dbl> 14.36100, 40.89482, 50.35100, 43.17424, 47.77300, 17.…
$ `2000`           <dbl> 14.42700, 40.52824, 49.66400, 43.19955, 47.64700, 17.…
$ `2001`           <dbl> 13.73900, 40.34121, 48.97900, 43.07550, 47.57400, 16.…
$ `2002`           <dbl> 12.99200, 40.04732, 48.20100, 42.92712, 47.44800, 15.…
$ `2003`           <dbl> 12.62100, 39.75014, 47.35000, 42.74688, 47.22600, 14.…
$ `2004`           <dbl> 11.92100, 39.57589, 46.33000, 42.50272, 47.09900, 13.…
$ `2005`           <dbl> 12.34800, 39.40739, 45.26300, 42.42154, 46.94400, 13.…
$ `2006`           <dbl> 13.05500, 39.23711, 44.72100, 42.19330, 46.64300, 12.…
$ `2007`           <dbl> 12.96200, 39.00052, 43.85800, 41.94301, 46.29000, 12.…
$ `2008`           <dbl> 12.74800, 38.85169, 41.50600, 41.75479, 45.88900, 11.…
$ `2009`           <dbl> 12.35000, 38.36494, 41.15700, 41.50376, 45.49500, 11.…
$ `2010`           <dbl> 12.19300, 37.94026, 40.60200, 41.21963, 44.97000, 11.…
$ `2011`           <dbl> 12.24600, 37.48399, 39.85500, 40.89424, 44.36400, 12.…
$ `2012`           <dbl> 12.72300, 36.92130, 40.00900, 40.41643, 43.86000, 12.…
$ `2013`           <dbl> 13.31600, 36.44714, 39.60100, 39.85651, 43.28200, 12.…
$ `2014`           <dbl> 13.53300, 36.02832, 39.10500, 39.33535, 42.67600, 12.…
$ `2015`           <dbl> 12.42800, 35.61331, 38.80300, 38.85921, 42.02000, 11.…
$ `2016`           <dbl> 12.30000, 35.18902, 37.93600, 38.39310, 41.37700, 11.…
$ `2017`           <dbl> 11.53000, 34.89254, 37.34200, 37.88166, 40.81000, 10.…
$ `2018`           <dbl> 9.88100, 34.61102, 36.92700, 37.44709, 40.23600, 10.5…
$ `2019`           <dbl> 9.13800, 34.34145, 36.46600, 37.02783, 39.72500, 10.3…
$ `2020`           <dbl> 8.10200, 33.91675, 36.05100, 36.61573, 39.27100, 10.2…
$ `2021`           <dbl> 7.19300, 33.54627, 35.84200, 36.23703, 38.80900, 10.2…
$ `2022`           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…

We can see that the columns Indicator Name and Indicator Code seem reducant. We can check to see if the entire column is the same information with unique.

indicator_names <- unique(birthrates$`Indicator Name`)
indicator_codes <- unique(birthrates$`Indicator Code`)
indicator_names
[1] "Birth rate, crude (per 1,000 people)"
indicator_codes
[1] "SP.DYN.CBRT.IN"

There is only one value for both of these columns, we can just drop these columns they are useless.

birthrates <- birthrates[-c(3:4)]
birthrates
# A tibble: 266 × 65
   `Country Name`       `Country Code` `1960` `1961` `1962` `1963` `1964` `1965`
   <chr>                <chr>           <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
 1 Aruba                ABW              33.9   32.8   31.6   30.4   29.1   27.9
 2 Africa Eastern and … AFE              47.4   47.5   47.6   47.6   47.6   47.7
 3 Afghanistan          AFG              50.3   50.4   50.6   50.7   50.8   50.9
 4 Africa Western and … AFW              47.3   47.4   47.5   47.5   47.5   47.5
 5 Angola               AGO              51.0   51.3   51.3   51.3   51.3   51.3
 6 Albania              ALB              41.1   40.3   39.2   38.1   36.8   35.4
 7 Andorra              AND              NA     NA     NA     NA     NA     NA  
 8 Arab World           ARB              47.6   47.6   47.9   47.6   47.3   46.9
 9 United Arab Emirates ARE              41.8   41.4   41.1   40.6   40.0   39.4
10 Argentina            ARG              23.8   23.6   23.8   23.7   23.4   23.3
# ℹ 256 more rows
# ℹ 57 more variables: `1966` <dbl>, `1967` <dbl>, `1968` <dbl>, `1969` <dbl>,
#   `1970` <dbl>, `1971` <dbl>, `1972` <dbl>, `1973` <dbl>, `1974` <dbl>,
#   `1975` <dbl>, `1976` <dbl>, `1977` <dbl>, `1978` <dbl>, `1979` <dbl>,
#   `1980` <dbl>, `1981` <dbl>, `1982` <dbl>, `1983` <dbl>, `1984` <dbl>,
#   `1985` <dbl>, `1986` <dbl>, `1987` <dbl>, `1988` <dbl>, `1989` <dbl>,
#   `1990` <dbl>, `1991` <dbl>, `1992` <dbl>, `1993` <dbl>, `1994` <dbl>, …

Our data is now in a better format, but it needs to be made longer due to each year containing a column. We can use pivot longer for this.

birthrates_longer <- birthrates %>%
  pivot_longer(
    cols = !(c('Country Name', 'Country Code')),
    names_to = "year",
    values_to = "birthrate_per_1000"
    )
head(birthrates_longer, 8)
# A tibble: 8 × 4
  `Country Name` `Country Code` year  birthrate_per_1000
  <chr>          <chr>          <chr>              <dbl>
1 Aruba          ABW            1960                33.9
2 Aruba          ABW            1961                32.8
3 Aruba          ABW            1962                31.6
4 Aruba          ABW            1963                30.4
5 Aruba          ABW            1964                29.1
6 Aruba          ABW            1965                27.9
7 Aruba          ABW            1966                26.7
8 Aruba          ABW            1967                25.5

Our years are the chr data type, convert them to an appropriate type.

birthrates_longer$year <- as.numeric(birthrates_longer$year)
head(birthrates_longer, 8)
# A tibble: 8 × 4
  `Country Name` `Country Code`  year birthrate_per_1000
  <chr>          <chr>          <dbl>              <dbl>
1 Aruba          ABW             1960               33.9
2 Aruba          ABW             1961               32.8
3 Aruba          ABW             1962               31.6
4 Aruba          ABW             1963               30.4
5 Aruba          ABW             1964               29.1
6 Aruba          ABW             1965               27.9
7 Aruba          ABW             1966               26.7
8 Aruba          ABW             1967               25.5

We can see from the head of the data frame that the years went from columns to individual rows. We can also check to see if we have NA values in our dataframe.

colSums(is.na(birthrates_longer))
      Country Name       Country Code               year birthrate_per_1000 
                 0                  0                  0                721 

There are NA values, lets display these values.

na_birthrates <- birthrates_longer %>%
  filter(is.na(birthrate_per_1000))
na_birthrates 
# A tibble: 721 × 4
   `Country Name`              `Country Code`  year birthrate_per_1000
   <chr>                       <chr>          <dbl>              <dbl>
 1 Aruba                       ABW             2022                 NA
 2 Africa Eastern and Southern AFE             2022                 NA
 3 Afghanistan                 AFG             2022                 NA
 4 Africa Western and Central  AFW             2022                 NA
 5 Angola                      AGO             2022                 NA
 6 Albania                     ALB             2022                 NA
 7 Andorra                     AND             1960                 NA
 8 Andorra                     AND             1961                 NA
 9 Andorra                     AND             1962                 NA
10 Andorra                     AND             1963                 NA
# ℹ 711 more rows

I look through to see the different rows that have NA values, It looks like the data on some countries is completely gone. Many countries also are missing 2022 data. I believe the best way to deal with these values is to just drop them. I don’t think supplimenting them with the average of the country or overall average of the year would be useful.

birthrates_longer <- birthrates_longer %>% drop_na()
colSums(is.na(birthrates_longer))
      Country Name       Country Code               year birthrate_per_1000 
                 0                  0                  0                  0 

We now have a tidy form of our data that we can work to analyze.

Analysis

We can try to answer some of the questions our classmate asked in their discussion post. We can start off with comparing the trends in birth rates over time for different selected countries or regions. I will start off by looking at the highest overall average birthrates in all countries to see which countries typically have higher birthrates.

birthrate_avg <- birthrates_longer %>%
  group_by(`Country Name`) %>%
    summarise(avg_birthrate_per_1000 = mean(birthrate_per_1000)) %>%
      arrange(desc(avg_birthrate_per_1000))

head(birthrate_avg)
# A tibble: 6 × 2
  `Country Name` avg_birthrate_per_1000
  <chr>                           <dbl>
1 Niger                            53.3
2 Chad                             48.6
3 Angola                           48.4
4 Mali                             47.8
5 South Sudan                      47.8
6 Afghanistan                      47.7

We can display this information with a table using Gt.

birthrate_avg %>%
  gt() %>%
    cols_label(
      `Country Name` = "Country",
      avg_birthrate_per_1000 = "Average Birthrate(per 1000 people)"
    ) %>%
    tab_header(title = md("Average Birthrate by Country")) %>%
    tab_options(container.height = 500, container.overflow.y = TRUE) # Set container height and container overflow to add scrolling for large tables.
Average Birthrate by Country
Country Average Birthrate(per 1000 people)
Niger 53.262226
Chad 48.574339
Angola 48.387597
Mali 47.800435
South Sudan 47.768903
Afghanistan 47.665823
Somalia 47.444919
Malawi 47.080839
Uganda 46.981435
Yemen, Rep. 46.349387
Cote d'Ivoire 46.255532
Congo, Dem. Rep. 45.670984
Zambia 45.665081
Burkina Faso 45.387387
Burundi 45.176210
Ethiopia 44.821129
Pre-demographic dividend 44.622297
Mozambique 44.595452
Tanzania 44.409097
Gambia, The 44.355048
Africa Western and Central 44.232591
Heavily indebted poor countries (HIPC) 44.211938
Nigeria 44.126452
Sierra Leone 43.919290
Benin 43.666129
Liberia 43.634435
Low income 43.566217
Central African Republic 43.545710
Kenya 43.409887
Sub-Saharan Africa (excluding high income) 43.376251
Sub-Saharan Africa 43.374105
Sub-Saharan Africa (IDA & IBRD countries) 43.374105
Rwanda 43.244419
Guinea 43.011968
Africa Eastern and Southern 42.792442
Senegal 42.783839
Cameroon 42.589081
Sudan 42.357016
Madagascar 41.929258
Guinea-Bissau 41.922452
Togo 41.713694
Least developed countries: UN classification 41.354148
Equatorial Guinea 41.104403
Mauritania 40.801661
Eritrea 40.646290
IDA blend 40.626757
IDA total 40.453477
IDA only 40.356718
Comoros 40.115403
Ghana 39.709435
Zimbabwe 39.586129
Marshall Islands 39.576306
Congo, Rep. 39.451484
Eswatini 39.124887
Pakistan 38.954677
Fragile and conflict affected situations 38.606717
Sao Tome and Principe 38.282548
Iraq 38.275323
Timor-Leste 38.254258
Solomon Islands 37.992968
Jordan 37.270194
Tajikistan 37.201565
Guatemala 37.195677
Lao PDR 37.140032
West Bank and Gaza 37.085625
Vanuatu 37.030758
Oman 36.885516
Djibouti 36.754565
Honduras 36.703177
Namibia 36.509935
Papua New Guinea 36.129242
Botswana 36.025177
Syrian Arab Republic 36.024323
Arab World 35.873112
Nicaragua 35.511839
Saudi Arabia 34.934597
Maldives 34.928935
Bangladesh 34.904177
Haiti 34.812694
Lesotho 34.772000
Gabon 34.473629
Nepal 34.461323
Cambodia 34.448016
Kiribati 34.370097
Samoa 34.318403
Bolivia 34.143903
Egypt, Arab Rep. 34.008323
Middle East & North Africa (excluding high income) 33.996087
Middle East & North Africa (IDA & IBRD countries) 33.934523
Cabo Verde 33.893419
Micronesia, Fed. Sts. 33.683403
Bhutan 33.651210
Middle East & North Africa 33.600741
Algeria 33.523161
Belize 33.087258
Pacific island small states 32.968387
Philippines 32.967194
Lower middle income 32.837176
Libya 32.419097
Turkmenistan 32.227258
El Salvador 32.209032
South Asia 32.173309
South Asia (IDA & IBRD) 32.173309
Early-demographic dividend 31.991848
Tonga 31.940339
Morocco 31.814032
Mongolia 31.686629
Uzbekistan 31.323226
Dominican Republic 31.278855
Paraguay 31.245210
Peru 30.900403
Other small states 30.774410
India 30.731710
Iran, Islamic Rep. 30.613419
Mexico 30.606984
Ecuador 30.373306
South Africa 29.928097
Small states 29.638167
Kuwait 29.190274
Fiji 29.102984
Guyana 29.073097
Kyrgyz Republic 29.009113
Venezuela, RB 28.937645
Suriname 28.682274
St. Martin (French part) 28.652774
Tunisia 28.648661
Indonesia 28.514806
Nauru 28.507323
Low & middle income 28.406261
Myanmar 28.384468
IDA & IBRD total 28.198101
Panama 28.160823
Turkiye 27.741306
Latin America & Caribbean (excluding high income) 27.721713
Latin America & the Caribbean (IDA & IBRD countries) 27.711546
Tuvalu 27.681645
Latin America & Caribbean 27.369910
Middle income 27.169328
Colombia 27.149484
St. Lucia 26.968661
Bahrain 26.887484
Malaysia 26.815774
Brunei Darussalam 26.536339
French Polynesia 26.475613
Vietnam 26.439226
Qatar 26.392468
Kosovo 26.246468
Lebanon 26.030194
Guam 25.960710
St. Vincent and the Grenadines 25.929548
Brazil 25.900274
Turks and Caicos Islands 25.537548
Costa Rica 25.537500
World 25.523428
Jamaica 25.217661
Grenada 25.016258
Caribbean small states 24.961436
IBRD only 24.928587
Azerbaijan 24.463177
New Caledonia 24.287903
United Arab Emirates 24.101790
St. Kitts and Nevis 23.517452
Kazakhstan 23.280726
Sri Lanka 23.125032
Israel 23.112903
Albania 22.847694
Thailand 22.536710
Bahamas, The 22.408435
East Asia & Pacific (IDA & IBRD countries) 22.304976
East Asia & Pacific (excluding high income) 22.294827
Upper middle income 22.121672
Dominica 22.116371
Seychelles 21.990196
Trinidad and Tobago 21.927129
Virgin Islands (U.S.) 21.858065
Korea, Dem. People's Rep. 21.536403
East Asia & Pacific 21.302650
Chile 21.124532
Mauritius 21.069871
Argentina 20.968613
Antigua and Barbuda 20.731516
Armenia 20.685645
Late-demographic dividend 20.392122
Sint Maarten (Dutch part) 20.238871
China 19.923548
Curacao 18.907952
Korea, Rep. 18.430758
Puerto Rico 18.423871
Aruba 18.377113
Europe & Central Asia (excluding high income) 18.042320
Cuba 17.770226
Greenland 17.638776
Iceland 17.600000
North Macedonia 17.553076
British Virgin Islands 17.512081
Europe & Central Asia (IDA & IBRD countries) 17.443324
Moldova 17.431097
American Samoa 17.420000
Ireland 17.379032
Uruguay 17.299371
Cyprus 17.168016
Georgia 17.163726
New Zealand 17.040645
Montenegro 16.864935
Northern Mariana Islands 16.850000
Singapore 16.806452
Barbados 16.756935
Gibraltar 16.754113
OECD members 16.032054
Faroe Islands 15.798077
Bosnia and Herzegovina 15.681758
Australia 15.617742
Palau 15.450000
United States 15.375806
North America 15.275795
Europe & Central Asia 15.032399
Hong Kong SAR, China 14.803629
Slovak Republic 14.796774
Cayman Islands 14.605405
High income 14.438668
Canada 14.390323
Malta 14.354839
Macao SAR, China 14.348887
Poland 14.287097
Portugal 14.253226
Romania 14.201613
France 14.138710
Belarus 13.879597
Liechtenstein 13.803226
Post-demographic dividend 13.715831
Lithuania 13.688710
Russian Federation 13.655113
Norway 13.643548
Bermuda 13.545000
Central Europe and the Baltics 13.521889
Netherlands 13.487097
United Kingdom 13.482258
Spain 13.301613
Estonia 12.887097
European Union 12.784393
Slovenia 12.730645
Finland 12.695161
Denmark 12.682258
Czechia 12.609677
Serbia 12.580032
Euro area 12.575640
Belgium 12.524194
Ukraine 12.503403
Switzerland 12.490323
Sweden 12.393548
Channel Islands 12.388819
Luxembourg 12.335484
Greece 12.300000
Croatia 12.275694
Latvia 12.269355
Bulgaria 12.229032
Isle of Man 12.187919
Austria 12.040323
Japan 12.037097
Hungary 11.920968
Monaco 11.700000
Italy 11.669355
Germany 11.001613
Andorra 10.768966
San Marino 8.761111

We can see some of the countries with the highest average birthrate being Niger, Chad, Angold, Mali and so on. All of these countries have an average birthrate greater than 47. This is more than 3 times the United States which has an average birthrate of around 15.

To best compare different birth rate trends we can create a time plot to visualize our data. I will select a few countries arbitrarily just because there are so many countries to display at once

countries = c("United Kingdom", "United States", "Germany", "Japan", "Italy", "China", "Niger", "Chad")
displayed_birthrates <- birthrates_longer %>% 
  filter(`Country Name` %in% countries)

ggplot(data = displayed_birthrates, aes(x = year, y = birthrate_per_1000, color = `Country Name`)
       ) +
      geom_line() +
      ggtitle("Selected Countries Birthrates Over time") +
      xlab("Year") +
      ylab("Birthrate (Per 1000 people)") + 
      labs(color = "Country")

We can see that both Chad and Niger both have significantly higher birthrates throughout most of the past century. Most countries I selected have a similar shape. Interestingly China has a very high birthrate but around the 1970’s it significantly decreased to a similar amount as countries like the United states. Japan has the lowest birthrate at 2020.

After seeing this I was interested in seeing the global average birthrate. We can see with the longer format it is very easy to create graphs and group by certain attributes.

global_birthrate <- birthrates_longer %>%
  group_by(year) %>%
  summarize(average_global_birthrate = mean(birthrate_per_1000))

ggplot(data = global_birthrate, aes(x = year, y = average_global_birthrate)
       ) +
      geom_line() +
      ggtitle("Global Average Birthrate") +
      xlab("Year") +
      ylab("Birthrate (Per 1000 people)")

We can see that the average global birthrate as decreased to an all time low from the 1960’s to 2020.

Summarizing

To Summarize, we were able to use pivot longer to create a longer dataset to work with. We dropped na values. Because we used averages dropping values did not hurt our data too significantly. We compared the birthrate trend of several countries. We also displayed a table of the average birthrates of each country and displayed a plot with the overall global birthrate.