607_Project2B_DylanGold

Codebase #2

Importing data

For my second dataset, I will clean and analyze https://www.kaggle.com/datasets/aungdev/birth-rate-of-countries-world-bank-data?utm_source=chatgpt.com&select=API_SP.DYN.CBRT.IN_DS2_en_csv_v2_5607611.csv.
This was picked by Brandon Chanderban in the 5A discussion posts. I first bring it into my github to be exported.
Then I import it.

library(tidyverse)
library(dplyr)
library(ggplot2)
library(gt)

url <- "https://raw.githubusercontent.com/DylanGoldJ/607-Project-2/refs/heads/main/FileB/API_SP.DYN.CBRT.IN_DS2_en_csv_v2_5607611.csv"

df <- read_csv(
  file = url,
  col_names = FALSE
)
head(df, 8)

# A tibble: 8 × 67
  X1          X2    X3    X4        X5     X6     X7     X8     X9    X10    X11
  <chr>       <chr> <chr> <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 Data Source Worl… <NA>  <NA>    NA     NA     NA     NA     NA     NA     NA  
2 <NA>        <NA>  <NA>  <NA>    NA     NA     NA     NA     NA     NA     NA  
3 Last Updat… 2023… <NA>  <NA>    NA     NA     NA     NA     NA     NA     NA  
4 <NA>        <NA>  <NA>  <NA>    NA     NA     NA     NA     NA     NA     NA  
5 Country Na… Coun… Indi… Indi… 1960   1961   1962   1963   1964   1965   1966  
6 Aruba       ABW   Birt… SP.D…   33.9   32.8   31.6   30.4   29.1   27.9   26.7
7 Africa Eas… AFE   Birt… SP.D…   47.4   47.5   47.6   47.6   47.6   47.7   47.7
8 Afghanistan AFG   Birt… SP.D…   50.3   50.4   50.6   50.7   50.8   50.9   51.0
# ℹ 56 more variables: X12 <dbl>, X13 <dbl>, X14 <dbl>, X15 <dbl>, X16 <dbl>,
#   X17 <dbl>, X18 <dbl>, X19 <dbl>, X20 <dbl>, X21 <dbl>, X22 <dbl>,
#   X23 <dbl>, X24 <dbl>, X25 <dbl>, X26 <dbl>, X27 <dbl>, X28 <dbl>,
#   X29 <dbl>, X30 <dbl>, X31 <dbl>, X32 <dbl>, X33 <dbl>, X34 <dbl>,
#   X35 <dbl>, X36 <dbl>, X37 <dbl>, X38 <dbl>, X39 <dbl>, X40 <dbl>,
#   X41 <dbl>, X42 <dbl>, X43 <dbl>, X44 <dbl>, X45 <dbl>, X46 <dbl>,
#   X47 <dbl>, X48 <dbl>, X49 <dbl>, X50 <dbl>, X51 <dbl>, X52 <dbl>, …

We can see that the data is not formated correctly. The column headers are at the 5th row. First we will fix this.

header <- df[5,]
colnames(df) <- header
head(df, 8)

# A tibble: 8 × 67
  `Country Name`  `Country Code` `Indicator Name` `Indicator Code` `1960` `1961`
  <chr>           <chr>          <chr>            <chr>             <dbl>  <dbl>
1 Data Source     World Develop… <NA>             <NA>               NA     NA  
2 <NA>            <NA>           <NA>             <NA>               NA     NA  
3 Last Updated D… 2023-06-29     <NA>             <NA>               NA     NA  
4 <NA>            <NA>           <NA>             <NA>               NA     NA  
5 Country Name    Country Code   Indicator Name   Indicator Code   1960   1961  
6 Aruba           ABW            Birth rate, cru… SP.DYN.CBRT.IN     33.9   32.8
7 Africa Eastern… AFE            Birth rate, cru… SP.DYN.CBRT.IN     47.4   47.5
8 Afghanistan     AFG            Birth rate, cru… SP.DYN.CBRT.IN     50.3   50.4
# ℹ 61 more variables: `1962` <dbl>, `1963` <dbl>, `1964` <dbl>, `1965` <dbl>,
#   `1966` <dbl>, `1967` <dbl>, `1968` <dbl>, `1969` <dbl>, `1970` <dbl>,
#   `1971` <dbl>, `1972` <dbl>, `1973` <dbl>, `1974` <dbl>, `1975` <dbl>,
#   `1976` <dbl>, `1977` <dbl>, `1978` <dbl>, `1979` <dbl>, `1980` <dbl>,
#   `1981` <dbl>, `1982` <dbl>, `1983` <dbl>, `1984` <dbl>, `1985` <dbl>,
#   `1986` <dbl>, `1987` <dbl>, `1988` <dbl>, `1989` <dbl>, `1990` <dbl>,
#   `1991` <dbl>, `1992` <dbl>, `1993` <dbl>, `1994` <dbl>, `1995` <dbl>, …

Now lets drop the first 5 rows, they contain no relevant information for us.

birthrates <- df[-c(1:5), ]
head(birthrates, 8)

# A tibble: 8 × 67
  `Country Name`  `Country Code` `Indicator Name` `Indicator Code` `1960` `1961`
  <chr>           <chr>          <chr>            <chr>             <dbl>  <dbl>
1 Aruba           ABW            Birth rate, cru… SP.DYN.CBRT.IN     33.9   32.8
2 Africa Eastern… AFE            Birth rate, cru… SP.DYN.CBRT.IN     47.4   47.5
3 Afghanistan     AFG            Birth rate, cru… SP.DYN.CBRT.IN     50.3   50.4
4 Africa Western… AFW            Birth rate, cru… SP.DYN.CBRT.IN     47.3   47.4
5 Angola          AGO            Birth rate, cru… SP.DYN.CBRT.IN     51.0   51.3
6 Albania         ALB            Birth rate, cru… SP.DYN.CBRT.IN     41.1   40.3
7 Andorra         AND            Birth rate, cru… SP.DYN.CBRT.IN     NA     NA  
8 Arab World      ARB            Birth rate, cru… SP.DYN.CBRT.IN     47.6   47.6
# ℹ 61 more variables: `1962` <dbl>, `1963` <dbl>, `1964` <dbl>, `1965` <dbl>,
#   `1966` <dbl>, `1967` <dbl>, `1968` <dbl>, `1969` <dbl>, `1970` <dbl>,
#   `1971` <dbl>, `1972` <dbl>, `1973` <dbl>, `1974` <dbl>, `1975` <dbl>,
#   `1976` <dbl>, `1977` <dbl>, `1978` <dbl>, `1979` <dbl>, `1980` <dbl>,
#   `1981` <dbl>, `1982` <dbl>, `1983` <dbl>, `1984` <dbl>, `1985` <dbl>,
#   `1986` <dbl>, `1987` <dbl>, `1988` <dbl>, `1989` <dbl>, `1990` <dbl>,
#   `1991` <dbl>, `1992` <dbl>, `1993` <dbl>, `1994` <dbl>, `1995` <dbl>, …

I will use glimpse to get an idea of the columns and data

glimpse(birthrates)

Rows: 266
Columns: 67
$ `Country Name`   <chr> "Aruba", "Africa Eastern and Southern", "Afghanistan"…
$ `Country Code`   <chr> "ABW", "AFE", "AFG", "AFW", "AGO", "ALB", "AND", "ARB…
$ `Indicator Name` <chr> "Birth rate, crude (per 1,000 people)", "Birth rate, …
$ `Indicator Code` <chr> "SP.DYN.CBRT.IN", "SP.DYN.CBRT.IN", "SP.DYN.CBRT.IN",…
$ `1960`           <dbl> 33.88300, 47.43855, 50.34000, 47.32548, 51.02600, 41.…
$ `1961`           <dbl> 32.83100, 47.53055, 50.44300, 47.42105, 51.28200, 40.…
$ `1962`           <dbl> 31.64900, 47.59756, 50.57000, 47.52922, 51.31600, 39.…
$ `1963`           <dbl> 30.41600, 47.63614, 50.70300, 47.53103, 51.32300, 38.…
$ `1964`           <dbl> 29.14700, 47.64548, 50.83100, 47.51192, 51.28200, 36.…
$ `1965`           <dbl> 27.88900, 47.66766, 50.87200, 47.46857, 51.28200, 35.…
$ `1966`           <dbl> 26.66300, 47.69789, 50.98600, 47.44364, 51.29500, 34.…
$ `1967`           <dbl> 25.50300, 47.69133, 51.08100, 47.42593, 51.31400, 33.…
$ `1968`           <dbl> 24.59200, 47.69102, 51.14800, 47.42235, 51.34800, 33.…
$ `1969`           <dbl> 23.73500, 47.72112, 51.19500, 47.41269, 51.35300, 33.…
$ `1970`           <dbl> 22.97400, 47.67313, 51.12200, 47.41411, 51.26700, 32.…
$ `1971`           <dbl> 22.31300, 47.64967, 51.16300, 47.52970, 50.69800, 31.…
$ `1972`           <dbl> 21.76600, 47.47074, 51.10900, 47.57899, 50.47400, 31.…
$ `1973`           <dbl> 21.49200, 47.22113, 51.11400, 47.63283, 50.46700, 30.…
$ `1974`           <dbl> 21.38200, 47.07547, 51.13500, 47.81713, 50.47200, 30.…
$ `1975`           <dbl> 21.39300, 46.95020, 51.01800, 47.91150, 50.46900, 29.…
$ `1976`           <dbl> 21.48500, 46.79183, 50.93500, 47.86907, 50.51400, 29.…
$ `1977`           <dbl> 21.73900, 46.63214, 50.92100, 47.96894, 50.52300, 28.…
$ `1978`           <dbl> 21.92000, 46.51202, 50.81600, 48.03727, 50.61600, 27.…
$ `1979`           <dbl> 21.99300, 46.47196, 50.73700, 47.93830, 50.73200, 27.…
$ `1980`           <dbl> 21.93100, 46.33961, 50.48200, 47.77071, 50.89200, 26.…
$ `1981`           <dbl> 21.73000, 46.23755, 50.26400, 47.51406, 51.10900, 26.…
$ `1982`           <dbl> 21.47600, 46.15826, 50.13800, 47.25192, 51.30700, 26.…
$ `1983`           <dbl> 21.38500, 46.13473, 50.13900, 47.11112, 51.61000, 26.…
$ `1984`           <dbl> 21.18300, 46.13520, 50.23500, 46.70656, 51.93500, 26.…
$ `1985`           <dbl> 20.91900, 46.14379, 50.55300, 46.20665, 52.13600, 26.…
$ `1986`           <dbl> 20.61700, 46.06749, 50.72800, 45.72924, 52.19000, 25.…
$ `1987`           <dbl> 20.26900, 45.83058, 50.84500, 45.34627, 52.14600, 25.…
$ `1988`           <dbl> 19.82100, 45.33736, 50.98000, 45.00171, 51.97300, 25.…
$ `1989`           <dbl> 19.18400, 44.81301, 51.16200, 44.92848, 51.69900, 24.…
$ `1990`           <dbl> 18.66200, 44.23072, 51.42300, 44.67619, 51.34400, 24.…
$ `1991`           <dbl> 17.72200, 43.84232, 51.78800, 44.47423, 50.92600, 23.…
$ `1992`           <dbl> 16.44300, 43.34168, 51.94800, 44.30932, 50.37400, 23.…
$ `1993`           <dbl> 16.12600, 42.96601, 52.03800, 44.16810, 49.89300, 22.…
$ `1994`           <dbl> 15.43100, 42.53329, 52.17400, 43.94269, 49.55000, 22.…
$ `1995`           <dbl> 15.99100, 42.48572, 52.07300, 43.73024, 49.18500, 21.…
$ `1996`           <dbl> 16.15300, 42.13563, 51.87300, 43.49103, 48.86000, 20.…
$ `1997`           <dbl> 16.38800, 41.57346, 51.40000, 43.21922, 48.41200, 19.…
$ `1998`           <dbl> 15.07800, 41.12879, 50.88000, 43.02697, 48.00900, 18.…
$ `1999`           <dbl> 14.36100, 40.89482, 50.35100, 43.17424, 47.77300, 17.…
$ `2000`           <dbl> 14.42700, 40.52824, 49.66400, 43.19955, 47.64700, 17.…
$ `2001`           <dbl> 13.73900, 40.34121, 48.97900, 43.07550, 47.57400, 16.…
$ `2002`           <dbl> 12.99200, 40.04732, 48.20100, 42.92712, 47.44800, 15.…
$ `2003`           <dbl> 12.62100, 39.75014, 47.35000, 42.74688, 47.22600, 14.…
$ `2004`           <dbl> 11.92100, 39.57589, 46.33000, 42.50272, 47.09900, 13.…
$ `2005`           <dbl> 12.34800, 39.40739, 45.26300, 42.42154, 46.94400, 13.…
$ `2006`           <dbl> 13.05500, 39.23711, 44.72100, 42.19330, 46.64300, 12.…
$ `2007`           <dbl> 12.96200, 39.00052, 43.85800, 41.94301, 46.29000, 12.…
$ `2008`           <dbl> 12.74800, 38.85169, 41.50600, 41.75479, 45.88900, 11.…
$ `2009`           <dbl> 12.35000, 38.36494, 41.15700, 41.50376, 45.49500, 11.…
$ `2010`           <dbl> 12.19300, 37.94026, 40.60200, 41.21963, 44.97000, 11.…
$ `2011`           <dbl> 12.24600, 37.48399, 39.85500, 40.89424, 44.36400, 12.…
$ `2012`           <dbl> 12.72300, 36.92130, 40.00900, 40.41643, 43.86000, 12.…
$ `2013`           <dbl> 13.31600, 36.44714, 39.60100, 39.85651, 43.28200, 12.…
$ `2014`           <dbl> 13.53300, 36.02832, 39.10500, 39.33535, 42.67600, 12.…
$ `2015`           <dbl> 12.42800, 35.61331, 38.80300, 38.85921, 42.02000, 11.…
$ `2016`           <dbl> 12.30000, 35.18902, 37.93600, 38.39310, 41.37700, 11.…
$ `2017`           <dbl> 11.53000, 34.89254, 37.34200, 37.88166, 40.81000, 10.…
$ `2018`           <dbl> 9.88100, 34.61102, 36.92700, 37.44709, 40.23600, 10.5…
$ `2019`           <dbl> 9.13800, 34.34145, 36.46600, 37.02783, 39.72500, 10.3…
$ `2020`           <dbl> 8.10200, 33.91675, 36.05100, 36.61573, 39.27100, 10.2…
$ `2021`           <dbl> 7.19300, 33.54627, 35.84200, 36.23703, 38.80900, 10.2…
$ `2022`           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…

We can see that the columns Indicator Name and Indicator Code seem reducant. We can check to see if the entire column is the same information with unique.

indicator_names <- unique(birthrates$`Indicator Name`)
indicator_codes <- unique(birthrates$`Indicator Code`)
indicator_names

[1] "Birth rate, crude (per 1,000 people)"

indicator_codes

[1] "SP.DYN.CBRT.IN"

There is only one value for both of these columns, we can just drop these columns they are useless.

birthrates <- birthrates[-c(3:4)]
birthrates

# A tibble: 266 × 65
   `Country Name`       `Country Code` `1960` `1961` `1962` `1963` `1964` `1965`
   <chr>                <chr>           <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
 1 Aruba                ABW              33.9   32.8   31.6   30.4   29.1   27.9
 2 Africa Eastern and … AFE              47.4   47.5   47.6   47.6   47.6   47.7
 3 Afghanistan          AFG              50.3   50.4   50.6   50.7   50.8   50.9
 4 Africa Western and … AFW              47.3   47.4   47.5   47.5   47.5   47.5
 5 Angola               AGO              51.0   51.3   51.3   51.3   51.3   51.3
 6 Albania              ALB              41.1   40.3   39.2   38.1   36.8   35.4
 7 Andorra              AND              NA     NA     NA     NA     NA     NA  
 8 Arab World           ARB              47.6   47.6   47.9   47.6   47.3   46.9
 9 United Arab Emirates ARE              41.8   41.4   41.1   40.6   40.0   39.4
10 Argentina            ARG              23.8   23.6   23.8   23.7   23.4   23.3
# ℹ 256 more rows
# ℹ 57 more variables: `1966` <dbl>, `1967` <dbl>, `1968` <dbl>, `1969` <dbl>,
#   `1970` <dbl>, `1971` <dbl>, `1972` <dbl>, `1973` <dbl>, `1974` <dbl>,
#   `1975` <dbl>, `1976` <dbl>, `1977` <dbl>, `1978` <dbl>, `1979` <dbl>,
#   `1980` <dbl>, `1981` <dbl>, `1982` <dbl>, `1983` <dbl>, `1984` <dbl>,
#   `1985` <dbl>, `1986` <dbl>, `1987` <dbl>, `1988` <dbl>, `1989` <dbl>,
#   `1990` <dbl>, `1991` <dbl>, `1992` <dbl>, `1993` <dbl>, `1994` <dbl>, …

Our data is now in a better format, but it needs to be made longer due to each year containing a column. We can use pivot longer for this.

birthrates_longer <- birthrates %>%
  pivot_longer(
    cols = !(c('Country Name', 'Country Code')),
    names_to = "year",
    values_to = "birthrate_per_1000"
    )
head(birthrates_longer, 8)

# A tibble: 8 × 4
  `Country Name` `Country Code` year  birthrate_per_1000
  <chr>          <chr>          <chr>              <dbl>
1 Aruba          ABW            1960                33.9
2 Aruba          ABW            1961                32.8
3 Aruba          ABW            1962                31.6
4 Aruba          ABW            1963                30.4
5 Aruba          ABW            1964                29.1
6 Aruba          ABW            1965                27.9
7 Aruba          ABW            1966                26.7
8 Aruba          ABW            1967                25.5

Our years are the chr data type, convert them to an appropriate type.

birthrates_longer$year <- as.numeric(birthrates_longer$year)
head(birthrates_longer, 8)

# A tibble: 8 × 4
  `Country Name` `Country Code`  year birthrate_per_1000
  <chr>          <chr>          <dbl>              <dbl>
1 Aruba          ABW             1960               33.9
2 Aruba          ABW             1961               32.8
3 Aruba          ABW             1962               31.6
4 Aruba          ABW             1963               30.4
5 Aruba          ABW             1964               29.1
6 Aruba          ABW             1965               27.9
7 Aruba          ABW             1966               26.7
8 Aruba          ABW             1967               25.5

We can see from the head of the data frame that the years went from columns to individual rows. We can also check to see if we have NA values in our dataframe.

colSums(is.na(birthrates_longer))

      Country Name       Country Code               year birthrate_per_1000 
                 0                  0                  0                721

There are NA values, lets display these values.

na_birthrates <- birthrates_longer %>%
  filter(is.na(birthrate_per_1000))
na_birthrates

# A tibble: 721 × 4
   `Country Name`              `Country Code`  year birthrate_per_1000
   <chr>                       <chr>          <dbl>              <dbl>
 1 Aruba                       ABW             2022                 NA
 2 Africa Eastern and Southern AFE             2022                 NA
 3 Afghanistan                 AFG             2022                 NA
 4 Africa Western and Central  AFW             2022                 NA
 5 Angola                      AGO             2022                 NA
 6 Albania                     ALB             2022                 NA
 7 Andorra                     AND             1960                 NA
 8 Andorra                     AND             1961                 NA
 9 Andorra                     AND             1962                 NA
10 Andorra                     AND             1963                 NA
# ℹ 711 more rows

I look through to see the different rows that have NA values, It looks like the data on some countries is completely gone. Many countries also are missing 2022 data. I believe the best way to deal with these values is to just drop them. I don’t think supplimenting them with the average of the country or overall average of the year would be useful.

birthrates_longer <- birthrates_longer %>% drop_na()
colSums(is.na(birthrates_longer))

      Country Name       Country Code               year birthrate_per_1000 
                 0                  0                  0                  0

We now have a tidy form of our data that we can work to analyze.

Analysis

We can try to answer some of the questions our classmate asked in their discussion post. We can start off with comparing the trends in birth rates over time for different selected countries or regions. I will start off by looking at the highest overall average birthrates in all countries to see which countries typically have higher birthrates.

birthrate_avg <- birthrates_longer %>%
  group_by(`Country Name`) %>%
    summarise(avg_birthrate_per_1000 = mean(birthrate_per_1000)) %>%
      arrange(desc(avg_birthrate_per_1000))

head(birthrate_avg)

# A tibble: 6 × 2
  `Country Name` avg_birthrate_per_1000
  <chr>                           <dbl>
1 Niger                            53.3
2 Chad                             48.6
3 Angola                           48.4
4 Mali                             47.8
5 South Sudan                      47.8
6 Afghanistan                      47.7

We can display this information with a table using Gt.

birthrate_avg %>%
  gt() %>%
    cols_label(
      `Country Name` = "Country",
      avg_birthrate_per_1000 = "Average Birthrate(per 1000 people)"
    ) %>%
    tab_header(title = md("Average Birthrate by Country")) %>%
    tab_options(container.height = 500, container.overflow.y = TRUE) # Set container height and container overflow to add scrolling for large tables.

Country	Average Birthrate(per 1000 people)
Average Birthrate by Country
Niger	53.262226
Chad	48.574339
Angola	48.387597
Mali	47.800435
South Sudan	47.768903
Afghanistan	47.665823
Somalia	47.444919
Malawi	47.080839
Uganda	46.981435
Yemen, Rep.	46.349387
Cote d'Ivoire	46.255532
Congo, Dem. Rep.	45.670984
Zambia	45.665081
Burkina Faso	45.387387
Burundi	45.176210
Ethiopia	44.821129
Pre-demographic dividend	44.622297
Mozambique	44.595452
Tanzania	44.409097
Gambia, The	44.355048
Africa Western and Central	44.232591
Heavily indebted poor countries (HIPC)	44.211938
Nigeria	44.126452
Sierra Leone	43.919290
Benin	43.666129
Liberia	43.634435
Low income	43.566217
Central African Republic	43.545710
Kenya	43.409887
Sub-Saharan Africa (excluding high income)	43.376251
Sub-Saharan Africa	43.374105
Sub-Saharan Africa (IDA & IBRD countries)	43.374105
Rwanda	43.244419
Guinea	43.011968
Africa Eastern and Southern	42.792442
Senegal	42.783839
Cameroon	42.589081
Sudan	42.357016
Madagascar	41.929258
Guinea-Bissau	41.922452
Togo	41.713694
Least developed countries: UN classification	41.354148
Equatorial Guinea	41.104403
Mauritania	40.801661
Eritrea	40.646290
IDA blend	40.626757
IDA total	40.453477
IDA only	40.356718
Comoros	40.115403
Ghana	39.709435
Zimbabwe	39.586129
Marshall Islands	39.576306
Congo, Rep.	39.451484
Eswatini	39.124887
Pakistan	38.954677
Fragile and conflict affected situations	38.606717
Sao Tome and Principe	38.282548
Iraq	38.275323
Timor-Leste	38.254258
Solomon Islands	37.992968
Jordan	37.270194
Tajikistan	37.201565
Guatemala	37.195677
Lao PDR	37.140032
West Bank and Gaza	37.085625
Vanuatu	37.030758
Oman	36.885516
Djibouti	36.754565
Honduras	36.703177
Namibia	36.509935
Papua New Guinea	36.129242
Botswana	36.025177
Syrian Arab Republic	36.024323
Arab World	35.873112
Nicaragua	35.511839
Saudi Arabia	34.934597
Maldives	34.928935
Bangladesh	34.904177
Haiti	34.812694
Lesotho	34.772000
Gabon	34.473629
Nepal	34.461323
Cambodia	34.448016
Kiribati	34.370097
Samoa	34.318403
Bolivia	34.143903
Egypt, Arab Rep.	34.008323
Middle East & North Africa (excluding high income)	33.996087
Middle East & North Africa (IDA & IBRD countries)	33.934523
Cabo Verde	33.893419
Micronesia, Fed. Sts.	33.683403
Bhutan	33.651210
Middle East & North Africa	33.600741
Algeria	33.523161
Belize	33.087258
Pacific island small states	32.968387
Philippines	32.967194
Lower middle income	32.837176
Libya	32.419097
Turkmenistan	32.227258
El Salvador	32.209032
South Asia	32.173309
South Asia (IDA & IBRD)	32.173309
Early-demographic dividend	31.991848
Tonga	31.940339
Morocco	31.814032
Mongolia	31.686629
Uzbekistan	31.323226
Dominican Republic	31.278855
Paraguay	31.245210
Peru	30.900403
Other small states	30.774410
India	30.731710
Iran, Islamic Rep.	30.613419
Mexico	30.606984
Ecuador	30.373306
South Africa	29.928097
Small states	29.638167
Kuwait	29.190274
Fiji	29.102984
Guyana	29.073097
Kyrgyz Republic	29.009113
Venezuela, RB	28.937645
Suriname	28.682274
St. Martin (French part)	28.652774
Tunisia	28.648661
Indonesia	28.514806
Nauru	28.507323
Low & middle income	28.406261
Myanmar	28.384468
IDA & IBRD total	28.198101
Panama	28.160823
Turkiye	27.741306
Latin America & Caribbean (excluding high income)	27.721713
Latin America & the Caribbean (IDA & IBRD countries)	27.711546
Tuvalu	27.681645
Latin America & Caribbean	27.369910
Middle income	27.169328
Colombia	27.149484
St. Lucia	26.968661
Bahrain	26.887484
Malaysia	26.815774
Brunei Darussalam	26.536339
French Polynesia	26.475613
Vietnam	26.439226
Qatar	26.392468
Kosovo	26.246468
Lebanon	26.030194
Guam	25.960710
St. Vincent and the Grenadines	25.929548
Brazil	25.900274
Turks and Caicos Islands	25.537548
Costa Rica	25.537500
World	25.523428
Jamaica	25.217661
Grenada	25.016258
Caribbean small states	24.961436
IBRD only	24.928587
Azerbaijan	24.463177
New Caledonia	24.287903
United Arab Emirates	24.101790
St. Kitts and Nevis	23.517452
Kazakhstan	23.280726
Sri Lanka	23.125032
Israel	23.112903
Albania	22.847694
Thailand	22.536710
Bahamas, The	22.408435
East Asia & Pacific (IDA & IBRD countries)	22.304976
East Asia & Pacific (excluding high income)	22.294827
Upper middle income	22.121672
Dominica	22.116371
Seychelles	21.990196
Trinidad and Tobago	21.927129
Virgin Islands (U.S.)	21.858065
Korea, Dem. People's Rep.	21.536403
East Asia & Pacific	21.302650
Chile	21.124532
Mauritius	21.069871
Argentina	20.968613
Antigua and Barbuda	20.731516
Armenia	20.685645
Late-demographic dividend	20.392122
Sint Maarten (Dutch part)	20.238871
China	19.923548
Curacao	18.907952
Korea, Rep.	18.430758
Puerto Rico	18.423871
Aruba	18.377113
Europe & Central Asia (excluding high income)	18.042320
Cuba	17.770226
Greenland	17.638776
Iceland	17.600000
North Macedonia	17.553076
British Virgin Islands	17.512081
Europe & Central Asia (IDA & IBRD countries)	17.443324
Moldova	17.431097
American Samoa	17.420000
Ireland	17.379032
Uruguay	17.299371
Cyprus	17.168016
Georgia	17.163726
New Zealand	17.040645
Montenegro	16.864935
Northern Mariana Islands	16.850000
Singapore	16.806452
Barbados	16.756935
Gibraltar	16.754113
OECD members	16.032054
Faroe Islands	15.798077
Bosnia and Herzegovina	15.681758
Australia	15.617742
Palau	15.450000
United States	15.375806
North America	15.275795
Europe & Central Asia	15.032399
Hong Kong SAR, China	14.803629
Slovak Republic	14.796774
Cayman Islands	14.605405
High income	14.438668
Canada	14.390323
Malta	14.354839
Macao SAR, China	14.348887
Poland	14.287097
Portugal	14.253226
Romania	14.201613
France	14.138710
Belarus	13.879597
Liechtenstein	13.803226
Post-demographic dividend	13.715831
Lithuania	13.688710
Russian Federation	13.655113
Norway	13.643548
Bermuda	13.545000
Central Europe and the Baltics	13.521889
Netherlands	13.487097
United Kingdom	13.482258
Spain	13.301613
Estonia	12.887097
European Union	12.784393
Slovenia	12.730645
Finland	12.695161
Denmark	12.682258
Czechia	12.609677
Serbia	12.580032
Euro area	12.575640
Belgium	12.524194
Ukraine	12.503403
Switzerland	12.490323
Sweden	12.393548
Channel Islands	12.388819
Luxembourg	12.335484
Greece	12.300000
Croatia	12.275694
Latvia	12.269355
Bulgaria	12.229032
Isle of Man	12.187919
Austria	12.040323
Japan	12.037097
Hungary	11.920968
Monaco	11.700000
Italy	11.669355
Germany	11.001613
Andorra	10.768966
San Marino	8.761111

We can see some of the countries with the highest average birthrate being Niger, Chad, Angold, Mali and so on. All of these countries have an average birthrate greater than 47. This is more than 3 times the United States which has an average birthrate of around 15.

To best compare different birth rate trends we can create a time plot to visualize our data. I will select a few countries arbitrarily just because there are so many countries to display at once

countries = c("United Kingdom", "United States", "Germany", "Japan", "Italy", "China", "Niger", "Chad")
displayed_birthrates <- birthrates_longer %>% 
  filter(`Country Name` %in% countries)

ggplot(data = displayed_birthrates, aes(x = year, y = birthrate_per_1000, color = `Country Name`)
       ) +
      geom_line() +
      ggtitle("Selected Countries Birthrates Over time") +
      xlab("Year") +
      ylab("Birthrate (Per 1000 people)") + 
      labs(color = "Country")

We can see that both Chad and Niger both have significantly higher birthrates throughout most of the past century. Most countries I selected have a similar shape. Interestingly China has a very high birthrate but around the 1970’s it significantly decreased to a similar amount as countries like the United states. Japan has the lowest birthrate at 2020.

After seeing this I was interested in seeing the global average birthrate. We can see with the longer format it is very easy to create graphs and group by certain attributes.

global_birthrate <- birthrates_longer %>%
  group_by(year) %>%
  summarize(average_global_birthrate = mean(birthrate_per_1000))

ggplot(data = global_birthrate, aes(x = year, y = average_global_birthrate)
       ) +
      geom_line() +
      ggtitle("Global Average Birthrate") +
      xlab("Year") +
      ylab("Birthrate (Per 1000 people)")

We can see that the average global birthrate as decreased to an all time low from the 1960’s to 2020.

Summarizing

To Summarize, we were able to use pivot longer to create a longer dataset to work with. We dropped na values. Because we used averages dropping values did not hurt our data too significantly. We compared the birthrate trend of several countries. We also displayed a table of the average birthrates of each country and displayed a plot with the overall global birthrate.