library(tidyverse)
library(dplyr)
library(ggplot2)
library(gt)607_Project2B_DylanGold
Codebase #2
Importing data
For my second dataset, I will clean and analyze https://www.kaggle.com/datasets/aungdev/birth-rate-of-countries-world-bank-data?utm_source=chatgpt.com&select=API_SP.DYN.CBRT.IN_DS2_en_csv_v2_5607611.csv.
This was picked by Brandon Chanderban in the 5A discussion posts. I first bring it into my github to be exported.
Then I import it.
url <- "https://raw.githubusercontent.com/DylanGoldJ/607-Project-2/refs/heads/main/FileB/API_SP.DYN.CBRT.IN_DS2_en_csv_v2_5607611.csv"
df <- read_csv(
file = url,
col_names = FALSE
)
head(df, 8)# A tibble: 8 × 67
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Data Source Worl… <NA> <NA> NA NA NA NA NA NA NA
2 <NA> <NA> <NA> <NA> NA NA NA NA NA NA NA
3 Last Updat… 2023… <NA> <NA> NA NA NA NA NA NA NA
4 <NA> <NA> <NA> <NA> NA NA NA NA NA NA NA
5 Country Na… Coun… Indi… Indi… 1960 1961 1962 1963 1964 1965 1966
6 Aruba ABW Birt… SP.D… 33.9 32.8 31.6 30.4 29.1 27.9 26.7
7 Africa Eas… AFE Birt… SP.D… 47.4 47.5 47.6 47.6 47.6 47.7 47.7
8 Afghanistan AFG Birt… SP.D… 50.3 50.4 50.6 50.7 50.8 50.9 51.0
# ℹ 56 more variables: X12 <dbl>, X13 <dbl>, X14 <dbl>, X15 <dbl>, X16 <dbl>,
# X17 <dbl>, X18 <dbl>, X19 <dbl>, X20 <dbl>, X21 <dbl>, X22 <dbl>,
# X23 <dbl>, X24 <dbl>, X25 <dbl>, X26 <dbl>, X27 <dbl>, X28 <dbl>,
# X29 <dbl>, X30 <dbl>, X31 <dbl>, X32 <dbl>, X33 <dbl>, X34 <dbl>,
# X35 <dbl>, X36 <dbl>, X37 <dbl>, X38 <dbl>, X39 <dbl>, X40 <dbl>,
# X41 <dbl>, X42 <dbl>, X43 <dbl>, X44 <dbl>, X45 <dbl>, X46 <dbl>,
# X47 <dbl>, X48 <dbl>, X49 <dbl>, X50 <dbl>, X51 <dbl>, X52 <dbl>, …
We can see that the data is not formated correctly. The column headers are at the 5th row. First we will fix this.
header <- df[5,]
colnames(df) <- header
head(df, 8)# A tibble: 8 × 67
`Country Name` `Country Code` `Indicator Name` `Indicator Code` `1960` `1961`
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 Data Source World Develop… <NA> <NA> NA NA
2 <NA> <NA> <NA> <NA> NA NA
3 Last Updated D… 2023-06-29 <NA> <NA> NA NA
4 <NA> <NA> <NA> <NA> NA NA
5 Country Name Country Code Indicator Name Indicator Code 1960 1961
6 Aruba ABW Birth rate, cru… SP.DYN.CBRT.IN 33.9 32.8
7 Africa Eastern… AFE Birth rate, cru… SP.DYN.CBRT.IN 47.4 47.5
8 Afghanistan AFG Birth rate, cru… SP.DYN.CBRT.IN 50.3 50.4
# ℹ 61 more variables: `1962` <dbl>, `1963` <dbl>, `1964` <dbl>, `1965` <dbl>,
# `1966` <dbl>, `1967` <dbl>, `1968` <dbl>, `1969` <dbl>, `1970` <dbl>,
# `1971` <dbl>, `1972` <dbl>, `1973` <dbl>, `1974` <dbl>, `1975` <dbl>,
# `1976` <dbl>, `1977` <dbl>, `1978` <dbl>, `1979` <dbl>, `1980` <dbl>,
# `1981` <dbl>, `1982` <dbl>, `1983` <dbl>, `1984` <dbl>, `1985` <dbl>,
# `1986` <dbl>, `1987` <dbl>, `1988` <dbl>, `1989` <dbl>, `1990` <dbl>,
# `1991` <dbl>, `1992` <dbl>, `1993` <dbl>, `1994` <dbl>, `1995` <dbl>, …
Now lets drop the first 5 rows, they contain no relevant information for us.
birthrates <- df[-c(1:5), ]
head(birthrates, 8)# A tibble: 8 × 67
`Country Name` `Country Code` `Indicator Name` `Indicator Code` `1960` `1961`
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 Aruba ABW Birth rate, cru… SP.DYN.CBRT.IN 33.9 32.8
2 Africa Eastern… AFE Birth rate, cru… SP.DYN.CBRT.IN 47.4 47.5
3 Afghanistan AFG Birth rate, cru… SP.DYN.CBRT.IN 50.3 50.4
4 Africa Western… AFW Birth rate, cru… SP.DYN.CBRT.IN 47.3 47.4
5 Angola AGO Birth rate, cru… SP.DYN.CBRT.IN 51.0 51.3
6 Albania ALB Birth rate, cru… SP.DYN.CBRT.IN 41.1 40.3
7 Andorra AND Birth rate, cru… SP.DYN.CBRT.IN NA NA
8 Arab World ARB Birth rate, cru… SP.DYN.CBRT.IN 47.6 47.6
# ℹ 61 more variables: `1962` <dbl>, `1963` <dbl>, `1964` <dbl>, `1965` <dbl>,
# `1966` <dbl>, `1967` <dbl>, `1968` <dbl>, `1969` <dbl>, `1970` <dbl>,
# `1971` <dbl>, `1972` <dbl>, `1973` <dbl>, `1974` <dbl>, `1975` <dbl>,
# `1976` <dbl>, `1977` <dbl>, `1978` <dbl>, `1979` <dbl>, `1980` <dbl>,
# `1981` <dbl>, `1982` <dbl>, `1983` <dbl>, `1984` <dbl>, `1985` <dbl>,
# `1986` <dbl>, `1987` <dbl>, `1988` <dbl>, `1989` <dbl>, `1990` <dbl>,
# `1991` <dbl>, `1992` <dbl>, `1993` <dbl>, `1994` <dbl>, `1995` <dbl>, …
I will use glimpse to get an idea of the columns and data
glimpse(birthrates)Rows: 266
Columns: 67
$ `Country Name` <chr> "Aruba", "Africa Eastern and Southern", "Afghanistan"…
$ `Country Code` <chr> "ABW", "AFE", "AFG", "AFW", "AGO", "ALB", "AND", "ARB…
$ `Indicator Name` <chr> "Birth rate, crude (per 1,000 people)", "Birth rate, …
$ `Indicator Code` <chr> "SP.DYN.CBRT.IN", "SP.DYN.CBRT.IN", "SP.DYN.CBRT.IN",…
$ `1960` <dbl> 33.88300, 47.43855, 50.34000, 47.32548, 51.02600, 41.…
$ `1961` <dbl> 32.83100, 47.53055, 50.44300, 47.42105, 51.28200, 40.…
$ `1962` <dbl> 31.64900, 47.59756, 50.57000, 47.52922, 51.31600, 39.…
$ `1963` <dbl> 30.41600, 47.63614, 50.70300, 47.53103, 51.32300, 38.…
$ `1964` <dbl> 29.14700, 47.64548, 50.83100, 47.51192, 51.28200, 36.…
$ `1965` <dbl> 27.88900, 47.66766, 50.87200, 47.46857, 51.28200, 35.…
$ `1966` <dbl> 26.66300, 47.69789, 50.98600, 47.44364, 51.29500, 34.…
$ `1967` <dbl> 25.50300, 47.69133, 51.08100, 47.42593, 51.31400, 33.…
$ `1968` <dbl> 24.59200, 47.69102, 51.14800, 47.42235, 51.34800, 33.…
$ `1969` <dbl> 23.73500, 47.72112, 51.19500, 47.41269, 51.35300, 33.…
$ `1970` <dbl> 22.97400, 47.67313, 51.12200, 47.41411, 51.26700, 32.…
$ `1971` <dbl> 22.31300, 47.64967, 51.16300, 47.52970, 50.69800, 31.…
$ `1972` <dbl> 21.76600, 47.47074, 51.10900, 47.57899, 50.47400, 31.…
$ `1973` <dbl> 21.49200, 47.22113, 51.11400, 47.63283, 50.46700, 30.…
$ `1974` <dbl> 21.38200, 47.07547, 51.13500, 47.81713, 50.47200, 30.…
$ `1975` <dbl> 21.39300, 46.95020, 51.01800, 47.91150, 50.46900, 29.…
$ `1976` <dbl> 21.48500, 46.79183, 50.93500, 47.86907, 50.51400, 29.…
$ `1977` <dbl> 21.73900, 46.63214, 50.92100, 47.96894, 50.52300, 28.…
$ `1978` <dbl> 21.92000, 46.51202, 50.81600, 48.03727, 50.61600, 27.…
$ `1979` <dbl> 21.99300, 46.47196, 50.73700, 47.93830, 50.73200, 27.…
$ `1980` <dbl> 21.93100, 46.33961, 50.48200, 47.77071, 50.89200, 26.…
$ `1981` <dbl> 21.73000, 46.23755, 50.26400, 47.51406, 51.10900, 26.…
$ `1982` <dbl> 21.47600, 46.15826, 50.13800, 47.25192, 51.30700, 26.…
$ `1983` <dbl> 21.38500, 46.13473, 50.13900, 47.11112, 51.61000, 26.…
$ `1984` <dbl> 21.18300, 46.13520, 50.23500, 46.70656, 51.93500, 26.…
$ `1985` <dbl> 20.91900, 46.14379, 50.55300, 46.20665, 52.13600, 26.…
$ `1986` <dbl> 20.61700, 46.06749, 50.72800, 45.72924, 52.19000, 25.…
$ `1987` <dbl> 20.26900, 45.83058, 50.84500, 45.34627, 52.14600, 25.…
$ `1988` <dbl> 19.82100, 45.33736, 50.98000, 45.00171, 51.97300, 25.…
$ `1989` <dbl> 19.18400, 44.81301, 51.16200, 44.92848, 51.69900, 24.…
$ `1990` <dbl> 18.66200, 44.23072, 51.42300, 44.67619, 51.34400, 24.…
$ `1991` <dbl> 17.72200, 43.84232, 51.78800, 44.47423, 50.92600, 23.…
$ `1992` <dbl> 16.44300, 43.34168, 51.94800, 44.30932, 50.37400, 23.…
$ `1993` <dbl> 16.12600, 42.96601, 52.03800, 44.16810, 49.89300, 22.…
$ `1994` <dbl> 15.43100, 42.53329, 52.17400, 43.94269, 49.55000, 22.…
$ `1995` <dbl> 15.99100, 42.48572, 52.07300, 43.73024, 49.18500, 21.…
$ `1996` <dbl> 16.15300, 42.13563, 51.87300, 43.49103, 48.86000, 20.…
$ `1997` <dbl> 16.38800, 41.57346, 51.40000, 43.21922, 48.41200, 19.…
$ `1998` <dbl> 15.07800, 41.12879, 50.88000, 43.02697, 48.00900, 18.…
$ `1999` <dbl> 14.36100, 40.89482, 50.35100, 43.17424, 47.77300, 17.…
$ `2000` <dbl> 14.42700, 40.52824, 49.66400, 43.19955, 47.64700, 17.…
$ `2001` <dbl> 13.73900, 40.34121, 48.97900, 43.07550, 47.57400, 16.…
$ `2002` <dbl> 12.99200, 40.04732, 48.20100, 42.92712, 47.44800, 15.…
$ `2003` <dbl> 12.62100, 39.75014, 47.35000, 42.74688, 47.22600, 14.…
$ `2004` <dbl> 11.92100, 39.57589, 46.33000, 42.50272, 47.09900, 13.…
$ `2005` <dbl> 12.34800, 39.40739, 45.26300, 42.42154, 46.94400, 13.…
$ `2006` <dbl> 13.05500, 39.23711, 44.72100, 42.19330, 46.64300, 12.…
$ `2007` <dbl> 12.96200, 39.00052, 43.85800, 41.94301, 46.29000, 12.…
$ `2008` <dbl> 12.74800, 38.85169, 41.50600, 41.75479, 45.88900, 11.…
$ `2009` <dbl> 12.35000, 38.36494, 41.15700, 41.50376, 45.49500, 11.…
$ `2010` <dbl> 12.19300, 37.94026, 40.60200, 41.21963, 44.97000, 11.…
$ `2011` <dbl> 12.24600, 37.48399, 39.85500, 40.89424, 44.36400, 12.…
$ `2012` <dbl> 12.72300, 36.92130, 40.00900, 40.41643, 43.86000, 12.…
$ `2013` <dbl> 13.31600, 36.44714, 39.60100, 39.85651, 43.28200, 12.…
$ `2014` <dbl> 13.53300, 36.02832, 39.10500, 39.33535, 42.67600, 12.…
$ `2015` <dbl> 12.42800, 35.61331, 38.80300, 38.85921, 42.02000, 11.…
$ `2016` <dbl> 12.30000, 35.18902, 37.93600, 38.39310, 41.37700, 11.…
$ `2017` <dbl> 11.53000, 34.89254, 37.34200, 37.88166, 40.81000, 10.…
$ `2018` <dbl> 9.88100, 34.61102, 36.92700, 37.44709, 40.23600, 10.5…
$ `2019` <dbl> 9.13800, 34.34145, 36.46600, 37.02783, 39.72500, 10.3…
$ `2020` <dbl> 8.10200, 33.91675, 36.05100, 36.61573, 39.27100, 10.2…
$ `2021` <dbl> 7.19300, 33.54627, 35.84200, 36.23703, 38.80900, 10.2…
$ `2022` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
We can see that the columns Indicator Name and Indicator Code seem reducant. We can check to see if the entire column is the same information with unique.
indicator_names <- unique(birthrates$`Indicator Name`)
indicator_codes <- unique(birthrates$`Indicator Code`)
indicator_names[1] "Birth rate, crude (per 1,000 people)"
indicator_codes[1] "SP.DYN.CBRT.IN"
There is only one value for both of these columns, we can just drop these columns they are useless.
birthrates <- birthrates[-c(3:4)]
birthrates# A tibble: 266 × 65
`Country Name` `Country Code` `1960` `1961` `1962` `1963` `1964` `1965`
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Aruba ABW 33.9 32.8 31.6 30.4 29.1 27.9
2 Africa Eastern and … AFE 47.4 47.5 47.6 47.6 47.6 47.7
3 Afghanistan AFG 50.3 50.4 50.6 50.7 50.8 50.9
4 Africa Western and … AFW 47.3 47.4 47.5 47.5 47.5 47.5
5 Angola AGO 51.0 51.3 51.3 51.3 51.3 51.3
6 Albania ALB 41.1 40.3 39.2 38.1 36.8 35.4
7 Andorra AND NA NA NA NA NA NA
8 Arab World ARB 47.6 47.6 47.9 47.6 47.3 46.9
9 United Arab Emirates ARE 41.8 41.4 41.1 40.6 40.0 39.4
10 Argentina ARG 23.8 23.6 23.8 23.7 23.4 23.3
# ℹ 256 more rows
# ℹ 57 more variables: `1966` <dbl>, `1967` <dbl>, `1968` <dbl>, `1969` <dbl>,
# `1970` <dbl>, `1971` <dbl>, `1972` <dbl>, `1973` <dbl>, `1974` <dbl>,
# `1975` <dbl>, `1976` <dbl>, `1977` <dbl>, `1978` <dbl>, `1979` <dbl>,
# `1980` <dbl>, `1981` <dbl>, `1982` <dbl>, `1983` <dbl>, `1984` <dbl>,
# `1985` <dbl>, `1986` <dbl>, `1987` <dbl>, `1988` <dbl>, `1989` <dbl>,
# `1990` <dbl>, `1991` <dbl>, `1992` <dbl>, `1993` <dbl>, `1994` <dbl>, …
Our data is now in a better format, but it needs to be made longer due to each year containing a column. We can use pivot longer for this.
birthrates_longer <- birthrates %>%
pivot_longer(
cols = !(c('Country Name', 'Country Code')),
names_to = "year",
values_to = "birthrate_per_1000"
)
head(birthrates_longer, 8)# A tibble: 8 × 4
`Country Name` `Country Code` year birthrate_per_1000
<chr> <chr> <chr> <dbl>
1 Aruba ABW 1960 33.9
2 Aruba ABW 1961 32.8
3 Aruba ABW 1962 31.6
4 Aruba ABW 1963 30.4
5 Aruba ABW 1964 29.1
6 Aruba ABW 1965 27.9
7 Aruba ABW 1966 26.7
8 Aruba ABW 1967 25.5
Our years are the chr data type, convert them to an appropriate type.
birthrates_longer$year <- as.numeric(birthrates_longer$year)
head(birthrates_longer, 8)# A tibble: 8 × 4
`Country Name` `Country Code` year birthrate_per_1000
<chr> <chr> <dbl> <dbl>
1 Aruba ABW 1960 33.9
2 Aruba ABW 1961 32.8
3 Aruba ABW 1962 31.6
4 Aruba ABW 1963 30.4
5 Aruba ABW 1964 29.1
6 Aruba ABW 1965 27.9
7 Aruba ABW 1966 26.7
8 Aruba ABW 1967 25.5
We can see from the head of the data frame that the years went from columns to individual rows. We can also check to see if we have NA values in our dataframe.
colSums(is.na(birthrates_longer)) Country Name Country Code year birthrate_per_1000
0 0 0 721
There are NA values, lets display these values.
na_birthrates <- birthrates_longer %>%
filter(is.na(birthrate_per_1000))
na_birthrates # A tibble: 721 × 4
`Country Name` `Country Code` year birthrate_per_1000
<chr> <chr> <dbl> <dbl>
1 Aruba ABW 2022 NA
2 Africa Eastern and Southern AFE 2022 NA
3 Afghanistan AFG 2022 NA
4 Africa Western and Central AFW 2022 NA
5 Angola AGO 2022 NA
6 Albania ALB 2022 NA
7 Andorra AND 1960 NA
8 Andorra AND 1961 NA
9 Andorra AND 1962 NA
10 Andorra AND 1963 NA
# ℹ 711 more rows
I look through to see the different rows that have NA values, It looks like the data on some countries is completely gone. Many countries also are missing 2022 data. I believe the best way to deal with these values is to just drop them. I don’t think supplimenting them with the average of the country or overall average of the year would be useful.
birthrates_longer <- birthrates_longer %>% drop_na()
colSums(is.na(birthrates_longer)) Country Name Country Code year birthrate_per_1000
0 0 0 0
We now have a tidy form of our data that we can work to analyze.
Analysis
We can try to answer some of the questions our classmate asked in their discussion post. We can start off with comparing the trends in birth rates over time for different selected countries or regions. I will start off by looking at the highest overall average birthrates in all countries to see which countries typically have higher birthrates.
birthrate_avg <- birthrates_longer %>%
group_by(`Country Name`) %>%
summarise(avg_birthrate_per_1000 = mean(birthrate_per_1000)) %>%
arrange(desc(avg_birthrate_per_1000))
head(birthrate_avg)# A tibble: 6 × 2
`Country Name` avg_birthrate_per_1000
<chr> <dbl>
1 Niger 53.3
2 Chad 48.6
3 Angola 48.4
4 Mali 47.8
5 South Sudan 47.8
6 Afghanistan 47.7
We can display this information with a table using Gt.
birthrate_avg %>%
gt() %>%
cols_label(
`Country Name` = "Country",
avg_birthrate_per_1000 = "Average Birthrate(per 1000 people)"
) %>%
tab_header(title = md("Average Birthrate by Country")) %>%
tab_options(container.height = 500, container.overflow.y = TRUE) # Set container height and container overflow to add scrolling for large tables.| Average Birthrate by Country | |
| Country | Average Birthrate(per 1000 people) |
|---|---|
| Niger | 53.262226 |
| Chad | 48.574339 |
| Angola | 48.387597 |
| Mali | 47.800435 |
| South Sudan | 47.768903 |
| Afghanistan | 47.665823 |
| Somalia | 47.444919 |
| Malawi | 47.080839 |
| Uganda | 46.981435 |
| Yemen, Rep. | 46.349387 |
| Cote d'Ivoire | 46.255532 |
| Congo, Dem. Rep. | 45.670984 |
| Zambia | 45.665081 |
| Burkina Faso | 45.387387 |
| Burundi | 45.176210 |
| Ethiopia | 44.821129 |
| Pre-demographic dividend | 44.622297 |
| Mozambique | 44.595452 |
| Tanzania | 44.409097 |
| Gambia, The | 44.355048 |
| Africa Western and Central | 44.232591 |
| Heavily indebted poor countries (HIPC) | 44.211938 |
| Nigeria | 44.126452 |
| Sierra Leone | 43.919290 |
| Benin | 43.666129 |
| Liberia | 43.634435 |
| Low income | 43.566217 |
| Central African Republic | 43.545710 |
| Kenya | 43.409887 |
| Sub-Saharan Africa (excluding high income) | 43.376251 |
| Sub-Saharan Africa | 43.374105 |
| Sub-Saharan Africa (IDA & IBRD countries) | 43.374105 |
| Rwanda | 43.244419 |
| Guinea | 43.011968 |
| Africa Eastern and Southern | 42.792442 |
| Senegal | 42.783839 |
| Cameroon | 42.589081 |
| Sudan | 42.357016 |
| Madagascar | 41.929258 |
| Guinea-Bissau | 41.922452 |
| Togo | 41.713694 |
| Least developed countries: UN classification | 41.354148 |
| Equatorial Guinea | 41.104403 |
| Mauritania | 40.801661 |
| Eritrea | 40.646290 |
| IDA blend | 40.626757 |
| IDA total | 40.453477 |
| IDA only | 40.356718 |
| Comoros | 40.115403 |
| Ghana | 39.709435 |
| Zimbabwe | 39.586129 |
| Marshall Islands | 39.576306 |
| Congo, Rep. | 39.451484 |
| Eswatini | 39.124887 |
| Pakistan | 38.954677 |
| Fragile and conflict affected situations | 38.606717 |
| Sao Tome and Principe | 38.282548 |
| Iraq | 38.275323 |
| Timor-Leste | 38.254258 |
| Solomon Islands | 37.992968 |
| Jordan | 37.270194 |
| Tajikistan | 37.201565 |
| Guatemala | 37.195677 |
| Lao PDR | 37.140032 |
| West Bank and Gaza | 37.085625 |
| Vanuatu | 37.030758 |
| Oman | 36.885516 |
| Djibouti | 36.754565 |
| Honduras | 36.703177 |
| Namibia | 36.509935 |
| Papua New Guinea | 36.129242 |
| Botswana | 36.025177 |
| Syrian Arab Republic | 36.024323 |
| Arab World | 35.873112 |
| Nicaragua | 35.511839 |
| Saudi Arabia | 34.934597 |
| Maldives | 34.928935 |
| Bangladesh | 34.904177 |
| Haiti | 34.812694 |
| Lesotho | 34.772000 |
| Gabon | 34.473629 |
| Nepal | 34.461323 |
| Cambodia | 34.448016 |
| Kiribati | 34.370097 |
| Samoa | 34.318403 |
| Bolivia | 34.143903 |
| Egypt, Arab Rep. | 34.008323 |
| Middle East & North Africa (excluding high income) | 33.996087 |
| Middle East & North Africa (IDA & IBRD countries) | 33.934523 |
| Cabo Verde | 33.893419 |
| Micronesia, Fed. Sts. | 33.683403 |
| Bhutan | 33.651210 |
| Middle East & North Africa | 33.600741 |
| Algeria | 33.523161 |
| Belize | 33.087258 |
| Pacific island small states | 32.968387 |
| Philippines | 32.967194 |
| Lower middle income | 32.837176 |
| Libya | 32.419097 |
| Turkmenistan | 32.227258 |
| El Salvador | 32.209032 |
| South Asia | 32.173309 |
| South Asia (IDA & IBRD) | 32.173309 |
| Early-demographic dividend | 31.991848 |
| Tonga | 31.940339 |
| Morocco | 31.814032 |
| Mongolia | 31.686629 |
| Uzbekistan | 31.323226 |
| Dominican Republic | 31.278855 |
| Paraguay | 31.245210 |
| Peru | 30.900403 |
| Other small states | 30.774410 |
| India | 30.731710 |
| Iran, Islamic Rep. | 30.613419 |
| Mexico | 30.606984 |
| Ecuador | 30.373306 |
| South Africa | 29.928097 |
| Small states | 29.638167 |
| Kuwait | 29.190274 |
| Fiji | 29.102984 |
| Guyana | 29.073097 |
| Kyrgyz Republic | 29.009113 |
| Venezuela, RB | 28.937645 |
| Suriname | 28.682274 |
| St. Martin (French part) | 28.652774 |
| Tunisia | 28.648661 |
| Indonesia | 28.514806 |
| Nauru | 28.507323 |
| Low & middle income | 28.406261 |
| Myanmar | 28.384468 |
| IDA & IBRD total | 28.198101 |
| Panama | 28.160823 |
| Turkiye | 27.741306 |
| Latin America & Caribbean (excluding high income) | 27.721713 |
| Latin America & the Caribbean (IDA & IBRD countries) | 27.711546 |
| Tuvalu | 27.681645 |
| Latin America & Caribbean | 27.369910 |
| Middle income | 27.169328 |
| Colombia | 27.149484 |
| St. Lucia | 26.968661 |
| Bahrain | 26.887484 |
| Malaysia | 26.815774 |
| Brunei Darussalam | 26.536339 |
| French Polynesia | 26.475613 |
| Vietnam | 26.439226 |
| Qatar | 26.392468 |
| Kosovo | 26.246468 |
| Lebanon | 26.030194 |
| Guam | 25.960710 |
| St. Vincent and the Grenadines | 25.929548 |
| Brazil | 25.900274 |
| Turks and Caicos Islands | 25.537548 |
| Costa Rica | 25.537500 |
| World | 25.523428 |
| Jamaica | 25.217661 |
| Grenada | 25.016258 |
| Caribbean small states | 24.961436 |
| IBRD only | 24.928587 |
| Azerbaijan | 24.463177 |
| New Caledonia | 24.287903 |
| United Arab Emirates | 24.101790 |
| St. Kitts and Nevis | 23.517452 |
| Kazakhstan | 23.280726 |
| Sri Lanka | 23.125032 |
| Israel | 23.112903 |
| Albania | 22.847694 |
| Thailand | 22.536710 |
| Bahamas, The | 22.408435 |
| East Asia & Pacific (IDA & IBRD countries) | 22.304976 |
| East Asia & Pacific (excluding high income) | 22.294827 |
| Upper middle income | 22.121672 |
| Dominica | 22.116371 |
| Seychelles | 21.990196 |
| Trinidad and Tobago | 21.927129 |
| Virgin Islands (U.S.) | 21.858065 |
| Korea, Dem. People's Rep. | 21.536403 |
| East Asia & Pacific | 21.302650 |
| Chile | 21.124532 |
| Mauritius | 21.069871 |
| Argentina | 20.968613 |
| Antigua and Barbuda | 20.731516 |
| Armenia | 20.685645 |
| Late-demographic dividend | 20.392122 |
| Sint Maarten (Dutch part) | 20.238871 |
| China | 19.923548 |
| Curacao | 18.907952 |
| Korea, Rep. | 18.430758 |
| Puerto Rico | 18.423871 |
| Aruba | 18.377113 |
| Europe & Central Asia (excluding high income) | 18.042320 |
| Cuba | 17.770226 |
| Greenland | 17.638776 |
| Iceland | 17.600000 |
| North Macedonia | 17.553076 |
| British Virgin Islands | 17.512081 |
| Europe & Central Asia (IDA & IBRD countries) | 17.443324 |
| Moldova | 17.431097 |
| American Samoa | 17.420000 |
| Ireland | 17.379032 |
| Uruguay | 17.299371 |
| Cyprus | 17.168016 |
| Georgia | 17.163726 |
| New Zealand | 17.040645 |
| Montenegro | 16.864935 |
| Northern Mariana Islands | 16.850000 |
| Singapore | 16.806452 |
| Barbados | 16.756935 |
| Gibraltar | 16.754113 |
| OECD members | 16.032054 |
| Faroe Islands | 15.798077 |
| Bosnia and Herzegovina | 15.681758 |
| Australia | 15.617742 |
| Palau | 15.450000 |
| United States | 15.375806 |
| North America | 15.275795 |
| Europe & Central Asia | 15.032399 |
| Hong Kong SAR, China | 14.803629 |
| Slovak Republic | 14.796774 |
| Cayman Islands | 14.605405 |
| High income | 14.438668 |
| Canada | 14.390323 |
| Malta | 14.354839 |
| Macao SAR, China | 14.348887 |
| Poland | 14.287097 |
| Portugal | 14.253226 |
| Romania | 14.201613 |
| France | 14.138710 |
| Belarus | 13.879597 |
| Liechtenstein | 13.803226 |
| Post-demographic dividend | 13.715831 |
| Lithuania | 13.688710 |
| Russian Federation | 13.655113 |
| Norway | 13.643548 |
| Bermuda | 13.545000 |
| Central Europe and the Baltics | 13.521889 |
| Netherlands | 13.487097 |
| United Kingdom | 13.482258 |
| Spain | 13.301613 |
| Estonia | 12.887097 |
| European Union | 12.784393 |
| Slovenia | 12.730645 |
| Finland | 12.695161 |
| Denmark | 12.682258 |
| Czechia | 12.609677 |
| Serbia | 12.580032 |
| Euro area | 12.575640 |
| Belgium | 12.524194 |
| Ukraine | 12.503403 |
| Switzerland | 12.490323 |
| Sweden | 12.393548 |
| Channel Islands | 12.388819 |
| Luxembourg | 12.335484 |
| Greece | 12.300000 |
| Croatia | 12.275694 |
| Latvia | 12.269355 |
| Bulgaria | 12.229032 |
| Isle of Man | 12.187919 |
| Austria | 12.040323 |
| Japan | 12.037097 |
| Hungary | 11.920968 |
| Monaco | 11.700000 |
| Italy | 11.669355 |
| Germany | 11.001613 |
| Andorra | 10.768966 |
| San Marino | 8.761111 |
We can see some of the countries with the highest average birthrate being Niger, Chad, Angold, Mali and so on. All of these countries have an average birthrate greater than 47. This is more than 3 times the United States which has an average birthrate of around 15.
To best compare different birth rate trends we can create a time plot to visualize our data. I will select a few countries arbitrarily just because there are so many countries to display at once
countries = c("United Kingdom", "United States", "Germany", "Japan", "Italy", "China", "Niger", "Chad")
displayed_birthrates <- birthrates_longer %>%
filter(`Country Name` %in% countries)
ggplot(data = displayed_birthrates, aes(x = year, y = birthrate_per_1000, color = `Country Name`)
) +
geom_line() +
ggtitle("Selected Countries Birthrates Over time") +
xlab("Year") +
ylab("Birthrate (Per 1000 people)") +
labs(color = "Country")We can see that both Chad and Niger both have significantly higher birthrates throughout most of the past century. Most countries I selected have a similar shape. Interestingly China has a very high birthrate but around the 1970’s it significantly decreased to a similar amount as countries like the United states. Japan has the lowest birthrate at 2020.
After seeing this I was interested in seeing the global average birthrate. We can see with the longer format it is very easy to create graphs and group by certain attributes.
global_birthrate <- birthrates_longer %>%
group_by(year) %>%
summarize(average_global_birthrate = mean(birthrate_per_1000))
ggplot(data = global_birthrate, aes(x = year, y = average_global_birthrate)
) +
geom_line() +
ggtitle("Global Average Birthrate") +
xlab("Year") +
ylab("Birthrate (Per 1000 people)")We can see that the average global birthrate as decreased to an all time low from the 1960’s to 2020.
Summarizing
To Summarize, we were able to use pivot longer to create a longer dataset to work with. We dropped na values. Because we used averages dropping values did not hurt our data too significantly. We compared the birthrate trend of several countries. We also displayed a table of the average birthrates of each country and displayed a plot with the overall global birthrate.