Project 2 — Data Transformations

Author

Brandon Chanderban

Published

March 7, 2026

Introduction/Approach

The objective of this Project Two Data Transformations exercise lies in preparing different datasets (three, specifically) for analysis work to be conducted further downstream. Moreover, these three datasets must be drawn from within the confines of the DATA 607 Week 5A Discussion thread, wherein the class’ students were tasked with identifying untidy datasets and suggesting possible analytical endeavors that may be performed once the datasets have been tidied. In choosing three of these provided datasets, a key requirement of the project sis that only distinct wide-format untidy datasets must be selected.

Once these three datasets are selected, imported and cleaned within RStudio, the specific analyses requested within the original discussion posts will then be executed, including features such as summary tables, visualizations, and brief explanatory narratives where applicable.

Prospective Datasets and their Corresponding Analyses

  1. The first of the three untidy datasets to be examined was outlined in the discussion post that I myself submitted. This Kaggle dataset contains data pertaining to the crude birth rates of countries and maybe viewed here. The dataset meets the requirement of being formatted in a wide structure, with the variable of year (1960 to 2021) spread across several columns. Within each record, there are several identifiers such as the Country Name, Country Code, Indicator Name, and Indicator Code, followed by the year-based data columns.
  • The potential analyses to be executed for this crude birth rate dataset are:

    • comparing the trends in birth rates over time across different regions/countries,

    • visualizing birth rate trends over time, and

    • examining the declines and increases in birth rates to identify any widely shared population changes (and then attempting to associate such changes with relevant real-world health and economic events).

  1. The second dataset to be utilized was posted by classmate Sinem Kilicdere and is another Kaggle dataset; this one examining GDP by country for the years 1960 to 2022. Similar to the prior dataset, this one is also arranged in a wide format, owing to the horizontal spread of year-based columns. The identifier columns for this dataset are Country name and Country code.
  • The prospective analyses to be conducted using this GDP by Country dataset (as set out by Sinem herself) are:

    • comparing GDP growth between countries (potentially using a time series graph),

    • calculating the average GDP across the included years, and

    • determining the top-performing countries in terms of economic output.

  1. The third and final dataset (viewed here) that will be employed within this Project Two was posted by classmate Ciara Bonnett-Jones and contains information pertaining to the renewable energy consumption of countries (represented as a percentage of their final energy consumption). As with both prior datasets, this one is arranged within a wide format, due to the time variable (year) which is spread out horizontally over several columns. In addition to the year columns, this dataset also contains the following variables: Country Name, Country Code, Indicator Name, and Indicator Code.
  • The analysis to be conducted as set out by Ciara’s discussion post centers on comparing the growth in renewable energy usage across different geographic regions over a ten year period.

Proposed Plan

For each of the three selected datasets, I will first obtain a raw CSV version of the data that preserves its original wide formatting, and then commit it to my personal GitHub repository. Each dataset will then be imported into RStudio via the read_csv() function, after which a reproducible tidying pipeline will be applied using tidyr and dplyr. The main transformation will involve reshaping the wide year-based columns into a tidy long format using pivot_longer(), followed by general standardization and cleaning such as consistent variable naming and well-document handling of missing or inconsistent values. Finally, I will complete the required Discussion 5A analyses for each dataset using only the tidied version, supported by summary tables, visualizations, and brief narrative interpretations.

Potential Challenges

One expected challenge involves missing values within the datasets, especially within the earlier examined years. In this, whether rows are retained (with missing values kept as NAs) or filtered out specific analyses will be noted and documented. Additionally, because these selected datasets span a wide range of countries and years, the resulting visualizations may become cluttered if every if plotted at once. As such, there may be cases where only a subset of countries (for instance, the top or bottom performers based on a calculated metric) will be focused on for clarity, while still maintaining a reproducible workflow.

Dataset One: Crude Birth Rates

Step 3.1: Raw Data Construction

Within the confines of this specific step, the dataset corresponding to the first crude birth rate analyses was downloaded from its discovered location (Kaggle), and pushed to a personal GitHub repository. The file itself contains all information from the original crude birth rates dataset and its wide format has been preserved. The dataset will be imported into RStudio for tidying, and further analyses, in the subsequent step.

Step 3.2: Data Import and Tidying

Now that the raw wide format CSV file has been committed and made accessible, the next step is to import it into RStudio and apply a reproducible tidying pipeline. Since the year values are currently spread across multiple columns, the principal transformation will involve reshaping the dataset from wide to long format through the use of pivot_longer(). Thereafter, the variable names will be standardized, relevant identifier columns retained, and the value column converted into a numeric format suitable for analysis.

Code
library(tidyverse)
library(janitor)
library(dplyr)
library(gt)
Code
birth_rate_url <- "https://raw.githubusercontent.com/bkchanderban/CUNY_SPS/refs/heads/main/DATA607/DATA607/Project%20Two%20Assignment/crude_birth_rate_dataset.csv"

birth_rate_raw <- read_csv(birth_rate_url, show_col_types = FALSE)

glimpse(birth_rate_raw)
Rows: 270
Columns: 67
$ `Data Source`                  <chr> NA, "Last Updated Date", NA, "Country N…
$ `World Development Indicators` <chr> NA, "29/06/2023", NA, "Country Code", "…
$ ...3                           <chr> NA, NA, NA, "Indicator Name", "Birth ra…
$ ...4                           <chr> NA, NA, NA, "Indicator Code", "SP.DYN.C…
$ ...5                           <dbl> NA, NA, NA, 1960.00000, 33.88300, 47.43…
$ ...6                           <dbl> NA, NA, NA, 1961.00000, 32.83100, 47.53…
$ ...7                           <dbl> NA, NA, NA, 1962.00000, 31.64900, 47.59…
$ ...8                           <dbl> NA, NA, NA, 1963.00000, 30.41600, 47.63…
$ ...9                           <dbl> NA, NA, NA, 1964.00000, 29.14700, 47.64…
$ ...10                          <dbl> NA, NA, NA, 1965.00000, 27.88900, 47.66…
$ ...11                          <dbl> NA, NA, NA, 1966.00000, 26.66300, 47.69…
$ ...12                          <dbl> NA, NA, NA, 1967.00000, 25.50300, 47.69…
$ ...13                          <dbl> NA, NA, NA, 1968.00000, 24.59200, 47.69…
$ ...14                          <dbl> NA, NA, NA, 1969.00000, 23.73500, 47.72…
$ ...15                          <dbl> NA, NA, NA, 1970.00000, 22.97400, 47.67…
$ ...16                          <dbl> NA, NA, NA, 1971.00000, 22.31300, 47.64…
$ ...17                          <dbl> NA, NA, NA, 1972.00000, 21.76600, 47.47…
$ ...18                          <dbl> NA, NA, NA, 1973.00000, 21.49200, 47.22…
$ ...19                          <dbl> NA, NA, NA, 1974.00000, 21.38200, 47.07…
$ ...20                          <dbl> NA, NA, NA, 1975.00000, 21.39300, 46.95…
$ ...21                          <dbl> NA, NA, NA, 1976.00000, 21.48500, 46.79…
$ ...22                          <dbl> NA, NA, NA, 1977.00000, 21.73900, 46.63…
$ ...23                          <dbl> NA, NA, NA, 1978.00000, 21.92000, 46.51…
$ ...24                          <dbl> NA, NA, NA, 1979.00000, 21.99300, 46.47…
$ ...25                          <dbl> NA, NA, NA, 1980.00000, 21.93100, 46.33…
$ ...26                          <dbl> NA, NA, NA, 1981.00000, 21.73000, 46.23…
$ ...27                          <dbl> NA, NA, NA, 1982.00000, 21.47600, 46.15…
$ ...28                          <dbl> NA, NA, NA, 1983.00000, 21.38500, 46.13…
$ ...29                          <dbl> NA, NA, NA, 1984.00000, 21.18300, 46.13…
$ ...30                          <dbl> NA, NA, NA, 1985.00000, 20.91900, 46.14…
$ ...31                          <dbl> NA, NA, NA, 1986.00000, 20.61700, 46.06…
$ ...32                          <dbl> NA, NA, NA, 1987.00000, 20.26900, 45.83…
$ ...33                          <dbl> NA, NA, NA, 1988.00000, 19.82100, 45.33…
$ ...34                          <dbl> NA, NA, NA, 1989.00000, 19.18400, 44.81…
$ ...35                          <dbl> NA, NA, NA, 1990.00000, 18.66200, 44.23…
$ ...36                          <dbl> NA, NA, NA, 1991.00000, 17.72200, 43.84…
$ ...37                          <dbl> NA, NA, NA, 1992.00000, 16.44300, 43.34…
$ ...38                          <dbl> NA, NA, NA, 1993.00000, 16.12600, 42.96…
$ ...39                          <dbl> NA, NA, NA, 1994.00000, 15.43100, 42.53…
$ ...40                          <dbl> NA, NA, NA, 1995.00000, 15.99100, 42.48…
$ ...41                          <dbl> NA, NA, NA, 1996.00000, 16.15300, 42.13…
$ ...42                          <dbl> NA, NA, NA, 1997.00000, 16.38800, 41.57…
$ ...43                          <dbl> NA, NA, NA, 1998.00000, 15.07800, 41.12…
$ ...44                          <dbl> NA, NA, NA, 1999.00000, 14.36100, 40.89…
$ ...45                          <dbl> NA, NA, NA, 2000.00000, 14.42700, 40.52…
$ ...46                          <dbl> NA, NA, NA, 2001.00000, 13.73900, 40.34…
$ ...47                          <dbl> NA, NA, NA, 2002.00000, 12.99200, 40.04…
$ ...48                          <dbl> NA, NA, NA, 2003.00000, 12.62100, 39.75…
$ ...49                          <dbl> NA, NA, NA, 2004.00000, 11.92100, 39.57…
$ ...50                          <dbl> NA, NA, NA, 2005.00000, 12.34800, 39.40…
$ ...51                          <dbl> NA, NA, NA, 2006.00000, 13.05500, 39.23…
$ ...52                          <dbl> NA, NA, NA, 2007.00000, 12.96200, 39.00…
$ ...53                          <dbl> NA, NA, NA, 2008.00000, 12.74800, 38.85…
$ ...54                          <dbl> NA, NA, NA, 2009.00000, 12.35000, 38.36…
$ ...55                          <dbl> NA, NA, NA, 2010.00000, 12.19300, 37.94…
$ ...56                          <dbl> NA, NA, NA, 2011.00000, 12.24600, 37.48…
$ ...57                          <dbl> NA, NA, NA, 2012.00000, 12.72300, 36.92…
$ ...58                          <dbl> NA, NA, NA, 2013.00000, 13.31600, 36.44…
$ ...59                          <dbl> NA, NA, NA, 2014.00000, 13.53300, 36.02…
$ ...60                          <dbl> NA, NA, NA, 2015.00000, 12.42800, 35.61…
$ ...61                          <dbl> NA, NA, NA, 2016.00000, 12.30000, 35.18…
$ ...62                          <dbl> NA, NA, NA, 2017.00000, 11.53000, 34.89…
$ ...63                          <dbl> NA, NA, NA, 2018.00000, 9.88100, 34.611…
$ ...64                          <dbl> NA, NA, NA, 2019.00000, 9.13800, 34.341…
$ ...65                          <dbl> NA, NA, NA, 2020.00000, 8.10200, 33.916…
$ ...66                          <dbl> NA, NA, NA, 2021.00000, 7.19300, 33.546…
$ ...67                          <dbl> NA, NA, NA, 2022, NA, NA, NA, NA, NA, N…
Code
head(birth_rate_raw)
# A tibble: 6 × 67
  `Data Source`   World Development In…¹ ...3  ...4    ...5   ...6   ...7   ...8
  <chr>           <chr>                  <chr> <chr>  <dbl>  <dbl>  <dbl>  <dbl>
1 <NA>            <NA>                   <NA>  <NA>    NA     NA     NA     NA  
2 Last Updated D… 29/06/2023             <NA>  <NA>    NA     NA     NA     NA  
3 <NA>            <NA>                   <NA>  <NA>    NA     NA     NA     NA  
4 Country Name    Country Code           Indi… Indi… 1960   1961   1962   1963  
5 Aruba           ABW                    Birt… SP.D…   33.9   32.8   31.6   30.4
6 Africa Eastern… AFE                    Birt… SP.D…   47.4   47.5   47.6   47.6
# ℹ abbreviated name: ¹​`World Development Indicators`
# ℹ 59 more variables: ...9 <dbl>, ...10 <dbl>, ...11 <dbl>, ...12 <dbl>,
#   ...13 <dbl>, ...14 <dbl>, ...15 <dbl>, ...16 <dbl>, ...17 <dbl>,
#   ...18 <dbl>, ...19 <dbl>, ...20 <dbl>, ...21 <dbl>, ...22 <dbl>,
#   ...23 <dbl>, ...24 <dbl>, ...25 <dbl>, ...26 <dbl>, ...27 <dbl>,
#   ...28 <dbl>, ...29 <dbl>, ...30 <dbl>, ...31 <dbl>, ...32 <dbl>,
#   ...33 <dbl>, ...34 <dbl>, ...35 <dbl>, ...36 <dbl>, ...37 <dbl>, …

Now that we have the crude birth rates dataset imported, we can now conduct some feature engineering, beginning with the removal of the first three metadata rows of the CSV file.

Code
birth_rate_cleaned <- birth_rate_raw %>%
  slice(-1,-2,-3) #remove the first three rows

Our next step is take the new first row (which contains strings such as Country Name, Country Code, etc.) and make these the column/variable names instead.

Code
colnames(birth_rate_cleaned) <- as.character(birth_rate_cleaned[1,])

birth_rate_cleaned <- birth_rate_cleaned %>%
  slice(-1) #remove the first row that was used to created the  headers

head(birth_rate_cleaned)
# A tibble: 6 × 67
  `Country Name`  `Country Code` `Indicator Name` `Indicator Code` `1960` `1961`
  <chr>           <chr>          <chr>            <chr>             <dbl>  <dbl>
1 Aruba           ABW            Birth rate, cru… SP.DYN.CBRT.IN     33.9   32.8
2 Africa Eastern… AFE            Birth rate, cru… SP.DYN.CBRT.IN     47.4   47.5
3 Afghanistan     AFG            Birth rate, cru… SP.DYN.CBRT.IN     50.3   50.4
4 Africa Western… AFW            Birth rate, cru… SP.DYN.CBRT.IN     47.3   47.4
5 Angola          AGO            Birth rate, cru… SP.DYN.CBRT.IN     51.0   51.3
6 Albania         ALB            Birth rate, cru… SP.DYN.CBRT.IN     41.1   40.3
# ℹ 61 more variables: `1962` <dbl>, `1963` <dbl>, `1964` <dbl>, `1965` <dbl>,
#   `1966` <dbl>, `1967` <dbl>, `1968` <dbl>, `1969` <dbl>, `1970` <dbl>,
#   `1971` <dbl>, `1972` <dbl>, `1973` <dbl>, `1974` <dbl>, `1975` <dbl>,
#   `1976` <dbl>, `1977` <dbl>, `1978` <dbl>, `1979` <dbl>, `1980` <dbl>,
#   `1981` <dbl>, `1982` <dbl>, `1983` <dbl>, `1984` <dbl>, `1985` <dbl>,
#   `1986` <dbl>, `1987` <dbl>, `1988` <dbl>, `1989` <dbl>, `1990` <dbl>,
#   `1991` <dbl>, `1992` <dbl>, `1993` <dbl>, `1994` <dbl>, `1995` <dbl>, …

And now we can clean our column names for best practices.

Code
birth_rate_cleaned <- birth_rate_cleaned %>%
  clean_names()

glimpse(birth_rate_cleaned)
Rows: 266
Columns: 67
$ country_name   <chr> "Aruba", "Africa Eastern and Southern", "Afghanistan", …
$ country_code   <chr> "ABW", "AFE", "AFG", "AFW", "AGO", "ALB", "AND", "ARB",…
$ indicator_name <chr> "Birth rate, crude (per 1,000 people)", "Birth rate, cr…
$ indicator_code <chr> "SP.DYN.CBRT.IN", "SP.DYN.CBRT.IN", "SP.DYN.CBRT.IN", "…
$ x1960          <dbl> 33.88300, 47.43855, 50.34000, 47.32548, 51.02600, 41.06…
$ x1961          <dbl> 32.83100, 47.53055, 50.44300, 47.42105, 51.28200, 40.33…
$ x1962          <dbl> 31.64900, 47.59756, 50.57000, 47.52922, 51.31600, 39.21…
$ x1963          <dbl> 30.41600, 47.63614, 50.70300, 47.53103, 51.32300, 38.14…
$ x1964          <dbl> 29.14700, 47.64548, 50.83100, 47.51192, 51.28200, 36.82…
$ x1965          <dbl> 27.88900, 47.66766, 50.87200, 47.46857, 51.28200, 35.36…
$ x1966          <dbl> 26.66300, 47.69789, 50.98600, 47.44364, 51.29500, 34.35…
$ x1967          <dbl> 25.50300, 47.69133, 51.08100, 47.42593, 51.31400, 33.89…
$ x1968          <dbl> 24.59200, 47.69102, 51.14800, 47.42235, 51.34800, 33.51…
$ x1969          <dbl> 23.73500, 47.72112, 51.19500, 47.41269, 51.35300, 33.07…
$ x1970          <dbl> 22.97400, 47.67313, 51.12200, 47.41411, 51.26700, 32.20…
$ x1971          <dbl> 22.31300, 47.64967, 51.16300, 47.52970, 50.69800, 31.74…
$ x1972          <dbl> 21.76600, 47.47074, 51.10900, 47.57899, 50.47400, 31.23…
$ x1973          <dbl> 21.49200, 47.22113, 51.11400, 47.63283, 50.46700, 30.69…
$ x1974          <dbl> 21.38200, 47.07547, 51.13500, 47.81713, 50.47200, 30.29…
$ x1975          <dbl> 21.39300, 46.95020, 51.01800, 47.91150, 50.46900, 29.98…
$ x1976          <dbl> 21.48500, 46.79183, 50.93500, 47.86907, 50.51400, 29.35…
$ x1977          <dbl> 21.73900, 46.63214, 50.92100, 47.96894, 50.52300, 28.58…
$ x1978          <dbl> 21.92000, 46.51202, 50.81600, 48.03727, 50.61600, 27.88…
$ x1979          <dbl> 21.99300, 46.47196, 50.73700, 47.93830, 50.73200, 27.20…
$ x1980          <dbl> 21.93100, 46.33961, 50.48200, 47.77071, 50.89200, 26.76…
$ x1981          <dbl> 21.73000, 46.23755, 50.26400, 47.51406, 51.10900, 26.65…
$ x1982          <dbl> 21.47600, 46.15826, 50.13800, 47.25192, 51.30700, 26.79…
$ x1983          <dbl> 21.38500, 46.13473, 50.13900, 47.11112, 51.61000, 26.72…
$ x1984          <dbl> 21.18300, 46.13520, 50.23500, 46.70656, 51.93500, 26.62…
$ x1985          <dbl> 20.91900, 46.14379, 50.55300, 46.20665, 52.13600, 26.29…
$ x1986          <dbl> 20.61700, 46.06749, 50.72800, 45.72924, 52.19000, 25.93…
$ x1987          <dbl> 20.26900, 45.83058, 50.84500, 45.34627, 52.14600, 25.68…
$ x1988          <dbl> 19.82100, 45.33736, 50.98000, 45.00171, 51.97300, 25.36…
$ x1989          <dbl> 19.18400, 44.81301, 51.16200, 44.92848, 51.69900, 24.86…
$ x1990          <dbl> 18.66200, 44.23072, 51.42300, 44.67619, 51.34400, 24.41…
$ x1991          <dbl> 17.72200, 43.84232, 51.78800, 44.47423, 50.92600, 23.72…
$ x1992          <dbl> 16.44300, 43.34168, 51.94800, 44.30932, 50.37400, 23.14…
$ x1993          <dbl> 16.12600, 42.96601, 52.03800, 44.16810, 49.89300, 22.47…
$ x1994          <dbl> 15.43100, 42.53329, 52.17400, 43.94269, 49.55000, 22.24…
$ x1995          <dbl> 15.99100, 42.48572, 52.07300, 43.73024, 49.18500, 21.80…
$ x1996          <dbl> 16.15300, 42.13563, 51.87300, 43.49103, 48.86000, 20.98…
$ x1997          <dbl> 16.38800, 41.57346, 51.40000, 43.21922, 48.41200, 19.81…
$ x1998          <dbl> 15.07800, 41.12879, 50.88000, 43.02697, 48.00900, 18.80…
$ x1999          <dbl> 14.36100, 40.89482, 50.35100, 43.17424, 47.77300, 17.98…
$ x2000          <dbl> 14.42700, 40.52824, 49.66400, 43.19955, 47.64700, 17.07…
$ x2001          <dbl> 13.73900, 40.34121, 48.97900, 43.07550, 47.57400, 16.12…
$ x2002          <dbl> 12.99200, 40.04732, 48.20100, 42.92712, 47.44800, 15.20…
$ x2003          <dbl> 12.62100, 39.75014, 47.35000, 42.74688, 47.22600, 14.66…
$ x2004          <dbl> 11.92100, 39.57589, 46.33000, 42.50272, 47.09900, 13.97…
$ x2005          <dbl> 12.34800, 39.40739, 45.26300, 42.42154, 46.94400, 13.23…
$ x2006          <dbl> 13.05500, 39.23711, 44.72100, 42.19330, 46.64300, 12.53…
$ x2007          <dbl> 12.96200, 39.00052, 43.85800, 41.94301, 46.29000, 12.07…
$ x2008          <dbl> 12.74800, 38.85169, 41.50600, 41.75479, 45.88900, 11.86…
$ x2009          <dbl> 12.35000, 38.36494, 41.15700, 41.50376, 45.49500, 11.84…
$ x2010          <dbl> 12.19300, 37.94026, 40.60200, 41.21963, 44.97000, 11.92…
$ x2011          <dbl> 12.24600, 37.48399, 39.85500, 40.89424, 44.36400, 12.03…
$ x2012          <dbl> 12.72300, 36.92130, 40.00900, 40.41643, 43.86000, 12.22…
$ x2013          <dbl> 13.31600, 36.44714, 39.60100, 39.85651, 43.28200, 12.27…
$ x2014          <dbl> 13.53300, 36.02832, 39.10500, 39.33535, 42.67600, 12.15…
$ x2015          <dbl> 12.42800, 35.61331, 38.80300, 38.85921, 42.02000, 11.71…
$ x2016          <dbl> 12.30000, 35.18902, 37.93600, 38.39310, 41.37700, 11.28…
$ x2017          <dbl> 11.53000, 34.89254, 37.34200, 37.88166, 40.81000, 10.87…
$ x2018          <dbl> 9.88100, 34.61102, 36.92700, 37.44709, 40.23600, 10.517…
$ x2019          <dbl> 9.13800, 34.34145, 36.46600, 37.02783, 39.72500, 10.343…
$ x2020          <dbl> 8.10200, 33.91675, 36.05100, 36.61573, 39.27100, 10.285…
$ x2021          <dbl> 7.19300, 33.54627, 35.84200, 36.23703, 38.80900, 10.240…
$ x2022          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
Code
head(birth_rate_cleaned)
# A tibble: 6 × 67
  country_name      country_code indicator_name indicator_code x1960 x1961 x1962
  <chr>             <chr>        <chr>          <chr>          <dbl> <dbl> <dbl>
1 Aruba             ABW          Birth rate, c… SP.DYN.CBRT.IN  33.9  32.8  31.6
2 Africa Eastern a… AFE          Birth rate, c… SP.DYN.CBRT.IN  47.4  47.5  47.6
3 Afghanistan       AFG          Birth rate, c… SP.DYN.CBRT.IN  50.3  50.4  50.6
4 Africa Western a… AFW          Birth rate, c… SP.DYN.CBRT.IN  47.3  47.4  47.5
5 Angola            AGO          Birth rate, c… SP.DYN.CBRT.IN  51.0  51.3  51.3
6 Albania           ALB          Birth rate, c… SP.DYN.CBRT.IN  41.1  40.3  39.2
# ℹ 60 more variables: x1963 <dbl>, x1964 <dbl>, x1965 <dbl>, x1966 <dbl>,
#   x1967 <dbl>, x1968 <dbl>, x1969 <dbl>, x1970 <dbl>, x1971 <dbl>,
#   x1972 <dbl>, x1973 <dbl>, x1974 <dbl>, x1975 <dbl>, x1976 <dbl>,
#   x1977 <dbl>, x1978 <dbl>, x1979 <dbl>, x1980 <dbl>, x1981 <dbl>,
#   x1982 <dbl>, x1983 <dbl>, x1984 <dbl>, x1985 <dbl>, x1986 <dbl>,
#   x1987 <dbl>, x1988 <dbl>, x1989 <dbl>, x1990 <dbl>, x1991 <dbl>,
#   x1992 <dbl>, x1993 <dbl>, x1994 <dbl>, x1995 <dbl>, x1996 <dbl>, …

The next step will now comprise executing the pivot_longer() functionality, converting dataset from its wide format into a long one.

Code
birth_rate_tidy <- birth_rate_cleaned %>%
  pivot_longer(
    cols = -c(country_name, country_code, indicator_name, indicator_code),
    names_to = "year",
    values_to = "birth_rate"
  ) %>%
mutate(
  year = str_extract(year, "\\d{4}") %>% as.integer(),
  birth_rate = as.numeric(birth_rate)
)

glimpse(birth_rate_tidy)
Rows: 16,758
Columns: 6
$ country_name   <chr> "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "…
$ country_code   <chr> "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW",…
$ indicator_name <chr> "Birth rate, crude (per 1,000 people)", "Birth rate, cr…
$ indicator_code <chr> "SP.DYN.CBRT.IN", "SP.DYN.CBRT.IN", "SP.DYN.CBRT.IN", "…
$ year           <int> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1…
$ birth_rate     <dbl> 33.883, 32.831, 31.649, 30.416, 29.147, 27.889, 26.663,…
Code
head(birth_rate_tidy)
# A tibble: 6 × 6
  country_name country_code indicator_name       indicator_code  year birth_rate
  <chr>        <chr>        <chr>                <chr>          <int>      <dbl>
1 Aruba        ABW          Birth rate, crude (… SP.DYN.CBRT.IN  1960       33.9
2 Aruba        ABW          Birth rate, crude (… SP.DYN.CBRT.IN  1961       32.8
3 Aruba        ABW          Birth rate, crude (… SP.DYN.CBRT.IN  1962       31.6
4 Aruba        ABW          Birth rate, crude (… SP.DYN.CBRT.IN  1963       30.4
5 Aruba        ABW          Birth rate, crude (… SP.DYN.CBRT.IN  1964       29.1
6 Aruba        ABW          Birth rate, crude (… SP.DYN.CBRT.IN  1965       27.9

At this stage, the dataset has now been converted into a tidy format, with one row representing one country-year observation. Any missing birth rate values will be retained as NAs (a function of pivot_longer() ) for now, and will subsequently only be filtered out where necessary for a given analysis, so as not to compromise the original informational structure of the dataset.

We will then proceed onto the next step for this first crude birth rates dataset.

Step 3.3: Analysis

The first analysis will involve examining broad birth rate trends over time across all countries.

Summary Table: Average Crude Birth Rate By Year

One prospective starting point is to commence by calculating the average birth rate across all included countries for each year.

Code
birth_rate_summary <- birth_rate_tidy %>%
  group_by(year) %>%
  summarise(
    average_birth_rate = mean(birth_rate, na.rm = TRUE),
    .groups = "drop"
  )

birth_rate_summary %>%
  gt() %>%
  tab_header(
    title = "Average Birth Rate by Year",
    subtitle = "Summary of annual birth rates"
  ) %>%
  cols_label(
    year = "Year",
    average_birth_rate = "Average Birth Rate"
  ) %>%
  fmt_number(
    columns = average_birth_rate,
    decimals = 2
  )
Average Birth Rate by Year
Summary of annual birth rates
Year Average Birth Rate
1960 38.10
1961 37.72
1962 37.83
1963 37.71
1964 37.18
1965 36.66
1966 36.08
1967 35.73
1968 35.37
1969 34.99
1970 34.55
1971 34.30
1972 33.95
1973 33.47
1974 33.18
1975 32.84
1976 32.49
1977 32.23
1978 32.05
1979 32.00
1980 31.91
1981 31.82
1982 31.68
1983 31.46
1984 31.14
1985 30.90
1986 30.55
1987 30.20
1988 29.86
1989 29.33
1990 28.97
1991 28.49
1992 27.99
1993 27.44
1994 26.86
1995 26.36
1996 25.89
1997 25.47
1998 24.98
1999 24.61
2000 24.27
2001 23.88
2002 23.54
2003 23.28
2004 23.07
2005 22.88
2006 22.73
2007 22.66
2008 22.63
2009 22.47
2010 22.24
2011 22.08
2012 21.86
2013 21.54
2014 21.30
2015 20.89
2016 20.50
2017 20.11
2018 19.66
2019 19.32
2020 18.99
2021 18.77
2022 NaN

The summary table exhibits a broad overview of how crude birth rates have changed over time on average across all of the included countries. We can take the above a step further and attempt to construct a visualization from these insights.

Visualization: Average Crude Birth Rate Over Time

Code
ggplot(data = birth_rate_summary, aes(x = year, y = average_birth_rate)) +
  geom_line() +
  labs(
    title = "Average Crude Birth Over Time",
    x = "Year",
    y = "Average Crude Birth Rate"
  ) +
  theme(plot.title = element_text(hjust = 0.5))

In acknowledging the afore-presented plot of average crude birth over time, we can notice, perhaps somewhat alarmingly, that on a holistic average, worldwide crude birth rates have evidently decreased rather noticeably within the time frame of the utilized data (from 1960 to 2021). This is evidenced by the negative gradient of the plotted line of values.

Country-Level Comparison

In addition to the overall trend, it may also be beneficial to compare selected countries in terms of how their birth rates have changed over time. For lucidity, the following analysis shall be limited to a chosen number (5) of countries.

Code
selected_countries <- c("Guyana", "United States", "India", "Brazil", "Nigeria")

birth_rate_selected <- birth_rate_tidy %>%
  filter(country_name %in% selected_countries)

ggplot(birth_rate_selected, aes(x = year, y = birth_rate, color = country_name)) +
  geom_line() +
  labs(
    title = "Crude Birth Rate for Selected Countries",
    x = "Year",
    y = "Crude Birth Rate",
    color = "Country"
  ) +
  theme(plot.title = element_text(hjust = 0.5))

This visual comparison above allows us to easier interpret and compare the numeric findings from the tidied birth rate dataframe. In this, we can see that of the five examined countries, the United States observed the lowest crude birth rate figures throughout the entirety of the year range. Inversely, Nigeria, throughout all of the years, maintained the highest birth rates. The remaining three countries (Guyana, India, and Brazil) commenced at almost the same level as Nigeria’s had, but in the course of time, these rates fell to almost the same level as that of the USA. On a whole however, all five countries do follow the general declining trend, which was initially observed in the average plot of crude birth rates (across all of the countries within the initial dataset).

Brief Interpretation/Dataset Conclusion

From the analyses conducted upon the tidied crude birth rate dataset, a clear downward trend in birth rates over time can be observed. The computed yearly averages indicate that, on a global scale, crude birth rates have steadily declined from the early 1960s through to the most recent years of available data. This pattern is also reflected within the country-level comparison, where each of the five examined countries demonstrates a similar long-term decline, albeit at different magnitudes. Nigeria consistently maintains the highest crude birth rates throughout the observed time frame, while the United States exhibits the lowest overall values. Meanwhile, countries such as Guyana, India, and Brazil begin the period with relatively high birth rates but gradually converge toward lower levels over time. Collectively, these findings suggest that declining fertility trends are not isolated to specific countries, but instead represent a broader demographic shift that has occurred globally over the past several decades.

Dataset Two: GDP by Country

Step 3.1: Raw Data Construction

At this stage, the CSV file pertaining to the second GDP by Country dataset was first downloaded from its discerned location (Kaggle, which was indicated by classmate Sinem Kilicdere), and pushed to a personal GitHub repository. The file preserved the original data structure, wherein each year is represented as a separate column, and will now be imported into RStudio for tidying and subsequent analysis.

Step 3.2: Data Import and Tidying

Now that the raw wide format GDP dataset has been committed and made accessible, the next step is to import it into RStudio and apply a reproducible tidying pipeline. Since the year values are already spread across several columns, the principal transformation here will again involve reshaping the dataset from a wide to a long format using pivot_longer(). Thereafter, the variable names will be standardized and the GDP figures converted into a numeric format suitable for downstream analysis.

Code
gdp_url <- "https://raw.githubusercontent.com/bkchanderban/CUNY_SPS/refs/heads/main/DATA607/DATA607/Project%20Two%20Assignment/country_gdp_dataset.csv"

gdp_raw <- read_csv(gdp_url, show_col_types = FALSE)

glimpse(gdp_raw)
Rows: 266
Columns: 65
$ Country        <chr> "Aruba", "Africa Eastern and Southern", "Afghanistan", …
$ `Country Code` <chr> "ABW", "AFE", "AFG", "AFW", "AGO", "ALB", "AND", "ARB",…
$ `1960`         <dbl> NA, 21125015452, 537777811, 10447637853, NA, NA, NA, NA…
$ `1961`         <dbl> NA, 21616228139, 548888896, 11173212080, NA, NA, NA, NA…
$ `1962`         <dbl> NA, 23506279900, 546666678, 11990534018, NA, NA, NA, NA…
$ `1963`         <dbl> NA, 28048360188, 751111191, 12727688165, NA, NA, NA, NA…
$ `1964`         <dbl> NA, 25920665260, 800000044, 13898109284, NA, NA, NA, NA…
$ `1965`         <dbl> NA, 29472103270, 1006666638, 14929792388, NA, NA, NA, N…
$ `1966`         <dbl> NA, 32014368121, 1399999967, 15910837742, NA, NA, NA, N…
$ `1967`         <dbl> NA, 33269509510, 1673333418, 14510579889, NA, NA, NA, N…
$ `1968`         <dbl> NA, 36327785495, 1373333367, 14968235782, NA, NA, NA, 3…
$ `1969`         <dbl> NA, 41638967621, 1408888922, 16979315745, NA, NA, NA, 3…
$ `1970`         <dbl> NA, 44629891649, 1748886596, 23596163865, NA, NA, 78617…
$ `1971`         <dbl> NA, 49173371529, 1831108971, 20936358634, NA, NA, 89406…
$ `1972`         <dbl> NA, 53123459912, 1595555476, 25386169423, NA, NA, 11341…
$ `1973`         <dbl> NA, 69482723444, 1733333264, 31975594565, NA, NA, 15084…
$ `1974`         <dbl> NA, 85380645042, 2155555498, 44416677335, NA, NA, 18655…
$ `1975`         <dbl> NA, 90835426418, 2366666616, 51667190242, NA, NA, 22011…
$ `1976`         <dbl> NA, 90212747243, 2555555567, 62351622300, NA, NA, 22728…
$ `1977`         <dbl> NA, 102241000000, 2953333418, 65595122956, NA, NA, 2539…
$ `1978`         <dbl> NA, 116085000000, 3300000109, 71496496574, NA, NA, 3080…
$ `1979`         <dbl> NA, 134257000000, 3697940410, 88948338390, NA, NA, 4115…
$ `1980`         <dbl> NA, 171218000000, 3641723322, 112439000000, 5930503401,…
$ `1981`         <dbl> NA, 175859000000, 3478787909, 211338000000, 5550483036,…
$ `1982`         <dbl> NA, 168096000000, NA, 187449000000, 5550483036, NA, 375…
$ `1983`         <dbl> NA, 175565000000, NA, 138384000000, 5784341596, NA, 327…
$ `1984`         <dbl> NA, 160647000000, NA, 114516000000, 6131475065, 1857338…
$ `1985`         <dbl> NA, 136759000000, NA, 116777000000, 7554065410, 1897050…
$ `1986`         <dbl> 405586592, 153050000000, NA, 107887000000, 7072536109, …
$ `1987`         <dbl> 487709497, 186658000000, NA, 110729000000, 8084412414, …
$ `1988`         <dbl> 596648045, 204766000000, NA, 109439000000, 8769836769, …
$ `1989`         <dbl> 695530726, 218242000000, NA, 102255000000, 10201780977,…
$ `1990`         <dbl> 764804469, 254062000000, NA, 122387000000, 11229515599,…
$ `1991`         <dbl> 872067039, 276857000000, NA, 118040000000, 12704558517,…
$ `1992`         <dbl> 958659218, 246088000000, NA, 118893000000, 15114352005,…
$ `1993`         <dbl> 1083240223, 242926000000, NA, 99272180411, 11051939102,…
$ `1994`         <dbl> 1245810056, 239611000000, NA, 86636400266, 3390500000, …
$ `1995`         <dbl> 1320670391, 270327000000, NA, 108691000000, 5561222222,…
$ `1996`         <dbl> 1379888268, 269491000000, NA, 126287000000, 7526963964,…
$ `1997`         <dbl> 1531843575, 283446000000, NA, 127602000000, 7648377413,…
$ `1998`         <dbl> 1665363128, 266652000000, NA, 130678000000, 6506229607,…
$ `1999`         <dbl> 1722905028, 263025000000, NA, 138086000000, 6152922943,…
$ `2000`         <dbl> 1873184358, 284759000000, NA, 140946000000, 9129594819,…
$ `2001`         <dbl> 1896648045, 259643000000, NA, 148530000000, 8936079253,…
$ `2002`         <dbl> 1962011173, 266529000000, 3854235264, 177201000000, 152…
$ `2003`         <dbl> 2044134078, 354177000000, 4539496563, 205214000000, 178…
$ `2004`         <dbl> 2.254749e+09, 4.404820e+11, 5.220825e+09, 2.542650e+11,…
$ `2005`         <dbl> 2.359777e+09, 5.139420e+11, 6.226199e+09, 3.108900e+11,…
$ `2006`         <dbl> 2.469832e+09, 5.775870e+11, 6.971383e+09, 3.969210e+11,…
$ `2007`         <dbl> 2.677654e+09, 6.628680e+11, 9.715765e+09, 4.654860e+11,…
$ `2008`         <dbl> 2.843017e+09, 7.105360e+11, 1.024977e+10, 5.677910e+11,…
$ `2009`         <dbl> 2.553631e+09, 7.219010e+11, 1.215484e+10, 5.083630e+11,…
$ `2010`         <dbl> 2.453631e+09, 8.635200e+11, 1.563384e+10, 5.985220e+11,…
$ `2011`         <dbl> 2.637989e+09, 9.678250e+11, 1.819041e+10, 6.820160e+11,…
$ `2012`         <dbl> 2.615084e+09, 9.753550e+11, 2.020357e+10, 7.375890e+11,…
$ `2013`         <dbl> 2.727933e+09, 9.859870e+11, 2.056449e+10, 8.339480e+11,…
$ `2014`         <dbl> 2.791061e+09, 1.006530e+12, 2.055058e+10, 8.943220e+11,…
$ `2015`         <dbl> 2.963128e+09, 9.273480e+11, 1.999814e+10, 7.686450e+11,…
$ `2016`         <dbl> 2.983799e+09, 8.851760e+11, 1.801955e+10, 6.913630e+11,…
$ `2017`         <dbl> 3.092179e+09, 1.021040e+12, 1.889635e+10, 6.848990e+11,…
$ `2018`         <dbl> 3.276188e+09, 1.007200e+12, 1.841886e+10, 7.670260e+11,…
$ `2019`         <dbl> 3.395794e+09, 1.000830e+12, 1.890450e+10, 8.225380e+11,…
$ `2020`         <dbl> 2.610039e+09, 9.275930e+11, 2.014345e+10, 7.864600e+11,…
$ `2021`         <dbl> 3.126019e+09, 1.082000e+12, 1.458314e+10, 8.444600e+11,…
$ `2022`         <dbl> NA, 1.169480e+12, NA, 8.778630e+11, 1.067140e+11, 1.888…
Code
head(gdp_raw)
# A tibble: 6 × 65
  Country   `Country Code`   `1960`   `1961`   `1962`   `1963`   `1964`   `1965`
  <chr>     <chr>             <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
1 Aruba     ABW            NA       NA       NA       NA       NA       NA      
2 Africa E… AFE             2.11e10  2.16e10  2.35e10  2.80e10  2.59e10  2.95e10
3 Afghanis… AFG             5.38e 8  5.49e 8  5.47e 8  7.51e 8  8.00e 8  1.01e 9
4 Africa W… AFW             1.04e10  1.12e10  1.20e10  1.27e10  1.39e10  1.49e10
5 Angola    AGO            NA       NA       NA       NA       NA       NA      
6 Albania   ALB            NA       NA       NA       NA       NA       NA      
# ℹ 57 more variables: `1966` <dbl>, `1967` <dbl>, `1968` <dbl>, `1969` <dbl>,
#   `1970` <dbl>, `1971` <dbl>, `1972` <dbl>, `1973` <dbl>, `1974` <dbl>,
#   `1975` <dbl>, `1976` <dbl>, `1977` <dbl>, `1978` <dbl>, `1979` <dbl>,
#   `1980` <dbl>, `1981` <dbl>, `1982` <dbl>, `1983` <dbl>, `1984` <dbl>,
#   `1985` <dbl>, `1986` <dbl>, `1987` <dbl>, `1988` <dbl>, `1989` <dbl>,
#   `1990` <dbl>, `1991` <dbl>, `1992` <dbl>, `1993` <dbl>, `1994` <dbl>,
#   `1995` <dbl>, `1996` <dbl>, `1997` <dbl>, `1998` <dbl>, `1999` <dbl>, …

As can be observed, the dataset is already structured with the first row containing the relevant variable names, so no metadata-row removal is required in this case. The next step is to clean the column names and convert the dataset into a long format.

Code
gdp_tidy <- gdp_raw %>%
  clean_names() %>%
  pivot_longer(
    cols = -c(country, country_code),
    names_to = "year",
    values_to = "gdp"
  ) %>%
  mutate(
    year = str_extract(year, "\\d{4}") %>% as.integer(),
    gdp = as.numeric(gdp)
  )

glimpse(gdp_tidy)
Rows: 16,758
Columns: 4
$ country      <chr> "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Ar…
$ country_code <chr> "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "…
$ year         <int> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 196…
$ gdp          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
Code
head(gdp_tidy)
# A tibble: 6 × 4
  country country_code  year   gdp
  <chr>   <chr>        <int> <dbl>
1 Aruba   ABW           1960    NA
2 Aruba   ABW           1961    NA
3 Aruba   ABW           1962    NA
4 Aruba   ABW           1963    NA
5 Aruba   ABW           1964    NA
6 Aruba   ABW           1965    NA

At this stage, the GDP dataset has now been transformed into a tidy format, with one row representing one country-year observation. Missing GDP values will be retained as NAs for now and only filtered out where necessary for specific analyses, so as not to prematurely reduce the informational content of the original data.

We may now proceed on to the analytical step for this second dataset.

Step 3.3: Analysis

The first analysis to be conducted upon the tidied GDP dataset will involve examining broad GDP trends over time across all included countries.

Summary Table: Average GDP by Year

Code
gdp_summary <- gdp_tidy %>%
  group_by(year) %>%
  summarise(
    average_gdp = mean(gdp, na.rm = TRUE),
    .groups = "drop"
  )

gdp_summary %>%
  gt() %>%
  tab_header(
    title = "Average GDP by Year",
    subtitle = "Summary of annual GDP values (USD billions)"
  ) %>%
  cols_label(
    year = "Year",
    average_gdp = "Average GDP (Billion USD)"
  ) %>%
  fmt_number(
    columns = average_gdp,
    decimals = 2,
    scale_by = 1e-9
  )
Average GDP by Year
Summary of annual GDP values (USD billions)
Year Average GDP (Billion USD)
1960 71.27
1961 72.75
1962 76.40
1963 82.36
1964 90.48
1965 91.83
1966 102.17
1967 105.98
1968 111.11
1969 122.87
1970 132.99
1971 144.00
1972 165.88
1973 204.76
1974 238.06
1975 262.66
1976 282.67
1977 313.95
1978 370.46
1979 428.40
1980 453.33
1981 463.51
1982 454.57
1983 454.35
1984 466.44
1985 482.14
1986 558.57
1987 619.86
1988 681.47
1989 710.55
1990 755.93
1991 778.78
1992 826.52
1993 831.35
1994 885.75
1995 953.09
1996 982.13
1997 981.06
1998 970.39
1999 993.18
2000 1,009.81
2001 998.93
2002 1,018.48
2003 1,149.29
2004 1,306.61
2005 1,431.44
2006 1,565.92
2007 1,794.23
2008 1,997.00
2009 1,895.74
2010 2,097.25
2011 2,322.63
2012 2,402.47
2013 2,478.71
2014 2,540.06
2015 2,401.52
2016 2,435.92
2017 2,611.09
2018 2,773.99
2019 2,826.37
2020 2,739.71
2021 3,211.35
2022 3,599.79

The above table provides a visual representation of how GDP has generally changed over time across all included countries. In this, we can observe substantial increases, from an average of 71.27 Billion USD in 1960 to approximately 3,599.79 Billion USD in 2022.

Country Level Comparison

In addition to the overall trend, it may also be useful to compare GDP growth across a selected number of countries. For clarity, the following comparison will be limited to six countries.

Code
selected_gdp_countries <- c("United States", "Guyana", "China", "India", "Brazil", "Nigeria")

gdp_selected <- gdp_tidy %>%
  filter(country %in% selected_gdp_countries)

ggplot(gdp_selected, aes(x = year, y = gdp, color = country)) +
  geom_line() +
  scale_y_log10() +
  labs(
    title = "GDP Trends for Selected Countries",
    x = "Year",
    y = "GDP",
    color = "Country"
  ) +
  theme(plot.title = element_text(hjust = 0.5))

The visual comparison above allows us to more easily interpret and compare the GDP growth trends of the selected countries over time. In general, all six countries exhibit an overall upward trend in GDP, though at very different magnitudes. China demonstrates the most pronounced acceleration in growth, especially from the early 2000s onward. India and Brazil also show sustained long-term growth, Nigeria displays a generally rising but somewhat more variable pattern, and Guyana, though remaining the smallest economy among the selected countries, still shows noticeable expansion in the later years.

It should also be noted that the y-axis has been transformed to a logarithmic (base-10) scale, meaning that the plotted values no longer represent equal absolute increases, but rather proportional ones; that is, equal vertical distances now correspond to multiplicative changes in GDP, which makes it easier to compare the growth trajectories of both larger and smaller economies within the same visualization.

Summary Table: Top-Performing Countries by Economic Output

The final analytical endeavor for this second dataset, and the last of classmate Sinem’s question to be addressed is that of determining the top-performing countries in terms of economic output.

Code
library(countrycode)

valid_codes <- unique(countrycode::codelist$iso3c)

top_gdp_countries <- gdp_tidy %>%
  filter(country_code %in% valid_codes) %>%
  filter(year == max(year, na.rm = TRUE)) %>%
  drop_na(gdp) %>%
  arrange(desc(gdp)) %>%
  slice_head(n = 10) %>%
  mutate(gdp_billions_usd = gdp / 1e9)

top_gdp_countries %>%
  select(country, country_code, year, gdp_billions_usd) %>%
  gt() %>%
  tab_header(
    title = "Top-Performing Countries by GDP",
    subtitle = "Highest GDP values in the most recent year available"
  ) %>%
  cols_label(
    country = "Country",
    country_code = "Country Code",
    year = "Year",
    gdp_billions_usd = "GDP (Billions USD)"
  ) %>%
  fmt_number(
    columns = gdp_billions_usd,
    decimals = 2
  )
Top-Performing Countries by GDP
Highest GDP values in the most recent year available
Country Country Code Year GDP (Billions USD)
United States USA 2022 25,462.70
China CHN 2022 17,963.20
Japan JPN 2022 4,231.14
Germany DEU 2022 4,072.19
India IND 2022 3,385.09
United Kingdom GBR 2022 3,070.67
France FRA 2022 2,782.91
Russian Federation RUS 2022 2,240.42
Canada CAN 2022 2,139.84
Italy ITA 2022 2,010.43

The above summary table highlights the countries with the highest levels of economic output in the most recent year available within the dataset. From this ranking, it is evident that the United States and China occupy the top two positions by a substantial margin, reflecting the scale of their respective economies. The remaining countries (such as Japan, Germany, India, the United Kingdom, etc.) also demonstrate significant economic output, though at levels notably below the two leading economies. Overall, the table illustrates how a relatively small number of countries account for a large share of global GDP, emphasizing the concentration of economic activity among the world’s largest economies.

Brief Interpretation/Dataset Conclusion

From the analyses conducted on the tidied GDP dataset, a clear long-term expansion in economic output can be observed across the examined period. The average GDP values show a steady increase from the early decades of the dataset to the most recent year available, indicating substantial global economic growth over time. The country-level comparison further illustrates that while many countries experience upward economic trajectories, the magnitude of growth differs considerably between them. Larger economies such as the United States and China dominate the global GDP figures, while other countries exhibit notable but comparatively smaller levels of economic output. Overall, these results suggest that although economic growth has occurred broadly across countries, global economic activity remains concentrated among a relatively small number of the world’s largest economies.

Dataset Three: Renewable Energy Consumption

Step 3.1: Raw Data Construction

At this stage, the renewable energy consumption dataset was obtained from the World Bank source originally identified in classmate Ciara Bonnett-Jones’ discussion post, and then committed to my personal GitHub repository in its original wide format. The file preserves the dataset’s original structure, wherein the year values are spread across multiple columns, and will now be imported into RStudio for tidying and subsequent analysis.

Step 3.2: Data Import and Tidying

Now that the raw wide format renewable energy dataset has been committed and made accessible, the next step is to import it into RStudio and apply a reproducible tidying pipeline. As with the crude birth rate dataset, this file contains metadata rows before the true variable names begin, so these rows must first be removed. Thereafter, the dataset can be reshaped from wide to long format using pivot_longer(), with the year values converted into a tidy variable and the renewable energy figures converted into numeric form for analysis.

Code
renewable_url <- "https://raw.githubusercontent.com/bkchanderban/CUNY_SPS/refs/heads/main/DATA607/DATA607/Project%20Two%20Assignment/renewable_energy_consumption_dataset.csv"

renewable_raw <- read_csv(renewable_url, col_names = FALSE, show_col_types = FALSE)

glimpse(renewable_raw)
Rows: 271
Columns: 70
$ X1  <chr> "Data Source", NA, "Last Updated Date", NA, "Country Name", "Aruba…
$ X2  <chr> "World Development Indicators", NA, "24/02/2026", NA, "Country Cod…
$ X3  <chr> NA, NA, NA, NA, "Indicator Name", "Renewable energy consumption (%…
$ X4  <chr> NA, NA, NA, NA, "Indicator Code", "EG.FEC.RNEW.ZS", "EG.FEC.RNEW.Z…
$ X5  <dbl> NA, NA, NA, NA, 1960, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X6  <dbl> NA, NA, NA, NA, 1961, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X7  <dbl> NA, NA, NA, NA, 1962, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X8  <dbl> NA, NA, NA, NA, 1963, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X9  <dbl> NA, NA, NA, NA, 1964, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X10 <dbl> NA, NA, NA, NA, 1965, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X11 <dbl> NA, NA, NA, NA, 1966, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X12 <dbl> NA, NA, NA, NA, 1967, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X13 <dbl> NA, NA, NA, NA, 1968, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X14 <dbl> NA, NA, NA, NA, 1969, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X15 <dbl> NA, NA, NA, NA, 1970, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X16 <dbl> NA, NA, NA, NA, 1971, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X17 <dbl> NA, NA, NA, NA, 1972, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X18 <dbl> NA, NA, NA, NA, 1973, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X19 <dbl> NA, NA, NA, NA, 1974, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X20 <dbl> NA, NA, NA, NA, 1975, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X21 <dbl> NA, NA, NA, NA, 1976, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X22 <dbl> NA, NA, NA, NA, 1977, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X23 <dbl> NA, NA, NA, NA, 1978, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X24 <dbl> NA, NA, NA, NA, 1979, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X25 <dbl> NA, NA, NA, NA, 1980, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X26 <dbl> NA, NA, NA, NA, 1981, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X27 <dbl> NA, NA, NA, NA, 1982, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X28 <dbl> NA, NA, NA, NA, 1983, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X29 <dbl> NA, NA, NA, NA, 1984, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X30 <dbl> NA, NA, NA, NA, 1985, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X31 <dbl> NA, NA, NA, NA, 1986, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X32 <dbl> NA, NA, NA, NA, 1987, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X33 <dbl> NA, NA, NA, NA, 1988, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X34 <dbl> NA, NA, NA, NA, 1989, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X35 <dbl> NA, NA, NA, NA, 1990.000000, 0.300000, 60.842404, 23.000000, 85.89…
$ X36 <dbl> NA, NA, NA, NA, 1991.000000, 0.200000, 62.133295, 23.700000, 85.26…
$ X37 <dbl> NA, NA, NA, NA, 1992.000000, 0.200000, 63.940789, 27.400000, 83.81…
$ X38 <dbl> NA, NA, NA, NA, 1993.000000, 0.200000, 64.651965, 28.500000, 85.10…
$ X39 <dbl> NA, NA, NA, NA, 1994.000000, 0.200000, 65.145067, 30.100000, 86.55…
$ X40 <dbl> NA, NA, NA, NA, 1995.000000, 0.200000, 64.721535, 31.800000, 86.14…
$ X41 <dbl> NA, NA, NA, NA, 1996.000000, 0.200000, 63.929272, 33.900000, 85.11…
$ X42 <dbl> NA, NA, NA, NA, 1997.000000, 0.200000, 63.241293, 36.400000, 83.91…
$ X43 <dbl> NA, NA, NA, NA, 1998.000000, 0.200000, 63.962096, 38.000000, 84.63…
$ X44 <dbl> NA, NA, NA, NA, 1999.000000, 0.200000, 65.058401, 43.400000, 84.00…
$ X45 <dbl> NA, NA, NA, NA, 2000.000000, 0.200000, 65.385867, 45.000000, 83.10…
$ X46 <dbl> NA, NA, NA, NA, 2001.000000, 0.200000, 65.817363, 45.600000, 81.93…
$ X47 <dbl> NA, NA, NA, NA, 2002.000000, 0.200000, 65.099281, 37.800000, 81.73…
$ X48 <dbl> NA, NA, NA, NA, 2003.000000, 0.200000, 64.615124, 36.700000, 80.71…
$ X49 <dbl> NA, NA, NA, NA, 2004.000000, 0.200000, 63.678035, 44.200000, 80.79…
$ X50 <dbl> NA, NA, NA, NA, 2005.000000, 0.200000, 63.109098, 33.900000, 80.93…
$ X51 <dbl> NA, NA, NA, NA, 2006.000000, 0.200000, 62.738124, 31.900000, 81.87…
$ X52 <dbl> NA, NA, NA, NA, 2007.000000, 0.200000, 61.505169, 28.800000, 82.33…
$ X53 <dbl> NA, NA, NA, NA, 2008.000000, 0.200000, 61.908027, 21.200000, 81.21…
$ X54 <dbl> NA, NA, NA, NA, 2009.000000, 0.300000, 61.304601, 16.500000, 82.39…
$ X55 <dbl> NA, NA, NA, NA, 2010.000000, 5.500000, 62.876387, 15.200000, 80.68…
$ X56 <dbl> NA, NA, NA, NA, 2011.000000, 5.700000, 62.805279, 12.600000, 79.28…
$ X57 <dbl> NA, NA, NA, NA, 2012.000000, 6.900000, 61.409964, 15.400000, 78.48…
$ X58 <dbl> NA, NA, NA, NA, 2013.000000, 6.900000, 60.801799, 16.900000, 76.70…
$ X59 <dbl> NA, NA, NA, NA, 2014.000000, 6.900000, 61.235882, 19.100000, 75.74…
$ X60 <dbl> NA, NA, NA, NA, 2015.000000, 6.700000, 61.670278, 17.700000, 76.18…
$ X61 <dbl> NA, NA, NA, NA, 2016.000000, 7.000000, 61.822884, 20.200000, 75.79…
$ X62 <dbl> NA, NA, NA, NA, 2017.000000, 6.800000, 61.426950, 19.500000, 76.10…
$ X63 <dbl> NA, NA, NA, NA, 2018.000000, 8.600000, 61.587530, 18.300000, 75.47…
$ X64 <dbl> NA, NA, NA, NA, 2019.000000, 8.300000, 62.690710, 18.900000, 75.07…
$ X65 <dbl> NA, NA, NA, NA, 2020.000000, 9.100000, 65.782380, 18.200000, 75.84…
$ X66 <dbl> NA, NA, NA, NA, 2021.0, 8.8, NA, 20.0, NA, 52.9, 41.9, 18.4, NA, 1…
$ X67 <dbl> NA, NA, NA, NA, 2022.0, 8.8, NA, 20.0, NA, NA, NA, 18.7, NA, NA, N…
$ X68 <dbl> NA, NA, NA, NA, 2023, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X69 <dbl> NA, NA, NA, NA, 2024, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X70 <dbl> NA, NA, NA, NA, 2025, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
Code
head(renewable_raw)
# A tibble: 6 × 70
  X1     X2    X3    X4       X5    X6    X7    X8    X9   X10   X11   X12   X13
  <chr>  <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Data … Worl… <NA>  <NA>     NA    NA    NA    NA    NA    NA    NA    NA    NA
2 <NA>   <NA>  <NA>  <NA>     NA    NA    NA    NA    NA    NA    NA    NA    NA
3 Last … 24/0… <NA>  <NA>     NA    NA    NA    NA    NA    NA    NA    NA    NA
4 <NA>   <NA>  <NA>  <NA>     NA    NA    NA    NA    NA    NA    NA    NA    NA
5 Count… Coun… Indi… Indi…  1960  1961  1962  1963  1964  1965  1966  1967  1968
6 Aruba  ABW   Rene… EG.F…    NA    NA    NA    NA    NA    NA    NA    NA    NA
# ℹ 57 more variables: X14 <dbl>, X15 <dbl>, X16 <dbl>, X17 <dbl>, X18 <dbl>,
#   X19 <dbl>, X20 <dbl>, X21 <dbl>, X22 <dbl>, X23 <dbl>, X24 <dbl>,
#   X25 <dbl>, X26 <dbl>, X27 <dbl>, X28 <dbl>, X29 <dbl>, X30 <dbl>,
#   X31 <dbl>, X32 <dbl>, X33 <dbl>, X34 <dbl>, X35 <dbl>, X36 <dbl>,
#   X37 <dbl>, X38 <dbl>, X39 <dbl>, X40 <dbl>, X41 <dbl>, X42 <dbl>,
#   X43 <dbl>, X44 <dbl>, X45 <dbl>, X46 <dbl>, X47 <dbl>, X48 <dbl>,
#   X49 <dbl>, X50 <dbl>, X51 <dbl>, X52 <dbl>, X53 <dbl>, X54 <dbl>, …

Upon import, the dataset contains metadata rows above the true headers. These first four rows must therefore be removed, and the new first row must be promoted to the variable names.

Code
renewable_cleaned <- renewable_raw %>%
  slice(-1, -2, -3, -4)

colnames(renewable_cleaned) <- as.character(renewable_cleaned[1,])

renewable_cleaned <- renewable_cleaned %>%
  slice(-1)

head(renewable_cleaned)
# A tibble: 6 × 70
  `Country Name`  `Country Code` `Indicator Name` `Indicator Code` `1960` `1961`
  <chr>           <chr>          <chr>            <chr>             <dbl>  <dbl>
1 Aruba           ABW            Renewable energ… EG.FEC.RNEW.ZS       NA     NA
2 Africa Eastern… AFE            Renewable energ… EG.FEC.RNEW.ZS       NA     NA
3 Afghanistan     AFG            Renewable energ… EG.FEC.RNEW.ZS       NA     NA
4 Africa Western… AFW            Renewable energ… EG.FEC.RNEW.ZS       NA     NA
5 Angola          AGO            Renewable energ… EG.FEC.RNEW.ZS       NA     NA
6 Albania         ALB            Renewable energ… EG.FEC.RNEW.ZS       NA     NA
# ℹ 64 more variables: `1962` <dbl>, `1963` <dbl>, `1964` <dbl>, `1965` <dbl>,
#   `1966` <dbl>, `1967` <dbl>, `1968` <dbl>, `1969` <dbl>, `1970` <dbl>,
#   `1971` <dbl>, `1972` <dbl>, `1973` <dbl>, `1974` <dbl>, `1975` <dbl>,
#   `1976` <dbl>, `1977` <dbl>, `1978` <dbl>, `1979` <dbl>, `1980` <dbl>,
#   `1981` <dbl>, `1982` <dbl>, `1983` <dbl>, `1984` <dbl>, `1985` <dbl>,
#   `1986` <dbl>, `1987` <dbl>, `1988` <dbl>, `1989` <dbl>, `1990` <dbl>,
#   `1991` <dbl>, `1992` <dbl>, `1993` <dbl>, `1994` <dbl>, `1995` <dbl>, …

The cleaned dataset can now have its column names standardized prior to reshaping.

Code
renewable_cleaned <- renewable_cleaned %>%
  clean_names()

glimpse(renewable_cleaned)
Rows: 266
Columns: 70
$ country_name   <chr> "Aruba", "Africa Eastern and Southern", "Afghanistan", …
$ country_code   <chr> "ABW", "AFE", "AFG", "AFW", "AGO", "ALB", "AND", "ARB",…
$ indicator_name <chr> "Renewable energy consumption (% of total final energy …
$ indicator_code <chr> "EG.FEC.RNEW.ZS", "EG.FEC.RNEW.ZS", "EG.FEC.RNEW.ZS", "…
$ x1960          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1961          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1962          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1963          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1964          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1965          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1966          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1967          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1968          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1969          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1970          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1971          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1972          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1973          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1974          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1975          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1976          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1977          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1978          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1979          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1980          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1981          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1982          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1983          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1984          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1985          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1986          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1987          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1988          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1989          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1990          <dbl> 0.300000, 60.842404, 23.000000, 85.895442, 72.300000, 2…
$ x1991          <dbl> 0.200000, 62.133295, 23.700000, 85.262074, 71.900000, 3…
$ x1992          <dbl> 0.200000, 63.940789, 27.400000, 83.818324, 72.700000, 4…
$ x1993          <dbl> 0.200000, 64.651965, 28.500000, 85.103232, 71.300000, 5…
$ x1994          <dbl> 0.200000, 65.145067, 30.100000, 86.557685, 72.200000, 5…
$ x1995          <dbl> 0.200000, 64.721535, 31.800000, 86.148094, 73.800000, 5…
$ x1996          <dbl> 0.200000, 63.929272, 33.900000, 85.118648, 72.900000, 5…
$ x1997          <dbl> 0.200000, 63.241293, 36.400000, 83.911059, 73.400000, 5…
$ x1998          <dbl> 0.200000, 63.962096, 38.000000, 84.634537, 76.400000, 4…
$ x1999          <dbl> 0.200000, 65.058401, 43.400000, 84.005039, 72.800000, 4…
$ x2000          <dbl> 0.200000, 65.385867, 45.000000, 83.101512, 73.800000, 4…
$ x2001          <dbl> 0.200000, 65.817363, 45.600000, 81.939243, 72.800000, 3…
$ x2002          <dbl> 0.200000, 65.099281, 37.800000, 81.735528, 70.800000, 3…
$ x2003          <dbl> 0.200000, 64.615124, 36.700000, 80.710450, 65.600000, 3…
$ x2004          <dbl> 0.200000, 63.678035, 44.200000, 80.790537, 62.300000, 3…
$ x2005          <dbl> 0.200000, 63.109098, 33.900000, 80.934034, 68.900000, 3…
$ x2006          <dbl> 0.200000, 62.738124, 31.900000, 81.873310, 62.600000, 3…
$ x2007          <dbl> 0.200000, 61.505169, 28.800000, 82.330015, 58.900000, 3…
$ x2008          <dbl> 0.200000, 61.908027, 21.200000, 81.211170, 55.400000, 3…
$ x2009          <dbl> 0.300000, 61.304601, 16.500000, 82.390213, 53.000000, 3…
$ x2010          <dbl> 5.500000, 62.876387, 15.200000, 80.687806, 53.100000, 3…
$ x2011          <dbl> 5.700000, 62.805279, 12.600000, 79.282287, 51.700000, 3…
$ x2012          <dbl> 6.900000, 61.409964, 15.400000, 78.481230, 49.000000, 3…
$ x2013          <dbl> 6.900000, 60.801799, 16.900000, 76.701499, 46.800000, 4…
$ x2014          <dbl> 6.900000, 61.235882, 19.100000, 75.741009, 47.500000, 3…
$ x2015          <dbl> 6.700000, 61.670278, 17.700000, 76.181671, 47.100000, 3…
$ x2016          <dbl> 7.000000, 61.822884, 20.200000, 75.796840, 48.100000, 3…
$ x2017          <dbl> 6.800000, 61.426950, 19.500000, 76.101681, 52.500000, 3…
$ x2018          <dbl> 8.600000, 61.587530, 18.300000, 75.476578, 52.500000, 3…
$ x2019          <dbl> 8.300000, 62.690710, 18.900000, 75.078788, 51.000000, 4…
$ x2020          <dbl> 9.100000, 65.782380, 18.200000, 75.846980, 60.100000, 4…
$ x2021          <dbl> 8.8, NA, 20.0, NA, 52.9, 41.9, 18.4, NA, 1.0, 9.2, 9.1,…
$ x2022          <dbl> 8.8, NA, 20.0, NA, NA, NA, 18.7, NA, NA, NA, NA, 0.4, 0…
$ x2023          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x2024          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x2025          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
Code
head(renewable_cleaned)
# A tibble: 6 × 70
  country_name      country_code indicator_name indicator_code x1960 x1961 x1962
  <chr>             <chr>        <chr>          <chr>          <dbl> <dbl> <dbl>
1 Aruba             ABW          Renewable ene… EG.FEC.RNEW.ZS    NA    NA    NA
2 Africa Eastern a… AFE          Renewable ene… EG.FEC.RNEW.ZS    NA    NA    NA
3 Afghanistan       AFG          Renewable ene… EG.FEC.RNEW.ZS    NA    NA    NA
4 Africa Western a… AFW          Renewable ene… EG.FEC.RNEW.ZS    NA    NA    NA
5 Angola            AGO          Renewable ene… EG.FEC.RNEW.ZS    NA    NA    NA
6 Albania           ALB          Renewable ene… EG.FEC.RNEW.ZS    NA    NA    NA
# ℹ 63 more variables: x1963 <dbl>, x1964 <dbl>, x1965 <dbl>, x1966 <dbl>,
#   x1967 <dbl>, x1968 <dbl>, x1969 <dbl>, x1970 <dbl>, x1971 <dbl>,
#   x1972 <dbl>, x1973 <dbl>, x1974 <dbl>, x1975 <dbl>, x1976 <dbl>,
#   x1977 <dbl>, x1978 <dbl>, x1979 <dbl>, x1980 <dbl>, x1981 <dbl>,
#   x1982 <dbl>, x1983 <dbl>, x1984 <dbl>, x1985 <dbl>, x1986 <dbl>,
#   x1987 <dbl>, x1988 <dbl>, x1989 <dbl>, x1990 <dbl>, x1991 <dbl>,
#   x1992 <dbl>, x1993 <dbl>, x1994 <dbl>, x1995 <dbl>, x1996 <dbl>, …

Now, the dataset may be converted from wide to long format.

Code
renewable_tidy <- renewable_cleaned %>%
  pivot_longer(
    cols = -c(country_name, country_code, indicator_name, indicator_code),
    names_to = "year",
    values_to = "renewable_energy_pct"
  ) %>%
  mutate(
    year = str_extract(year, "\\d{4}") %>% as.integer(),
    renewable_energy_pct = as.numeric(renewable_energy_pct)
  )

glimpse(renewable_tidy)
Rows: 17,556
Columns: 6
$ country_name         <chr> "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aru…
$ country_code         <chr> "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", …
$ indicator_name       <chr> "Renewable energy consumption (% of total final e…
$ indicator_code       <chr> "EG.FEC.RNEW.ZS", "EG.FEC.RNEW.ZS", "EG.FEC.RNEW.…
$ year                 <int> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1…
$ renewable_energy_pct <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
Code
head(renewable_tidy)
# A tibble: 6 × 6
  country_name country_code indicator_name                  indicator_code  year
  <chr>        <chr>        <chr>                           <chr>          <int>
1 Aruba        ABW          Renewable energy consumption (… EG.FEC.RNEW.ZS  1960
2 Aruba        ABW          Renewable energy consumption (… EG.FEC.RNEW.ZS  1961
3 Aruba        ABW          Renewable energy consumption (… EG.FEC.RNEW.ZS  1962
4 Aruba        ABW          Renewable energy consumption (… EG.FEC.RNEW.ZS  1963
5 Aruba        ABW          Renewable energy consumption (… EG.FEC.RNEW.ZS  1964
6 Aruba        ABW          Renewable energy consumption (… EG.FEC.RNEW.ZS  1965
# ℹ 1 more variable: renewable_energy_pct <dbl>

At this section within the Dataset Three segment, the dataset has now been converted into a tidy format, with one row representing one country/region per year observation. Missing renewable energy values will be retained as NAs for now and only filtered out where necessary for the specific analysis.

Step 3.3: Analysis

The analysis requested for this dataset centers on comparing the growth rate in renewable energy usage across different geographic regions over a ten-year period. As such, the first step is to isolate the regional groupings from the larger dataset and restrict the dataset to a recent ten-year window.

Code
selected_regions <- c(
  "Africa Eastern and Southern",
  "East Asia & Pacific",
  "Europe & Central Asia",
  "Latin America & Caribbean",
  "North America",
  "South Asia"
)

renewable_regions <- renewable_tidy %>%
  filter(country_name %in% selected_regions) %>%
  filter(year >= 2013, year <= 2022) %>%
    rename(
      region = country_name,
      region_code = country_code
    )

renewable_regions
# A tibble: 60 × 6
   region   region_code indicator_name indicator_code  year renewable_energy_pct
   <chr>    <chr>       <chr>          <chr>          <int>                <dbl>
 1 Africa … AFE         Renewable ene… EG.FEC.RNEW.ZS  2013                 60.8
 2 Africa … AFE         Renewable ene… EG.FEC.RNEW.ZS  2014                 61.2
 3 Africa … AFE         Renewable ene… EG.FEC.RNEW.ZS  2015                 61.7
 4 Africa … AFE         Renewable ene… EG.FEC.RNEW.ZS  2016                 61.8
 5 Africa … AFE         Renewable ene… EG.FEC.RNEW.ZS  2017                 61.4
 6 Africa … AFE         Renewable ene… EG.FEC.RNEW.ZS  2018                 61.6
 7 Africa … AFE         Renewable ene… EG.FEC.RNEW.ZS  2019                 62.7
 8 Africa … AFE         Renewable ene… EG.FEC.RNEW.ZS  2020                 65.8
 9 Africa … AFE         Renewable ene… EG.FEC.RNEW.ZS  2021                 NA  
10 Africa … AFE         Renewable ene… EG.FEC.RNEW.ZS  2022                 NA  
# ℹ 50 more rows

The following visualization can then be constructed in order to compare the growth trajectories of the six selected geographic regions.

Code
ggplot(renewable_regions, aes(x = year, y = renewable_energy_pct, color = region)) +
  geom_line(linewidth = 1) +
  geom_point(size = 1.3) +
  scale_y_continuous(limits = c(9, 67)) +
  labs(
    title = "Renewable Energy Usage Across Selected Regions\n(2013–2022)",
    x = "Year",
    y = "Renewable Energy Consumption (%)",
    color = "Region"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16),
    legend.position = "bottom",
    legend.title = element_text(size = 12),
    legend.text = element_text(size = 10)
  )

The above visualization illustrates the changes in renewable energy consumption across the selected geographical regions over the 2013 to 2022 period. In general, most regions exhibit a gradual upward trend, suggesting an increasing share of renewable energy within overall energy consumption. Africa Eastern and Southern consistently maintains the highest proportion of renewable energy usage, while Latin America and the Caribbean also appears to show steady growth over the period. Europe and Central Asia, East Asia and the Pacific, and North America remain comparatively lower, though each demonstrates modest increases over time. South Asia, while experiencing a slight decline in the earlier years, appears to stabilize and rise again toward the end of the observed period. Overall, the figure highlights a gradual but observable shift toward greater renewable energy adoption across multiple regions.

Brief Interpretation/Dataset Conclusion

From the analysis conducted upon this tidied renewable energy dataset, it can be observed that the selected geographic regions generally experienced either stable or gradually increasing levels of renewable energy consumption over the examined ten-year period. While the magnitude of usage differs notably between regions, the overall pattern suggests a broad movement toward greater reliance on renewable energy sources. Africa Eastern and Southern remained the highest throughout the period, while regions such as Latin America and the Caribbean also showed visible growth. Collectively, these findings suggest that the transition toward renewable energy is occurring across multiple parts of the world, though at uneven rates and from different starting points.

Overall Project Conclusion

Across the three datasets analyzed in this project, the results illustrate how different types of global indicators (for instance, demographic, economic, and environmental) can be explored through a consistent data tidying and analytical workflow in R. By transforming each dataset into a tidy format and applying simple exploratory analyses and visualizations, it becomes possible to observe meaningful patterns such as changes in birth rates, long-term economic growth across countries, and even gradual increases in renewable energy adoption across regions. Overall, the exercise demonstrates how reproducible data preparation and visualization techniques can help reveal broader global trends from publicly available datasets.

LLM Used