Project 2 — Data Transformations

Author

Brandon Chanderban

Published

March 7, 2026

Introduction/Approach

The objective of this Project Two Data Transformations exercise lies in preparing different datasets (three, specifically) for analysis work to be conducted further downstream. Moreover, these three datasets must be drawn from within the confines of the DATA 607 Week 5A Discussion thread, wherein the class’ students were tasked with identifying untidy datasets and suggesting possible analytical endeavors that may be performed once the datasets have been tidied. In choosing three of these provided datasets, a key requirement of the project sis that only distinct wide-format untidy datasets must be selected.

Once these three datasets are selected, imported and cleaned within RStudio, the specific analyses requested within the original discussion posts will then be executed, including features such as summary tables, visualizations, and brief explanatory narratives where applicable.

Prospective Datasets and their Corresponding Analyses

The first of the three untidy datasets to be examined was outlined in the discussion post that I myself submitted. This Kaggle dataset contains data pertaining to the crude birth rates of countries and maybe viewed here. The dataset meets the requirement of being formatted in a wide structure, with the variable of year (1960 to 2021) spread across several columns. Within each record, there are several identifiers such as the Country Name, Country Code, Indicator Name, and Indicator Code, followed by the year-based data columns.

The potential analyses to be executed for this crude birth rate dataset are:
- comparing the trends in birth rates over time across different regions/countries,
- visualizing birth rate trends over time, and
- examining the declines and increases in birth rates to identify any widely shared population changes (and then attempting to associate such changes with relevant real-world health and economic events).

The second dataset to be utilized was posted by classmate Sinem Kilicdere and is another Kaggle dataset; this one examining GDP by country for the years 1960 to 2022. Similar to the prior dataset, this one is also arranged in a wide format, owing to the horizontal spread of year-based columns. The identifier columns for this dataset are Country name and Country code.

The prospective analyses to be conducted using this GDP by Country dataset (as set out by Sinem herself) are:
- comparing GDP growth between countries (potentially using a time series graph),
- calculating the average GDP across the included years, and
- determining the top-performing countries in terms of economic output.

The third and final dataset (viewed here) that will be employed within this Project Two was posted by classmate Ciara Bonnett-Jones and contains information pertaining to the renewable energy consumption of countries (represented as a percentage of their final energy consumption). As with both prior datasets, this one is arranged within a wide format, due to the time variable (year) which is spread out horizontally over several columns. In addition to the year columns, this dataset also contains the following variables: Country Name, Country Code, Indicator Name, and Indicator Code.

The analysis to be conducted as set out by Ciara’s discussion post centers on comparing the growth in renewable energy usage across different geographic regions over a ten year period.

Proposed Plan

For each of the three selected datasets, I will first obtain a raw CSV version of the data that preserves its original wide formatting, and then commit it to my personal GitHub repository. Each dataset will then be imported into RStudio via the read_csv() function, after which a reproducible tidying pipeline will be applied using tidyr and dplyr. The main transformation will involve reshaping the wide year-based columns into a tidy long format using pivot_longer(), followed by general standardization and cleaning such as consistent variable naming and well-document handling of missing or inconsistent values. Finally, I will complete the required Discussion 5A analyses for each dataset using only the tidied version, supported by summary tables, visualizations, and brief narrative interpretations.

Potential Challenges

One expected challenge involves missing values within the datasets, especially within the earlier examined years. In this, whether rows are retained (with missing values kept as NAs) or filtered out specific analyses will be noted and documented. Additionally, because these selected datasets span a wide range of countries and years, the resulting visualizations may become cluttered if every if plotted at once. As such, there may be cases where only a subset of countries (for instance, the top or bottom performers based on a calculated metric) will be focused on for clarity, while still maintaining a reproducible workflow.

Dataset One: Crude Birth Rates

Step 3.1: Raw Data Construction

Within the confines of this specific step, the dataset corresponding to the first crude birth rate analyses was downloaded from its discovered location (Kaggle), and pushed to a personal GitHub repository. The file itself contains all information from the original crude birth rates dataset and its wide format has been preserved. The dataset will be imported into RStudio for tidying, and further analyses, in the subsequent step.

Step 3.2: Data Import and Tidying

Now that the raw wide format CSV file has been committed and made accessible, the next step is to import it into RStudio and apply a reproducible tidying pipeline. Since the year values are currently spread across multiple columns, the principal transformation will involve reshaping the dataset from wide to long format through the use of pivot_longer(). Thereafter, the variable names will be standardized, relevant identifier columns retained, and the value column converted into a numeric format suitable for analysis.

Code

library(tidyverse)
library(janitor)
library(dplyr)
library(gt)

Code

birth_rate_url <- "https://raw.githubusercontent.com/bkchanderban/CUNY_SPS/refs/heads/main/DATA607/DATA607/Project%20Two%20Assignment/crude_birth_rate_dataset.csv"

birth_rate_raw <- read_csv(birth_rate_url, show_col_types = FALSE)

glimpse(birth_rate_raw)

Rows: 270
Columns: 67
$ `Data Source`                  <chr> NA, "Last Updated Date", NA, "Country N…
$ `World Development Indicators` <chr> NA, "29/06/2023", NA, "Country Code", "…
$ ...3                           <chr> NA, NA, NA, "Indicator Name", "Birth ra…
$ ...4                           <chr> NA, NA, NA, "Indicator Code", "SP.DYN.C…
$ ...5                           <dbl> NA, NA, NA, 1960.00000, 33.88300, 47.43…
$ ...6                           <dbl> NA, NA, NA, 1961.00000, 32.83100, 47.53…
$ ...7                           <dbl> NA, NA, NA, 1962.00000, 31.64900, 47.59…
$ ...8                           <dbl> NA, NA, NA, 1963.00000, 30.41600, 47.63…
$ ...9                           <dbl> NA, NA, NA, 1964.00000, 29.14700, 47.64…
$ ...10                          <dbl> NA, NA, NA, 1965.00000, 27.88900, 47.66…
$ ...11                          <dbl> NA, NA, NA, 1966.00000, 26.66300, 47.69…
$ ...12                          <dbl> NA, NA, NA, 1967.00000, 25.50300, 47.69…
$ ...13                          <dbl> NA, NA, NA, 1968.00000, 24.59200, 47.69…
$ ...14                          <dbl> NA, NA, NA, 1969.00000, 23.73500, 47.72…
$ ...15                          <dbl> NA, NA, NA, 1970.00000, 22.97400, 47.67…
$ ...16                          <dbl> NA, NA, NA, 1971.00000, 22.31300, 47.64…
$ ...17                          <dbl> NA, NA, NA, 1972.00000, 21.76600, 47.47…
$ ...18                          <dbl> NA, NA, NA, 1973.00000, 21.49200, 47.22…
$ ...19                          <dbl> NA, NA, NA, 1974.00000, 21.38200, 47.07…
$ ...20                          <dbl> NA, NA, NA, 1975.00000, 21.39300, 46.95…
$ ...21                          <dbl> NA, NA, NA, 1976.00000, 21.48500, 46.79…
$ ...22                          <dbl> NA, NA, NA, 1977.00000, 21.73900, 46.63…
$ ...23                          <dbl> NA, NA, NA, 1978.00000, 21.92000, 46.51…
$ ...24                          <dbl> NA, NA, NA, 1979.00000, 21.99300, 46.47…
$ ...25                          <dbl> NA, NA, NA, 1980.00000, 21.93100, 46.33…
$ ...26                          <dbl> NA, NA, NA, 1981.00000, 21.73000, 46.23…
$ ...27                          <dbl> NA, NA, NA, 1982.00000, 21.47600, 46.15…
$ ...28                          <dbl> NA, NA, NA, 1983.00000, 21.38500, 46.13…
$ ...29                          <dbl> NA, NA, NA, 1984.00000, 21.18300, 46.13…
$ ...30                          <dbl> NA, NA, NA, 1985.00000, 20.91900, 46.14…
$ ...31                          <dbl> NA, NA, NA, 1986.00000, 20.61700, 46.06…
$ ...32                          <dbl> NA, NA, NA, 1987.00000, 20.26900, 45.83…
$ ...33                          <dbl> NA, NA, NA, 1988.00000, 19.82100, 45.33…
$ ...34                          <dbl> NA, NA, NA, 1989.00000, 19.18400, 44.81…
$ ...35                          <dbl> NA, NA, NA, 1990.00000, 18.66200, 44.23…
$ ...36                          <dbl> NA, NA, NA, 1991.00000, 17.72200, 43.84…
$ ...37                          <dbl> NA, NA, NA, 1992.00000, 16.44300, 43.34…
$ ...38                          <dbl> NA, NA, NA, 1993.00000, 16.12600, 42.96…
$ ...39                          <dbl> NA, NA, NA, 1994.00000, 15.43100, 42.53…
$ ...40                          <dbl> NA, NA, NA, 1995.00000, 15.99100, 42.48…
$ ...41                          <dbl> NA, NA, NA, 1996.00000, 16.15300, 42.13…
$ ...42                          <dbl> NA, NA, NA, 1997.00000, 16.38800, 41.57…
$ ...43                          <dbl> NA, NA, NA, 1998.00000, 15.07800, 41.12…
$ ...44                          <dbl> NA, NA, NA, 1999.00000, 14.36100, 40.89…
$ ...45                          <dbl> NA, NA, NA, 2000.00000, 14.42700, 40.52…
$ ...46                          <dbl> NA, NA, NA, 2001.00000, 13.73900, 40.34…
$ ...47                          <dbl> NA, NA, NA, 2002.00000, 12.99200, 40.04…
$ ...48                          <dbl> NA, NA, NA, 2003.00000, 12.62100, 39.75…
$ ...49                          <dbl> NA, NA, NA, 2004.00000, 11.92100, 39.57…
$ ...50                          <dbl> NA, NA, NA, 2005.00000, 12.34800, 39.40…
$ ...51                          <dbl> NA, NA, NA, 2006.00000, 13.05500, 39.23…
$ ...52                          <dbl> NA, NA, NA, 2007.00000, 12.96200, 39.00…
$ ...53                          <dbl> NA, NA, NA, 2008.00000, 12.74800, 38.85…
$ ...54                          <dbl> NA, NA, NA, 2009.00000, 12.35000, 38.36…
$ ...55                          <dbl> NA, NA, NA, 2010.00000, 12.19300, 37.94…
$ ...56                          <dbl> NA, NA, NA, 2011.00000, 12.24600, 37.48…
$ ...57                          <dbl> NA, NA, NA, 2012.00000, 12.72300, 36.92…
$ ...58                          <dbl> NA, NA, NA, 2013.00000, 13.31600, 36.44…
$ ...59                          <dbl> NA, NA, NA, 2014.00000, 13.53300, 36.02…
$ ...60                          <dbl> NA, NA, NA, 2015.00000, 12.42800, 35.61…
$ ...61                          <dbl> NA, NA, NA, 2016.00000, 12.30000, 35.18…
$ ...62                          <dbl> NA, NA, NA, 2017.00000, 11.53000, 34.89…
$ ...63                          <dbl> NA, NA, NA, 2018.00000, 9.88100, 34.611…
$ ...64                          <dbl> NA, NA, NA, 2019.00000, 9.13800, 34.341…
$ ...65                          <dbl> NA, NA, NA, 2020.00000, 8.10200, 33.916…
$ ...66                          <dbl> NA, NA, NA, 2021.00000, 7.19300, 33.546…
$ ...67                          <dbl> NA, NA, NA, 2022, NA, NA, NA, NA, NA, N…

Code

head(birth_rate_raw)

# A tibble: 6 × 67
  `Data Source`   World Development In…¹ ...3  ...4    ...5   ...6   ...7   ...8
  <chr>           <chr>                  <chr> <chr>  <dbl>  <dbl>  <dbl>  <dbl>
1 <NA>            <NA>                   <NA>  <NA>    NA     NA     NA     NA  
2 Last Updated D… 29/06/2023             <NA>  <NA>    NA     NA     NA     NA  
3 <NA>            <NA>                   <NA>  <NA>    NA     NA     NA     NA  
4 Country Name    Country Code           Indi… Indi… 1960   1961   1962   1963  
5 Aruba           ABW                    Birt… SP.D…   33.9   32.8   31.6   30.4
6 Africa Eastern… AFE                    Birt… SP.D…   47.4   47.5   47.6   47.6
# ℹ abbreviated name: ¹`World Development Indicators`
# ℹ 59 more variables: ...9 <dbl>, ...10 <dbl>, ...11 <dbl>, ...12 <dbl>,
#   ...13 <dbl>, ...14 <dbl>, ...15 <dbl>, ...16 <dbl>, ...17 <dbl>,
#   ...18 <dbl>, ...19 <dbl>, ...20 <dbl>, ...21 <dbl>, ...22 <dbl>,
#   ...23 <dbl>, ...24 <dbl>, ...25 <dbl>, ...26 <dbl>, ...27 <dbl>,
#   ...28 <dbl>, ...29 <dbl>, ...30 <dbl>, ...31 <dbl>, ...32 <dbl>,
#   ...33 <dbl>, ...34 <dbl>, ...35 <dbl>, ...36 <dbl>, ...37 <dbl>, …

Now that we have the crude birth rates dataset imported, we can now conduct some feature engineering, beginning with the removal of the first three metadata rows of the CSV file.

Code

birth_rate_cleaned <- birth_rate_raw %>%
  slice(-1,-2,-3) #remove the first three rows

Our next step is take the new first row (which contains strings such as Country Name, Country Code, etc.) and make these the column/variable names instead.

Code

colnames(birth_rate_cleaned) <- as.character(birth_rate_cleaned[1,])

birth_rate_cleaned <- birth_rate_cleaned %>%
  slice(-1) #remove the first row that was used to created the  headers

head(birth_rate_cleaned)

# A tibble: 6 × 67
  `Country Name`  `Country Code` `Indicator Name` `Indicator Code` `1960` `1961`
  <chr>           <chr>          <chr>            <chr>             <dbl>  <dbl>
1 Aruba           ABW            Birth rate, cru… SP.DYN.CBRT.IN     33.9   32.8
2 Africa Eastern… AFE            Birth rate, cru… SP.DYN.CBRT.IN     47.4   47.5
3 Afghanistan     AFG            Birth rate, cru… SP.DYN.CBRT.IN     50.3   50.4
4 Africa Western… AFW            Birth rate, cru… SP.DYN.CBRT.IN     47.3   47.4
5 Angola          AGO            Birth rate, cru… SP.DYN.CBRT.IN     51.0   51.3
6 Albania         ALB            Birth rate, cru… SP.DYN.CBRT.IN     41.1   40.3
# ℹ 61 more variables: `1962` <dbl>, `1963` <dbl>, `1964` <dbl>, `1965` <dbl>,
#   `1966` <dbl>, `1967` <dbl>, `1968` <dbl>, `1969` <dbl>, `1970` <dbl>,
#   `1971` <dbl>, `1972` <dbl>, `1973` <dbl>, `1974` <dbl>, `1975` <dbl>,
#   `1976` <dbl>, `1977` <dbl>, `1978` <dbl>, `1979` <dbl>, `1980` <dbl>,
#   `1981` <dbl>, `1982` <dbl>, `1983` <dbl>, `1984` <dbl>, `1985` <dbl>,
#   `1986` <dbl>, `1987` <dbl>, `1988` <dbl>, `1989` <dbl>, `1990` <dbl>,
#   `1991` <dbl>, `1992` <dbl>, `1993` <dbl>, `1994` <dbl>, `1995` <dbl>, …

And now we can clean our column names for best practices.

Code

birth_rate_cleaned <- birth_rate_cleaned %>%
  clean_names()

glimpse(birth_rate_cleaned)

Rows: 266
Columns: 67
$ country_name   <chr> "Aruba", "Africa Eastern and Southern", "Afghanistan", …
$ country_code   <chr> "ABW", "AFE", "AFG", "AFW", "AGO", "ALB", "AND", "ARB",…
$ indicator_name <chr> "Birth rate, crude (per 1,000 people)", "Birth rate, cr…
$ indicator_code <chr> "SP.DYN.CBRT.IN", "SP.DYN.CBRT.IN", "SP.DYN.CBRT.IN", "…
$ x1960          <dbl> 33.88300, 47.43855, 50.34000, 47.32548, 51.02600, 41.06…
$ x1961          <dbl> 32.83100, 47.53055, 50.44300, 47.42105, 51.28200, 40.33…
$ x1962          <dbl> 31.64900, 47.59756, 50.57000, 47.52922, 51.31600, 39.21…
$ x1963          <dbl> 30.41600, 47.63614, 50.70300, 47.53103, 51.32300, 38.14…
$ x1964          <dbl> 29.14700, 47.64548, 50.83100, 47.51192, 51.28200, 36.82…
$ x1965          <dbl> 27.88900, 47.66766, 50.87200, 47.46857, 51.28200, 35.36…
$ x1966          <dbl> 26.66300, 47.69789, 50.98600, 47.44364, 51.29500, 34.35…
$ x1967          <dbl> 25.50300, 47.69133, 51.08100, 47.42593, 51.31400, 33.89…
$ x1968          <dbl> 24.59200, 47.69102, 51.14800, 47.42235, 51.34800, 33.51…
$ x1969          <dbl> 23.73500, 47.72112, 51.19500, 47.41269, 51.35300, 33.07…
$ x1970          <dbl> 22.97400, 47.67313, 51.12200, 47.41411, 51.26700, 32.20…
$ x1971          <dbl> 22.31300, 47.64967, 51.16300, 47.52970, 50.69800, 31.74…
$ x1972          <dbl> 21.76600, 47.47074, 51.10900, 47.57899, 50.47400, 31.23…
$ x1973          <dbl> 21.49200, 47.22113, 51.11400, 47.63283, 50.46700, 30.69…
$ x1974          <dbl> 21.38200, 47.07547, 51.13500, 47.81713, 50.47200, 30.29…
$ x1975          <dbl> 21.39300, 46.95020, 51.01800, 47.91150, 50.46900, 29.98…
$ x1976          <dbl> 21.48500, 46.79183, 50.93500, 47.86907, 50.51400, 29.35…
$ x1977          <dbl> 21.73900, 46.63214, 50.92100, 47.96894, 50.52300, 28.58…
$ x1978          <dbl> 21.92000, 46.51202, 50.81600, 48.03727, 50.61600, 27.88…
$ x1979          <dbl> 21.99300, 46.47196, 50.73700, 47.93830, 50.73200, 27.20…
$ x1980          <dbl> 21.93100, 46.33961, 50.48200, 47.77071, 50.89200, 26.76…
$ x1981          <dbl> 21.73000, 46.23755, 50.26400, 47.51406, 51.10900, 26.65…
$ x1982          <dbl> 21.47600, 46.15826, 50.13800, 47.25192, 51.30700, 26.79…
$ x1983          <dbl> 21.38500, 46.13473, 50.13900, 47.11112, 51.61000, 26.72…
$ x1984          <dbl> 21.18300, 46.13520, 50.23500, 46.70656, 51.93500, 26.62…
$ x1985          <dbl> 20.91900, 46.14379, 50.55300, 46.20665, 52.13600, 26.29…
$ x1986          <dbl> 20.61700, 46.06749, 50.72800, 45.72924, 52.19000, 25.93…
$ x1987          <dbl> 20.26900, 45.83058, 50.84500, 45.34627, 52.14600, 25.68…
$ x1988          <dbl> 19.82100, 45.33736, 50.98000, 45.00171, 51.97300, 25.36…
$ x1989          <dbl> 19.18400, 44.81301, 51.16200, 44.92848, 51.69900, 24.86…
$ x1990          <dbl> 18.66200, 44.23072, 51.42300, 44.67619, 51.34400, 24.41…
$ x1991          <dbl> 17.72200, 43.84232, 51.78800, 44.47423, 50.92600, 23.72…
$ x1992          <dbl> 16.44300, 43.34168, 51.94800, 44.30932, 50.37400, 23.14…
$ x1993          <dbl> 16.12600, 42.96601, 52.03800, 44.16810, 49.89300, 22.47…
$ x1994          <dbl> 15.43100, 42.53329, 52.17400, 43.94269, 49.55000, 22.24…
$ x1995          <dbl> 15.99100, 42.48572, 52.07300, 43.73024, 49.18500, 21.80…
$ x1996          <dbl> 16.15300, 42.13563, 51.87300, 43.49103, 48.86000, 20.98…
$ x1997          <dbl> 16.38800, 41.57346, 51.40000, 43.21922, 48.41200, 19.81…
$ x1998          <dbl> 15.07800, 41.12879, 50.88000, 43.02697, 48.00900, 18.80…
$ x1999          <dbl> 14.36100, 40.89482, 50.35100, 43.17424, 47.77300, 17.98…
$ x2000          <dbl> 14.42700, 40.52824, 49.66400, 43.19955, 47.64700, 17.07…
$ x2001          <dbl> 13.73900, 40.34121, 48.97900, 43.07550, 47.57400, 16.12…
$ x2002          <dbl> 12.99200, 40.04732, 48.20100, 42.92712, 47.44800, 15.20…
$ x2003          <dbl> 12.62100, 39.75014, 47.35000, 42.74688, 47.22600, 14.66…
$ x2004          <dbl> 11.92100, 39.57589, 46.33000, 42.50272, 47.09900, 13.97…
$ x2005          <dbl> 12.34800, 39.40739, 45.26300, 42.42154, 46.94400, 13.23…
$ x2006          <dbl> 13.05500, 39.23711, 44.72100, 42.19330, 46.64300, 12.53…
$ x2007          <dbl> 12.96200, 39.00052, 43.85800, 41.94301, 46.29000, 12.07…
$ x2008          <dbl> 12.74800, 38.85169, 41.50600, 41.75479, 45.88900, 11.86…
$ x2009          <dbl> 12.35000, 38.36494, 41.15700, 41.50376, 45.49500, 11.84…
$ x2010          <dbl> 12.19300, 37.94026, 40.60200, 41.21963, 44.97000, 11.92…
$ x2011          <dbl> 12.24600, 37.48399, 39.85500, 40.89424, 44.36400, 12.03…
$ x2012          <dbl> 12.72300, 36.92130, 40.00900, 40.41643, 43.86000, 12.22…
$ x2013          <dbl> 13.31600, 36.44714, 39.60100, 39.85651, 43.28200, 12.27…
$ x2014          <dbl> 13.53300, 36.02832, 39.10500, 39.33535, 42.67600, 12.15…
$ x2015          <dbl> 12.42800, 35.61331, 38.80300, 38.85921, 42.02000, 11.71…
$ x2016          <dbl> 12.30000, 35.18902, 37.93600, 38.39310, 41.37700, 11.28…
$ x2017          <dbl> 11.53000, 34.89254, 37.34200, 37.88166, 40.81000, 10.87…
$ x2018          <dbl> 9.88100, 34.61102, 36.92700, 37.44709, 40.23600, 10.517…
$ x2019          <dbl> 9.13800, 34.34145, 36.46600, 37.02783, 39.72500, 10.343…
$ x2020          <dbl> 8.10200, 33.91675, 36.05100, 36.61573, 39.27100, 10.285…
$ x2021          <dbl> 7.19300, 33.54627, 35.84200, 36.23703, 38.80900, 10.240…
$ x2022          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

Code

head(birth_rate_cleaned)

# A tibble: 6 × 67
  country_name      country_code indicator_name indicator_code x1960 x1961 x1962
  <chr>             <chr>        <chr>          <chr>          <dbl> <dbl> <dbl>
1 Aruba             ABW          Birth rate, c… SP.DYN.CBRT.IN  33.9  32.8  31.6
2 Africa Eastern a… AFE          Birth rate, c… SP.DYN.CBRT.IN  47.4  47.5  47.6
3 Afghanistan       AFG          Birth rate, c… SP.DYN.CBRT.IN  50.3  50.4  50.6
4 Africa Western a… AFW          Birth rate, c… SP.DYN.CBRT.IN  47.3  47.4  47.5
5 Angola            AGO          Birth rate, c… SP.DYN.CBRT.IN  51.0  51.3  51.3
6 Albania           ALB          Birth rate, c… SP.DYN.CBRT.IN  41.1  40.3  39.2
# ℹ 60 more variables: x1963 <dbl>, x1964 <dbl>, x1965 <dbl>, x1966 <dbl>,
#   x1967 <dbl>, x1968 <dbl>, x1969 <dbl>, x1970 <dbl>, x1971 <dbl>,
#   x1972 <dbl>, x1973 <dbl>, x1974 <dbl>, x1975 <dbl>, x1976 <dbl>,
#   x1977 <dbl>, x1978 <dbl>, x1979 <dbl>, x1980 <dbl>, x1981 <dbl>,
#   x1982 <dbl>, x1983 <dbl>, x1984 <dbl>, x1985 <dbl>, x1986 <dbl>,
#   x1987 <dbl>, x1988 <dbl>, x1989 <dbl>, x1990 <dbl>, x1991 <dbl>,
#   x1992 <dbl>, x1993 <dbl>, x1994 <dbl>, x1995 <dbl>, x1996 <dbl>, …

The next step will now comprise executing the pivot_longer() functionality, converting dataset from its wide format into a long one.

Code

birth_rate_tidy <- birth_rate_cleaned %>%
  pivot_longer(
    cols = -c(country_name, country_code, indicator_name, indicator_code),
    names_to = "year",
    values_to = "birth_rate"
  ) %>%
mutate(
  year = str_extract(year, "\\d{4}") %>% as.integer(),
  birth_rate = as.numeric(birth_rate)
)

glimpse(birth_rate_tidy)

Rows: 16,758
Columns: 6
$ country_name   <chr> "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "…
$ country_code   <chr> "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW",…
$ indicator_name <chr> "Birth rate, crude (per 1,000 people)", "Birth rate, cr…
$ indicator_code <chr> "SP.DYN.CBRT.IN", "SP.DYN.CBRT.IN", "SP.DYN.CBRT.IN", "…
$ year           <int> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1…
$ birth_rate     <dbl> 33.883, 32.831, 31.649, 30.416, 29.147, 27.889, 26.663,…

Code

head(birth_rate_tidy)

# A tibble: 6 × 6
  country_name country_code indicator_name       indicator_code  year birth_rate
  <chr>        <chr>        <chr>                <chr>          <int>      <dbl>
1 Aruba        ABW          Birth rate, crude (… SP.DYN.CBRT.IN  1960       33.9
2 Aruba        ABW          Birth rate, crude (… SP.DYN.CBRT.IN  1961       32.8
3 Aruba        ABW          Birth rate, crude (… SP.DYN.CBRT.IN  1962       31.6
4 Aruba        ABW          Birth rate, crude (… SP.DYN.CBRT.IN  1963       30.4
5 Aruba        ABW          Birth rate, crude (… SP.DYN.CBRT.IN  1964       29.1
6 Aruba        ABW          Birth rate, crude (… SP.DYN.CBRT.IN  1965       27.9

At this stage, the dataset has now been converted into a tidy format, with one row representing one country-year observation. Any missing birth rate values will be retained as NAs (a function of pivot_longer() ) for now, and will subsequently only be filtered out where necessary for a given analysis, so as not to compromise the original informational structure of the dataset.

We will then proceed onto the next step for this first crude birth rates dataset.

Step 3.3: Analysis

The first analysis will involve examining broad birth rate trends over time across all countries.

Summary Table: Average Crude Birth Rate By Year

One prospective starting point is to commence by calculating the average birth rate across all included countries for each year.

Code

birth_rate_summary <- birth_rate_tidy %>%
  group_by(year) %>%
  summarise(
    average_birth_rate = mean(birth_rate, na.rm = TRUE),
    .groups = "drop"
  )

birth_rate_summary %>%
  gt() %>%
  tab_header(
    title = "Average Birth Rate by Year",
    subtitle = "Summary of annual birth rates"
  ) %>%
  cols_label(
    year = "Year",
    average_birth_rate = "Average Birth Rate"
  ) %>%
  fmt_number(
    columns = average_birth_rate,
    decimals = 2
  )

Year	Average Birth Rate
Average Birth Rate by Year
Summary of annual birth rates
1960	38.10
1961	37.72
1962	37.83
1963	37.71
1964	37.18
1965	36.66
1966	36.08
1967	35.73
1968	35.37
1969	34.99
1970	34.55
1971	34.30
1972	33.95
1973	33.47
1974	33.18
1975	32.84
1976	32.49
1977	32.23
1978	32.05
1979	32.00
1980	31.91
1981	31.82
1982	31.68
1983	31.46
1984	31.14
1985	30.90
1986	30.55
1987	30.20
1988	29.86
1989	29.33
1990	28.97
1991	28.49
1992	27.99
1993	27.44
1994	26.86
1995	26.36
1996	25.89
1997	25.47
1998	24.98
1999	24.61
2000	24.27
2001	23.88
2002	23.54
2003	23.28
2004	23.07
2005	22.88
2006	22.73
2007	22.66
2008	22.63
2009	22.47
2010	22.24
2011	22.08
2012	21.86
2013	21.54
2014	21.30
2015	20.89
2016	20.50
2017	20.11
2018	19.66
2019	19.32
2020	18.99
2021	18.77
2022	NaN

The summary table exhibits a broad overview of how crude birth rates have changed over time on average across all of the included countries. We can take the above a step further and attempt to construct a visualization from these insights.

Visualization: Average Crude Birth Rate Over Time

Code

ggplot(data = birth_rate_summary, aes(x = year, y = average_birth_rate)) +
  geom_line() +
  labs(
    title = "Average Crude Birth Over Time",
    x = "Year",
    y = "Average Crude Birth Rate"
  ) +
  theme(plot.title = element_text(hjust = 0.5))

In acknowledging the afore-presented plot of average crude birth over time, we can notice, perhaps somewhat alarmingly, that on a holistic average, worldwide crude birth rates have evidently decreased rather noticeably within the time frame of the utilized data (from 1960 to 2021). This is evidenced by the negative gradient of the plotted line of values.

Country-Level Comparison

In addition to the overall trend, it may also be beneficial to compare selected countries in terms of how their birth rates have changed over time. For lucidity, the following analysis shall be limited to a chosen number (5) of countries.

Code

selected_countries <- c("Guyana", "United States", "India", "Brazil", "Nigeria")

birth_rate_selected <- birth_rate_tidy %>%
  filter(country_name %in% selected_countries)

ggplot(birth_rate_selected, aes(x = year, y = birth_rate, color = country_name)) +
  geom_line() +
  labs(
    title = "Crude Birth Rate for Selected Countries",
    x = "Year",
    y = "Crude Birth Rate",
    color = "Country"
  ) +
  theme(plot.title = element_text(hjust = 0.5))

This visual comparison above allows us to easier interpret and compare the numeric findings from the tidied birth rate dataframe. In this, we can see that of the five examined countries, the United States observed the lowest crude birth rate figures throughout the entirety of the year range. Inversely, Nigeria, throughout all of the years, maintained the highest birth rates. The remaining three countries (Guyana, India, and Brazil) commenced at almost the same level as Nigeria’s had, but in the course of time, these rates fell to almost the same level as that of the USA. On a whole however, all five countries do follow the general declining trend, which was initially observed in the average plot of crude birth rates (across all of the countries within the initial dataset).

Brief Interpretation/Dataset Conclusion

From the analyses conducted upon the tidied crude birth rate dataset, a clear downward trend in birth rates over time can be observed. The computed yearly averages indicate that, on a global scale, crude birth rates have steadily declined from the early 1960s through to the most recent years of available data. This pattern is also reflected within the country-level comparison, where each of the five examined countries demonstrates a similar long-term decline, albeit at different magnitudes. Nigeria consistently maintains the highest crude birth rates throughout the observed time frame, while the United States exhibits the lowest overall values. Meanwhile, countries such as Guyana, India, and Brazil begin the period with relatively high birth rates but gradually converge toward lower levels over time. Collectively, these findings suggest that declining fertility trends are not isolated to specific countries, but instead represent a broader demographic shift that has occurred globally over the past several decades.

Dataset Two: GDP by Country

Step 3.1: Raw Data Construction

At this stage, the CSV file pertaining to the second GDP by Country dataset was first downloaded from its discerned location (Kaggle, which was indicated by classmate Sinem Kilicdere), and pushed to a personal GitHub repository. The file preserved the original data structure, wherein each year is represented as a separate column, and will now be imported into RStudio for tidying and subsequent analysis.

Step 3.2: Data Import and Tidying

Now that the raw wide format GDP dataset has been committed and made accessible, the next step is to import it into RStudio and apply a reproducible tidying pipeline. Since the year values are already spread across several columns, the principal transformation here will again involve reshaping the dataset from a wide to a long format using pivot_longer(). Thereafter, the variable names will be standardized and the GDP figures converted into a numeric format suitable for downstream analysis.

Code

gdp_url <- "https://raw.githubusercontent.com/bkchanderban/CUNY_SPS/refs/heads/main/DATA607/DATA607/Project%20Two%20Assignment/country_gdp_dataset.csv"

gdp_raw <- read_csv(gdp_url, show_col_types = FALSE)

glimpse(gdp_raw)

Rows: 266
Columns: 65
$ Country        <chr> "Aruba", "Africa Eastern and Southern", "Afghanistan", …
$ `Country Code` <chr> "ABW", "AFE", "AFG", "AFW", "AGO", "ALB", "AND", "ARB",…
$ `1960`         <dbl> NA, 21125015452, 537777811, 10447637853, NA, NA, NA, NA…
$ `1961`         <dbl> NA, 21616228139, 548888896, 11173212080, NA, NA, NA, NA…
$ `1962`         <dbl> NA, 23506279900, 546666678, 11990534018, NA, NA, NA, NA…
$ `1963`         <dbl> NA, 28048360188, 751111191, 12727688165, NA, NA, NA, NA…
$ `1964`         <dbl> NA, 25920665260, 800000044, 13898109284, NA, NA, NA, NA…
$ `1965`         <dbl> NA, 29472103270, 1006666638, 14929792388, NA, NA, NA, N…
$ `1966`         <dbl> NA, 32014368121, 1399999967, 15910837742, NA, NA, NA, N…
$ `1967`         <dbl> NA, 33269509510, 1673333418, 14510579889, NA, NA, NA, N…
$ `1968`         <dbl> NA, 36327785495, 1373333367, 14968235782, NA, NA, NA, 3…
$ `1969`         <dbl> NA, 41638967621, 1408888922, 16979315745, NA, NA, NA, 3…
$ `1970`         <dbl> NA, 44629891649, 1748886596, 23596163865, NA, NA, 78617…
$ `1971`         <dbl> NA, 49173371529, 1831108971, 20936358634, NA, NA, 89406…
$ `1972`         <dbl> NA, 53123459912, 1595555476, 25386169423, NA, NA, 11341…
$ `1973`         <dbl> NA, 69482723444, 1733333264, 31975594565, NA, NA, 15084…
$ `1974`         <dbl> NA, 85380645042, 2155555498, 44416677335, NA, NA, 18655…
$ `1975`         <dbl> NA, 90835426418, 2366666616, 51667190242, NA, NA, 22011…
$ `1976`         <dbl> NA, 90212747243, 2555555567, 62351622300, NA, NA, 22728…
$ `1977`         <dbl> NA, 102241000000, 2953333418, 65595122956, NA, NA, 2539…
$ `1978`         <dbl> NA, 116085000000, 3300000109, 71496496574, NA, NA, 3080…
$ `1979`         <dbl> NA, 134257000000, 3697940410, 88948338390, NA, NA, 4115…
$ `1980`         <dbl> NA, 171218000000, 3641723322, 112439000000, 5930503401,…
$ `1981`         <dbl> NA, 175859000000, 3478787909, 211338000000, 5550483036,…
$ `1982`         <dbl> NA, 168096000000, NA, 187449000000, 5550483036, NA, 375…
$ `1983`         <dbl> NA, 175565000000, NA, 138384000000, 5784341596, NA, 327…
$ `1984`         <dbl> NA, 160647000000, NA, 114516000000, 6131475065, 1857338…
$ `1985`         <dbl> NA, 136759000000, NA, 116777000000, 7554065410, 1897050…
$ `1986`         <dbl> 405586592, 153050000000, NA, 107887000000, 7072536109, …
$ `1987`         <dbl> 487709497, 186658000000, NA, 110729000000, 8084412414, …
$ `1988`         <dbl> 596648045, 204766000000, NA, 109439000000, 8769836769, …
$ `1989`         <dbl> 695530726, 218242000000, NA, 102255000000, 10201780977,…
$ `1990`         <dbl> 764804469, 254062000000, NA, 122387000000, 11229515599,…
$ `1991`         <dbl> 872067039, 276857000000, NA, 118040000000, 12704558517,…
$ `1992`         <dbl> 958659218, 246088000000, NA, 118893000000, 15114352005,…
$ `1993`         <dbl> 1083240223, 242926000000, NA, 99272180411, 11051939102,…
$ `1994`         <dbl> 1245810056, 239611000000, NA, 86636400266, 3390500000, …
$ `1995`         <dbl> 1320670391, 270327000000, NA, 108691000000, 5561222222,…
$ `1996`         <dbl> 1379888268, 269491000000, NA, 126287000000, 7526963964,…
$ `1997`         <dbl> 1531843575, 283446000000, NA, 127602000000, 7648377413,…
$ `1998`         <dbl> 1665363128, 266652000000, NA, 130678000000, 6506229607,…
$ `1999`         <dbl> 1722905028, 263025000000, NA, 138086000000, 6152922943,…
$ `2000`         <dbl> 1873184358, 284759000000, NA, 140946000000, 9129594819,…
$ `2001`         <dbl> 1896648045, 259643000000, NA, 148530000000, 8936079253,…
$ `2002`         <dbl> 1962011173, 266529000000, 3854235264, 177201000000, 152…
$ `2003`         <dbl> 2044134078, 354177000000, 4539496563, 205214000000, 178…
$ `2004`         <dbl> 2.254749e+09, 4.404820e+11, 5.220825e+09, 2.542650e+11,…
$ `2005`         <dbl> 2.359777e+09, 5.139420e+11, 6.226199e+09, 3.108900e+11,…
$ `2006`         <dbl> 2.469832e+09, 5.775870e+11, 6.971383e+09, 3.969210e+11,…
$ `2007`         <dbl> 2.677654e+09, 6.628680e+11, 9.715765e+09, 4.654860e+11,…
$ `2008`         <dbl> 2.843017e+09, 7.105360e+11, 1.024977e+10, 5.677910e+11,…
$ `2009`         <dbl> 2.553631e+09, 7.219010e+11, 1.215484e+10, 5.083630e+11,…
$ `2010`         <dbl> 2.453631e+09, 8.635200e+11, 1.563384e+10, 5.985220e+11,…
$ `2011`         <dbl> 2.637989e+09, 9.678250e+11, 1.819041e+10, 6.820160e+11,…
$ `2012`         <dbl> 2.615084e+09, 9.753550e+11, 2.020357e+10, 7.375890e+11,…
$ `2013`         <dbl> 2.727933e+09, 9.859870e+11, 2.056449e+10, 8.339480e+11,…
$ `2014`         <dbl> 2.791061e+09, 1.006530e+12, 2.055058e+10, 8.943220e+11,…
$ `2015`         <dbl> 2.963128e+09, 9.273480e+11, 1.999814e+10, 7.686450e+11,…
$ `2016`         <dbl> 2.983799e+09, 8.851760e+11, 1.801955e+10, 6.913630e+11,…
$ `2017`         <dbl> 3.092179e+09, 1.021040e+12, 1.889635e+10, 6.848990e+11,…
$ `2018`         <dbl> 3.276188e+09, 1.007200e+12, 1.841886e+10, 7.670260e+11,…
$ `2019`         <dbl> 3.395794e+09, 1.000830e+12, 1.890450e+10, 8.225380e+11,…
$ `2020`         <dbl> 2.610039e+09, 9.275930e+11, 2.014345e+10, 7.864600e+11,…
$ `2021`         <dbl> 3.126019e+09, 1.082000e+12, 1.458314e+10, 8.444600e+11,…
$ `2022`         <dbl> NA, 1.169480e+12, NA, 8.778630e+11, 1.067140e+11, 1.888…

Code

head(gdp_raw)

# A tibble: 6 × 65
  Country   `Country Code`   `1960`   `1961`   `1962`   `1963`   `1964`   `1965`
  <chr>     <chr>             <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
1 Aruba     ABW            NA       NA       NA       NA       NA       NA      
2 Africa E… AFE             2.11e10  2.16e10  2.35e10  2.80e10  2.59e10  2.95e10
3 Afghanis… AFG             5.38e 8  5.49e 8  5.47e 8  7.51e 8  8.00e 8  1.01e 9
4 Africa W… AFW             1.04e10  1.12e10  1.20e10  1.27e10  1.39e10  1.49e10
5 Angola    AGO            NA       NA       NA       NA       NA       NA      
6 Albania   ALB            NA       NA       NA       NA       NA       NA      
# ℹ 57 more variables: `1966` <dbl>, `1967` <dbl>, `1968` <dbl>, `1969` <dbl>,
#   `1970` <dbl>, `1971` <dbl>, `1972` <dbl>, `1973` <dbl>, `1974` <dbl>,
#   `1975` <dbl>, `1976` <dbl>, `1977` <dbl>, `1978` <dbl>, `1979` <dbl>,
#   `1980` <dbl>, `1981` <dbl>, `1982` <dbl>, `1983` <dbl>, `1984` <dbl>,
#   `1985` <dbl>, `1986` <dbl>, `1987` <dbl>, `1988` <dbl>, `1989` <dbl>,
#   `1990` <dbl>, `1991` <dbl>, `1992` <dbl>, `1993` <dbl>, `1994` <dbl>,
#   `1995` <dbl>, `1996` <dbl>, `1997` <dbl>, `1998` <dbl>, `1999` <dbl>, …

As can be observed, the dataset is already structured with the first row containing the relevant variable names, so no metadata-row removal is required in this case. The next step is to clean the column names and convert the dataset into a long format.

Code

gdp_tidy <- gdp_raw %>%
  clean_names() %>%
  pivot_longer(
    cols = -c(country, country_code),
    names_to = "year",
    values_to = "gdp"
  ) %>%
  mutate(
    year = str_extract(year, "\\d{4}") %>% as.integer(),
    gdp = as.numeric(gdp)
  )

glimpse(gdp_tidy)

Rows: 16,758
Columns: 4
$ country      <chr> "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Ar…
$ country_code <chr> "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "…
$ year         <int> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 196…
$ gdp          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…

Code

head(gdp_tidy)

# A tibble: 6 × 4
  country country_code  year   gdp
  <chr>   <chr>        <int> <dbl>
1 Aruba   ABW           1960    NA
2 Aruba   ABW           1961    NA
3 Aruba   ABW           1962    NA
4 Aruba   ABW           1963    NA
5 Aruba   ABW           1964    NA
6 Aruba   ABW           1965    NA

At this stage, the GDP dataset has now been transformed into a tidy format, with one row representing one country-year observation. Missing GDP values will be retained as NAs for now and only filtered out where necessary for specific analyses, so as not to prematurely reduce the informational content of the original data.

We may now proceed on to the analytical step for this second dataset.

Step 3.3: Analysis

The first analysis to be conducted upon the tidied GDP dataset will involve examining broad GDP trends over time across all included countries.

Summary Table: Average GDP by Year

Code

gdp_summary <- gdp_tidy %>%
  group_by(year) %>%
  summarise(
    average_gdp = mean(gdp, na.rm = TRUE),
    .groups = "drop"
  )

gdp_summary %>%
  gt() %>%
  tab_header(
    title = "Average GDP by Year",
    subtitle = "Summary of annual GDP values (USD billions)"
  ) %>%
  cols_label(
    year = "Year",
    average_gdp = "Average GDP (Billion USD)"
  ) %>%
  fmt_number(
    columns = average_gdp,
    decimals = 2,
    scale_by = 1e-9
  )

Year	Average GDP (Billion USD)
Average GDP by Year
Summary of annual GDP values (USD billions)
1960	71.27
1961	72.75
1962	76.40
1963	82.36
1964	90.48
1965	91.83
1966	102.17
1967	105.98
1968	111.11
1969	122.87
1970	132.99
1971	144.00
1972	165.88
1973	204.76
1974	238.06
1975	262.66
1976	282.67
1977	313.95
1978	370.46
1979	428.40
1980	453.33
1981	463.51
1982	454.57
1983	454.35
1984	466.44
1985	482.14
1986	558.57
1987	619.86
1988	681.47
1989	710.55
1990	755.93
1991	778.78
1992	826.52
1993	831.35
1994	885.75
1995	953.09
1996	982.13
1997	981.06
1998	970.39
1999	993.18
2000	1,009.81
2001	998.93
2002	1,018.48
2003	1,149.29
2004	1,306.61
2005	1,431.44
2006	1,565.92
2007	1,794.23
2008	1,997.00
2009	1,895.74
2010	2,097.25
2011	2,322.63
2012	2,402.47
2013	2,478.71
2014	2,540.06
2015	2,401.52
2016	2,435.92
2017	2,611.09
2018	2,773.99
2019	2,826.37
2020	2,739.71
2021	3,211.35
2022	3,599.79

The above table provides a visual representation of how GDP has generally changed over time across all included countries. In this, we can observe substantial increases, from an average of 71.27 Billion USD in 1960 to approximately 3,599.79 Billion USD in 2022.

Country Level Comparison

In addition to the overall trend, it may also be useful to compare GDP growth across a selected number of countries. For clarity, the following comparison will be limited to six countries.

Code

selected_gdp_countries <- c("United States", "Guyana", "China", "India", "Brazil", "Nigeria")

gdp_selected <- gdp_tidy %>%
  filter(country %in% selected_gdp_countries)

ggplot(gdp_selected, aes(x = year, y = gdp, color = country)) +
  geom_line() +
  scale_y_log10() +
  labs(
    title = "GDP Trends for Selected Countries",
    x = "Year",
    y = "GDP",
    color = "Country"
  ) +
  theme(plot.title = element_text(hjust = 0.5))

The visual comparison above allows us to more easily interpret and compare the GDP growth trends of the selected countries over time. In general, all six countries exhibit an overall upward trend in GDP, though at very different magnitudes. China demonstrates the most pronounced acceleration in growth, especially from the early 2000s onward. India and Brazil also show sustained long-term growth, Nigeria displays a generally rising but somewhat more variable pattern, and Guyana, though remaining the smallest economy among the selected countries, still shows noticeable expansion in the later years.

It should also be noted that the y-axis has been transformed to a logarithmic (base-10) scale, meaning that the plotted values no longer represent equal absolute increases, but rather proportional ones; that is, equal vertical distances now correspond to multiplicative changes in GDP, which makes it easier to compare the growth trajectories of both larger and smaller economies within the same visualization.

Summary Table: Top-Performing Countries by Economic Output

The final analytical endeavor for this second dataset, and the last of classmate Sinem’s question to be addressed is that of determining the top-performing countries in terms of economic output.

Code

library(countrycode)

valid_codes <- unique(countrycode::codelist$iso3c)

top_gdp_countries <- gdp_tidy %>%
  filter(country_code %in% valid_codes) %>%
  filter(year == max(year, na.rm = TRUE)) %>%
  drop_na(gdp) %>%
  arrange(desc(gdp)) %>%
  slice_head(n = 10) %>%
  mutate(gdp_billions_usd = gdp / 1e9)

top_gdp_countries %>%
  select(country, country_code, year, gdp_billions_usd) %>%
  gt() %>%
  tab_header(
    title = "Top-Performing Countries by GDP",
    subtitle = "Highest GDP values in the most recent year available"
  ) %>%
  cols_label(
    country = "Country",
    country_code = "Country Code",
    year = "Year",
    gdp_billions_usd = "GDP (Billions USD)"
  ) %>%
  fmt_number(
    columns = gdp_billions_usd,
    decimals = 2
  )

Country	Country Code	Year	GDP (Billions USD)
Top-Performing Countries by GDP
Highest GDP values in the most recent year available
United States	USA	2022	25,462.70
China	CHN	2022	17,963.20
Japan	JPN	2022	4,231.14
Germany	DEU	2022	4,072.19
India	IND	2022	3,385.09
United Kingdom	GBR	2022	3,070.67
France	FRA	2022	2,782.91
Russian Federation	RUS	2022	2,240.42
Canada	CAN	2022	2,139.84
Italy	ITA	2022	2,010.43

The above summary table highlights the countries with the highest levels of economic output in the most recent year available within the dataset. From this ranking, it is evident that the United States and China occupy the top two positions by a substantial margin, reflecting the scale of their respective economies. The remaining countries (such as Japan, Germany, India, the United Kingdom, etc.) also demonstrate significant economic output, though at levels notably below the two leading economies. Overall, the table illustrates how a relatively small number of countries account for a large share of global GDP, emphasizing the concentration of economic activity among the world’s largest economies.

Brief Interpretation/Dataset Conclusion

From the analyses conducted on the tidied GDP dataset, a clear long-term expansion in economic output can be observed across the examined period. The average GDP values show a steady increase from the early decades of the dataset to the most recent year available, indicating substantial global economic growth over time. The country-level comparison further illustrates that while many countries experience upward economic trajectories, the magnitude of growth differs considerably between them. Larger economies such as the United States and China dominate the global GDP figures, while other countries exhibit notable but comparatively smaller levels of economic output. Overall, these results suggest that although economic growth has occurred broadly across countries, global economic activity remains concentrated among a relatively small number of the world’s largest economies.

Dataset Three: Renewable Energy Consumption

Step 3.1: Raw Data Construction

At this stage, the renewable energy consumption dataset was obtained from the World Bank source originally identified in classmate Ciara Bonnett-Jones’ discussion post, and then committed to my personal GitHub repository in its original wide format. The file preserves the dataset’s original structure, wherein the year values are spread across multiple columns, and will now be imported into RStudio for tidying and subsequent analysis.

Step 3.2: Data Import and Tidying

Now that the raw wide format renewable energy dataset has been committed and made accessible, the next step is to import it into RStudio and apply a reproducible tidying pipeline. As with the crude birth rate dataset, this file contains metadata rows before the true variable names begin, so these rows must first be removed. Thereafter, the dataset can be reshaped from wide to long format using pivot_longer(), with the year values converted into a tidy variable and the renewable energy figures converted into numeric form for analysis.

Code

renewable_url <- "https://raw.githubusercontent.com/bkchanderban/CUNY_SPS/refs/heads/main/DATA607/DATA607/Project%20Two%20Assignment/renewable_energy_consumption_dataset.csv"

renewable_raw <- read_csv(renewable_url, col_names = FALSE, show_col_types = FALSE)

glimpse(renewable_raw)

Rows: 271
Columns: 70
$ X1  <chr> "Data Source", NA, "Last Updated Date", NA, "Country Name", "Aruba…
$ X2  <chr> "World Development Indicators", NA, "24/02/2026", NA, "Country Cod…
$ X3  <chr> NA, NA, NA, NA, "Indicator Name", "Renewable energy consumption (%…
$ X4  <chr> NA, NA, NA, NA, "Indicator Code", "EG.FEC.RNEW.ZS", "EG.FEC.RNEW.Z…
$ X5  <dbl> NA, NA, NA, NA, 1960, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X6  <dbl> NA, NA, NA, NA, 1961, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X7  <dbl> NA, NA, NA, NA, 1962, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X8  <dbl> NA, NA, NA, NA, 1963, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X9  <dbl> NA, NA, NA, NA, 1964, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X10 <dbl> NA, NA, NA, NA, 1965, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X11 <dbl> NA, NA, NA, NA, 1966, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X12 <dbl> NA, NA, NA, NA, 1967, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X13 <dbl> NA, NA, NA, NA, 1968, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X14 <dbl> NA, NA, NA, NA, 1969, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X15 <dbl> NA, NA, NA, NA, 1970, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X16 <dbl> NA, NA, NA, NA, 1971, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X17 <dbl> NA, NA, NA, NA, 1972, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X18 <dbl> NA, NA, NA, NA, 1973, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X19 <dbl> NA, NA, NA, NA, 1974, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X20 <dbl> NA, NA, NA, NA, 1975, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X21 <dbl> NA, NA, NA, NA, 1976, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X22 <dbl> NA, NA, NA, NA, 1977, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X23 <dbl> NA, NA, NA, NA, 1978, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X24 <dbl> NA, NA, NA, NA, 1979, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X25 <dbl> NA, NA, NA, NA, 1980, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X26 <dbl> NA, NA, NA, NA, 1981, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X27 <dbl> NA, NA, NA, NA, 1982, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X28 <dbl> NA, NA, NA, NA, 1983, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X29 <dbl> NA, NA, NA, NA, 1984, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X30 <dbl> NA, NA, NA, NA, 1985, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X31 <dbl> NA, NA, NA, NA, 1986, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X32 <dbl> NA, NA, NA, NA, 1987, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X33 <dbl> NA, NA, NA, NA, 1988, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X34 <dbl> NA, NA, NA, NA, 1989, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X35 <dbl> NA, NA, NA, NA, 1990.000000, 0.300000, 60.842404, 23.000000, 85.89…
$ X36 <dbl> NA, NA, NA, NA, 1991.000000, 0.200000, 62.133295, 23.700000, 85.26…
$ X37 <dbl> NA, NA, NA, NA, 1992.000000, 0.200000, 63.940789, 27.400000, 83.81…
$ X38 <dbl> NA, NA, NA, NA, 1993.000000, 0.200000, 64.651965, 28.500000, 85.10…
$ X39 <dbl> NA, NA, NA, NA, 1994.000000, 0.200000, 65.145067, 30.100000, 86.55…
$ X40 <dbl> NA, NA, NA, NA, 1995.000000, 0.200000, 64.721535, 31.800000, 86.14…
$ X41 <dbl> NA, NA, NA, NA, 1996.000000, 0.200000, 63.929272, 33.900000, 85.11…
$ X42 <dbl> NA, NA, NA, NA, 1997.000000, 0.200000, 63.241293, 36.400000, 83.91…
$ X43 <dbl> NA, NA, NA, NA, 1998.000000, 0.200000, 63.962096, 38.000000, 84.63…
$ X44 <dbl> NA, NA, NA, NA, 1999.000000, 0.200000, 65.058401, 43.400000, 84.00…
$ X45 <dbl> NA, NA, NA, NA, 2000.000000, 0.200000, 65.385867, 45.000000, 83.10…
$ X46 <dbl> NA, NA, NA, NA, 2001.000000, 0.200000, 65.817363, 45.600000, 81.93…
$ X47 <dbl> NA, NA, NA, NA, 2002.000000, 0.200000, 65.099281, 37.800000, 81.73…
$ X48 <dbl> NA, NA, NA, NA, 2003.000000, 0.200000, 64.615124, 36.700000, 80.71…
$ X49 <dbl> NA, NA, NA, NA, 2004.000000, 0.200000, 63.678035, 44.200000, 80.79…
$ X50 <dbl> NA, NA, NA, NA, 2005.000000, 0.200000, 63.109098, 33.900000, 80.93…
$ X51 <dbl> NA, NA, NA, NA, 2006.000000, 0.200000, 62.738124, 31.900000, 81.87…
$ X52 <dbl> NA, NA, NA, NA, 2007.000000, 0.200000, 61.505169, 28.800000, 82.33…
$ X53 <dbl> NA, NA, NA, NA, 2008.000000, 0.200000, 61.908027, 21.200000, 81.21…
$ X54 <dbl> NA, NA, NA, NA, 2009.000000, 0.300000, 61.304601, 16.500000, 82.39…
$ X55 <dbl> NA, NA, NA, NA, 2010.000000, 5.500000, 62.876387, 15.200000, 80.68…
$ X56 <dbl> NA, NA, NA, NA, 2011.000000, 5.700000, 62.805279, 12.600000, 79.28…
$ X57 <dbl> NA, NA, NA, NA, 2012.000000, 6.900000, 61.409964, 15.400000, 78.48…
$ X58 <dbl> NA, NA, NA, NA, 2013.000000, 6.900000, 60.801799, 16.900000, 76.70…
$ X59 <dbl> NA, NA, NA, NA, 2014.000000, 6.900000, 61.235882, 19.100000, 75.74…
$ X60 <dbl> NA, NA, NA, NA, 2015.000000, 6.700000, 61.670278, 17.700000, 76.18…
$ X61 <dbl> NA, NA, NA, NA, 2016.000000, 7.000000, 61.822884, 20.200000, 75.79…
$ X62 <dbl> NA, NA, NA, NA, 2017.000000, 6.800000, 61.426950, 19.500000, 76.10…
$ X63 <dbl> NA, NA, NA, NA, 2018.000000, 8.600000, 61.587530, 18.300000, 75.47…
$ X64 <dbl> NA, NA, NA, NA, 2019.000000, 8.300000, 62.690710, 18.900000, 75.07…
$ X65 <dbl> NA, NA, NA, NA, 2020.000000, 9.100000, 65.782380, 18.200000, 75.84…
$ X66 <dbl> NA, NA, NA, NA, 2021.0, 8.8, NA, 20.0, NA, 52.9, 41.9, 18.4, NA, 1…
$ X67 <dbl> NA, NA, NA, NA, 2022.0, 8.8, NA, 20.0, NA, NA, NA, 18.7, NA, NA, N…
$ X68 <dbl> NA, NA, NA, NA, 2023, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X69 <dbl> NA, NA, NA, NA, 2024, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ X70 <dbl> NA, NA, NA, NA, 2025, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …

Code

head(renewable_raw)

# A tibble: 6 × 70
  X1     X2    X3    X4       X5    X6    X7    X8    X9   X10   X11   X12   X13
  <chr>  <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Data … Worl… <NA>  <NA>     NA    NA    NA    NA    NA    NA    NA    NA    NA
2 <NA>   <NA>  <NA>  <NA>     NA    NA    NA    NA    NA    NA    NA    NA    NA
3 Last … 24/0… <NA>  <NA>     NA    NA    NA    NA    NA    NA    NA    NA    NA
4 <NA>   <NA>  <NA>  <NA>     NA    NA    NA    NA    NA    NA    NA    NA    NA
5 Count… Coun… Indi… Indi…  1960  1961  1962  1963  1964  1965  1966  1967  1968
6 Aruba  ABW   Rene… EG.F…    NA    NA    NA    NA    NA    NA    NA    NA    NA
# ℹ 57 more variables: X14 <dbl>, X15 <dbl>, X16 <dbl>, X17 <dbl>, X18 <dbl>,
#   X19 <dbl>, X20 <dbl>, X21 <dbl>, X22 <dbl>, X23 <dbl>, X24 <dbl>,
#   X25 <dbl>, X26 <dbl>, X27 <dbl>, X28 <dbl>, X29 <dbl>, X30 <dbl>,
#   X31 <dbl>, X32 <dbl>, X33 <dbl>, X34 <dbl>, X35 <dbl>, X36 <dbl>,
#   X37 <dbl>, X38 <dbl>, X39 <dbl>, X40 <dbl>, X41 <dbl>, X42 <dbl>,
#   X43 <dbl>, X44 <dbl>, X45 <dbl>, X46 <dbl>, X47 <dbl>, X48 <dbl>,
#   X49 <dbl>, X50 <dbl>, X51 <dbl>, X52 <dbl>, X53 <dbl>, X54 <dbl>, …

Upon import, the dataset contains metadata rows above the true headers. These first four rows must therefore be removed, and the new first row must be promoted to the variable names.

Code

renewable_cleaned <- renewable_raw %>%
  slice(-1, -2, -3, -4)

colnames(renewable_cleaned) <- as.character(renewable_cleaned[1,])

renewable_cleaned <- renewable_cleaned %>%
  slice(-1)

head(renewable_cleaned)

# A tibble: 6 × 70
  `Country Name`  `Country Code` `Indicator Name` `Indicator Code` `1960` `1961`
  <chr>           <chr>          <chr>            <chr>             <dbl>  <dbl>
1 Aruba           ABW            Renewable energ… EG.FEC.RNEW.ZS       NA     NA
2 Africa Eastern… AFE            Renewable energ… EG.FEC.RNEW.ZS       NA     NA
3 Afghanistan     AFG            Renewable energ… EG.FEC.RNEW.ZS       NA     NA
4 Africa Western… AFW            Renewable energ… EG.FEC.RNEW.ZS       NA     NA
5 Angola          AGO            Renewable energ… EG.FEC.RNEW.ZS       NA     NA
6 Albania         ALB            Renewable energ… EG.FEC.RNEW.ZS       NA     NA
# ℹ 64 more variables: `1962` <dbl>, `1963` <dbl>, `1964` <dbl>, `1965` <dbl>,
#   `1966` <dbl>, `1967` <dbl>, `1968` <dbl>, `1969` <dbl>, `1970` <dbl>,
#   `1971` <dbl>, `1972` <dbl>, `1973` <dbl>, `1974` <dbl>, `1975` <dbl>,
#   `1976` <dbl>, `1977` <dbl>, `1978` <dbl>, `1979` <dbl>, `1980` <dbl>,
#   `1981` <dbl>, `1982` <dbl>, `1983` <dbl>, `1984` <dbl>, `1985` <dbl>,
#   `1986` <dbl>, `1987` <dbl>, `1988` <dbl>, `1989` <dbl>, `1990` <dbl>,
#   `1991` <dbl>, `1992` <dbl>, `1993` <dbl>, `1994` <dbl>, `1995` <dbl>, …

The cleaned dataset can now have its column names standardized prior to reshaping.

Code

renewable_cleaned <- renewable_cleaned %>%
  clean_names()

glimpse(renewable_cleaned)

Rows: 266
Columns: 70
$ country_name   <chr> "Aruba", "Africa Eastern and Southern", "Afghanistan", …
$ country_code   <chr> "ABW", "AFE", "AFG", "AFW", "AGO", "ALB", "AND", "ARB",…
$ indicator_name <chr> "Renewable energy consumption (% of total final energy …
$ indicator_code <chr> "EG.FEC.RNEW.ZS", "EG.FEC.RNEW.ZS", "EG.FEC.RNEW.ZS", "…
$ x1960          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1961          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1962          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1963          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1964          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1965          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1966          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1967          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1968          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1969          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1970          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1971          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1972          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1973          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1974          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1975          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1976          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1977          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1978          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1979          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1980          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1981          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1982          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1983          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1984          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1985          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1986          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1987          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1988          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1989          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x1990          <dbl> 0.300000, 60.842404, 23.000000, 85.895442, 72.300000, 2…
$ x1991          <dbl> 0.200000, 62.133295, 23.700000, 85.262074, 71.900000, 3…
$ x1992          <dbl> 0.200000, 63.940789, 27.400000, 83.818324, 72.700000, 4…
$ x1993          <dbl> 0.200000, 64.651965, 28.500000, 85.103232, 71.300000, 5…
$ x1994          <dbl> 0.200000, 65.145067, 30.100000, 86.557685, 72.200000, 5…
$ x1995          <dbl> 0.200000, 64.721535, 31.800000, 86.148094, 73.800000, 5…
$ x1996          <dbl> 0.200000, 63.929272, 33.900000, 85.118648, 72.900000, 5…
$ x1997          <dbl> 0.200000, 63.241293, 36.400000, 83.911059, 73.400000, 5…
$ x1998          <dbl> 0.200000, 63.962096, 38.000000, 84.634537, 76.400000, 4…
$ x1999          <dbl> 0.200000, 65.058401, 43.400000, 84.005039, 72.800000, 4…
$ x2000          <dbl> 0.200000, 65.385867, 45.000000, 83.101512, 73.800000, 4…
$ x2001          <dbl> 0.200000, 65.817363, 45.600000, 81.939243, 72.800000, 3…
$ x2002          <dbl> 0.200000, 65.099281, 37.800000, 81.735528, 70.800000, 3…
$ x2003          <dbl> 0.200000, 64.615124, 36.700000, 80.710450, 65.600000, 3…
$ x2004          <dbl> 0.200000, 63.678035, 44.200000, 80.790537, 62.300000, 3…
$ x2005          <dbl> 0.200000, 63.109098, 33.900000, 80.934034, 68.900000, 3…
$ x2006          <dbl> 0.200000, 62.738124, 31.900000, 81.873310, 62.600000, 3…
$ x2007          <dbl> 0.200000, 61.505169, 28.800000, 82.330015, 58.900000, 3…
$ x2008          <dbl> 0.200000, 61.908027, 21.200000, 81.211170, 55.400000, 3…
$ x2009          <dbl> 0.300000, 61.304601, 16.500000, 82.390213, 53.000000, 3…
$ x2010          <dbl> 5.500000, 62.876387, 15.200000, 80.687806, 53.100000, 3…
$ x2011          <dbl> 5.700000, 62.805279, 12.600000, 79.282287, 51.700000, 3…
$ x2012          <dbl> 6.900000, 61.409964, 15.400000, 78.481230, 49.000000, 3…
$ x2013          <dbl> 6.900000, 60.801799, 16.900000, 76.701499, 46.800000, 4…
$ x2014          <dbl> 6.900000, 61.235882, 19.100000, 75.741009, 47.500000, 3…
$ x2015          <dbl> 6.700000, 61.670278, 17.700000, 76.181671, 47.100000, 3…
$ x2016          <dbl> 7.000000, 61.822884, 20.200000, 75.796840, 48.100000, 3…
$ x2017          <dbl> 6.800000, 61.426950, 19.500000, 76.101681, 52.500000, 3…
$ x2018          <dbl> 8.600000, 61.587530, 18.300000, 75.476578, 52.500000, 3…
$ x2019          <dbl> 8.300000, 62.690710, 18.900000, 75.078788, 51.000000, 4…
$ x2020          <dbl> 9.100000, 65.782380, 18.200000, 75.846980, 60.100000, 4…
$ x2021          <dbl> 8.8, NA, 20.0, NA, 52.9, 41.9, 18.4, NA, 1.0, 9.2, 9.1,…
$ x2022          <dbl> 8.8, NA, 20.0, NA, NA, NA, 18.7, NA, NA, NA, NA, 0.4, 0…
$ x2023          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x2024          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ x2025          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

Code

head(renewable_cleaned)

# A tibble: 6 × 70
  country_name      country_code indicator_name indicator_code x1960 x1961 x1962
  <chr>             <chr>        <chr>          <chr>          <dbl> <dbl> <dbl>
1 Aruba             ABW          Renewable ene… EG.FEC.RNEW.ZS    NA    NA    NA
2 Africa Eastern a… AFE          Renewable ene… EG.FEC.RNEW.ZS    NA    NA    NA
3 Afghanistan       AFG          Renewable ene… EG.FEC.RNEW.ZS    NA    NA    NA
4 Africa Western a… AFW          Renewable ene… EG.FEC.RNEW.ZS    NA    NA    NA
5 Angola            AGO          Renewable ene… EG.FEC.RNEW.ZS    NA    NA    NA
6 Albania           ALB          Renewable ene… EG.FEC.RNEW.ZS    NA    NA    NA
# ℹ 63 more variables: x1963 <dbl>, x1964 <dbl>, x1965 <dbl>, x1966 <dbl>,
#   x1967 <dbl>, x1968 <dbl>, x1969 <dbl>, x1970 <dbl>, x1971 <dbl>,
#   x1972 <dbl>, x1973 <dbl>, x1974 <dbl>, x1975 <dbl>, x1976 <dbl>,
#   x1977 <dbl>, x1978 <dbl>, x1979 <dbl>, x1980 <dbl>, x1981 <dbl>,
#   x1982 <dbl>, x1983 <dbl>, x1984 <dbl>, x1985 <dbl>, x1986 <dbl>,
#   x1987 <dbl>, x1988 <dbl>, x1989 <dbl>, x1990 <dbl>, x1991 <dbl>,
#   x1992 <dbl>, x1993 <dbl>, x1994 <dbl>, x1995 <dbl>, x1996 <dbl>, …

Now, the dataset may be converted from wide to long format.

Code

renewable_tidy <- renewable_cleaned %>%
  pivot_longer(
    cols = -c(country_name, country_code, indicator_name, indicator_code),
    names_to = "year",
    values_to = "renewable_energy_pct"
  ) %>%
  mutate(
    year = str_extract(year, "\\d{4}") %>% as.integer(),
    renewable_energy_pct = as.numeric(renewable_energy_pct)
  )

glimpse(renewable_tidy)

Rows: 17,556
Columns: 6
$ country_name         <chr> "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aru…
$ country_code         <chr> "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", …
$ indicator_name       <chr> "Renewable energy consumption (% of total final e…
$ indicator_code       <chr> "EG.FEC.RNEW.ZS", "EG.FEC.RNEW.ZS", "EG.FEC.RNEW.…
$ year                 <int> 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1…
$ renewable_energy_pct <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…

Code

head(renewable_tidy)

# A tibble: 6 × 6
  country_name country_code indicator_name                  indicator_code  year
  <chr>        <chr>        <chr>                           <chr>          <int>
1 Aruba        ABW          Renewable energy consumption (… EG.FEC.RNEW.ZS  1960
2 Aruba        ABW          Renewable energy consumption (… EG.FEC.RNEW.ZS  1961
3 Aruba        ABW          Renewable energy consumption (… EG.FEC.RNEW.ZS  1962
4 Aruba        ABW          Renewable energy consumption (… EG.FEC.RNEW.ZS  1963
5 Aruba        ABW          Renewable energy consumption (… EG.FEC.RNEW.ZS  1964
6 Aruba        ABW          Renewable energy consumption (… EG.FEC.RNEW.ZS  1965
# ℹ 1 more variable: renewable_energy_pct <dbl>

At this section within the Dataset Three segment, the dataset has now been converted into a tidy format, with one row representing one country/region per year observation. Missing renewable energy values will be retained as NAs for now and only filtered out where necessary for the specific analysis.

Step 3.3: Analysis

The analysis requested for this dataset centers on comparing the growth rate in renewable energy usage across different geographic regions over a ten-year period. As such, the first step is to isolate the regional groupings from the larger dataset and restrict the dataset to a recent ten-year window.

Code

selected_regions <- c(
  "Africa Eastern and Southern",
  "East Asia & Pacific",
  "Europe & Central Asia",
  "Latin America & Caribbean",
  "North America",
  "South Asia"
)

renewable_regions <- renewable_tidy %>%
  filter(country_name %in% selected_regions) %>%
  filter(year >= 2013, year <= 2022) %>%
    rename(
      region = country_name,
      region_code = country_code
    )

renewable_regions

# A tibble: 60 × 6
   region   region_code indicator_name indicator_code  year renewable_energy_pct
   <chr>    <chr>       <chr>          <chr>          <int>                <dbl>
 1 Africa … AFE         Renewable ene… EG.FEC.RNEW.ZS  2013                 60.8
 2 Africa … AFE         Renewable ene… EG.FEC.RNEW.ZS  2014                 61.2
 3 Africa … AFE         Renewable ene… EG.FEC.RNEW.ZS  2015                 61.7
 4 Africa … AFE         Renewable ene… EG.FEC.RNEW.ZS  2016                 61.8
 5 Africa … AFE         Renewable ene… EG.FEC.RNEW.ZS  2017                 61.4
 6 Africa … AFE         Renewable ene… EG.FEC.RNEW.ZS  2018                 61.6
 7 Africa … AFE         Renewable ene… EG.FEC.RNEW.ZS  2019                 62.7
 8 Africa … AFE         Renewable ene… EG.FEC.RNEW.ZS  2020                 65.8
 9 Africa … AFE         Renewable ene… EG.FEC.RNEW.ZS  2021                 NA  
10 Africa … AFE         Renewable ene… EG.FEC.RNEW.ZS  2022                 NA  
# ℹ 50 more rows

The following visualization can then be constructed in order to compare the growth trajectories of the six selected geographic regions.

Code

ggplot(renewable_regions, aes(x = year, y = renewable_energy_pct, color = region)) +
  geom_line(linewidth = 1) +
  geom_point(size = 1.3) +
  scale_y_continuous(limits = c(9, 67)) +
  labs(
    title = "Renewable Energy Usage Across Selected Regions\n(2013–2022)",
    x = "Year",
    y = "Renewable Energy Consumption (%)",
    color = "Region"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16),
    legend.position = "bottom",
    legend.title = element_text(size = 12),
    legend.text = element_text(size = 10)
  )

The above visualization illustrates the changes in renewable energy consumption across the selected geographical regions over the 2013 to 2022 period. In general, most regions exhibit a gradual upward trend, suggesting an increasing share of renewable energy within overall energy consumption. Africa Eastern and Southern consistently maintains the highest proportion of renewable energy usage, while Latin America and the Caribbean also appears to show steady growth over the period. Europe and Central Asia, East Asia and the Pacific, and North America remain comparatively lower, though each demonstrates modest increases over time. South Asia, while experiencing a slight decline in the earlier years, appears to stabilize and rise again toward the end of the observed period. Overall, the figure highlights a gradual but observable shift toward greater renewable energy adoption across multiple regions.

Brief Interpretation/Dataset Conclusion

From the analysis conducted upon this tidied renewable energy dataset, it can be observed that the selected geographic regions generally experienced either stable or gradually increasing levels of renewable energy consumption over the examined ten-year period. While the magnitude of usage differs notably between regions, the overall pattern suggests a broad movement toward greater reliance on renewable energy sources. Africa Eastern and Southern remained the highest throughout the period, while regions such as Latin America and the Caribbean also showed visible growth. Collectively, these findings suggest that the transition toward renewable energy is occurring across multiple parts of the world, though at uneven rates and from different starting points.

Overall Project Conclusion

Across the three datasets analyzed in this project, the results illustrate how different types of global indicators (for instance, demographic, economic, and environmental) can be explored through a consistent data tidying and analytical workflow in R. By transforming each dataset into a tidy format and applying simple exploratory analyses and visualizations, it becomes possible to observe meaningful patterns such as changes in birth rates, long-term economic growth across countries, and even gradual increases in renewable energy adoption across regions. Overall, the exercise demonstrates how reproducible data preparation and visualization techniques can help reveal broader global trends from publicly available datasets.

LLM Used

OpenAI. (2026). ChatGPT (Version 4o) [Large language model]. https://chat.openai.com . Accessed March 07, 2026.