Erasmus Teaching Mobility

Introduction to the Course: “Statistical Research Methods and Data Exploration”

Dear Students,

Welcome to the world of statistical research methods and data exploration! In this course, we embark on a journey through the realms of statistical analysis, drawing inspiration from the intricacies of the modern data landscape and its profound impact on global trends and economic insights.

Our guiding light for this academic odyssey is the lens of data, insightfully showcased in the article titled “What is happening to productivity in the World? Productivity Gini” This article serves as a beacon, illuminating the significance of productivity trends and their implications across nations.

As the author of that article and your guide throughout this course, I aim to take you on a compelling exploration through the multifaceted world of statistics and data analysis. We’ll delve into diverse datasets and scenarios, unraveling the mysteries concealed within the numbers, and uncovering the untold stories that data can narrate.

Our expedition commences with an emphasis on the methods behind statistical research. We will dissect the core concepts of statistical analysis, honing our skills in exploring datasets, interpreting distributions, formulating hypotheses, and using specialized criteria for analysis. These fundamental tools will be our compass, guiding us through the terrain of data interpretation and understanding.

Our itinerary doesn’t halt solely at statistical theory. Rather, it extends its boundaries to real-world applications and hands-on experiences. We’ll unravel the predictive powers of data and statistical models, exploring scenarios such as the Spaceship Titanic collision—a Kaggle competition setting—where predictive modeling and research planning take center stage.

Throughout our journey, we aim to equip you with a versatile set of tools—skills in data acquisition, exploration, hypothesis testing, predictive modeling, and research planning. These skills are not only vital in the world of academia but are also indispensable in the practical realms of industry, governance, and global economics.

So, fasten your seatbelts, prepare your analytical compasses, and get ready to navigate the enthralling landscapes of statistical research and data exploration. This course is a platform for us to delve into the captivating world of numbers, patterns, and stories that data conceals.

Together, let us embark on this voyage of discovery and enlightenment, where statistical research methods and data exploration lead us to a deeper understanding of our ever-evolving world.

Welcome aboard!

Reach the Data Available

Exploring World Development Indicators: A Gateway to Productivity Insights

In our quest to decipher the enigmatic landscape of productivity trends across the globe, our first port of call is an exploration into the realm of World Development Indicators (WDI). These indicators, encapsulated within the World Bank’s vast repository, are our stepping stones to understanding the economic pulse of nations and their productivity dynamics.

In this segment, we embark on an immersive expedition through the WDI—a gateway to a trove of economic data, metrics, and developmental insights. Our primary objective is to empower you with the fundamental skills needed to access, dissect, and glean valuable insights from this rich tapestry of information.

To initiate this journey, we will engage in a simple yet profound demonstration, showcasing how to access and navigate the WDI database using accessible tools, primarily the WDI package in R. Through hands-on exploration, we will highlight the seamless process of fetching pertinent economic indicators, such as GDP, population metrics, and other crucial developmental measures encapsulated within the WDI.

As we dive into the WDI dataset, we aim to illustrate the steps involved in extracting, summarizing, and visualizing key indicators. This immersive experience will equip you with the rudimentary skills to harness the potential of publicly available datasets, particularly the WDI, to unravel trends, patterns, and insights regarding global productivity.

This session acts as your compass, guiding you through the initial steps in your journey toward deciphering the complex tapestry of economic data. By the end of this segment, you will emerge with the prowess to navigate the WDI terrain, laying the groundwork for deeper explorations into the productivity narratives prevalent across the globe.

Join us on this inaugural step as we set sail to explore the WDI, laying the foundation for a deeper understanding of the global economic landscape and its productivity dynamics.

Understanding Packages in R: Your Tools for Data Exploration

In the realm of R, packages act as your trusty tools and resources, offering a multitude of functions and capabilities that augment the core functionalities of the programming language. They are instrumental in extending R’s abilities to encompass diverse data manipulation, visualization, and statistical analysis.

What Are R Packages?

R packages are collections of functions, datasets, and other supplementary materials bundled together to serve a specific purpose. Each package serves as a specialized toolkit, tailored to fulfill distinct analytical or computational needs. These packages are the gears in the machinery of R, enabling you to perform various tasks efficiently and effectively.

Why Packages Matter?

As you venture into the realm of statistical research and data exploration, understanding and utilizing packages become integral. They equip you with an arsenal of functions, enabling you to perform tasks ranging from data acquisition and cleaning to statistical analysis and visualization. Using packages saves time and effort by providing pre-written code for complex operations.

How to Access and Use Packages?

In R, accessing and using packages is a straightforward process. The key steps involve installation, loading, and utilization. The process begins by installing a package, which is a one-time operation. Once installed, the package needs to be loaded into your R session to access its functions and datasets. After loading, you can use the functions provided by the package to perform various tasks, adding powerful tools to your analytical toolkit.

Introduction to Package Usage

Throughout our course, we will introduce and utilize specific packages, such as the WDI package for World Bank data retrieval and analysis. We will guide you through the steps of installing, loading, and employing these packages, ensuring you are equipped to access and harness the functionalities they offer.

In your journey of statistical exploration, packages will be your trusted companions, enriching your experience and broadening the horizons of your data analysis endeavors.

Quick Overview of the WDI Package in R

The WDI package in R is a powerful tool designed to access the World Bank’s extensive World Development Indicators (WDI) repository. It serves as a specialized toolkit, simplifying the retrieval of key economic and social data.

Key Features:

Enables retrieval of diverse indicators such as GDP, population metrics, and education statistics. Accesses time-series data, allowing trend analysis over time.
Provides an intuitive interface to navigate and retrieve specific indicators for countries or regions.

Importance:

Essential for understanding global economic trends and productivity insights.

Simplifies complex data retrieval, enabling analysis of critical economic indicators across nations. Throughout our course, we’ll explore the WDI package, guiding you through its installation, usage, and the extraction of essential indicators. This tool will be pivotal in our journey towards understanding productivity dynamics in various countries.

Quick Overview of the dplyr Package in R

The dplyr package in R is a fundamental toolkit designed for efficient and intuitive data manipulation and transformation. It stands as a versatile set of functions essential for working with data frames, enabling streamlined operations for data analysis and cleaning.

Key Features:

Provides a collection of functions for common data manipulation tasks: filtering, selecting, arranging, summarizing, and mutating data.
Enhances data frame operations, allowing for seamless data manipulation using a consistent and easy-to-understand syntax.
Offers a cohesive set of verbs designed for intuitive and efficient data transformation and analysis. Importance:
Simplifies complex data manipulations, making tasks such as subsetting, summarizing, and arranging data frames more straightforward and coherent.
Crucial for data cleaning, transformation, and preparation, enabling efficient and effective data analysis workflows.

Throughout our course, we’ll delve into the capabilities of the dplyr package, showcasing its functions and guiding you through its usage. This package will serve as a cornerstone in your journey toward mastering essential data manipulation skills and fostering efficient data analysis practices.

Downloading and Using Packages in R

To download and install a package (e.g., “dplyr”), use install.packages() function. For example, install.packages("dplyr"). The install.packages() function is used to download and install a package onto your system. This operation is required only once for each package you wish to use.

To use the installed package’s functions, load it into your current R session using library() function. For example, library(dplyr). Loading a package via library() makes its functions and capabilities available for use in your current R session. Each time you start a new R session, you need to reload the packages you plan to use.

To access the package’s functions and capabilities in R, you’ll use library() each time you initiate an R session. This loads the installed package into the current session for immediate use. Throughout our course, we’ll emphasize the importance of installing necessary packages for R and loading them using library(), enabling you to access the full suite of tools and functions for your data analysis and statistical exploration.

Exploring Productivity Trends: Analytical Journey Using R

In our quest to understand global productivity dynamics, we embark on a journey steeped in data analysis and statistical exploration. Our foundational steps are guided by an article titled What is happening to productivity in the World? Productivity Gini. This article serves as a launchpad to delve into productivity trends across nations, utilizing data from the World Bank’s repository.

Downloading Relevant World Bank Data

Visit the World Bank Data Catalog:

Go to the World Bank Data Catalog https://data.worldbank.org/ or World Bank Data page https://databank.worldbank.org/home.

Search or Explore Indicators:

Use the search bar or explore the indicators section to find the data of interest. For instance, you might search for indicators like GDP, population, inflation, etc.

Select Indicators:

Click on the specific indicator of interest. This will lead to a page where the data and its details are provided.

Note the Indicator Code:

The page for each indicator usually provides a code that represents it. For example, “NY.GDP.MKTP.CD” represents GDP.

Understand Indicator Descriptions:

You should read the indicator descriptions to ensure they are using the correct variables for their analysis.

Leveraging the WDI package in R, we access critical World Development Indicators, gathering economic and social metrics necessary for our analysis.

Note for students: If you haven’t installed the necessary packages before, please uncomment the code below (remove the #) for installation and then re-comment it after the installation process to ensure the code works.

# Install required packages (uncomment and run if needed)
# install.packages("WDI") # If not installed previously
library(WDI) # Load WDI package

The World Bank’s database comprises over 1,400 time series indicators for 217 economies and more than 40 country groups, spanning data over 50 years. When you visit https://data.worldbank.org/, you can explore the data based on ‘country’ and ‘indicator’. Under the ‘Browse by Country or Indicator’ option, delving into the ‘indicator’ section allows access to numerous time series under different headings. The initial headings cover various aspects, such as ‘Agriculture and Rural Development,’ ‘Aid Effectiveness,’ and ‘Climate Change,’ containing a total of 20 headings and 1,400 time series.

To find the keyword for the dataset you want, navigating to the ‘indicator’ section and visiting the page at https://data.worldbank.org/indicator leads you to the specific dataset page. For instance, if you’re curious about Gross Domestic Product (GDP) and Total Population data for various countries, you’ll need to utilize the respective URLs provided: under the ‘Economy & Growth’ category you will see GDP (current US$), and under the ‘Climate Change’ category you will see Population, total, you might land on a page like this:

The ‘NY.GDP.MKTP.CD’, ‘NY.GDP.DEFL.ZS’ and ‘SP.POP.TOTL’ in the middle of the urls is the keyword identifier for this specific indicators.

# Download data from World Bank using WDI package
df <- WDI(country = "all", indicator = c("NY.GDP.MKTP.CD", "SP.POP.TOTL"))

First, I will use str() function. In R, str() is a function used to reveal the structure of a dataset. When applied to a dataset, such as df in our case, str(df) provides a concise overview of its composition and displays details about the columns (variables) present in the dataset. It reveals the names and data types of each column.It offers a glimpse into the first few rows of the dataset. This preview showcases actual data, helping to understand what’s stored in the dataset. It assists in identifying the types of data in each column, such as numeric, character (text), or factors (categorical data). It helps in spotting missing values (‘NA’) within the dataset, crucial for data cleaning and analysis.

str(df)

## 'data.frame':    16758 obs. of  6 variables:
##  $ country       : chr  "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ iso2c         : chr  "AF" "AF" "AF" "AF" ...
##  $ iso3c         : chr  "AFG" "AFG" "AFG" "AFG" ...
##  $ year          : int  1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 ...
##  $ NY.GDP.MKTP.CD: num  5.38e+08 5.49e+08 5.47e+08 7.51e+08 8.00e+08 ...
##   ..- attr(*, "label")= chr "GDP (current US$)"
##  $ SP.POP.TOTL   : num  8622466 8790140 8969047 9157465 9355514 ...
##   ..- attr(*, "label")= chr "Population, total"

Employing R for data manipulation using packages like dplyr and tidyr, we ensure the data is structured and ready for analysis.

# Install required packages (uncomment and run if needed)
# install.packages("dplyr")
# install.packages("tidyr") # If not installed previously
library(dplyr)
library(tidyr)

I will also use explore package. The explore package in R is a toolset designed to facilitate data exploration and analysis, providing various functions to delve into and understand your dataset more comprehensively. It offers utilities to assist in initial data investigation and descriptive statistics.

Note for students: If you haven’t installed the necessary packages before, please use install.packages() function. I will not use install.packages() function again in this lesson.

The describe_all() function, a part of the explore package, is used to generate descriptive statistics for all columns (variables) in a dataset simultaneously. When applied to a dataset, it calculates and displays statistics such as mean, median, standard deviation, minimum, maximum, and quartiles for each column.

The pipe operator (%>%) in R, often associated with the magrittr package, allows for chaining multiple operations sequentially. It enhances code readability and efficiency by passing the output of one function as the input to the next function in the sequence.

With df %>% describe_all(), the %>% operator chains the dataset df into the describe_all() function, generating comprehensive statistics for all columns in the dataset.

# Load the "explore" package
library(explore)

# Describe all columns in the dataset using describe_all() function
df %>% describe_all()

## # A tibble: 6 × 8
##   variable       type     na na_pct unique      min     mean      max
##   <chr>          <chr> <int>  <dbl>  <int>    <dbl>    <dbl>    <dbl>
## 1 country        chr       0    0      266      NA  NA       NA      
## 2 iso2c          chr       0    0      266      NA  NA       NA      
## 3 iso3c          chr       0    0      262      NA  NA       NA      
## 4 year           int       0    0       63    1960   1.99e 3  2.02e 3
## 5 NY.GDP.MKTP.CD dbl    3393   20.2  13243 8824746.  1.21e12  1.01e14
## 6 SP.POP.TOTL    dbl      93    0.6  16460    2646   2.16e 8  7.95e 9

Running df %>% describe_all() executes the describe_all() function from the explore package on the dataset df, providing a comprehensive summary of descriptive statistics for each column, offering insights into the dataset’s characteristics.

The dataset contains information on various factors for 266 unique countries over the period from 1960 to 2022. Let’s explore the details for the available variables:

country is represented as character (text).
iso2c and iso3c are coded as character strings.
year is indicated as an integer, covering the years from 1960 to 2022.
NY.GDP.MKTP.CD and SP.POP.TOTL are represented in double (numeric) format.
About 20.2% of the observations for NY.GDP.MKTP.CD (GDP) are missing.
Around 0.6% of observations for SP.POP.TOTL (total population) are not available.

The dataset provides a broad view of countries’ GDP and total population over several decades. Despite some missing values, it offers a wide range of data to explore and analyze.

266 unique countries

# Extract unique countries in the dataset
countries <- unique(df$country)

The unique() function in R is used to extract unique elements from a vector, data frame, or array.

knitr::kable(countries)

x
Afghanistan
Africa Eastern and Southern
Africa Western and Central
Albania
Algeria
American Samoa
Andorra
Angola
Antigua and Barbuda
Arab World
Argentina
Armenia
Aruba
Australia
Austria
Azerbaijan
Bahamas, The
Bahrain
Bangladesh
Barbados
Belarus
Belgium
Belize
Benin
Bermuda
Bhutan
Bolivia
Bosnia and Herzegovina
Botswana
Brazil
British Virgin Islands
Brunei Darussalam
Bulgaria
Burkina Faso
Burundi
Cabo Verde
Cambodia
Cameroon
Canada
Caribbean small states
Cayman Islands
Central African Republic
Central Europe and the Baltics
Chad
Channel Islands
Chile
China
Colombia
Comoros
Congo, Dem. Rep.
Congo, Rep.
Costa Rica
Cote d’Ivoire
Croatia
Cuba
Curacao
Cyprus
Czechia
Denmark
Djibouti
Dominica
Dominican Republic
Early-demographic dividend
East Asia & Pacific (excluding high income)
East Asia & Pacific (IDA & IBRD countries)
East Asia & Pacific
Ecuador
Egypt, Arab Rep.
El Salvador
Equatorial Guinea
Eritrea
Estonia
Eswatini
Ethiopia
Euro area
Europe & Central Asia (excluding high income)
Europe & Central Asia (IDA & IBRD countries)
Europe & Central Asia
European Union
Faroe Islands
Fiji
Finland
Fragile and conflict affected situations
France
French Polynesia
Gabon
Gambia, The
Georgia
Germany
Ghana
Gibraltar
Greece
Greenland
Grenada
Guam
Guatemala
Guinea-Bissau
Guinea
Guyana
Haiti
Heavily indebted poor countries (HIPC)
High income
Honduras
Hong Kong SAR, China
Hungary
IBRD only
Iceland
IDA & IBRD total
IDA blend
IDA only
IDA total
India
Indonesia
Iran, Islamic Rep.
Iraq
Ireland
Isle of Man
Israel
Italy
Jamaica
Japan
Jordan
Kazakhstan
Kenya
Kiribati
Korea, Dem. People’s Rep.
Korea, Rep.
Kosovo
Kuwait
Kyrgyz Republic
Lao PDR
Late-demographic dividend
Latin America & Caribbean (excluding high income)
Latin America & Caribbean
Latin America & the Caribbean (IDA & IBRD countries)
Latvia
Least developed countries: UN classification
Lebanon
Lesotho
Liberia
Libya
Liechtenstein
Lithuania
Low & middle income
Low income
Lower middle income
Luxembourg
Macao SAR, China
Madagascar
Malawi
Malaysia
Maldives
Mali
Malta
Marshall Islands
Mauritania
Mauritius
Mexico
Micronesia, Fed. Sts.
Middle East & North Africa (excluding high income)
Middle East & North Africa (IDA & IBRD countries)
Middle East & North Africa
Middle income
Moldova
Monaco
Mongolia
Montenegro
Morocco
Mozambique
Myanmar
Namibia
Nauru
Nepal
Netherlands
New Caledonia
New Zealand
Nicaragua
Niger
Nigeria
North America
North Macedonia
Northern Mariana Islands
Norway
Not classified
OECD members
Oman
Other small states
Pacific island small states
Pakistan
Palau
Panama
Papua New Guinea
Paraguay
Peru
Philippines
Poland
Portugal
Post-demographic dividend
Pre-demographic dividend
Puerto Rico
Qatar
Romania
Russian Federation
Rwanda
Samoa
San Marino
Sao Tome and Principe
Saudi Arabia
Senegal
Serbia
Seychelles
Sierra Leone
Singapore
Sint Maarten (Dutch part)
Slovak Republic
Slovenia
Small states
Solomon Islands
Somalia
South Africa
South Asia (IDA & IBRD)
South Asia
South Sudan
Spain
Sri Lanka
St. Kitts and Nevis
St. Lucia
St. Martin (French part)
St. Vincent and the Grenadines
Sub-Saharan Africa (excluding high income)
Sub-Saharan Africa (IDA & IBRD countries)
Sub-Saharan Africa
Sudan
Suriname
Sweden
Switzerland
Syrian Arab Republic
Tajikistan
Tanzania
Thailand
Timor-Leste
Togo
Tonga
Trinidad and Tobago
Tunisia
Turkiye
Turkmenistan
Turks and Caicos Islands
Tuvalu
Uganda
Ukraine
United Arab Emirates
United Kingdom
United States
Upper middle income
Uruguay
Uzbekistan
Vanuatu
Venezuela, RB
Viet Nam
Virgin Islands (U.S.)
West Bank and Gaza
World
Yemen, Rep.
Zambia
Zimbabwe

In the dataset we’re using, there are entries that do not represent actual countries but contain descriptors like Not classified, OECD members, or Other small states. To ensure accuracy in our analysis, we need to exclude these non-country entries. The WDI package provides additional information, including a dataset named WDI_data$country. By leveraging this supplementary data, we can enrich our existing dataset with more detailed country information. Using the combination of our dataset and the WDI_data$country dataset, we can identify and subsequently remove these non-country entries to refine and focus our analysis on legitimate country-specific data points for a more precise assessment.

WDI_extra <- as.data.frame(WDI_data$country)

df <- left_join(df, WDI_extra)

df <- df %>%
  filter(region != "Aggregates")

df %>% describe_all()

## # A tibble: 12 × 8
##    variable       type     na na_pct unique      min          mean      max
##    <chr>          <chr> <int>  <dbl>  <int>    <dbl>         <dbl>    <dbl>
##  1 country        chr       0    0      215      NA            NA  NA      
##  2 iso2c          chr       0    0      215      NA            NA  NA      
##  3 iso3c          chr       0    0      215      NA            NA  NA      
##  4 year           int       0    0       63    1960          1991   2.02e 3
##  5 NY.GDP.MKTP.CD dbl    3068   22.7  10473 8824746. 190626974855.  2.55e13
##  6 SP.POP.TOTL    dbl      30    0.2  13467    2646      24760308.  1.42e 9
##  7 region         chr       0    0        7      NA            NA  NA      
##  8 capital        chr       0    0      210      NA            NA  NA      
##  9 longitude      chr       0    0      210      NA            NA  NA      
## 10 latitude       chr       0    0      210      NA            NA  NA      
## 11 income         chr       0    0        5      NA            NA  NA      
## 12 lending        chr       0    0        4      NA            NA  NA

In our dataset, we’ve observed missing data for certain variables such as total population (SP.POP.TOTL) and Gross Domestic Product (NY.GDP.MKTP.CD). It’s essential to handle missing data to conduct comprehensive analyses.

# Group by 'country' and count missing values within each group
missing_data_count <- df %>%
  group_by(country) %>%
  summarize(missing_count = sum(is.na(SP.POP.TOTL)), .groups = 'drop')

# Identify countries with all missing data or more than 20 missing points
countries_with_all_missing <- missing_data_count %>%
  filter(missing_count == 66)  # Change 66 to the total number of observations per country
countries_with_more_than_20_missing <- missing_data_count %>%
  filter(missing_count > 5)  # Change 20 to your desired threshold

# View countries with all missing data or more than 20 missing points
print(countries_with_all_missing)

## # A tibble: 0 × 2
## # ℹ 2 variables: country <chr>, missing_count <int>

print(countries_with_more_than_20_missing)

## # A tibble: 1 × 2
##   country            missing_count
##   <chr>                      <int>
## 1 West Bank and Gaza            30

# Filter out West Bank and Gaza
df <- df %>%
  filter(!(country %in% c("West Bank and Gaza")))

df %>% describe_all()

## # A tibble: 12 × 8
##    variable       type     na na_pct unique      min          mean      max
##    <chr>          <chr> <int>  <dbl>  <int>    <dbl>         <dbl>    <dbl>
##  1 country        chr       0    0      214      NA            NA  NA      
##  2 iso2c          chr       0    0      214      NA            NA  NA      
##  3 iso3c          chr       0    0      214      NA            NA  NA      
##  4 year           int       0    0       63    1960          1991   2.02e 3
##  5 NY.GDP.MKTP.CD dbl    3034   22.5  10444 8824746. 191130628001.  2.55e13
##  6 SP.POP.TOTL    dbl       0    0    13433    2646      24812448.  1.42e 9
##  7 region         chr       0    0        7      NA            NA  NA      
##  8 capital        chr       0    0      210      NA            NA  NA      
##  9 longitude      chr       0    0      210      NA            NA  NA      
## 10 latitude       chr       0    0      210      NA            NA  NA      
## 11 income         chr       0    0        5      NA            NA  NA      
## 12 lending        chr       0    0        4      NA            NA  NA

# Subset the original dataset 'df' to keep year bigger than 2000
dff <- df %>%
  filter(year>=2003)

# Group by 'country' and count missing values within each group
missing_data_count <- dff %>%
  group_by(country) %>%
  summarize(missing_count = sum(is.na(NY.GDP.MKTP.CD)), .groups = 'drop')

dff %>% describe_all()

## # A tibble: 12 × 8
##    variable       type     na na_pct unique       min          mean      max
##    <chr>          <chr> <int>  <dbl>  <int>     <dbl>         <dbl>    <dbl>
##  1 country        chr       0    0      214       NA            NA  NA      
##  2 iso2c          chr       0    0      214       NA            NA  NA      
##  3 iso3c          chr       0    0      214       NA            NA  NA      
##  4 year           int       0    0       20     2003          2012.  2.02e 3
##  5 NY.GDP.MKTP.CD dbl     192    4.5   4089 19456336. 344289278960.  2.55e13
##  6 SP.POP.TOTL    dbl       0    0     4278     9668      32980495.  1.42e 9
##  7 region         chr       0    0        7       NA            NA  NA      
##  8 capital        chr       0    0      210       NA            NA  NA      
##  9 longitude      chr       0    0      210       NA            NA  NA      
## 10 latitude       chr       0    0      210       NA            NA  NA      
## 11 income         chr       0    0        5       NA            NA  NA      
## 12 lending        chr       0    0        4       NA            NA  NA

# Filter out countries no missing values for 'NY.GDP.MKTP.CD'
countries_to_keep <- missing_data_count %>%
  filter(missing_count == 0) %>%
  pull(country)

# Subset the original dataset 'df' to keep only the selected countries
dff <- dff %>%
  filter(country %in% countries_to_keep)

dff %>% describe_all()

## # A tibble: 12 × 8
##    variable       type     na na_pct unique       min          mean      max
##    <chr>          <chr> <int>  <dbl>  <int>     <dbl>         <dbl>    <dbl>
##  1 country        chr       0      0    178       NA            NA  NA      
##  2 iso2c          chr       0      0    178       NA            NA  NA      
##  3 iso3c          chr       0      0    178       NA            NA  NA      
##  4 year           int       0      0     20     2003          2012.  2.02e 3
##  5 NY.GDP.MKTP.CD dbl       0      0   3560 19456336. 392621361626.  2.55e13
##  6 SP.POP.TOTL    dbl       0      0   3559     9668      38612129.  1.42e 9
##  7 region         chr       0      0      7       NA            NA  NA      
##  8 capital        chr       0      0    176       NA            NA  NA      
##  9 longitude      chr       0      0    178       NA            NA  NA      
## 10 latitude       chr       0      0    178       NA            NA  NA      
## 11 income         chr       0      0      4       NA            NA  NA      
## 12 lending        chr       0      0      4       NA            NA  NA

Assessing Global Productivity: Understanding Country Contributions in the World Economy.

In this segment of our course, we will explore how to gauge a country’s productivity in the global landscape. We aim to calculate the world’s real GDP by summing the individual GDPs of all countries, alongside the total world population. This allows us to establish a foundational perspective of the world’s economic output.

To assess each country’s contribution, we’ll determine the ratio between a country’s population and its GDP in comparison to the global population and GDP. Imagine a country that accounts for 1% of the world’s population and generates 1% of the world’s total production. When we combine these percentages, we derive a simple, yet effective metric named verim—our naive productivity measure.

When verim exceeds one, it suggests that a country is proportionally contributing more to the global output than its share of the population. Conversely, a verim below one indicates an area for potential improvement in productivity. This analysis provides a fundamental understanding of how countries perform concerning their population size and economic output within the global context.

# Calculate ratios for each country
df_ratios <- dff %>%
  group_by(year) %>%
  summarise(
    world_gdp = sum(NY.GDP.MKTP.CD),
    world_population = sum(SP.POP.TOTL),
    world_gdp_perp = world_gdp/world_population
  )

# Plotting total production per person after 2000
library(ggplot2)

ggplot(df_ratios, aes(x = year, y = world_gdp_perp)) +
  geom_line() +
  labs(
    title = "Total Production per Person after 2000",
    x = "Year",
    y = "Total Production per Person"
  )

Merge df_ratios with df.

dff <- left_join(dff, df_ratios, by='year')

dff <- dff %>%
  mutate(
    country_ratio = NY.GDP.MKTP.CD / world_gdp * 100,
    population_ratio = SP.POP.TOTL / world_population * 100,
    verim = country_ratio / population_ratio
  )

dff <- dff %>%
  arrange(country, year) %>%
  group_by(country) %>%
  mutate(cumulative_change = (verim / first(verim) - 1) * 100)

df_TR <- dff %>% filter(country=="Turkiye")

ggplot(df_TR, aes(x = year, y = verim)) +
  geom_line() +
  labs(
    title = "verim in Turkiye after 2000",
    x = "Year",
    y = "Verim"
  )

df_LV <- dff %>% filter(country=="Latvia")

ggplot(df_LV, aes(x = year, y = verim)) +
  geom_line() +
  labs(
    title = "verim in Latvia after 2000",
    x = "Year",
    y = "Verim"
  )

df %>% filter(country %in% c("France", "Turkiye")) %>%
ggplot(aes(x = year,
           y = SP.POP.TOTL,
           col = country)) +
  geom_line()

library(tidyverse)

Subregion

library(plotly)

library(readxl)

pwt <- read_excel("C:/Users/hutku/Downloads/pwt1001.xlsx", sheet = "Data")

dffk <- dff %>% left_join(pwt)

Note Source of World map code from Statistics Guides with Dr Paul Christiansen

libraries <- c(
    "tidyverse", "sf", "rnaturalearth",
    "wbstats", "gganimate", "classInt"
)

invisible(lapply(libraries, library, character.only = TRUE))

crs <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"

world_sf <- ne_countries(
    type = "countries", scale = "small"
) %>%
    sf::st_as_sf() %>%
    sf::st_transform(crs)

world_sf_no_antartica <- world_sf %>%
    dplyr::filter(region_un != "Antarctica")

dffk <- dplyr::left_join(
    world_sf_no_antartica, dffk,
    by = c("iso_a2" = "iso2c")
)

countries_Melanesia <- dffk %>%
  filter(year ==2022, subregion == "Melanesia") %>%
  pull(country)

countries_SouthernEurope <- dffk %>%
  filter(year ==2022, subregion == "Southern Europe") %>%
  pull(country)

countries_NorthernAfrica <- dffk %>%
  filter(year ==2022) %>%
  filter(subregion == "Northern Africa" ) %>%
  pull(country)

countries_NA <- dffk %>%
  filter(year ==2022) %>%
  filter(is.na(subregion)) %>%
  pull(country)

countries_MiddleAfrica <- dffk %>%
  filter(year ==2022) %>%
  filter(subregion == "Middle Africa" ) %>%
  pull(country)

countries_SouthAmerica <- dffk %>%
  filter(year ==2022) %>%
  filter(subregion == "South America" ) %>%
  pull(country)

countries_WesternAsia <- dffk %>%
  filter(year ==2022) %>%
  filter(subregion == "Western Asia" ) %>%
  pull(country)

countries_AustraliaandNewZealand <- dffk %>%
  filter(year ==2022) %>%
  filter(subregion == "Australia and New Zealand" ) %>%
  pull(country)

countries_WesternEurope <- dffk %>%
  filter(year ==2022) %>%
  filter(subregion == "Western Europe" ) %>%
  pull(country)

countries_Caribbean <- dffk %>%
  filter(year ==2022) %>%
  filter(subregion == "Caribbean" ) %>%
  pull(country)

countries_SouthernAsia <- dffk %>%
  filter(year ==2022) %>%
  filter(subregion == "Southern Asia" ) %>%
  pull(country)

countries_EasternEurope <- dffk %>%
  filter(year ==2022) %>%
  filter(subregion == "Eastern Europe" ) %>%
  pull(country)

countries_CentralAmerica <- dffk %>%
  filter(year ==2022) %>%
  filter(subregion == "Central America" ) %>%
  pull(country)

countries_WesternAfrica <- dffk %>%
  filter(year ==2022) %>%
  filter(subregion == "Western Africa" ) %>%
  pull(country)

countries_SouthernAfrica <- dffk %>%
  filter(year ==2022) %>%
  filter(subregion == "Southern Africa" ) %>%
  pull(country)

countries_SouthEasternAsia <- dffk %>%
  filter(year ==2022) %>%
  filter(subregion == "South-Eastern Asia" ) %>%
  pull(country)

countries_EasternAfrica <- dffk %>%
  filter(year ==2022) %>%
  filter(subregion == "Eastern Africa" ) %>%
  pull(country)

countries_NorthernAmerica <- dffk %>%
  filter(year ==2022) %>%
  filter(subregion == "Northern America" ) %>%
  pull(country)

countries_EasternAsia <- dffk %>%
  filter(year ==2022) %>%
  filter(subregion == "Eastern Asia" ) %>%
  pull(country)

countries_NorthernEurope <- dffk %>%
  filter(year ==2022) %>%
  filter(subregion == "Northern Europe" ) %>%
  pull(country)

countries_CentralAsia <- dffk %>%
  filter(year ==2022) %>%
  filter(subregion == "Central Asia" ) %>%
  pull(country)

phi <- dffk %>% filter(country %in% countries_WesternAsia) %>%
ggplot(aes(x = year,
           y = verim,
           col = country)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~country, scales = "free") +
  transition_reveal(year)  +
  labs(title = "Year: {frame_along}")

animate(plot = phi,
        nframes = 30)

Western Asia

df_WA <- dff %>% 
  filter(
    country %in% countries_WesternAsia,
    year %in% c(2003, 2022),
  ) %>%
  mutate(year = factor(year)) %>% 
  select(country, year, verim)

df_WA <- as_tibble(df_WA)  %>% 
  arrange(country, year) %>% 
  mutate(
    change_verim = diff(verim), 
    order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
    .by = country
  )  %>% 
  mutate(country = fct_reorder(country, order_dumbbells))

df_WA %>% 
  ggplot(aes(x = verim, y = country)) +
  geom_path(
    aes(color = (change_verim < 0)),
    linewidth = 1,
    arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
  ) +
  geom_vline(xintercept=1, linetype='dotted', col = 'red')+
  annotate("text", x = 1, y = "Armenia", label = "Self-Sufficient Treshold", angle=90) +
  labs(
    title = 'Western Asia',
    x = 'Verim (2003 - 2022)', 
    y = element_blank(),
    fill = 'Year'
  ) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    legend.position = 'none'
  )

Southern Asia

df_SA <- dff %>% 
  filter(
    country %in% countries_SouthernAsia,
    year %in% c(2003, 2022),
  ) %>%
  mutate(year = factor(year)) %>% 
  select(country, year, verim)

df_SA <- as_tibble(df_SA)  %>% 
  arrange(country, year) %>% 
  mutate(
    change_verim = diff(verim), 
    order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
    .by = country
  )  %>% 
  mutate(country = fct_reorder(country, order_dumbbells))

df_SA %>% 
  ggplot(aes(x = verim, y = country)) +
  geom_path(
    aes(color = (change_verim < 0)),
    linewidth = 1,
    arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
  ) +
  geom_vline(xintercept=1, linetype='dotted', col = 'red')+
  annotate("text", x = 1, y = 3, label = "Self-Sufficient Treshold", angle=90) +
  labs(
    title = 'Southern Asia',
    x = 'Verim (2003 - 2022)', 
    y = element_blank(),
    fill = 'Year'
  ) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    legend.position = 'none'
  )

phi <- dffk %>% filter(country %in% countries_SouthernAsia) %>%
ggplot(aes(x = year,
           y = verim,
           col = country)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~country, scales = "free") +
  transition_reveal(year)  +
  labs(title = "Year: {frame_along}")

animate(plot = phi,
        nframes = 30)

South Eastern Asia

df_SEA <- dff %>% 
  filter(
    country %in% countries_SouthEasternAsia,
    year %in% c(2003, 2022),
  ) %>%
  mutate(year = factor(year)) %>% 
  select(country, year, verim)

df_SEA <- as_tibble(df_SEA)  %>% 
  arrange(country, year) %>% 
  mutate(
    change_verim = diff(verim), 
    order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
    .by = country
  )  %>% 
  mutate(country = fct_reorder(country, order_dumbbells))

df_SEA %>% 
  ggplot(aes(x = verim, y = country)) +
  geom_path(
    aes(color = (change_verim < 0)),
    linewidth = 1,
    arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
  ) +
  geom_vline(xintercept=1, linetype='dotted', col = 'red')+
  annotate("text", x = 1, y = 3, label = "Self-Sufficient Treshold", angle=90) +
  labs(
    title = 'South-Eastern Asia',
    x = 'Verim (2003 - 2022)', 
    y = element_blank(),
    fill = 'Year'
  ) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    legend.position = 'none'
  )

phi <- dffk %>% filter(country %in% countries_SouthEasternAsia) %>%
ggplot(aes(x = year,
           y = verim,
           col = country)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~country, scales = "free") +
  transition_reveal(year)  +
  labs(title = "Year: {frame_along}")

animate(plot = phi,
        nframes = 30)

South Central Asia

df_CA <- dff %>% 
  filter(
    country %in% countries_CentralAsia,
    year %in% c(2003, 2022),
  ) %>%
  mutate(year = factor(year)) %>% 
  select(country, year, verim)

df_CA <- as_tibble(df_CA)  %>% 
  arrange(country, year) %>% 
  mutate(
    change_verim = diff(verim), 
    order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
    .by = country
  )  %>% 
  mutate(country = fct_reorder(country, order_dumbbells))

df_CA %>% 
  ggplot(aes(x = verim, y = country)) +
  geom_path(
    aes(color = (change_verim < 0)),
    linewidth = 1,
    arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
  ) +
  geom_vline(xintercept=1, linetype='dotted', col = 'red')+
  annotate("text", x = 1, y = 3, label = "Self-Sufficient Treshold", angle=90) +
  labs(
    title = 'Central Asia',
    x = 'Verim (2003 - 2022)', 
    y = element_blank(),
    fill = 'Year'
  ) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    legend.position = 'none'
  )

phi <- dffk %>% filter(country %in% countries_CentralAsia) %>%
ggplot(aes(x = year,
           y = verim,
           col = country)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~country, scales = "free") +
  transition_reveal(year)  +
  labs(title = "Year: {frame_along}")

animate(plot = phi,
        nframes = 30)

Eastern Asia

df_EA <- dff %>% 
  filter(
    country %in% countries_EasternAsia,
    year %in% c(2003, 2022),
  ) %>%
  mutate(year = factor(year)) %>% 
  select(country, year, verim)

df_EA <- as_tibble(df_EA)  %>% 
  arrange(country, year) %>% 
  mutate(
    change_verim = diff(verim), 
    order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
    .by = country
  )  %>% 
  mutate(country = fct_reorder(country, order_dumbbells))

df_EA %>% 
  ggplot(aes(x = verim, y = country)) +
  geom_path(
    aes(color = (change_verim < 0)),
    linewidth = 1,
    arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
  ) +
  geom_vline(xintercept=1, linetype='dotted', col = 'red')+
  annotate("text", x = 1, y = 2.5, label = "Self-Sufficient Treshold", angle=90) +
  labs(
    title = 'Eastern Asia',
    x = 'Verim (2003 - 2022)', 
    y = element_blank(),
    fill = 'Year'
  ) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    legend.position = 'none'
  )

phi <- dffk %>% filter(country %in% countries_EasternAsia) %>%
ggplot(aes(x = year,
           y = verim,
           col = country)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~country, scales = "free") +
  transition_reveal(year)  +
  labs(title = "Year: {frame_along}")

animate(plot = phi,
        nframes = 30)

Melanesia

df_MAL <- dff %>% 
  filter(
    country %in% countries_Melanesia,
    year %in% c(2003, 2022),
  ) %>%
  mutate(year = factor(year)) %>% 
  select(country, year, verim)

df_MAL <- as_tibble(df_MAL)  %>% 
  arrange(country, year) %>% 
  mutate(
    change_verim = diff(verim), 
    order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
    .by = country
  )  %>% 
  mutate(country = fct_reorder(country, order_dumbbells))

df_MAL %>% 
  ggplot(aes(x = verim, y = country)) +
  geom_path(
    aes(color = (change_verim < 0)),
    linewidth = 1,
    arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
  ) +
  geom_vline(xintercept=1, linetype='dotted', col = 'red')+
  annotate("text", x = 1, y = 3, label = "Self-Sufficient Treshold", angle=90) +
  labs(
    title = 'Melanesia',
    x = 'Verim (2003 - 2022)', 
    y = element_blank(),
    fill = 'Year'
  ) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    legend.position = 'none'
  )

phi <- dffk %>% filter(country %in% countries_Melanesia) %>%
ggplot(aes(x = year,
           y = verim,
           col = country)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~country, scales = "free") +
  transition_reveal(year)  +
  labs(title = "Year: {frame_along}")

animate(plot = phi,
        nframes = 30)

Southern Europe

df_SE <- dff %>% 
  filter(
    country %in% countries_SouthernEurope,
    year %in% c(2003, 2022),
  ) %>%
  mutate(year = factor(year)) %>% 
  select(country, year, verim)

df_SE <- as_tibble(df_SE)  %>% 
  arrange(country, year) %>% 
  mutate(
    change_verim = diff(verim), 
    order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
    .by = country
  )  %>% 
  mutate(country = fct_reorder(country, order_dumbbells))

df_SE %>% 
  ggplot(aes(x = verim, y = country)) +
  geom_path(
    aes(color = (change_verim < 0)),
    linewidth = 1,
    arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
  ) +
  geom_vline(xintercept=1, linetype='dotted', col = 'red')+
  annotate("text", x = 1, y = 6, label = "Self-Sufficient Treshold", angle=90) +
  labs(
    title = 'Southern Europe',
    x = 'Verim (2003 - 2022)', 
    y = element_blank(),
    fill = 'Year'
  ) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    legend.position = 'none'
  )

phi <- dffk %>% filter(country %in% countries_SouthernEurope) %>%
ggplot(aes(x = year,
           y = verim,
           col = country)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~country, scales = "free") +
  transition_reveal(year)  +
  labs(title = "Year: {frame_along}")

animate(plot = phi,
        nframes = 30)

Western Europe

df_WE <- dff %>% 
  filter(
    country %in% countries_WesternEurope,
    year %in% c(2003, 2022),
  ) %>%
  mutate(year = factor(year)) %>% 
  select(country, year, verim)

df_WE <- as_tibble(df_WE)  %>% 
  arrange(country, year) %>% 
  mutate(
    change_verim = diff(verim), 
    order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
    .by = country
  )  %>% 
  mutate(country = fct_reorder(country, order_dumbbells))

df_WE %>% 
  ggplot(aes(x = verim, y = country)) +
  geom_path(
    aes(color = (change_verim < 0)),
    linewidth = 1,
    arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
  ) +
  geom_vline(xintercept=1, linetype='dotted', col = 'red')+
  annotate("text", x = 1, y = 4, label = "Self-Sufficient Treshold", angle=90) +
  labs(
    title = 'Western Europe',
    x = 'Verim (2003 - 2022)', 
    y = element_blank(),
    fill = 'Year'
  ) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    legend.position = 'none'
  )

phi <- dffk %>% filter(country %in% countries_WesternEurope) %>%
ggplot(aes(x = year,
           y = verim,
           col = country)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~country, scales = "free") +
  transition_reveal(year)  +
  labs(title = "Year: {frame_along}")

animate(plot = phi,
        nframes = 30)

Eastern Europe

df_EE <- dff %>% 
  filter(
    country %in% countries_EasternEurope,
    year %in% c(2003, 2022),
  ) %>%
  mutate(year = factor(year)) %>% 
  select(country, year, verim)

df_EE <- as_tibble(df_EE)  %>% 
  arrange(country, year) %>% 
  mutate(
    change_verim = diff(verim), 
    order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
    .by = country
  )  %>% 
  mutate(country = fct_reorder(country, order_dumbbells))

df_EE %>% 
  ggplot(aes(x = verim, y = country)) +
  geom_path(
    aes(color = (change_verim < 0)),
    linewidth = 1,
    arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
  ) +
  geom_vline(xintercept=1, linetype='dotted', col = 'red')+
  annotate("text", x = 1, y = 5, label = "Self-Sufficient Treshold", angle=90) +
  labs(
    title = 'Eastern Europe',
    x = 'Verim (2003 - 2022)', 
    y = element_blank(),
    fill = 'Year'
  ) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    legend.position = 'none'
  )

phi <- dffk %>% filter(country %in% countries_EasternEurope) %>%
ggplot(aes(x = year,
           y = verim,
           col = country)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~country, scales = "free") +
  transition_reveal(year)  +
  labs(title = "Year: {frame_along}")

animate(plot = phi,
        nframes = 30)

Northern Europe

df_NE <- dff %>% 
  filter(
    country %in% countries_NorthernEurope,
    year %in% c(2003, 2022),
  ) %>%
  mutate(year = factor(year)) %>% 
  select(country, year, verim)

df_NE <- as_tibble(df_NE)  %>% 
  arrange(country, year) %>% 
  mutate(
    change_verim = diff(verim), 
    order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
    .by = country
  )  %>% 
  mutate(country = fct_reorder(country, order_dumbbells))

df_NE %>% 
  ggplot(aes(x = verim, y = country)) +
  geom_path(
    aes(color = (change_verim < 0)),
    linewidth = 1,
    arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
  ) +
  geom_vline(xintercept=1, linetype='dotted', col = 'red')+
  annotate("text", x = 1, y = 5, label = "Self-Sufficient Treshold", angle=90) +
  labs(
    title = 'Northern Europe',
    x = 'Verim (2003 - 2022)', 
    y = element_blank(),
    fill = 'Year'
  ) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    legend.position = 'none'
  )

phi <- dffk %>% filter(country %in% countries_NorthernEurope) %>%
ggplot(aes(x = year,
           y = verim,
           col = country)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~country, scales = "free") +
  transition_reveal(year)  +
  labs(title = "Year: {frame_along}")

animate(plot = phi,
        nframes = 30)

Northern Africa

df_NA <- dff %>% 
  filter(
    country %in% countries_NorthernAfrica,
    year %in% c(2003, 2022),
  ) %>%
  mutate(year = factor(year)) %>% 
  select(country, year, verim)

df_NA <- as_tibble(df_NA)  %>% 
  arrange(country, year) %>% 
  mutate(
    change_verim = diff(verim), 
    order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
    .by = country
  )  %>% 
  mutate(country = fct_reorder(country, order_dumbbells))

df_NA %>% 
  ggplot(aes(x = verim, y = country)) +
  geom_path(
    aes(color = (change_verim < 0)),
    linewidth = 1,
    arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
  ) +
  geom_vline(xintercept=1, linetype='dotted', col = 'red')+
  annotate("text", x = 1, y = 3, label = "Self-Sufficient Treshold", angle=90) +
  labs(
    title = 'Northern Africa',
    x = 'Verim (2003 - 2022)', 
    y = element_blank(),
    fill = 'Year'
  ) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    legend.position = 'none'
  )

phi <- dffk %>% filter(country %in% countries_NorthernAfrica) %>%
ggplot(aes(x = year,
           y = verim,
           col = country)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~country, scales = "free") +
  transition_reveal(year)  +
  labs(title = "Year: {frame_along}")

animate(plot = phi,
        nframes = 30)

Middle Africa

df_MA <- dff %>% 
  filter(
    country %in% countries_MiddleAfrica,
    year %in% c(2003, 2022),
  ) %>%
  mutate(year = factor(year)) %>% 
  select(country, year, verim)

df_MA <- as_tibble(df_MA)  %>% 
  arrange(country, year) %>% 
  mutate(
    change_verim = diff(verim), 
    order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
    .by = country
  )  %>% 
  mutate(country = fct_reorder(country, order_dumbbells))

df_MA %>% 
  ggplot(aes(x = verim, y = country)) +
  geom_path(
    aes(color = (change_verim < 0)),
    linewidth = 1,
    arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
  ) +
  geom_vline(xintercept=1, linetype='dotted', col = 'red')+
  annotate("text", x = 1, y = 5, label = "Self-Sufficient Treshold", angle=90) +
  labs(
    title = 'Middle Africa',
    x = 'Verim (2003 - 2022)', 
    y = element_blank(),
    fill = 'Year'
  ) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    legend.position = 'none'
  )

phi <- dffk %>% filter(country %in% countries_MiddleAfrica) %>%
ggplot(aes(x = year,
           y = verim,
           col = country)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~country, scales = "free") +
  transition_reveal(year)  +
  labs(title = "Year: {frame_along}")

animate(plot = phi,
        nframes = 30)

Western Africa

df_WA <- dff %>% 
  filter(
    country %in% countries_WesternAfrica,
    year %in% c(2003, 2022),
  ) %>%
  mutate(year = factor(year)) %>% 
  select(country, year, verim)

df_WA <- as_tibble(df_WA)  %>% 
  arrange(country, year) %>% 
  mutate(
    change_verim = diff(verim), 
    order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
    .by = country
  )  %>% 
  mutate(country = fct_reorder(country, order_dumbbells))

df_WA %>% 
  ggplot(aes(x = verim, y = country)) +
  geom_path(
    aes(color = (change_verim < 0)),
    linewidth = 1,
    arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
  ) +
  geom_vline(xintercept=1, linetype='dotted', col = 'red')+
  annotate("text", x = 1, y = 5, label = "Self-Sufficient Treshold", angle=90) +
  labs(
    title = 'Western Africa',
    x = 'Verim (2003 - 2022)', 
    y = element_blank(),
    fill = 'Year'
  ) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    legend.position = 'none'
  )

phi <- dffk %>% filter(country %in% countries_WesternAfrica) %>%
ggplot(aes(x = year,
           y = verim,
           col = country)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~country, scales = "free") +
  transition_reveal(year)  +
  labs(title = "Year: {frame_along}")

animate(plot = phi,
        nframes = 30)

Southern Africa

df_SA <- dff %>% 
  filter(
    country %in% countries_SouthernAfrica,
    year %in% c(2003, 2022),
  ) %>%
  mutate(year = factor(year)) %>% 
  select(country, year, verim)

df_SA <- as_tibble(df_SA)  %>% 
  arrange(country, year) %>% 
  mutate(
    change_verim = diff(verim), 
    order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
    .by = country
  )  %>% 
  mutate(country = fct_reorder(country, order_dumbbells))

df_SA %>% 
  ggplot(aes(x = verim, y = country)) +
  geom_path(
    aes(color = (change_verim < 0)),
    linewidth = 1,
    arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
  ) +
  geom_vline(xintercept=1, linetype='dotted', col = 'red')+
  annotate("text", x = 1, y = 3, label = "Self-Sufficient Treshold", angle=90) +
  labs(
    title = 'Southern Africa',
    x = 'Verim (2003 - 2022)', 
    y = element_blank(),
    fill = 'Year'
  ) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    legend.position = 'none'
  )

phi <- dffk %>% filter(country %in% countries_SouthernAfrica) %>%
ggplot(aes(x = year,
           y = verim,
           col = country)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~country, scales = "free") +
  transition_reveal(year)  +
  labs(title = "Year: {frame_along}")

animate(plot = phi,
        nframes = 30)

Eastern Africa

df_EA <- dff %>% 
  filter(
    country %in% countries_EasternAfrica,
    year %in% c(2003, 2022),
  ) %>%
  mutate(year = factor(year)) %>% 
  select(country, year, verim)

df_EA <- as_tibble(df_EA)  %>% 
  arrange(country, year) %>% 
  mutate(
    change_verim = diff(verim), 
    order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
    .by = country
  )  %>% 
  mutate(country = fct_reorder(country, order_dumbbells))

df_EA %>% 
  ggplot(aes(x = verim, y = country)) +
  geom_path(
    aes(color = (change_verim < 0)),
    linewidth = 1,
    arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
  ) +
  geom_vline(xintercept=1, linetype='dotted', col = 'red')+
  annotate("text", x = 1, y = 5, label = "Self-Sufficient Treshold", angle=90) +
  labs(
    title = 'Eastern Africa',
    x = 'Verim (2003 - 2022)', 
    y = element_blank(),
    fill = 'Year'
  ) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    legend.position = 'none'
  )

phi <- dffk %>% filter(country %in% countries_EasternAfrica) %>%
ggplot(aes(x = year,
           y = verim,
           col = country)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~country, scales = "free") +
  transition_reveal(year)  +
  labs(title = "Year: {frame_along}")

animate(plot = phi,
        nframes = 30)

South America

df_SA <- dff %>% 
  filter(
    country %in% countries_SouthAmerica,
    year %in% c(2003, 2022),
  ) %>%
  mutate(year = factor(year)) %>% 
  select(country, year, verim)

df_SA <- as_tibble(df_SA)  %>% 
  arrange(country, year) %>% 
  mutate(
    change_verim = diff(verim), 
    order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
    .by = country
  )  %>% 
  mutate(country = fct_reorder(country, order_dumbbells))

df_SA %>% 
  ggplot(aes(x = verim, y = country)) +
  geom_path(
    aes(color = (change_verim < 0)),
    linewidth = 1,
    arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
  ) +
  geom_vline(xintercept=1, linetype='dotted', col = 'red')+
  annotate("text", x = 1, y = 5, label = "Self-Sufficient Treshold", angle=90) +
  labs(
    title = 'South America',
    x = 'Verim (2003 - 2022)', 
    y = element_blank(),
    fill = 'Year'
  ) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    legend.position = 'none'
  )

phi <- dffk %>% filter(country %in% countries_SouthAmerica) %>%
ggplot(aes(x = year,
           y = verim,
           col = country)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~country, scales = "free") +
  transition_reveal(year)  +
  labs(title = "Year: {frame_along}")
count

## function (x, ..., wt = NULL, sort = FALSE, name = NULL) 
## {
##     UseMethod("count")
## }
## <bytecode: 0x0000018357e7b8c0>
## <environment: namespace:dplyr>

animate(plot = phi,
        nframes = 30)

Central America

df_CA <- dff %>% 
  filter(
    country %in% countries_CentralAmerica,
    year %in% c(2003, 2022),
  ) %>%
  mutate(year = factor(year)) %>% 
  select(country, year, verim)

df_CA <- as_tibble(df_CA)  %>% 
  arrange(country, year) %>% 
  mutate(
    change_verim = diff(verim), 
    order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
    .by = country
  )  %>% 
  mutate(country = fct_reorder(country, order_dumbbells))

df_CA %>% 
  ggplot(aes(x = verim, y = country)) +
  geom_path(
    aes(color = (change_verim < 0)),
    linewidth = 1,
    arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
  ) +
  geom_vline(xintercept=1, linetype='dotted', col = 'red')+
  annotate("text", x = 1, y = 5, label = "Self-Sufficient Treshold", angle=90) +
  labs(
    title = 'Central America',
    x = 'Verim (2003 - 2022)', 
    y = element_blank(),
    fill = 'Year'
  ) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    legend.position = 'none'
  )

phi <- dffk %>% filter(country %in% countries_CentralAmerica) %>%
ggplot(aes(x = year,
           y = verim,
           col = country)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~country, scales = "free") +
  transition_reveal(year)  +
  labs(title = "Year: {frame_along}")

animate(plot = phi,
        nframes = 30)

Northern America

df_NA <- dff %>% 
  filter(
    country %in% c(countries_NorthernAmerica,countries_AustraliaandNewZealand),
    year %in% c(2003, 2022),
  ) %>%
  mutate(year = factor(year)) %>% 
  select(country, year, verim)

df_NA <- as_tibble(df_NA)  %>% 
  arrange(country, year) %>% 
  mutate(
    change_verim = diff(verim), 
    order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
    .by = country
  )  %>% 
  mutate(country = fct_reorder(country, order_dumbbells))

df_NA %>% 
  ggplot(aes(x = verim, y = country)) +
  geom_path(
    aes(color = (change_verim < 0)),
    linewidth = 1,
    arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
  ) +
  geom_vline(xintercept=1, linetype='dotted', col = 'red')+
  annotate("text", x = 1, y = 3, label = "Self-Sufficient Treshold", angle=90) +
  labs(
    title = 'United States & New Zealand & Australia & New Zealand',
    x = 'Verim (2003 - 2022)', 
    y = element_blank(),
    fill = 'Year'
  ) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    legend.position = 'none'
  )

phi <- dffk %>% filter(country %in% c(countries_NorthernAmerica,countries_AustraliaandNewZealand)) %>%
ggplot(aes(x = year,
           y = verim,
           col = country)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~country, scales = "free") +
  transition_reveal(year)  +
  labs(title = "Year: {frame_along}")

animate(plot = phi,
        nframes = 30)

Caribbean

df_CAR <- dff %>% 
  filter(
    country %in% countries_Caribbean,
    year %in% c(2003, 2022),
  ) %>%
  mutate(year = factor(year)) %>% 
  select(country, year, verim)

df_CAR <- as_tibble(df_CAR)  %>% 
  arrange(country, year) %>% 
  mutate(
    change_verim = diff(verim), 
    order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
    .by = country
  )  %>% 
  mutate(country = fct_reorder(country, order_dumbbells))

df_CAR %>% 
  ggplot(aes(x = verim, y = country)) +
  geom_path(
    aes(color = (change_verim < 0)),
    linewidth = 1,
    arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
  ) +
  geom_vline(xintercept=1, linetype='dotted', col = 'red')+
  annotate("text", x = 1, y = 3, label = "Self-Sufficient Treshold", angle=90) +
  labs(
    title = 'Caribbean',
    x = 'Verim (2003 - 2022)', 
    y = element_blank(),
    fill = 'Year'
  ) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    legend.position = 'none'
  )

phi <- dffk %>% filter(country %in% countries_Caribbean) %>%
ggplot(aes(x = year,
           y = verim,
           col = country)) +
  geom_line(show.legend = FALSE) +
  facet_wrap(~country, scales = "free") +
  transition_reveal(year)  +
  labs(title = "Year: {frame_along}")

animate(plot = phi,
        nframes = 30)

Some Other Countries

df_OC <- dff %>% 
  filter(
    country %in% countries_NA,
    year %in% c(2003, 2022),
  ) %>%
  mutate(year = factor(year)) %>% 
  select(country, year, verim)

df_OC <- as_tibble(df_OC)  %>% 
  arrange(country, year) %>% 
  mutate(
    change_verim = diff(verim), 
    order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
    .by = country
  )  %>% 
  mutate(country = fct_reorder(country, order_dumbbells))

df_OC %>% 
  ggplot(aes(x = verim, y = country)) +
  geom_path(
    aes(color = (change_verim < 0)),
    linewidth = 1,
    arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
  ) +
  geom_vline(xintercept=1, linetype='dotted', col = 'red')+
  annotate("text", x = 1, y = 12, label = "Self-Sufficient Treshold", angle=90) +
  labs(
    title = 'Other Countries',
    x = 'Verim (2003 - 2022)', 
    y = element_blank(),
    fill = 'Year'
  ) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor.x = element_blank(),
    legend.position = 'none'
  )

phi <- dffk %>% filter(country %in% countr) %>% ggplot(aes(x = year, y = verim, col = country)) + geom_line(show.legend = FALSE) + facet_wrap(~country, scales = “free”) + transition_reveal(year) + labs(title = “Year: {frame_along}”)

animate(plot = phi, nframes = 30)

Note Source of World map code from Statistics Guides with Dr Paul Christiansen

# Robinson
robinson_crs <- "+proj=robin +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs"
dffk_robinson <- dffk %>%
    sf::st_transform(robinson_crs)
vmin <- min(dffk$verim, na.rm = T)
vmax <- max(dffk$verim, na.rm = T)

brk <- round(classIntervals(
    dffk$verim,
    n = 77,
    style = "fisher"
)
$brks, 1) %>%
    head(-1) %>%
    tail(-1) %>%
    append(vmax)
breaks <- c(vmin, brk)

# Number of colors needed
num_colors <- 78

# Generate a color palette
new_cols <- rainbow(num_colors)

# Reverse the order if needed
new_cols <- rev(new_cols)

# Print or use the new color palette
print(new_cols)

##  [1] "#FF0014" "#FF0027" "#FF003B" "#FF004E" "#FF0062" "#FF0076" "#FF0089"
##  [8] "#FF009D" "#FF00B1" "#FF00C4" "#FF00D8" "#FF00EB" "#FF00FF" "#EB00FF"
## [15] "#D800FF" "#C400FF" "#B100FF" "#9D00FF" "#8900FF" "#7600FF" "#6200FF"
## [22] "#4E00FF" "#3B00FF" "#2700FF" "#1400FF" "#0000FF" "#0014FF" "#0027FF"
## [29] "#003BFF" "#004EFF" "#0062FF" "#0076FF" "#0089FF" "#009DFF" "#00B1FF"
## [36] "#00C4FF" "#00D8FF" "#00EBFF" "#00FFFF" "#00FFEB" "#00FFD8" "#00FFC4"
## [43] "#00FFB1" "#00FF9D" "#00FF89" "#00FF76" "#00FF62" "#00FF4E" "#00FF3B"
## [50] "#00FF27" "#00FF14" "#00FF00" "#14FF00" "#27FF00" "#3BFF00" "#4EFF00"
## [57] "#62FF00" "#76FF00" "#89FF00" "#9DFF00" "#B1FF00" "#C4FF00" "#D8FF00"
## [64] "#EBFF00" "#FFFF00" "#FFEB00" "#FFD800" "#FFC400" "#FFB100" "#FF9D00"
## [71] "#FF8900" "#FF7600" "#FF6200" "#FF4E00" "#FF3B00" "#FF2700" "#FF1400"
## [78] "#FF0000"

cols <- rev(new_cols)

animated_map <- function() {
    world_map <- ggplot(
        data = dffk,
        aes(fill = verim)
    ) +
        geom_sf(color = "white", size = 0.05) +
        scale_fill_gradientn(
            name = "",
            colours = cols,
            breaks = breaks,
            labels = round(breaks, 1),
            limits = c(vmin, vmax),
            na.value = "grey70"
        ) +
        coord_sf(crs = robinson_crs) +
        guides(fill = guide_legend(
            direction = "vertical",
            keyheight = unit(1, units = "mm"),
            keywidth = unit(1, units = "mm"),
            title.position = "top",
            title.hjust = .5,
            label.hjust = .5,
            nrow = 6,
            byrow = T,
            reverse = F,
            label.position = "right"
        )) +
        theme_minimal() +
        theme(
            axis.line = element_blank(),
            axis.text.x = element_blank(),
            axis.text.y = element_blank(),
            axis.ticks = element_blank(),
            axis.title.x = element_blank(),
            axis.title.y = element_blank(),
            legend.position = c(.5, -.015),
            legend.text = element_text(size = 5, color = "grey10"),
            panel.grid.major = element_line(color = "white", size = .2),
            panel.grid.minor = element_blank(),
            plot.title = element_text(
                face = "bold", size = 20,
                color = "grey10", hjust = .5, vjust = -3
            ),
            plot.subtitle = element_text(
                size = 40, color = "#c43c4e",
                hjust = .5, vjust = -1
            ),
            plot.caption = element_text(
                size = 8, color = "grey10",
                hjust = .5, vjust = -10
            ),
            plot.margin = unit(c(t = -4, r = -4, b = -4, l = -4), "lines"),
            plot.background = element_rect(fill = "white", color = NA),
            panel.background = element_rect(fill = "white", color = NA),
            legend.background = element_rect(fill = "white", color = NA),
            panel.border = element_blank()
        ) +
        labs(
            x = "",
            y = "",
            title = "Verim",
            subtitle = "Year: {as.integer(closest_state)}",
            caption = ""
        )

    return(world_map)
}

“Year: {as.integer(closest_state)}”

world_map <- animated_map()
print(world_map)

timelapse_world_map <- world_map +
    transition_states(year) +
    enter_fade() +
    exit_fade() +
    ease_aes("quadratic-in-out", interval = .2)

animated_world <- gganimate::animate(
    timelapse_world_map,
    nframes = 120,
    duration = 22,
    start_pause = 3,
    end_pause = 30,
    height = 6,
    width = 7.15,
    res = 300,
    units = "in",
    fps = 15,
    renderer = gifski_renderer(loop = T)
)

animated_world

library(showtext)
library(ggtext)
library(ggrepel)

data <- dffk %>% filter(year==2019 & !is.na(hc) )

hcverim <- data %>%
  ggplot(aes(x= hc, y=verim)) + 
  geom_point() +
  geom_text(data= data, aes(y = verim + .25, label=iso_a2, colour = region),
            size = 2) +
  geom_hline(yintercept=1, linetype='dotted', col = 'black') +
  geom_vline(xintercept=2.5, linetype='dotted', col = 'black') +
  geom_smooth(data=subset(data,verim>1 & hc>2.5),
               method=lm,se=FALSE) +
  labs(title = "Verim vs Human Capital (2022)",
       subtitle = NULL,
       tag = NULL, 
       x = "Human Capital",
       y= "Verim",
       color = NULL) +
  theme(
    axis.title.x = element_markdown(),
    axis.title.y = element_markdown(),
    axis.ticks = element_blank(),
    axis.line = element_line(),
    panel.background = element_rect(fill="#FFFFFF")
  )

source https://groups.google.com/forum/#!topic/ggplot2/1TgH-kG5XMA

lm_eqn <- function(df){
    m <- lm(verim ~ hc, df);
    eq <- substitute(verim == a + b %.% hc*","~r^2~"="~r2, 
         list(a = format(unname(coef(m)[1]), digits = 2),
              b = format(unname(coef(m)[2]), digits = 2),
             r2 = format(summary(m)$r.squared, digits = 3)))
    as.character(as.expression(eq));
}

hcverim + geom_text(x = 4, y = 3, label = lm_eqn(data %>% filter(hc>2.5, verim>1)), size = 4, parse = TRUE) + 
  geom_text(x = 1.2, y = 8, label = "A", size = 6, parse = TRUE) + 
  geom_text(x = 1.2, y = 0, label = "B", size = 6, parse = TRUE) + 
  geom_text(x = 4.2, y = 8, label = "C", size = 6, parse = TRUE) + 
  geom_text(x = 4.2, y = 0, label = "D", size = 6, parse = TRUE)

govverim <- data %>%
  ggplot(aes(x= csh_g, y=verim)) + 
  geom_point() +
  geom_text(data= data, aes(y = verim + .25, label=iso_a2, colour = region),
            size = 2) +
  geom_hline(yintercept=1, linetype='dotted', col = 'black')  +
  geom_smooth(data=subset(data,verim>1),method=lm,se=FALSE) +
  labs(title = "Verim vs Government spending share",
       subtitle = NULL,
       tag = NULL, 
       x = "Government spending share",
       y= "Verim",
       color = NULL) +
  theme(
    axis.title.x = element_markdown(),
    axis.title.y = element_markdown(),
    axis.ticks = element_blank(),
    axis.line = element_line(),
    panel.background = element_rect(fill="#FFFFFF")
  )

govverim

lm_eqn2 <- function(df){
    m <- lm(verim ~ csh_g, df);
    eq <- substitute(verim == a + b %.% csh_g*","~r^2~"="~r2, 
         list(a = format(unname(coef(m)[1]), digits = 2),
              b = format(unname(coef(m)[2]), digits = 2),
             r2 = format(summary(m)$r.squared, digits = 3)))
    as.character(as.expression(eq));
}

govverim + geom_text(x = 0.09, y = 2.5, label = lm_eqn2(data %>% filter(verim>1)), size = 3, parse = TRUE)