Erasmus Teaching Mobility
Introduction to the Course: “Statistical Research Methods and Data Exploration”
Dear Students,
Welcome to the world of statistical research methods and data exploration! In this course, we embark on a journey through the realms of statistical analysis, drawing inspiration from the intricacies of the modern data landscape and its profound impact on global trends and economic insights.
Our guiding light for this academic odyssey is the lens of data, insightfully showcased in the article titled “What is happening to productivity in the World? Productivity Gini” This article serves as a beacon, illuminating the significance of productivity trends and their implications across nations.
As the author of that article and your guide throughout this course, I aim to take you on a compelling exploration through the multifaceted world of statistics and data analysis. We’ll delve into diverse datasets and scenarios, unraveling the mysteries concealed within the numbers, and uncovering the untold stories that data can narrate.
Our expedition commences with an emphasis on the methods behind statistical research. We will dissect the core concepts of statistical analysis, honing our skills in exploring datasets, interpreting distributions, formulating hypotheses, and using specialized criteria for analysis. These fundamental tools will be our compass, guiding us through the terrain of data interpretation and understanding.
Our itinerary doesn’t halt solely at statistical theory. Rather, it extends its boundaries to real-world applications and hands-on experiences. We’ll unravel the predictive powers of data and statistical models, exploring scenarios such as the Spaceship Titanic collision—a Kaggle competition setting—where predictive modeling and research planning take center stage.
Throughout our journey, we aim to equip you with a versatile set of tools—skills in data acquisition, exploration, hypothesis testing, predictive modeling, and research planning. These skills are not only vital in the world of academia but are also indispensable in the practical realms of industry, governance, and global economics.
So, fasten your seatbelts, prepare your analytical compasses, and get ready to navigate the enthralling landscapes of statistical research and data exploration. This course is a platform for us to delve into the captivating world of numbers, patterns, and stories that data conceals.
Together, let us embark on this voyage of discovery and enlightenment, where statistical research methods and data exploration lead us to a deeper understanding of our ever-evolving world.
Welcome aboard!
Reach the Data Available
Exploring World Development Indicators: A Gateway to Productivity Insights
In our quest to decipher the enigmatic landscape of productivity trends across the globe, our first port of call is an exploration into the realm of World Development Indicators (WDI). These indicators, encapsulated within the World Bank’s vast repository, are our stepping stones to understanding the economic pulse of nations and their productivity dynamics.
In this segment, we embark on an immersive expedition through the WDI—a gateway to a trove of economic data, metrics, and developmental insights. Our primary objective is to empower you with the fundamental skills needed to access, dissect, and glean valuable insights from this rich tapestry of information.
To initiate this journey, we will engage in a simple yet profound demonstration, showcasing how to access and navigate the WDI database using accessible tools, primarily the WDI package in R. Through hands-on exploration, we will highlight the seamless process of fetching pertinent economic indicators, such as GDP, population metrics, and other crucial developmental measures encapsulated within the WDI.
As we dive into the WDI dataset, we aim to illustrate the steps involved in extracting, summarizing, and visualizing key indicators. This immersive experience will equip you with the rudimentary skills to harness the potential of publicly available datasets, particularly the WDI, to unravel trends, patterns, and insights regarding global productivity.
This session acts as your compass, guiding you through the initial steps in your journey toward deciphering the complex tapestry of economic data. By the end of this segment, you will emerge with the prowess to navigate the WDI terrain, laying the groundwork for deeper explorations into the productivity narratives prevalent across the globe.
Join us on this inaugural step as we set sail to explore the WDI, laying the foundation for a deeper understanding of the global economic landscape and its productivity dynamics.
Understanding Packages in R: Your Tools for Data Exploration
In the realm of R, packages act as your trusty tools and resources, offering a multitude of functions and capabilities that augment the core functionalities of the programming language. They are instrumental in extending R’s abilities to encompass diverse data manipulation, visualization, and statistical analysis.
What Are R Packages?
R packages are collections of functions, datasets, and other supplementary materials bundled together to serve a specific purpose. Each package serves as a specialized toolkit, tailored to fulfill distinct analytical or computational needs. These packages are the gears in the machinery of R, enabling you to perform various tasks efficiently and effectively.
Why Packages Matter?
As you venture into the realm of statistical research and data exploration, understanding and utilizing packages become integral. They equip you with an arsenal of functions, enabling you to perform tasks ranging from data acquisition and cleaning to statistical analysis and visualization. Using packages saves time and effort by providing pre-written code for complex operations.
How to Access and Use Packages?
In R, accessing and using packages is a straightforward process. The key steps involve installation, loading, and utilization. The process begins by installing a package, which is a one-time operation. Once installed, the package needs to be loaded into your R session to access its functions and datasets. After loading, you can use the functions provided by the package to perform various tasks, adding powerful tools to your analytical toolkit.
Introduction to Package Usage
Throughout our course, we will introduce and utilize specific packages, such as the WDI package for World Bank data retrieval and analysis. We will guide you through the steps of installing, loading, and employing these packages, ensuring you are equipped to access and harness the functionalities they offer.
In your journey of statistical exploration, packages will be your trusted companions, enriching your experience and broadening the horizons of your data analysis endeavors.
Quick Overview of the WDI Package in R
The WDI package in R is a powerful tool designed to access the World Bank’s extensive World Development Indicators (WDI) repository. It serves as a specialized toolkit, simplifying the retrieval of key economic and social data.
Key Features:
Enables retrieval of diverse indicators such as GDP, population metrics, and education statistics. Accesses time-series data, allowing trend analysis over time.
Provides an intuitive interface to navigate and retrieve specific indicators for countries or regions.
Importance:
- Essential for understanding global economic trends and productivity insights.
Simplifies complex data retrieval, enabling analysis of critical
economic indicators across nations. Throughout our course, we’ll explore
the WDI
package, guiding you through its installation,
usage, and the extraction of essential indicators. This tool will be
pivotal in our journey towards understanding productivity dynamics in
various countries.
Quick Overview of the dplyr Package in R
The dplyr package in R is a fundamental toolkit designed for efficient and intuitive data manipulation and transformation. It stands as a versatile set of functions essential for working with data frames, enabling streamlined operations for data analysis and cleaning.
Key Features:
Provides a collection of functions for common data manipulation tasks: filtering, selecting, arranging, summarizing, and mutating data.
Enhances data frame operations, allowing for seamless data manipulation using a consistent and easy-to-understand syntax.
Offers a cohesive set of verbs designed for intuitive and efficient data transformation and analysis. Importance:
Simplifies complex data manipulations, making tasks such as subsetting, summarizing, and arranging data frames more straightforward and coherent.
Crucial for data cleaning, transformation, and preparation, enabling efficient and effective data analysis workflows.
Throughout our course, we’ll delve into the capabilities of the
dplyr
package, showcasing its functions and guiding you
through its usage. This package will serve as a cornerstone in your
journey toward mastering essential data manipulation skills and
fostering efficient data analysis practices.
Downloading and Using Packages in R
To download and install a package (e.g., “dplyr”), use
install.packages()
function. For example,
install.packages("dplyr")
. The
install.packages()
function is used to download and install
a package onto your system. This operation is required only once for
each package you wish to use.
To use the installed package’s functions, load it into your current R
session using library()
function. For example,
library(dplyr)
. Loading a package via
library()
makes its functions and capabilities available
for use in your current R session. Each time you start a new R session,
you need to reload the packages you plan to use.
To access the package’s functions and capabilities in R, you’ll use library() each time you initiate an R session. This loads the installed package into the current session for immediate use. Throughout our course, we’ll emphasize the importance of installing necessary packages for R and loading them using library(), enabling you to access the full suite of tools and functions for your data analysis and statistical exploration.
Exploring Productivity Trends: Analytical Journey Using R
In our quest to understand global productivity dynamics, we embark on a journey steeped in data analysis and statistical exploration. Our foundational steps are guided by an article titled What is happening to productivity in the World? Productivity Gini. This article serves as a launchpad to delve into productivity trends across nations, utilizing data from the World Bank’s repository.
Downloading Relevant World Bank Data
- Visit the World Bank Data Catalog:
Go to the World Bank Data Catalog https://data.worldbank.org/ or World Bank Data page https://databank.worldbank.org/home.
- Search or Explore Indicators:
Use the search bar or explore the indicators section to find the data of interest. For instance, you might search for indicators like GDP, population, inflation, etc.
- Select Indicators:
Click on the specific indicator of interest. This will lead to a page where the data and its details are provided.
- Note the Indicator Code:
The page for each indicator usually provides a code that represents it. For example, “NY.GDP.MKTP.CD” represents GDP.
- Understand Indicator Descriptions:
You should read the indicator descriptions to ensure they are using the correct variables for their analysis.
Leveraging the WDI package in R, we access critical World Development Indicators, gathering economic and social metrics necessary for our analysis.
Note for students: If you haven’t installed the necessary packages before, please uncomment the code below (remove the #) for installation and then re-comment it after the installation process to ensure the code works.
# Install required packages (uncomment and run if needed)
# install.packages("WDI") # If not installed previously
library(WDI) # Load WDI package
The World Bank’s database comprises over 1,400 time series indicators for 217 economies and more than 40 country groups, spanning data over 50 years. When you visit https://data.worldbank.org/, you can explore the data based on ‘country’ and ‘indicator’. Under the ‘Browse by Country or Indicator’ option, delving into the ‘indicator’ section allows access to numerous time series under different headings. The initial headings cover various aspects, such as ‘Agriculture and Rural Development,’ ‘Aid Effectiveness,’ and ‘Climate Change,’ containing a total of 20 headings and 1,400 time series.
To find the keyword for the dataset you want, navigating to the ‘indicator’ section and visiting the page at https://data.worldbank.org/indicator leads you to the specific dataset page. For instance, if you’re curious about Gross Domestic Product (GDP) and Total Population data for various countries, you’ll need to utilize the respective URLs provided: under the ‘Economy & Growth’ category you will see GDP (current US$), and under the ‘Climate Change’ category you will see Population, total, you might land on a page like this:
https://data.worldbank.org/indicator/NY.GDP.MKTP.CD?view=chart
https://data.worldbank.org/indicator/NY.GDP.DEFL.ZS?view=chart
The ‘NY.GDP.MKTP.CD’, ‘NY.GDP.DEFL.ZS’ and ‘SP.POP.TOTL’ in the middle of the urls is the keyword identifier for this specific indicators.
# Download data from World Bank using WDI package
df <- WDI(country = "all", indicator = c("NY.GDP.MKTP.CD", "SP.POP.TOTL"))
First, I will use str()
function. In R
,
str()
is a function used to reveal the structure of a
dataset. When applied to a dataset, such as df
in our case,
str(df)
provides a concise overview of its composition and
displays details about the columns (variables) present in the dataset.
It reveals the names and data types of each column.It offers a glimpse
into the first few rows of the dataset. This preview showcases actual
data, helping to understand what’s stored in the dataset. It assists in
identifying the types of data in each column, such as numeric, character
(text), or factors (categorical data). It helps in spotting missing
values (‘NA’) within the dataset, crucial for data cleaning and
analysis.
## 'data.frame': 16758 obs. of 6 variables:
## $ country : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
## $ iso2c : chr "AF" "AF" "AF" "AF" ...
## $ iso3c : chr "AFG" "AFG" "AFG" "AFG" ...
## $ year : int 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 ...
## $ NY.GDP.MKTP.CD: num 5.38e+08 5.49e+08 5.47e+08 7.51e+08 8.00e+08 ...
## ..- attr(*, "label")= chr "GDP (current US$)"
## $ SP.POP.TOTL : num 8622466 8790140 8969047 9157465 9355514 ...
## ..- attr(*, "label")= chr "Population, total"
Employing R for data manipulation using packages like
dplyr
and tidyr
, we ensure the data is
structured and ready for analysis.
# Install required packages (uncomment and run if needed)
# install.packages("dplyr")
# install.packages("tidyr") # If not installed previously
library(dplyr)
library(tidyr)
I will also use explore
package. The
explore
package in R is a toolset designed to facilitate
data exploration and analysis, providing various functions to delve into
and understand your dataset more comprehensively. It offers utilities to
assist in initial data investigation and descriptive statistics.
Note for students: If you haven’t installed the
necessary packages before, please use install.packages()
function. I will not use install.packages()
function again
in this lesson.
The describe_all()
function, a part of the
explore
package, is used to generate descriptive statistics
for all columns (variables) in a dataset simultaneously. When applied to
a dataset, it calculates and displays statistics such as mean, median,
standard deviation, minimum, maximum, and quartiles for each column.
The pipe operator (%>%
) in R, often associated with
the magrittr
package, allows for chaining multiple
operations sequentially. It enhances code readability and efficiency by
passing the output of one function as the input to the next function in
the sequence.
With df %>% describe_all()
, the %>%
operator chains the dataset df into the describe_all()
function, generating comprehensive statistics for all columns in the
dataset.
# Load the "explore" package
library(explore)
# Describe all columns in the dataset using describe_all() function
df %>% describe_all()
## # A tibble: 6 × 8
## variable type na na_pct unique min mean max
## <chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl>
## 1 country chr 0 0 266 NA NA NA
## 2 iso2c chr 0 0 266 NA NA NA
## 3 iso3c chr 0 0 262 NA NA NA
## 4 year int 0 0 63 1960 1.99e 3 2.02e 3
## 5 NY.GDP.MKTP.CD dbl 3393 20.2 13243 8824746. 1.21e12 1.01e14
## 6 SP.POP.TOTL dbl 93 0.6 16460 2646 2.16e 8 7.95e 9
Running df %>% describe_all()
executes the
describe_all()
function from the explore
package on the dataset df
, providing a comprehensive
summary of descriptive statistics for each column, offering insights
into the dataset’s characteristics.
The dataset contains information on various factors for 266 unique countries over the period from 1960 to 2022. Let’s explore the details for the available variables:
country
is represented as character (text).iso2c
andiso3c
are coded as character strings.year
is indicated as an integer, covering the years from 1960 to 2022.NY.GDP.MKTP.CD
andSP.POP.TOTL
are represented in double (numeric) format.- About 20.2% of the observations for
NY.GDP.MKTP.CD
(GDP) are missing. - Around 0.6% of observations for
SP.POP.TOTL
(total population) are not available.
The dataset provides a broad view of countries’ GDP and total population over several decades. Despite some missing values, it offers a wide range of data to explore and analyze.
266 unique countries
The unique()
function in R is used to extract unique
elements from a vector, data frame, or array.
x |
---|
Afghanistan |
Africa Eastern and Southern |
Africa Western and Central |
Albania |
Algeria |
American Samoa |
Andorra |
Angola |
Antigua and Barbuda |
Arab World |
Argentina |
Armenia |
Aruba |
Australia |
Austria |
Azerbaijan |
Bahamas, The |
Bahrain |
Bangladesh |
Barbados |
Belarus |
Belgium |
Belize |
Benin |
Bermuda |
Bhutan |
Bolivia |
Bosnia and Herzegovina |
Botswana |
Brazil |
British Virgin Islands |
Brunei Darussalam |
Bulgaria |
Burkina Faso |
Burundi |
Cabo Verde |
Cambodia |
Cameroon |
Canada |
Caribbean small states |
Cayman Islands |
Central African Republic |
Central Europe and the Baltics |
Chad |
Channel Islands |
Chile |
China |
Colombia |
Comoros |
Congo, Dem. Rep. |
Congo, Rep. |
Costa Rica |
Cote d’Ivoire |
Croatia |
Cuba |
Curacao |
Cyprus |
Czechia |
Denmark |
Djibouti |
Dominica |
Dominican Republic |
Early-demographic dividend |
East Asia & Pacific (excluding high income) |
East Asia & Pacific (IDA & IBRD countries) |
East Asia & Pacific |
Ecuador |
Egypt, Arab Rep. |
El Salvador |
Equatorial Guinea |
Eritrea |
Estonia |
Eswatini |
Ethiopia |
Euro area |
Europe & Central Asia (excluding high income) |
Europe & Central Asia (IDA & IBRD countries) |
Europe & Central Asia |
European Union |
Faroe Islands |
Fiji |
Finland |
Fragile and conflict affected situations |
France |
French Polynesia |
Gabon |
Gambia, The |
Georgia |
Germany |
Ghana |
Gibraltar |
Greece |
Greenland |
Grenada |
Guam |
Guatemala |
Guinea-Bissau |
Guinea |
Guyana |
Haiti |
Heavily indebted poor countries (HIPC) |
High income |
Honduras |
Hong Kong SAR, China |
Hungary |
IBRD only |
Iceland |
IDA & IBRD total |
IDA blend |
IDA only |
IDA total |
India |
Indonesia |
Iran, Islamic Rep. |
Iraq |
Ireland |
Isle of Man |
Israel |
Italy |
Jamaica |
Japan |
Jordan |
Kazakhstan |
Kenya |
Kiribati |
Korea, Dem. People’s Rep. |
Korea, Rep. |
Kosovo |
Kuwait |
Kyrgyz Republic |
Lao PDR |
Late-demographic dividend |
Latin America & Caribbean (excluding high income) |
Latin America & Caribbean |
Latin America & the Caribbean (IDA & IBRD countries) |
Latvia |
Least developed countries: UN classification |
Lebanon |
Lesotho |
Liberia |
Libya |
Liechtenstein |
Lithuania |
Low & middle income |
Low income |
Lower middle income |
Luxembourg |
Macao SAR, China |
Madagascar |
Malawi |
Malaysia |
Maldives |
Mali |
Malta |
Marshall Islands |
Mauritania |
Mauritius |
Mexico |
Micronesia, Fed. Sts. |
Middle East & North Africa (excluding high income) |
Middle East & North Africa (IDA & IBRD countries) |
Middle East & North Africa |
Middle income |
Moldova |
Monaco |
Mongolia |
Montenegro |
Morocco |
Mozambique |
Myanmar |
Namibia |
Nauru |
Nepal |
Netherlands |
New Caledonia |
New Zealand |
Nicaragua |
Niger |
Nigeria |
North America |
North Macedonia |
Northern Mariana Islands |
Norway |
Not classified |
OECD members |
Oman |
Other small states |
Pacific island small states |
Pakistan |
Palau |
Panama |
Papua New Guinea |
Paraguay |
Peru |
Philippines |
Poland |
Portugal |
Post-demographic dividend |
Pre-demographic dividend |
Puerto Rico |
Qatar |
Romania |
Russian Federation |
Rwanda |
Samoa |
San Marino |
Sao Tome and Principe |
Saudi Arabia |
Senegal |
Serbia |
Seychelles |
Sierra Leone |
Singapore |
Sint Maarten (Dutch part) |
Slovak Republic |
Slovenia |
Small states |
Solomon Islands |
Somalia |
South Africa |
South Asia (IDA & IBRD) |
South Asia |
South Sudan |
Spain |
Sri Lanka |
St. Kitts and Nevis |
St. Lucia |
St. Martin (French part) |
St. Vincent and the Grenadines |
Sub-Saharan Africa (excluding high income) |
Sub-Saharan Africa (IDA & IBRD countries) |
Sub-Saharan Africa |
Sudan |
Suriname |
Sweden |
Switzerland |
Syrian Arab Republic |
Tajikistan |
Tanzania |
Thailand |
Timor-Leste |
Togo |
Tonga |
Trinidad and Tobago |
Tunisia |
Turkiye |
Turkmenistan |
Turks and Caicos Islands |
Tuvalu |
Uganda |
Ukraine |
United Arab Emirates |
United Kingdom |
United States |
Upper middle income |
Uruguay |
Uzbekistan |
Vanuatu |
Venezuela, RB |
Viet Nam |
Virgin Islands (U.S.) |
West Bank and Gaza |
World |
Yemen, Rep. |
Zambia |
Zimbabwe |
In the dataset we’re using, there are entries that do not represent
actual countries but contain descriptors like
Not classified
, OECD members
, or
Other small states.
To ensure accuracy in our analysis, we
need to exclude these non-country entries. The WDI package provides
additional information, including a dataset named
WDI_data$country
. By leveraging this supplementary data, we
can enrich our existing dataset with more detailed country information.
Using the combination of our dataset and the
WDI_data$country
dataset, we can identify and subsequently
remove these non-country entries to refine and focus our analysis on
legitimate country-specific data points for a more precise
assessment.
## # A tibble: 12 × 8
## variable type na na_pct unique min mean max
## <chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl>
## 1 country chr 0 0 215 NA NA NA
## 2 iso2c chr 0 0 215 NA NA NA
## 3 iso3c chr 0 0 215 NA NA NA
## 4 year int 0 0 63 1960 1991 2.02e 3
## 5 NY.GDP.MKTP.CD dbl 3068 22.7 10473 8824746. 190626974855. 2.55e13
## 6 SP.POP.TOTL dbl 30 0.2 13467 2646 24760308. 1.42e 9
## 7 region chr 0 0 7 NA NA NA
## 8 capital chr 0 0 210 NA NA NA
## 9 longitude chr 0 0 210 NA NA NA
## 10 latitude chr 0 0 210 NA NA NA
## 11 income chr 0 0 5 NA NA NA
## 12 lending chr 0 0 4 NA NA NA
In our dataset, we’ve observed missing data for certain variables
such as total population (SP.POP.TOTL
) and Gross Domestic
Product (NY.GDP.MKTP.CD
). It’s essential to handle missing
data to conduct comprehensive analyses.
# Group by 'country' and count missing values within each group
missing_data_count <- df %>%
group_by(country) %>%
summarize(missing_count = sum(is.na(SP.POP.TOTL)), .groups = 'drop')
# Identify countries with all missing data or more than 20 missing points
countries_with_all_missing <- missing_data_count %>%
filter(missing_count == 66) # Change 66 to the total number of observations per country
countries_with_more_than_20_missing <- missing_data_count %>%
filter(missing_count > 5) # Change 20 to your desired threshold
# View countries with all missing data or more than 20 missing points
print(countries_with_all_missing)
## # A tibble: 0 × 2
## # ℹ 2 variables: country <chr>, missing_count <int>
## # A tibble: 1 × 2
## country missing_count
## <chr> <int>
## 1 West Bank and Gaza 30
## # A tibble: 12 × 8
## variable type na na_pct unique min mean max
## <chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl>
## 1 country chr 0 0 214 NA NA NA
## 2 iso2c chr 0 0 214 NA NA NA
## 3 iso3c chr 0 0 214 NA NA NA
## 4 year int 0 0 63 1960 1991 2.02e 3
## 5 NY.GDP.MKTP.CD dbl 3034 22.5 10444 8824746. 191130628001. 2.55e13
## 6 SP.POP.TOTL dbl 0 0 13433 2646 24812448. 1.42e 9
## 7 region chr 0 0 7 NA NA NA
## 8 capital chr 0 0 210 NA NA NA
## 9 longitude chr 0 0 210 NA NA NA
## 10 latitude chr 0 0 210 NA NA NA
## 11 income chr 0 0 5 NA NA NA
## 12 lending chr 0 0 4 NA NA NA
# Group by 'country' and count missing values within each group
missing_data_count <- dff %>%
group_by(country) %>%
summarize(missing_count = sum(is.na(NY.GDP.MKTP.CD)), .groups = 'drop')
## # A tibble: 12 × 8
## variable type na na_pct unique min mean max
## <chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl>
## 1 country chr 0 0 214 NA NA NA
## 2 iso2c chr 0 0 214 NA NA NA
## 3 iso3c chr 0 0 214 NA NA NA
## 4 year int 0 0 20 2003 2012. 2.02e 3
## 5 NY.GDP.MKTP.CD dbl 192 4.5 4089 19456336. 344289278960. 2.55e13
## 6 SP.POP.TOTL dbl 0 0 4278 9668 32980495. 1.42e 9
## 7 region chr 0 0 7 NA NA NA
## 8 capital chr 0 0 210 NA NA NA
## 9 longitude chr 0 0 210 NA NA NA
## 10 latitude chr 0 0 210 NA NA NA
## 11 income chr 0 0 5 NA NA NA
## 12 lending chr 0 0 4 NA NA NA
# Filter out countries no missing values for 'NY.GDP.MKTP.CD'
countries_to_keep <- missing_data_count %>%
filter(missing_count == 0) %>%
pull(country)
# Subset the original dataset 'df' to keep only the selected countries
dff <- dff %>%
filter(country %in% countries_to_keep)
## # A tibble: 12 × 8
## variable type na na_pct unique min mean max
## <chr> <chr> <int> <dbl> <int> <dbl> <dbl> <dbl>
## 1 country chr 0 0 178 NA NA NA
## 2 iso2c chr 0 0 178 NA NA NA
## 3 iso3c chr 0 0 178 NA NA NA
## 4 year int 0 0 20 2003 2012. 2.02e 3
## 5 NY.GDP.MKTP.CD dbl 0 0 3560 19456336. 392621361626. 2.55e13
## 6 SP.POP.TOTL dbl 0 0 3559 9668 38612129. 1.42e 9
## 7 region chr 0 0 7 NA NA NA
## 8 capital chr 0 0 176 NA NA NA
## 9 longitude chr 0 0 178 NA NA NA
## 10 latitude chr 0 0 178 NA NA NA
## 11 income chr 0 0 4 NA NA NA
## 12 lending chr 0 0 4 NA NA NA
Assessing Global Productivity: Understanding Country Contributions in the World Economy.
In this segment of our course, we will explore how to gauge a country’s productivity in the global landscape. We aim to calculate the world’s real GDP by summing the individual GDPs of all countries, alongside the total world population. This allows us to establish a foundational perspective of the world’s economic output.
To assess each country’s contribution, we’ll determine the ratio
between a country’s population and its GDP in comparison to the global
population and GDP. Imagine a country that accounts for 1% of the
world’s population and generates 1% of the world’s total production.
When we combine these percentages, we derive a simple, yet effective
metric named verim
—our naive productivity measure.
When verim
exceeds one, it suggests that a country is
proportionally contributing more to the global output than its share of
the population. Conversely, a verim
below one indicates an
area for potential improvement in productivity. This analysis provides a
fundamental understanding of how countries perform concerning their
population size and economic output within the global context.
# Calculate ratios for each country
df_ratios <- dff %>%
group_by(year) %>%
summarise(
world_gdp = sum(NY.GDP.MKTP.CD),
world_population = sum(SP.POP.TOTL),
world_gdp_perp = world_gdp/world_population
)
# Plotting total production per person after 2000
library(ggplot2)
ggplot(df_ratios, aes(x = year, y = world_gdp_perp)) +
geom_line() +
labs(
title = "Total Production per Person after 2000",
x = "Year",
y = "Total Production per Person"
)
Merge df_ratios with df.
dff <- dff %>%
mutate(
country_ratio = NY.GDP.MKTP.CD / world_gdp * 100,
population_ratio = SP.POP.TOTL / world_population * 100,
verim = country_ratio / population_ratio
)
dff <- dff %>%
arrange(country, year) %>%
group_by(country) %>%
mutate(cumulative_change = (verim / first(verim) - 1) * 100)
ggplot(df_TR, aes(x = year, y = verim)) +
geom_line() +
labs(
title = "verim in Turkiye after 2000",
x = "Year",
y = "Verim"
)
ggplot(df_LV, aes(x = year, y = verim)) +
geom_line() +
labs(
title = "verim in Latvia after 2000",
x = "Year",
y = "Verim"
)
df %>% filter(country %in% c("France", "Turkiye")) %>%
ggplot(aes(x = year,
y = SP.POP.TOTL,
col = country)) +
geom_line()
Subregion
Note Source of World map code from Statistics Guides with Dr Paul Christiansen
world_sf <- ne_countries(
type = "countries", scale = "small"
) %>%
sf::st_as_sf() %>%
sf::st_transform(crs)
countries_SouthernEurope <- dffk %>%
filter(year ==2022, subregion == "Southern Europe") %>%
pull(country)
countries_NorthernAfrica <- dffk %>%
filter(year ==2022) %>%
filter(subregion == "Northern Africa" ) %>%
pull(country)
countries_MiddleAfrica <- dffk %>%
filter(year ==2022) %>%
filter(subregion == "Middle Africa" ) %>%
pull(country)
countries_SouthAmerica <- dffk %>%
filter(year ==2022) %>%
filter(subregion == "South America" ) %>%
pull(country)
countries_WesternAsia <- dffk %>%
filter(year ==2022) %>%
filter(subregion == "Western Asia" ) %>%
pull(country)
countries_AustraliaandNewZealand <- dffk %>%
filter(year ==2022) %>%
filter(subregion == "Australia and New Zealand" ) %>%
pull(country)
countries_WesternEurope <- dffk %>%
filter(year ==2022) %>%
filter(subregion == "Western Europe" ) %>%
pull(country)
countries_Caribbean <- dffk %>%
filter(year ==2022) %>%
filter(subregion == "Caribbean" ) %>%
pull(country)
countries_SouthernAsia <- dffk %>%
filter(year ==2022) %>%
filter(subregion == "Southern Asia" ) %>%
pull(country)
countries_EasternEurope <- dffk %>%
filter(year ==2022) %>%
filter(subregion == "Eastern Europe" ) %>%
pull(country)
countries_CentralAmerica <- dffk %>%
filter(year ==2022) %>%
filter(subregion == "Central America" ) %>%
pull(country)
countries_WesternAfrica <- dffk %>%
filter(year ==2022) %>%
filter(subregion == "Western Africa" ) %>%
pull(country)
countries_SouthernAfrica <- dffk %>%
filter(year ==2022) %>%
filter(subregion == "Southern Africa" ) %>%
pull(country)
countries_SouthEasternAsia <- dffk %>%
filter(year ==2022) %>%
filter(subregion == "South-Eastern Asia" ) %>%
pull(country)
countries_EasternAfrica <- dffk %>%
filter(year ==2022) %>%
filter(subregion == "Eastern Africa" ) %>%
pull(country)
countries_NorthernAmerica <- dffk %>%
filter(year ==2022) %>%
filter(subregion == "Northern America" ) %>%
pull(country)
countries_EasternAsia <- dffk %>%
filter(year ==2022) %>%
filter(subregion == "Eastern Asia" ) %>%
pull(country)
countries_NorthernEurope <- dffk %>%
filter(year ==2022) %>%
filter(subregion == "Northern Europe" ) %>%
pull(country)
countries_CentralAsia <- dffk %>%
filter(year ==2022) %>%
filter(subregion == "Central Asia" ) %>%
pull(country)
phi <- dffk %>% filter(country %in% countries_WesternAsia) %>%
ggplot(aes(x = year,
y = verim,
col = country)) +
geom_line(show.legend = FALSE) +
facet_wrap(~country, scales = "free") +
transition_reveal(year) +
labs(title = "Year: {frame_along}")
Western Asia
df_WA <- dff %>%
filter(
country %in% countries_WesternAsia,
year %in% c(2003, 2022),
) %>%
mutate(year = factor(year)) %>%
select(country, year, verim)
df_WA <- as_tibble(df_WA) %>%
arrange(country, year) %>%
mutate(
change_verim = diff(verim),
order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
.by = country
) %>%
mutate(country = fct_reorder(country, order_dumbbells))
df_WA %>%
ggplot(aes(x = verim, y = country)) +
geom_path(
aes(color = (change_verim < 0)),
linewidth = 1,
arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
) +
geom_vline(xintercept=1, linetype='dotted', col = 'red')+
annotate("text", x = 1, y = "Armenia", label = "Self-Sufficient Treshold", angle=90) +
labs(
title = 'Western Asia',
x = 'Verim (2003 - 2022)',
y = element_blank(),
fill = 'Year'
) +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
legend.position = 'none'
)
Southern Asia
df_SA <- dff %>%
filter(
country %in% countries_SouthernAsia,
year %in% c(2003, 2022),
) %>%
mutate(year = factor(year)) %>%
select(country, year, verim)
df_SA <- as_tibble(df_SA) %>%
arrange(country, year) %>%
mutate(
change_verim = diff(verim),
order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
.by = country
) %>%
mutate(country = fct_reorder(country, order_dumbbells))
df_SA %>%
ggplot(aes(x = verim, y = country)) +
geom_path(
aes(color = (change_verim < 0)),
linewidth = 1,
arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
) +
geom_vline(xintercept=1, linetype='dotted', col = 'red')+
annotate("text", x = 1, y = 3, label = "Self-Sufficient Treshold", angle=90) +
labs(
title = 'Southern Asia',
x = 'Verim (2003 - 2022)',
y = element_blank(),
fill = 'Year'
) +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
legend.position = 'none'
)
phi <- dffk %>% filter(country %in% countries_SouthernAsia) %>%
ggplot(aes(x = year,
y = verim,
col = country)) +
geom_line(show.legend = FALSE) +
facet_wrap(~country, scales = "free") +
transition_reveal(year) +
labs(title = "Year: {frame_along}")
South Eastern Asia
df_SEA <- dff %>%
filter(
country %in% countries_SouthEasternAsia,
year %in% c(2003, 2022),
) %>%
mutate(year = factor(year)) %>%
select(country, year, verim)
df_SEA <- as_tibble(df_SEA) %>%
arrange(country, year) %>%
mutate(
change_verim = diff(verim),
order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
.by = country
) %>%
mutate(country = fct_reorder(country, order_dumbbells))
df_SEA %>%
ggplot(aes(x = verim, y = country)) +
geom_path(
aes(color = (change_verim < 0)),
linewidth = 1,
arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
) +
geom_vline(xintercept=1, linetype='dotted', col = 'red')+
annotate("text", x = 1, y = 3, label = "Self-Sufficient Treshold", angle=90) +
labs(
title = 'South-Eastern Asia',
x = 'Verim (2003 - 2022)',
y = element_blank(),
fill = 'Year'
) +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
legend.position = 'none'
)
phi <- dffk %>% filter(country %in% countries_SouthEasternAsia) %>%
ggplot(aes(x = year,
y = verim,
col = country)) +
geom_line(show.legend = FALSE) +
facet_wrap(~country, scales = "free") +
transition_reveal(year) +
labs(title = "Year: {frame_along}")
South
Central Asia
df_CA <- dff %>%
filter(
country %in% countries_CentralAsia,
year %in% c(2003, 2022),
) %>%
mutate(year = factor(year)) %>%
select(country, year, verim)
df_CA <- as_tibble(df_CA) %>%
arrange(country, year) %>%
mutate(
change_verim = diff(verim),
order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
.by = country
) %>%
mutate(country = fct_reorder(country, order_dumbbells))
df_CA %>%
ggplot(aes(x = verim, y = country)) +
geom_path(
aes(color = (change_verim < 0)),
linewidth = 1,
arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
) +
geom_vline(xintercept=1, linetype='dotted', col = 'red')+
annotate("text", x = 1, y = 3, label = "Self-Sufficient Treshold", angle=90) +
labs(
title = 'Central Asia',
x = 'Verim (2003 - 2022)',
y = element_blank(),
fill = 'Year'
) +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
legend.position = 'none'
)
phi <- dffk %>% filter(country %in% countries_CentralAsia) %>%
ggplot(aes(x = year,
y = verim,
col = country)) +
geom_line(show.legend = FALSE) +
facet_wrap(~country, scales = "free") +
transition_reveal(year) +
labs(title = "Year: {frame_along}")
Eastern Asia
df_EA <- dff %>%
filter(
country %in% countries_EasternAsia,
year %in% c(2003, 2022),
) %>%
mutate(year = factor(year)) %>%
select(country, year, verim)
df_EA <- as_tibble(df_EA) %>%
arrange(country, year) %>%
mutate(
change_verim = diff(verim),
order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
.by = country
) %>%
mutate(country = fct_reorder(country, order_dumbbells))
df_EA %>%
ggplot(aes(x = verim, y = country)) +
geom_path(
aes(color = (change_verim < 0)),
linewidth = 1,
arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
) +
geom_vline(xintercept=1, linetype='dotted', col = 'red')+
annotate("text", x = 1, y = 2.5, label = "Self-Sufficient Treshold", angle=90) +
labs(
title = 'Eastern Asia',
x = 'Verim (2003 - 2022)',
y = element_blank(),
fill = 'Year'
) +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
legend.position = 'none'
)
phi <- dffk %>% filter(country %in% countries_EasternAsia) %>%
ggplot(aes(x = year,
y = verim,
col = country)) +
geom_line(show.legend = FALSE) +
facet_wrap(~country, scales = "free") +
transition_reveal(year) +
labs(title = "Year: {frame_along}")
Melanesia
df_MAL <- dff %>%
filter(
country %in% countries_Melanesia,
year %in% c(2003, 2022),
) %>%
mutate(year = factor(year)) %>%
select(country, year, verim)
df_MAL <- as_tibble(df_MAL) %>%
arrange(country, year) %>%
mutate(
change_verim = diff(verim),
order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
.by = country
) %>%
mutate(country = fct_reorder(country, order_dumbbells))
df_MAL %>%
ggplot(aes(x = verim, y = country)) +
geom_path(
aes(color = (change_verim < 0)),
linewidth = 1,
arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
) +
geom_vline(xintercept=1, linetype='dotted', col = 'red')+
annotate("text", x = 1, y = 3, label = "Self-Sufficient Treshold", angle=90) +
labs(
title = 'Melanesia',
x = 'Verim (2003 - 2022)',
y = element_blank(),
fill = 'Year'
) +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
legend.position = 'none'
)
phi <- dffk %>% filter(country %in% countries_Melanesia) %>%
ggplot(aes(x = year,
y = verim,
col = country)) +
geom_line(show.legend = FALSE) +
facet_wrap(~country, scales = "free") +
transition_reveal(year) +
labs(title = "Year: {frame_along}")
Southern Europe
df_SE <- dff %>%
filter(
country %in% countries_SouthernEurope,
year %in% c(2003, 2022),
) %>%
mutate(year = factor(year)) %>%
select(country, year, verim)
df_SE <- as_tibble(df_SE) %>%
arrange(country, year) %>%
mutate(
change_verim = diff(verim),
order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
.by = country
) %>%
mutate(country = fct_reorder(country, order_dumbbells))
df_SE %>%
ggplot(aes(x = verim, y = country)) +
geom_path(
aes(color = (change_verim < 0)),
linewidth = 1,
arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
) +
geom_vline(xintercept=1, linetype='dotted', col = 'red')+
annotate("text", x = 1, y = 6, label = "Self-Sufficient Treshold", angle=90) +
labs(
title = 'Southern Europe',
x = 'Verim (2003 - 2022)',
y = element_blank(),
fill = 'Year'
) +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
legend.position = 'none'
)
phi <- dffk %>% filter(country %in% countries_SouthernEurope) %>%
ggplot(aes(x = year,
y = verim,
col = country)) +
geom_line(show.legend = FALSE) +
facet_wrap(~country, scales = "free") +
transition_reveal(year) +
labs(title = "Year: {frame_along}")
Western Europe
df_WE <- dff %>%
filter(
country %in% countries_WesternEurope,
year %in% c(2003, 2022),
) %>%
mutate(year = factor(year)) %>%
select(country, year, verim)
df_WE <- as_tibble(df_WE) %>%
arrange(country, year) %>%
mutate(
change_verim = diff(verim),
order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
.by = country
) %>%
mutate(country = fct_reorder(country, order_dumbbells))
df_WE %>%
ggplot(aes(x = verim, y = country)) +
geom_path(
aes(color = (change_verim < 0)),
linewidth = 1,
arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
) +
geom_vline(xintercept=1, linetype='dotted', col = 'red')+
annotate("text", x = 1, y = 4, label = "Self-Sufficient Treshold", angle=90) +
labs(
title = 'Western Europe',
x = 'Verim (2003 - 2022)',
y = element_blank(),
fill = 'Year'
) +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
legend.position = 'none'
)
phi <- dffk %>% filter(country %in% countries_WesternEurope) %>%
ggplot(aes(x = year,
y = verim,
col = country)) +
geom_line(show.legend = FALSE) +
facet_wrap(~country, scales = "free") +
transition_reveal(year) +
labs(title = "Year: {frame_along}")
Eastern Europe
df_EE <- dff %>%
filter(
country %in% countries_EasternEurope,
year %in% c(2003, 2022),
) %>%
mutate(year = factor(year)) %>%
select(country, year, verim)
df_EE <- as_tibble(df_EE) %>%
arrange(country, year) %>%
mutate(
change_verim = diff(verim),
order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
.by = country
) %>%
mutate(country = fct_reorder(country, order_dumbbells))
df_EE %>%
ggplot(aes(x = verim, y = country)) +
geom_path(
aes(color = (change_verim < 0)),
linewidth = 1,
arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
) +
geom_vline(xintercept=1, linetype='dotted', col = 'red')+
annotate("text", x = 1, y = 5, label = "Self-Sufficient Treshold", angle=90) +
labs(
title = 'Eastern Europe',
x = 'Verim (2003 - 2022)',
y = element_blank(),
fill = 'Year'
) +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
legend.position = 'none'
)
phi <- dffk %>% filter(country %in% countries_EasternEurope) %>%
ggplot(aes(x = year,
y = verim,
col = country)) +
geom_line(show.legend = FALSE) +
facet_wrap(~country, scales = "free") +
transition_reveal(year) +
labs(title = "Year: {frame_along}")
Northern Europe
df_NE <- dff %>%
filter(
country %in% countries_NorthernEurope,
year %in% c(2003, 2022),
) %>%
mutate(year = factor(year)) %>%
select(country, year, verim)
df_NE <- as_tibble(df_NE) %>%
arrange(country, year) %>%
mutate(
change_verim = diff(verim),
order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
.by = country
) %>%
mutate(country = fct_reorder(country, order_dumbbells))
df_NE %>%
ggplot(aes(x = verim, y = country)) +
geom_path(
aes(color = (change_verim < 0)),
linewidth = 1,
arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
) +
geom_vline(xintercept=1, linetype='dotted', col = 'red')+
annotate("text", x = 1, y = 5, label = "Self-Sufficient Treshold", angle=90) +
labs(
title = 'Northern Europe',
x = 'Verim (2003 - 2022)',
y = element_blank(),
fill = 'Year'
) +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
legend.position = 'none'
)
phi <- dffk %>% filter(country %in% countries_NorthernEurope) %>%
ggplot(aes(x = year,
y = verim,
col = country)) +
geom_line(show.legend = FALSE) +
facet_wrap(~country, scales = "free") +
transition_reveal(year) +
labs(title = "Year: {frame_along}")
Northern Africa
df_NA <- dff %>%
filter(
country %in% countries_NorthernAfrica,
year %in% c(2003, 2022),
) %>%
mutate(year = factor(year)) %>%
select(country, year, verim)
df_NA <- as_tibble(df_NA) %>%
arrange(country, year) %>%
mutate(
change_verim = diff(verim),
order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
.by = country
) %>%
mutate(country = fct_reorder(country, order_dumbbells))
df_NA %>%
ggplot(aes(x = verim, y = country)) +
geom_path(
aes(color = (change_verim < 0)),
linewidth = 1,
arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
) +
geom_vline(xintercept=1, linetype='dotted', col = 'red')+
annotate("text", x = 1, y = 3, label = "Self-Sufficient Treshold", angle=90) +
labs(
title = 'Northern Africa',
x = 'Verim (2003 - 2022)',
y = element_blank(),
fill = 'Year'
) +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
legend.position = 'none'
)
phi <- dffk %>% filter(country %in% countries_NorthernAfrica) %>%
ggplot(aes(x = year,
y = verim,
col = country)) +
geom_line(show.legend = FALSE) +
facet_wrap(~country, scales = "free") +
transition_reveal(year) +
labs(title = "Year: {frame_along}")
Middle Africa
df_MA <- dff %>%
filter(
country %in% countries_MiddleAfrica,
year %in% c(2003, 2022),
) %>%
mutate(year = factor(year)) %>%
select(country, year, verim)
df_MA <- as_tibble(df_MA) %>%
arrange(country, year) %>%
mutate(
change_verim = diff(verim),
order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
.by = country
) %>%
mutate(country = fct_reorder(country, order_dumbbells))
df_MA %>%
ggplot(aes(x = verim, y = country)) +
geom_path(
aes(color = (change_verim < 0)),
linewidth = 1,
arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
) +
geom_vline(xintercept=1, linetype='dotted', col = 'red')+
annotate("text", x = 1, y = 5, label = "Self-Sufficient Treshold", angle=90) +
labs(
title = 'Middle Africa',
x = 'Verim (2003 - 2022)',
y = element_blank(),
fill = 'Year'
) +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
legend.position = 'none'
)
phi <- dffk %>% filter(country %in% countries_MiddleAfrica) %>%
ggplot(aes(x = year,
y = verim,
col = country)) +
geom_line(show.legend = FALSE) +
facet_wrap(~country, scales = "free") +
transition_reveal(year) +
labs(title = "Year: {frame_along}")
Western Africa
df_WA <- dff %>%
filter(
country %in% countries_WesternAfrica,
year %in% c(2003, 2022),
) %>%
mutate(year = factor(year)) %>%
select(country, year, verim)
df_WA <- as_tibble(df_WA) %>%
arrange(country, year) %>%
mutate(
change_verim = diff(verim),
order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
.by = country
) %>%
mutate(country = fct_reorder(country, order_dumbbells))
df_WA %>%
ggplot(aes(x = verim, y = country)) +
geom_path(
aes(color = (change_verim < 0)),
linewidth = 1,
arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
) +
geom_vline(xintercept=1, linetype='dotted', col = 'red')+
annotate("text", x = 1, y = 5, label = "Self-Sufficient Treshold", angle=90) +
labs(
title = 'Western Africa',
x = 'Verim (2003 - 2022)',
y = element_blank(),
fill = 'Year'
) +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
legend.position = 'none'
)
phi <- dffk %>% filter(country %in% countries_WesternAfrica) %>%
ggplot(aes(x = year,
y = verim,
col = country)) +
geom_line(show.legend = FALSE) +
facet_wrap(~country, scales = "free") +
transition_reveal(year) +
labs(title = "Year: {frame_along}")
Southern Africa
df_SA <- dff %>%
filter(
country %in% countries_SouthernAfrica,
year %in% c(2003, 2022),
) %>%
mutate(year = factor(year)) %>%
select(country, year, verim)
df_SA <- as_tibble(df_SA) %>%
arrange(country, year) %>%
mutate(
change_verim = diff(verim),
order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
.by = country
) %>%
mutate(country = fct_reorder(country, order_dumbbells))
df_SA %>%
ggplot(aes(x = verim, y = country)) +
geom_path(
aes(color = (change_verim < 0)),
linewidth = 1,
arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
) +
geom_vline(xintercept=1, linetype='dotted', col = 'red')+
annotate("text", x = 1, y = 3, label = "Self-Sufficient Treshold", angle=90) +
labs(
title = 'Southern Africa',
x = 'Verim (2003 - 2022)',
y = element_blank(),
fill = 'Year'
) +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
legend.position = 'none'
)
phi <- dffk %>% filter(country %in% countries_SouthernAfrica) %>%
ggplot(aes(x = year,
y = verim,
col = country)) +
geom_line(show.legend = FALSE) +
facet_wrap(~country, scales = "free") +
transition_reveal(year) +
labs(title = "Year: {frame_along}")
Eastern Africa
df_EA <- dff %>%
filter(
country %in% countries_EasternAfrica,
year %in% c(2003, 2022),
) %>%
mutate(year = factor(year)) %>%
select(country, year, verim)
df_EA <- as_tibble(df_EA) %>%
arrange(country, year) %>%
mutate(
change_verim = diff(verim),
order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
.by = country
) %>%
mutate(country = fct_reorder(country, order_dumbbells))
df_EA %>%
ggplot(aes(x = verim, y = country)) +
geom_path(
aes(color = (change_verim < 0)),
linewidth = 1,
arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
) +
geom_vline(xintercept=1, linetype='dotted', col = 'red')+
annotate("text", x = 1, y = 5, label = "Self-Sufficient Treshold", angle=90) +
labs(
title = 'Eastern Africa',
x = 'Verim (2003 - 2022)',
y = element_blank(),
fill = 'Year'
) +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
legend.position = 'none'
)
phi <- dffk %>% filter(country %in% countries_EasternAfrica) %>%
ggplot(aes(x = year,
y = verim,
col = country)) +
geom_line(show.legend = FALSE) +
facet_wrap(~country, scales = "free") +
transition_reveal(year) +
labs(title = "Year: {frame_along}")
South America
df_SA <- dff %>%
filter(
country %in% countries_SouthAmerica,
year %in% c(2003, 2022),
) %>%
mutate(year = factor(year)) %>%
select(country, year, verim)
df_SA <- as_tibble(df_SA) %>%
arrange(country, year) %>%
mutate(
change_verim = diff(verim),
order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
.by = country
) %>%
mutate(country = fct_reorder(country, order_dumbbells))
df_SA %>%
ggplot(aes(x = verim, y = country)) +
geom_path(
aes(color = (change_verim < 0)),
linewidth = 1,
arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
) +
geom_vline(xintercept=1, linetype='dotted', col = 'red')+
annotate("text", x = 1, y = 5, label = "Self-Sufficient Treshold", angle=90) +
labs(
title = 'South America',
x = 'Verim (2003 - 2022)',
y = element_blank(),
fill = 'Year'
) +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
legend.position = 'none'
)
phi <- dffk %>% filter(country %in% countries_SouthAmerica) %>%
ggplot(aes(x = year,
y = verim,
col = country)) +
geom_line(show.legend = FALSE) +
facet_wrap(~country, scales = "free") +
transition_reveal(year) +
labs(title = "Year: {frame_along}")
count
## function (x, ..., wt = NULL, sort = FALSE, name = NULL)
## {
## UseMethod("count")
## }
## <bytecode: 0x0000018357e7b8c0>
## <environment: namespace:dplyr>
Central America
df_CA <- dff %>%
filter(
country %in% countries_CentralAmerica,
year %in% c(2003, 2022),
) %>%
mutate(year = factor(year)) %>%
select(country, year, verim)
df_CA <- as_tibble(df_CA) %>%
arrange(country, year) %>%
mutate(
change_verim = diff(verim),
order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
.by = country
) %>%
mutate(country = fct_reorder(country, order_dumbbells))
df_CA %>%
ggplot(aes(x = verim, y = country)) +
geom_path(
aes(color = (change_verim < 0)),
linewidth = 1,
arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
) +
geom_vline(xintercept=1, linetype='dotted', col = 'red')+
annotate("text", x = 1, y = 5, label = "Self-Sufficient Treshold", angle=90) +
labs(
title = 'Central America',
x = 'Verim (2003 - 2022)',
y = element_blank(),
fill = 'Year'
) +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
legend.position = 'none'
)
phi <- dffk %>% filter(country %in% countries_CentralAmerica) %>%
ggplot(aes(x = year,
y = verim,
col = country)) +
geom_line(show.legend = FALSE) +
facet_wrap(~country, scales = "free") +
transition_reveal(year) +
labs(title = "Year: {frame_along}")
Northern America
df_NA <- dff %>%
filter(
country %in% c(countries_NorthernAmerica,countries_AustraliaandNewZealand),
year %in% c(2003, 2022),
) %>%
mutate(year = factor(year)) %>%
select(country, year, verim)
df_NA <- as_tibble(df_NA) %>%
arrange(country, year) %>%
mutate(
change_verim = diff(verim),
order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
.by = country
) %>%
mutate(country = fct_reorder(country, order_dumbbells))
df_NA %>%
ggplot(aes(x = verim, y = country)) +
geom_path(
aes(color = (change_verim < 0)),
linewidth = 1,
arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
) +
geom_vline(xintercept=1, linetype='dotted', col = 'red')+
annotate("text", x = 1, y = 3, label = "Self-Sufficient Treshold", angle=90) +
labs(
title = 'United States & New Zealand & Australia & New Zealand',
x = 'Verim (2003 - 2022)',
y = element_blank(),
fill = 'Year'
) +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
legend.position = 'none'
)
phi <- dffk %>% filter(country %in% c(countries_NorthernAmerica,countries_AustraliaandNewZealand)) %>%
ggplot(aes(x = year,
y = verim,
col = country)) +
geom_line(show.legend = FALSE) +
facet_wrap(~country, scales = "free") +
transition_reveal(year) +
labs(title = "Year: {frame_along}")
Caribbean
df_CAR <- dff %>%
filter(
country %in% countries_Caribbean,
year %in% c(2003, 2022),
) %>%
mutate(year = factor(year)) %>%
select(country, year, verim)
df_CAR <- as_tibble(df_CAR) %>%
arrange(country, year) %>%
mutate(
change_verim = diff(verim),
order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
.by = country
) %>%
mutate(country = fct_reorder(country, order_dumbbells))
df_CAR %>%
ggplot(aes(x = verim, y = country)) +
geom_path(
aes(color = (change_verim < 0)),
linewidth = 1,
arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
) +
geom_vline(xintercept=1, linetype='dotted', col = 'red')+
annotate("text", x = 1, y = 3, label = "Self-Sufficient Treshold", angle=90) +
labs(
title = 'Caribbean',
x = 'Verim (2003 - 2022)',
y = element_blank(),
fill = 'Year'
) +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
legend.position = 'none'
)
phi <- dffk %>% filter(country %in% countries_Caribbean) %>%
ggplot(aes(x = year,
y = verim,
col = country)) +
geom_line(show.legend = FALSE) +
facet_wrap(~country, scales = "free") +
transition_reveal(year) +
labs(title = "Year: {frame_along}")
Some Other Countries
df_OC <- dff %>%
filter(
country %in% countries_NA,
year %in% c(2003, 2022),
) %>%
mutate(year = factor(year)) %>%
select(country, year, verim)
df_OC <- as_tibble(df_OC) %>%
arrange(country, year) %>%
mutate(
change_verim = diff(verim),
order_dumbbells = if_else(change_verim < 0, -1, 1) * verim[2],
.by = country
) %>%
mutate(country = fct_reorder(country, order_dumbbells))
df_OC %>%
ggplot(aes(x = verim, y = country)) +
geom_path(
aes(color = (change_verim < 0)),
linewidth = 1,
arrow = arrow(length = unit(0.3, 'cm'), type = 'closed')
) +
geom_vline(xintercept=1, linetype='dotted', col = 'red')+
annotate("text", x = 1, y = 12, label = "Self-Sufficient Treshold", angle=90) +
labs(
title = 'Other Countries',
x = 'Verim (2003 - 2022)',
y = element_blank(),
fill = 'Year'
) +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.x = element_blank(),
legend.position = 'none'
)
phi <- dffk %>% filter(country %in% countr) %>% ggplot(aes(x = year, y = verim, col = country)) + geom_line(show.legend = FALSE) + facet_wrap(~country, scales = “free”) + transition_reveal(year) + labs(title = “Year: {frame_along}”)
animate(plot = phi, nframes = 30)
Note Source of World map code from Statistics Guides with Dr Paul Christiansen
# Robinson
robinson_crs <- "+proj=robin +lon_0=0 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m +no_defs"
dffk_robinson <- dffk %>%
sf::st_transform(robinson_crs)
vmin <- min(dffk$verim, na.rm = T)
vmax <- max(dffk$verim, na.rm = T)
brk <- round(classIntervals(
dffk$verim,
n = 77,
style = "fisher"
)
$brks, 1) %>%
head(-1) %>%
tail(-1) %>%
append(vmax)
breaks <- c(vmin, brk)
# Number of colors needed
num_colors <- 78
# Generate a color palette
new_cols <- rainbow(num_colors)
# Reverse the order if needed
new_cols <- rev(new_cols)
# Print or use the new color palette
print(new_cols)
## [1] "#FF0014" "#FF0027" "#FF003B" "#FF004E" "#FF0062" "#FF0076" "#FF0089"
## [8] "#FF009D" "#FF00B1" "#FF00C4" "#FF00D8" "#FF00EB" "#FF00FF" "#EB00FF"
## [15] "#D800FF" "#C400FF" "#B100FF" "#9D00FF" "#8900FF" "#7600FF" "#6200FF"
## [22] "#4E00FF" "#3B00FF" "#2700FF" "#1400FF" "#0000FF" "#0014FF" "#0027FF"
## [29] "#003BFF" "#004EFF" "#0062FF" "#0076FF" "#0089FF" "#009DFF" "#00B1FF"
## [36] "#00C4FF" "#00D8FF" "#00EBFF" "#00FFFF" "#00FFEB" "#00FFD8" "#00FFC4"
## [43] "#00FFB1" "#00FF9D" "#00FF89" "#00FF76" "#00FF62" "#00FF4E" "#00FF3B"
## [50] "#00FF27" "#00FF14" "#00FF00" "#14FF00" "#27FF00" "#3BFF00" "#4EFF00"
## [57] "#62FF00" "#76FF00" "#89FF00" "#9DFF00" "#B1FF00" "#C4FF00" "#D8FF00"
## [64] "#EBFF00" "#FFFF00" "#FFEB00" "#FFD800" "#FFC400" "#FFB100" "#FF9D00"
## [71] "#FF8900" "#FF7600" "#FF6200" "#FF4E00" "#FF3B00" "#FF2700" "#FF1400"
## [78] "#FF0000"
animated_map <- function() {
world_map <- ggplot(
data = dffk,
aes(fill = verim)
) +
geom_sf(color = "white", size = 0.05) +
scale_fill_gradientn(
name = "",
colours = cols,
breaks = breaks,
labels = round(breaks, 1),
limits = c(vmin, vmax),
na.value = "grey70"
) +
coord_sf(crs = robinson_crs) +
guides(fill = guide_legend(
direction = "vertical",
keyheight = unit(1, units = "mm"),
keywidth = unit(1, units = "mm"),
title.position = "top",
title.hjust = .5,
label.hjust = .5,
nrow = 6,
byrow = T,
reverse = F,
label.position = "right"
)) +
theme_minimal() +
theme(
axis.line = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank(),
legend.position = c(.5, -.015),
legend.text = element_text(size = 5, color = "grey10"),
panel.grid.major = element_line(color = "white", size = .2),
panel.grid.minor = element_blank(),
plot.title = element_text(
face = "bold", size = 20,
color = "grey10", hjust = .5, vjust = -3
),
plot.subtitle = element_text(
size = 40, color = "#c43c4e",
hjust = .5, vjust = -1
),
plot.caption = element_text(
size = 8, color = "grey10",
hjust = .5, vjust = -10
),
plot.margin = unit(c(t = -4, r = -4, b = -4, l = -4), "lines"),
plot.background = element_rect(fill = "white", color = NA),
panel.background = element_rect(fill = "white", color = NA),
legend.background = element_rect(fill = "white", color = NA),
panel.border = element_blank()
) +
labs(
x = "",
y = "",
title = "Verim",
subtitle = "Year: {as.integer(closest_state)}",
caption = ""
)
return(world_map)
}
“Year: {as.integer(closest_state)}”
timelapse_world_map <- world_map +
transition_states(year) +
enter_fade() +
exit_fade() +
ease_aes("quadratic-in-out", interval = .2)
animated_world <- gganimate::animate(
timelapse_world_map,
nframes = 120,
duration = 22,
start_pause = 3,
end_pause = 30,
height = 6,
width = 7.15,
res = 300,
units = "in",
fps = 15,
renderer = gifski_renderer(loop = T)
)
hcverim <- data %>%
ggplot(aes(x= hc, y=verim)) +
geom_point() +
geom_text(data= data, aes(y = verim + .25, label=iso_a2, colour = region),
size = 2) +
geom_hline(yintercept=1, linetype='dotted', col = 'black') +
geom_vline(xintercept=2.5, linetype='dotted', col = 'black') +
geom_smooth(data=subset(data,verim>1 & hc>2.5),
method=lm,se=FALSE) +
labs(title = "Verim vs Human Capital (2022)",
subtitle = NULL,
tag = NULL,
x = "Human Capital",
y= "Verim",
color = NULL) +
theme(
axis.title.x = element_markdown(),
axis.title.y = element_markdown(),
axis.ticks = element_blank(),
axis.line = element_line(),
panel.background = element_rect(fill="#FFFFFF")
)
source https://groups.google.com/forum/#!topic/ggplot2/1TgH-kG5XMA
lm_eqn <- function(df){
m <- lm(verim ~ hc, df);
eq <- substitute(verim == a + b %.% hc*","~r^2~"="~r2,
list(a = format(unname(coef(m)[1]), digits = 2),
b = format(unname(coef(m)[2]), digits = 2),
r2 = format(summary(m)$r.squared, digits = 3)))
as.character(as.expression(eq));
}
hcverim + geom_text(x = 4, y = 3, label = lm_eqn(data %>% filter(hc>2.5, verim>1)), size = 4, parse = TRUE) +
geom_text(x = 1.2, y = 8, label = "A", size = 6, parse = TRUE) +
geom_text(x = 1.2, y = 0, label = "B", size = 6, parse = TRUE) +
geom_text(x = 4.2, y = 8, label = "C", size = 6, parse = TRUE) +
geom_text(x = 4.2, y = 0, label = "D", size = 6, parse = TRUE)
govverim <- data %>%
ggplot(aes(x= csh_g, y=verim)) +
geom_point() +
geom_text(data= data, aes(y = verim + .25, label=iso_a2, colour = region),
size = 2) +
geom_hline(yintercept=1, linetype='dotted', col = 'black') +
geom_smooth(data=subset(data,verim>1),method=lm,se=FALSE) +
labs(title = "Verim vs Government spending share",
subtitle = NULL,
tag = NULL,
x = "Government spending share",
y= "Verim",
color = NULL) +
theme(
axis.title.x = element_markdown(),
axis.title.y = element_markdown(),
axis.ticks = element_blank(),
axis.line = element_line(),
panel.background = element_rect(fill="#FFFFFF")
)
lm_eqn2 <- function(df){
m <- lm(verim ~ csh_g, df);
eq <- substitute(verim == a + b %.% csh_g*","~r^2~"="~r2,
list(a = format(unname(coef(m)[1]), digits = 2),
b = format(unname(coef(m)[2]), digits = 2),
r2 = format(summary(m)$r.squared, digits = 3)))
as.character(as.expression(eq));
}
govverim + geom_text(x = 0.09, y = 2.5, label = lm_eqn2(data %>% filter(verim>1)), size = 3, parse = TRUE)