This report analyzes global economic stress by examining three key dimensions: headline consumer price inflation, income inequality, and cost of living. Data from multiple international datasets is cleaned, merged, and visualized to identify patterns and relationships across countries and continents.
library(readxl)
library(ggplot2)
library(dplyr)
library(tidyr)
library(reshape2)
Inflation_Dataset <- read_xls('Datasets/Global Dataset of Inflation.xls')
Income_Inequality <- read_xls('Datasets/Inequality in Income.xls')
cost_of_living <- read_xlsx('Datasets/Comparison of worldwide cost of living.xlsx')
This chunk loads the raw inflation dataset and prepares it for analysis. It removes duplicate rows, keeps only the “Headline Consumer Price Inflation” rows, and drops unnecessary columns. The data is then reshaped from wide format — where years are stored as separate columns — to long format, where each row represents a single country and year combination. Finally, the year and inflation values are converted to the correct data types, and any rows with missing inflation values are removed.
inflation_clean <- Inflation_Dataset %>%
distinct() %>%
filter(!is.na(Country), `Series Name` == "Headline Consumer Price Inflation") %>%
select(-`Country Code`, -`IMF Country Code`, -`Indicator Type`,
-`Series Name`, -Note, -`...60`) %>%
pivot_longer(
cols = -Country,
names_to = "year",
values_to = "headline_inflation"
) %>%
mutate(
year = as.integer(year),
headline_inflation = as.numeric(headline_inflation)
) %>%
filter(!is.na(headline_inflation))
This chunk cleans the income inequality dataset by removing duplicate
rows and filtering out entries with missing country names. It selects
the relevant columns — Country, ISO3 code, Continent, and all columns
related to inequality in income — then reshapes the data from wide to
long format. The year is extracted from the column names using
gsub() to strip out extra text, and rows with missing
inequality values are filtered out.
inequality_clean <- Income_Inequality %>%
distinct() %>%
filter(!is.na(Country)) %>%
select(Country, ISO3, Continent, starts_with("Inequality in income")) %>%
pivot_longer(
cols = -c(Country, ISO3, Continent),
names_to = "year",
values_to = "inequality_index"
) %>%
mutate(
year = as.integer(gsub("Inequality in income \\(|\\)", "", year)),
inequality_index = as.numeric(inequality_index)
) %>%
filter(!is.na(inequality_index))
This chunk prepares the cost of living dataset by removing duplicate rows and entries with missing country names. It selects and renames four key columns — country, cost index, monthly income in USD, and purchasing power index — into cleaner, consistent names that align with the other datasets for easy merging.
cost_clean <- cost_of_living %>%
distinct() %>%
filter(!is.na(country)) %>%
select(
Country = country,
cost_index,
monthly_income_usd = `usd_monthly income`,
purchasing_power_index
)
This chunk combines all three cleaned datasets into a single unified
dataframe called economic_stress. First, an inner join is
performed between the inflation and inequality datasets on both Country
and year, ensuring only countries with data in both datasets are
retained. Then a left join adds the cost of living data by Country,
preserving all rows even if cost of living data is unavailable for some
countries. Finally, columns are reordered for clarity and rows are
sorted alphabetically by country and then chronologically by year.
economic_stress <- inflation_clean %>%
inner_join(inequality_clean, by = c("Country", "year")) %>%
left_join(cost_clean, by = "Country") %>%
select(
Country, ISO3, Continent, year,
headline_inflation,
inequality_index,
cost_index,
monthly_income_usd,
purchasing_power_index
) %>%
arrange(Country, year)
This chunk provides a quick overview of the merged dataset.
summary() displays statistical summaries for each column
such as minimum, maximum, and mean values, while nrow()
confirms the total number of rows in the final dataset. These checks
help verify that the merging process worked correctly.
summary(economic_stress)
## Country ISO3 Continent year
## Length:1578 Length:1578 Length:1578 Min. :2010
## Class :character Class :character Class :character 1st Qu.:2013
## Mode :character Mode :character Mode :character Median :2016
## Mean :2016
## 3rd Qu.:2019
## Max. :2021
##
## headline_inflation inequality_index cost_index monthly_income_usd
## Min. : -3.230 Min. : 5.845 Min. : 24.80 Min. : 88.0
## 1st Qu.: 1.350 1st Qu.:16.316 1st Qu.: 36.50 1st Qu.: 295.0
## Median : 3.015 Median :21.214 Median : 46.60 Median : 635.5
## Mean : 16.682 Mean :23.442 Mean : 60.77 Mean :1666.5
## 3rd Qu.: 5.790 3rd Qu.:28.893 3rd Qu.: 94.17 3rd Qu.:3499.5
## Max. :17087.720 Max. :68.337 Max. :142.90 Max. :7329.0
## NA's :628 NA's :628
## purchasing_power_index
## Min. : 5.30
## 1st Qu.: 14.90
## Median : 24.70
## Mean : 37.00
## 3rd Qu.: 58.67
## Max. :111.30
## NA's :628
nrow(economic_stress)
## [1] 1578
The following four charts each filter the
economic_stress dataset for a specific country using its
ISO3 code and plot a bar chart showing how headline inflation changed
year by year. The fill = headline_inflation argument
applies a color gradient to the bars based on the inflation value,
making it easy to identify years of high and low inflation at a
glance.
economic_stress %>%
filter(ISO3 == 'USA') %>%
ggplot(aes(x = year, y = headline_inflation, fill = headline_inflation)) +
geom_bar(stat = 'identity') +
labs(title = 'Rate of Inflation in the United States',
x = 'Year',
y = 'Inflation Rate')
economic_stress %>%
filter(ISO3 == 'RUS') %>%
ggplot(aes(x = year, y = headline_inflation, fill = headline_inflation)) +
geom_bar(stat = 'identity') +
labs(title = 'Rate of Inflation of Russia',
x = 'Year',
y = 'Inflation Rate')
economic_stress %>%
filter(ISO3 == 'CHN') %>%
ggplot(aes(x = year, y = headline_inflation, fill = headline_inflation)) +
geom_bar(stat = 'identity') +
labs(title = 'Rate of Inflation in China',
x = 'Year',
y = 'Inflation Rate')
economic_stress %>%
filter(ISO3 == 'IND') %>%
ggplot(aes(x = year, y = headline_inflation, fill = headline_inflation)) +
geom_bar(stat = 'identity') +
labs(title = 'Rate of Inflation in India',
x = 'Year',
y = 'Inflation Rate')
This chunk revisits the merged dataset to check its overall structure
and identify any missing values. summary() provides
descriptive statistics for all columns, and
colSums(is.na()) counts how many NA values exist in each
column. This is an important step before running any correlation or
regression analysis to ensure data quality.
summary(economic_stress)
## Country ISO3 Continent year
## Length:1578 Length:1578 Length:1578 Min. :2010
## Class :character Class :character Class :character 1st Qu.:2013
## Mode :character Mode :character Mode :character Median :2016
## Mean :2016
## 3rd Qu.:2019
## Max. :2021
##
## headline_inflation inequality_index cost_index monthly_income_usd
## Min. : -3.230 Min. : 5.845 Min. : 24.80 Min. : 88.0
## 1st Qu.: 1.350 1st Qu.:16.316 1st Qu.: 36.50 1st Qu.: 295.0
## Median : 3.015 Median :21.214 Median : 46.60 Median : 635.5
## Mean : 16.682 Mean :23.442 Mean : 60.77 Mean :1666.5
## 3rd Qu.: 5.790 3rd Qu.:28.893 3rd Qu.: 94.17 3rd Qu.:3499.5
## Max. :17087.720 Max. :68.337 Max. :142.90 Max. :7329.0
## NA's :628 NA's :628
## purchasing_power_index
## Min. : 5.30
## 1st Qu.: 14.90
## Median : 24.70
## Mean : 37.00
## 3rd Qu.: 58.67
## Max. :111.30
## NA's :628
colSums(is.na(economic_stress))
## Country ISO3 Continent
## 0 0 0
## year headline_inflation inequality_index
## 0 0 0
## cost_index monthly_income_usd purchasing_power_index
## 628 628 628
This chunk first computes a correlation matrix between the four key
numeric variables — headline inflation, inequality index, cost index,
and purchasing power index — using only complete, non-missing rows. The
melt() function from the reshape2 library then
converts the matrix into a long format suitable for ggplot2, which
renders it as a color-coded heatmap using geom_tile().
Stronger colors indicate a higher correlation between the two variables
being compared.
cor_matrix <- economic_stress %>%
select(headline_inflation, inequality_index, cost_index, purchasing_power_index) %>%
cor(use = "complete.obs")
melted_cor <- melt(cor_matrix)
ggplot(melted_cor, aes(Var1, Var2, fill = value)) +
geom_tile() +
labs(title = "Correlation Heatmap")
These two chunks calculate the average inflation and average inequality index for each country across all years. The top 10 countries with the highest averages are selected and stored in separate variables, which are then used in the following bar charts for comparison.
top10_inflation <- economic_stress %>%
group_by(Country) %>%
summarise(avg_inflation = mean(headline_inflation, na.rm = TRUE)) %>%
arrange(desc(avg_inflation)) %>%
top_n(10, avg_inflation)
top10_inequality <- economic_stress %>%
group_by(Country) %>%
summarise(avg_inequality = mean(inequality_index, na.rm = TRUE)) %>%
top_n(10, avg_inequality)
This chart plots a horizontal bar chart for the top 10 countries with
the highest average headline inflation. reorder() ensures
the bars are sorted from highest to lowest, and
coord_flip() rotates the chart so that country names are
displayed on the y-axis and are easy to read.
ggplot(top10_inflation, aes(x = reorder(Country, avg_inflation), y = avg_inflation)) +
geom_col(fill = "steelblue") +
coord_flip() +
labs(title = "Top 10 Countries by Average Inflation",
x = "Country", y = "Avg Headline Inflation") +
theme_minimal()
This chart plots a horizontal bar chart for the top 10 countries with
the highest average inequality index. Similar to the inflation chart,
reorder() and coord_flip() are used to produce
a clean, ranked horizontal layout for easy comparison across
countries.
ggplot(top10_inequality, aes(x = reorder(Country, avg_inequality), y = avg_inequality)) +
geom_col(fill = "tomato") +
coord_flip() +
labs(title = "Top 10 Countries by Inequality Index",
x = "Country", y = "Avg Inequality Index") +
theme_minimal()
This chunk filters data for India specifically and plots a line graph showing how its headline inflation changed over the years. A line chart is the most appropriate choice here because it clearly highlights trends and fluctuations in inflation over time for a single country.
ggplot(economic_stress %>% filter(ISO3 == "IND"),
aes(x = year, y = headline_inflation)) +
geom_line(color = "blue") +
labs(title = "India Inflation Trend")
This chunk creates a boxplot showing the spread and distribution of headline inflation across different continents. The boxplot reveals the median, interquartile range, and any outliers in inflation values for each continent, making it easy to compare regional economic patterns at a glance.
ggplot(economic_stress, aes(x = Continent, y = headline_inflation)) +
geom_boxplot() +
labs(title = "Inflation Distribution by Continent")
This chunk plots a scatter plot with a linear regression line to
explore the relationship between the inequality index and headline
inflation across all countries and years. The
geom_smooth(method = "lm") layer adds a best-fit regression
line along with a shaded confidence interval, helping to identify
whether higher inequality tends to be associated with higher
inflation.
ggplot(economic_stress,
aes(x = inequality_index, y = headline_inflation)) +
geom_point() +
geom_smooth(method = "lm") +
labs(title = "Inflation vs Inequality")
This chunk builds a multiple linear regression model where headline
inflation is the outcome variable and inequality index, cost index, and
purchasing power index are the predictor variables.
summary(model) prints detailed results including regression
coefficients, p-values, and the R-squared value, which together indicate
how well these three factors explain variation in inflation across
countries.
model <- lm(headline_inflation ~ inequality_index + cost_index + purchasing_power_index,
data = economic_stress)
summary(model)
##
## Call:
## lm(formula = headline_inflation ~ inequality_index + cost_index +
## purchasing_power_index, data = economic_stress)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.064 -2.154 -0.444 1.222 53.099
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.388070 0.494076 12.929 < 2e-16 ***
## inequality_index 0.051543 0.014501 3.554 0.000397 ***
## cost_index -0.068475 0.008656 -7.911 7.12e-15 ***
## purchasing_power_index 0.014017 0.009496 1.476 0.140239
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.244 on 946 degrees of freedom
## (628 observations deleted due to missingness)
## Multiple R-squared: 0.1813, Adjusted R-squared: 0.1787
## F-statistic: 69.84 on 3 and 946 DF, p-value: < 2.2e-16
This chunk calculates the average inflation and average inequality
index for each continent, stores the results in
continent_summary, and then plots them as a bar chart. This
provides a high-level regional comparison of economic stress, showing
which continents experience the highest levels of inflation on
average.
continent_summary <- economic_stress %>%
group_by(Continent) %>%
summarise(
avg_inflation = mean(headline_inflation, na.rm = TRUE),
avg_inequality = mean(inequality_index, na.rm = TRUE)
)
ggplot(continent_summary, aes(x = Continent, y = avg_inflation)) +
geom_col(fill = "steelblue") +
labs(title = "Average Inflation by Continent",
x = "Continent", y = "Avg Inflation") +
theme_minimal()
This chunk plots a histogram showing how headline inflation values are distributed across the entire dataset. With 30 bins, it reveals whether inflation values are normally distributed, right-skewed, or concentrated around specific ranges, providing an overall picture of inflation patterns globally.
ggplot(economic_stress, aes(x = headline_inflation)) +
geom_histogram(bins = 30, fill = "skyblue") +
labs(title = "Distribution of Inflation")
This chunk filters data for three major economies — India, the USA, and China — and plots their inflation trends on a single line graph with a different color assigned to each country. This multi-line chart makes it easy to compare how these economies performed relative to each other over the same time period and to spot diverging or converging trends.
economic_stress %>%
filter(ISO3 %in% c("IND", "USA", "CHN")) %>%
ggplot(aes(x = year, y = headline_inflation, color = ISO3)) +
geom_line() +
labs(title = "Inflation Comparison — India, USA, China")
This chunk groups the data by year and calculates the global average inflation for each year, then sorts the results in descending order. This gives a quick overview of which years experienced the highest average inflation worldwide, offering useful context for understanding global economic trends over time.
economic_stress %>%
group_by(year) %>%
summarise(avg_inflation = mean(headline_inflation, na.rm = TRUE)) %>%
arrange(desc(avg_inflation))
## # A tibble: 12 × 2
## year avg_inflation
## <int> <dbl>
## 1 2020 133.
## 2 2021 9.05
## 3 2016 6.78
## 4 2011 6.35
## 5 2019 6.25
## 6 2017 5.88
## 7 2012 5.85
## 8 2018 5.02
## 9 2010 4.38
## 10 2013 4.18
## 11 2015 4.07
## 12 2014 3.99
economic_stress %>%
group_by(year) %>%
summarise(avg_inflation = mean(headline_inflation, na.rm = TRUE)) %>%
ggplot(aes(x = year, y = avg_inflation)) +
geom_line(color = "steelblue") +
geom_point() +
labs(title = "Global Average Inflation Over Time",
x = "Year", y = "Avg Inflation") +
theme_minimal()
This visualization tracks the inequality_index across different years, providing a macro-level view of how wealth distribution has shifted regionally. By calculating the annual mean for each year, we can identify broader socioeconomic patterns that individual year data.
inequality_trend <- economic_stress %>%
group_by(year) %>%
summarise(avg_inequality = mean(inequality_index, na.rm = TRUE))
inequality_trend <- economic_stress %>%
group_by(year) %>%
summarise(avg_inequality = mean(inequality_index, na.rm = TRUE))
ggplot(inequality_trend, aes(x = year, y = avg_inequality)) +
geom_line(color = "tomato", size = 1.2) +
geom_point(color = "tomato", size = 2) +
labs(title = "Global Average Income Inequality Over Time",
subtitle = "Mean Inequality Index across all recorded countries",
x = "Year",
y = "Avg Inequality Index") +
theme_minimal()
##Average Cost of Living by year Over Time
This analysis monitors the average cost_index by year to
illustrate the evolving burden of essential expenses on a global scale.
By observing these trends over time, we can determine how yearly
affordability is affected by global supply chain shifts and local
economic pressures.
cost_trend <- economic_stress %>%
group_by(year) %>%
summarise(avg_cost = mean(cost_index, na.rm = TRUE))
ggplot(cost_trend, aes(x = year, y = avg_cost)) +
geom_line(color = "darkgreen", size = 1.2) +
geom_point(color = "darkgreen", size = 2) +
labs(title = "Global Average Cost of Living Over Time",
subtitle = "Relative cost index trends based on available global data",
x = "Year",
y = "Avg Cost Index") +
theme_minimal()