Introduction

This report analyzes global economic stress by examining three key dimensions: headline consumer price inflation, income inequality, and cost of living. Data from multiple international datasets is cleaned, merged, and visualized to identify patterns and relationships across countries and continents.

Loading Libraries

library(readxl)
library(ggplot2)
library(dplyr)
library(tidyr)
library(reshape2)

Loading the Datasets

Inflation_Dataset <- read_xls('Datasets/Global Dataset of Inflation.xls')
Income_Inequality <- read_xls('Datasets/Inequality in Income.xls')
cost_of_living    <- read_xlsx('Datasets/Comparison of worldwide cost of living.xlsx')

Cleaning the Inflation Dataset

This chunk loads the raw inflation dataset and prepares it for analysis. It removes duplicate rows, keeps only the “Headline Consumer Price Inflation” rows, and drops unnecessary columns. The data is then reshaped from wide format — where years are stored as separate columns — to long format, where each row represents a single country and year combination. Finally, the year and inflation values are converted to the correct data types, and any rows with missing inflation values are removed.

inflation_clean <- Inflation_Dataset %>%
  distinct() %>%
  filter(!is.na(Country), `Series Name` == "Headline Consumer Price Inflation") %>%
  select(-`Country Code`, -`IMF Country Code`, -`Indicator Type`,
         -`Series Name`, -Note, -`...60`) %>%
  pivot_longer(
    cols      = -Country,
    names_to  = "year",
    values_to = "headline_inflation"
  ) %>%
  mutate(
    year               = as.integer(year),
    headline_inflation = as.numeric(headline_inflation)
  ) %>%
  filter(!is.na(headline_inflation))

Cleaning the Income Inequality Dataset

This chunk cleans the income inequality dataset by removing duplicate rows and filtering out entries with missing country names. It selects the relevant columns — Country, ISO3 code, Continent, and all columns related to inequality in income — then reshapes the data from wide to long format. The year is extracted from the column names using gsub() to strip out extra text, and rows with missing inequality values are filtered out.

inequality_clean <- Income_Inequality %>%
  distinct() %>%
  filter(!is.na(Country)) %>%
  select(Country, ISO3, Continent, starts_with("Inequality in income")) %>%
  pivot_longer(
    cols      = -c(Country, ISO3, Continent),
    names_to  = "year",
    values_to = "inequality_index"
  ) %>%
  mutate(
    year             = as.integer(gsub("Inequality in income \\(|\\)", "", year)),
    inequality_index = as.numeric(inequality_index)
  ) %>%
  filter(!is.na(inequality_index))

Cleaning the Cost of Living Dataset

This chunk prepares the cost of living dataset by removing duplicate rows and entries with missing country names. It selects and renames four key columns — country, cost index, monthly income in USD, and purchasing power index — into cleaner, consistent names that align with the other datasets for easy merging.

cost_clean <- cost_of_living %>%
  distinct() %>%
  filter(!is.na(country)) %>%
  select(
    Country              = country,
    cost_index,
    monthly_income_usd   = `usd_monthly income`,
    purchasing_power_index
  )

Merging All 3 Datasets into 1 Dataset

This chunk combines all three cleaned datasets into a single unified dataframe called economic_stress. First, an inner join is performed between the inflation and inequality datasets on both Country and year, ensuring only countries with data in both datasets are retained. Then a left join adds the cost of living data by Country, preserving all rows even if cost of living data is unavailable for some countries. Finally, columns are reordered for clarity and rows are sorted alphabetically by country and then chronologically by year.

economic_stress <- inflation_clean %>%
  inner_join(inequality_clean, by = c("Country", "year")) %>%
  left_join(cost_clean, by = "Country") %>%
  select(
    Country, ISO3, Continent, year,
    headline_inflation,
    inequality_index,
    cost_index,
    monthly_income_usd,
    purchasing_power_index
  ) %>%
  arrange(Country, year)

Viewing and Checking the Merged Dataset

This chunk provides a quick overview of the merged dataset. summary() displays statistical summaries for each column such as minimum, maximum, and mean values, while nrow() confirms the total number of rows in the final dataset. These checks help verify that the merging process worked correctly.

summary(economic_stress)
##    Country              ISO3            Continent              year     
##  Length:1578        Length:1578        Length:1578        Min.   :2010  
##  Class :character   Class :character   Class :character   1st Qu.:2013  
##  Mode  :character   Mode  :character   Mode  :character   Median :2016  
##                                                           Mean   :2016  
##                                                           3rd Qu.:2019  
##                                                           Max.   :2021  
##                                                                         
##  headline_inflation  inequality_index   cost_index     monthly_income_usd
##  Min.   :   -3.230   Min.   : 5.845   Min.   : 24.80   Min.   :  88.0    
##  1st Qu.:    1.350   1st Qu.:16.316   1st Qu.: 36.50   1st Qu.: 295.0    
##  Median :    3.015   Median :21.214   Median : 46.60   Median : 635.5    
##  Mean   :   16.682   Mean   :23.442   Mean   : 60.77   Mean   :1666.5    
##  3rd Qu.:    5.790   3rd Qu.:28.893   3rd Qu.: 94.17   3rd Qu.:3499.5    
##  Max.   :17087.720   Max.   :68.337   Max.   :142.90   Max.   :7329.0    
##                                       NA's   :628      NA's   :628       
##  purchasing_power_index
##  Min.   :  5.30        
##  1st Qu.: 14.90        
##  Median : 24.70        
##  Mean   : 37.00        
##  3rd Qu.: 58.67        
##  Max.   :111.30        
##  NA's   :628
nrow(economic_stress)
## [1] 1578

Graphical Representation — Inflation by Country

The following four charts each filter the economic_stress dataset for a specific country using its ISO3 code and plot a bar chart showing how headline inflation changed year by year. The fill = headline_inflation argument applies a color gradient to the bars based on the inflation value, making it easy to identify years of high and low inflation at a glance.

United States

economic_stress %>%
  filter(ISO3 == 'USA') %>%
  ggplot(aes(x = year, y = headline_inflation, fill = headline_inflation)) +
  geom_bar(stat = 'identity') +
  labs(title = 'Rate of Inflation in the United States',
       x = 'Year',
       y = 'Inflation Rate')

Russia

economic_stress %>%
  filter(ISO3 == 'RUS') %>%
  ggplot(aes(x = year, y = headline_inflation, fill = headline_inflation)) +
  geom_bar(stat = 'identity') +
  labs(title = 'Rate of Inflation of Russia',
       x = 'Year',
       y = 'Inflation Rate')

China

economic_stress %>%
  filter(ISO3 == 'CHN') %>%
  ggplot(aes(x = year, y = headline_inflation, fill = headline_inflation)) +
  geom_bar(stat = 'identity') +
  labs(title = 'Rate of Inflation in China',
       x = 'Year',
       y = 'Inflation Rate')

India

economic_stress %>%
  filter(ISO3 == 'IND') %>%
  ggplot(aes(x = year, y = headline_inflation, fill = headline_inflation)) +
  geom_bar(stat = 'identity') +
  labs(title = 'Rate of Inflation in India',
       x = 'Year',
       y = 'Inflation Rate')

Descriptive Statistics and Missing Value Check

This chunk revisits the merged dataset to check its overall structure and identify any missing values. summary() provides descriptive statistics for all columns, and colSums(is.na()) counts how many NA values exist in each column. This is an important step before running any correlation or regression analysis to ensure data quality.

summary(economic_stress)
##    Country              ISO3            Continent              year     
##  Length:1578        Length:1578        Length:1578        Min.   :2010  
##  Class :character   Class :character   Class :character   1st Qu.:2013  
##  Mode  :character   Mode  :character   Mode  :character   Median :2016  
##                                                           Mean   :2016  
##                                                           3rd Qu.:2019  
##                                                           Max.   :2021  
##                                                                         
##  headline_inflation  inequality_index   cost_index     monthly_income_usd
##  Min.   :   -3.230   Min.   : 5.845   Min.   : 24.80   Min.   :  88.0    
##  1st Qu.:    1.350   1st Qu.:16.316   1st Qu.: 36.50   1st Qu.: 295.0    
##  Median :    3.015   Median :21.214   Median : 46.60   Median : 635.5    
##  Mean   :   16.682   Mean   :23.442   Mean   : 60.77   Mean   :1666.5    
##  3rd Qu.:    5.790   3rd Qu.:28.893   3rd Qu.: 94.17   3rd Qu.:3499.5    
##  Max.   :17087.720   Max.   :68.337   Max.   :142.90   Max.   :7329.0    
##                                       NA's   :628      NA's   :628       
##  purchasing_power_index
##  Min.   :  5.30        
##  1st Qu.: 14.90        
##  Median : 24.70        
##  Mean   : 37.00        
##  3rd Qu.: 58.67        
##  Max.   :111.30        
##  NA's   :628
colSums(is.na(economic_stress))
##                Country                   ISO3              Continent 
##                      0                      0                      0 
##                   year     headline_inflation       inequality_index 
##                      0                      0                      0 
##             cost_index     monthly_income_usd purchasing_power_index 
##                    628                    628                    628

Correlation Heatmap

This chunk first computes a correlation matrix between the four key numeric variables — headline inflation, inequality index, cost index, and purchasing power index — using only complete, non-missing rows. The melt() function from the reshape2 library then converts the matrix into a long format suitable for ggplot2, which renders it as a color-coded heatmap using geom_tile(). Stronger colors indicate a higher correlation between the two variables being compared.

cor_matrix <- economic_stress %>%
  select(headline_inflation, inequality_index, cost_index, purchasing_power_index) %>%
  cor(use = "complete.obs")

melted_cor <- melt(cor_matrix)

ggplot(melted_cor, aes(Var1, Var2, fill = value)) +
  geom_tile() +
  labs(title = "Correlation Heatmap")

Top 10 Countries by Average Inflation and Inequality

These two chunks calculate the average inflation and average inequality index for each country across all years. The top 10 countries with the highest averages are selected and stored in separate variables, which are then used in the following bar charts for comparison.

top10_inflation <- economic_stress %>%
  group_by(Country) %>%
  summarise(avg_inflation = mean(headline_inflation, na.rm = TRUE)) %>%
  arrange(desc(avg_inflation)) %>%
  top_n(10, avg_inflation)

top10_inequality <- economic_stress %>%
  group_by(Country) %>%
  summarise(avg_inequality = mean(inequality_index, na.rm = TRUE)) %>%
  top_n(10, avg_inequality)

Bar Chart — Top 10 Countries by Average Inflation

This chart plots a horizontal bar chart for the top 10 countries with the highest average headline inflation. reorder() ensures the bars are sorted from highest to lowest, and coord_flip() rotates the chart so that country names are displayed on the y-axis and are easy to read.

ggplot(top10_inflation, aes(x = reorder(Country, avg_inflation), y = avg_inflation)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(title = "Top 10 Countries by Average Inflation",
       x = "Country", y = "Avg Headline Inflation") +
  theme_minimal()

Bar Chart — Top 10 Countries by Inequality Index

This chart plots a horizontal bar chart for the top 10 countries with the highest average inequality index. Similar to the inflation chart, reorder() and coord_flip() are used to produce a clean, ranked horizontal layout for easy comparison across countries.

ggplot(top10_inequality, aes(x = reorder(Country, avg_inequality), y = avg_inequality)) +
  geom_col(fill = "tomato") +
  coord_flip() +
  labs(title = "Top 10 Countries by Inequality Index",
       x = "Country", y = "Avg Inequality Index") +
  theme_minimal()

India Inflation Trend Over Time

This chunk filters data for India specifically and plots a line graph showing how its headline inflation changed over the years. A line chart is the most appropriate choice here because it clearly highlights trends and fluctuations in inflation over time for a single country.

ggplot(economic_stress %>% filter(ISO3 == "IND"),
       aes(x = year, y = headline_inflation)) +
  geom_line(color = "blue") +
  labs(title = "India Inflation Trend")

Inflation Distribution by Continent

This chunk creates a boxplot showing the spread and distribution of headline inflation across different continents. The boxplot reveals the median, interquartile range, and any outliers in inflation values for each continent, making it easy to compare regional economic patterns at a glance.

ggplot(economic_stress, aes(x = Continent, y = headline_inflation)) +
  geom_boxplot() +
  labs(title = "Inflation Distribution by Continent")

Inflation vs Inequality — Scatter Plot

This chunk plots a scatter plot with a linear regression line to explore the relationship between the inequality index and headline inflation across all countries and years. The geom_smooth(method = "lm") layer adds a best-fit regression line along with a shaded confidence interval, helping to identify whether higher inequality tends to be associated with higher inflation.

ggplot(economic_stress,
       aes(x = inequality_index, y = headline_inflation)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(title = "Inflation vs Inequality")

Multiple Linear Regression Model

This chunk builds a multiple linear regression model where headline inflation is the outcome variable and inequality index, cost index, and purchasing power index are the predictor variables. summary(model) prints detailed results including regression coefficients, p-values, and the R-squared value, which together indicate how well these three factors explain variation in inflation across countries.

model <- lm(headline_inflation ~ inequality_index + cost_index + purchasing_power_index,
            data = economic_stress)

summary(model)
## 
## Call:
## lm(formula = headline_inflation ~ inequality_index + cost_index + 
##     purchasing_power_index, data = economic_stress)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.064 -2.154 -0.444  1.222 53.099 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             6.388070   0.494076  12.929  < 2e-16 ***
## inequality_index        0.051543   0.014501   3.554 0.000397 ***
## cost_index             -0.068475   0.008656  -7.911 7.12e-15 ***
## purchasing_power_index  0.014017   0.009496   1.476 0.140239    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.244 on 946 degrees of freedom
##   (628 observations deleted due to missingness)
## Multiple R-squared:  0.1813, Adjusted R-squared:  0.1787 
## F-statistic: 69.84 on 3 and 946 DF,  p-value: < 2.2e-16

Average Inflation and Inequality by Continent

This chunk calculates the average inflation and average inequality index for each continent, stores the results in continent_summary, and then plots them as a bar chart. This provides a high-level regional comparison of economic stress, showing which continents experience the highest levels of inflation on average.

continent_summary <- economic_stress %>%
  group_by(Continent) %>%
  summarise(
    avg_inflation  = mean(headline_inflation, na.rm = TRUE),
    avg_inequality = mean(inequality_index, na.rm = TRUE)
  )

ggplot(continent_summary, aes(x = Continent, y = avg_inflation)) +
  geom_col(fill = "steelblue") +
  labs(title = "Average Inflation by Continent",
       x = "Continent", y = "Avg Inflation") +
  theme_minimal()

Distribution of Inflation — Histogram

This chunk plots a histogram showing how headline inflation values are distributed across the entire dataset. With 30 bins, it reveals whether inflation values are normally distributed, right-skewed, or concentrated around specific ranges, providing an overall picture of inflation patterns globally.

ggplot(economic_stress, aes(x = headline_inflation)) +
  geom_histogram(bins = 30, fill = "skyblue") +
  labs(title = "Distribution of Inflation")

Inflation Comparison — India, USA, and China

This chunk filters data for three major economies — India, the USA, and China — and plots their inflation trends on a single line graph with a different color assigned to each country. This multi-line chart makes it easy to compare how these economies performed relative to each other over the same time period and to spot diverging or converging trends.

economic_stress %>%
  filter(ISO3 %in% c("IND", "USA", "CHN")) %>%
  ggplot(aes(x = year, y = headline_inflation, color = ISO3)) +
  geom_line() +
  labs(title = "Inflation Comparison — India, USA, China")

Global Average Inflation by Year

This chunk groups the data by year and calculates the global average inflation for each year, then sorts the results in descending order. This gives a quick overview of which years experienced the highest average inflation worldwide, offering useful context for understanding global economic trends over time.

economic_stress %>%
  group_by(year) %>%
  summarise(avg_inflation = mean(headline_inflation, na.rm = TRUE)) %>%
  arrange(desc(avg_inflation))
## # A tibble: 12 × 2
##     year avg_inflation
##    <int>         <dbl>
##  1  2020        133.  
##  2  2021          9.05
##  3  2016          6.78
##  4  2011          6.35
##  5  2019          6.25
##  6  2017          5.88
##  7  2012          5.85
##  8  2018          5.02
##  9  2010          4.38
## 10  2013          4.18
## 11  2015          4.07
## 12  2014          3.99
economic_stress %>%
  group_by(year) %>%
  summarise(avg_inflation = mean(headline_inflation, na.rm = TRUE)) %>%
  ggplot(aes(x = year, y = avg_inflation)) +
  geom_line(color = "steelblue") +
  geom_point() +
  labs(title = "Global Average Inflation Over Time",
       x = "Year", y = "Avg Inflation") +
  theme_minimal()

Average Income Inequality Over Time

This visualization tracks the inequality_index across different years, providing a macro-level view of how wealth distribution has shifted regionally. By calculating the annual mean for each year, we can identify broader socioeconomic patterns that individual year data.

inequality_trend <- economic_stress %>%
  group_by(year) %>%
  summarise(avg_inequality = mean(inequality_index, na.rm = TRUE))
inequality_trend <- economic_stress %>%
  group_by(year) %>%
  summarise(avg_inequality = mean(inequality_index, na.rm = TRUE))


ggplot(inequality_trend, aes(x = year, y = avg_inequality)) +
  geom_line(color = "tomato", size = 1.2) +
  geom_point(color = "tomato", size = 2) +
  labs(title = "Global Average Income Inequality Over Time",
       subtitle = "Mean Inequality Index across all recorded countries",
       x = "Year", 
       y = "Avg Inequality Index") +
  theme_minimal()

##Average Cost of Living by year Over Time

This analysis monitors the average cost_index by year to illustrate the evolving burden of essential expenses on a global scale. By observing these trends over time, we can determine how yearly affordability is affected by global supply chain shifts and local economic pressures.

cost_trend <- economic_stress %>%
  group_by(year) %>%
  summarise(avg_cost = mean(cost_index, na.rm = TRUE))
ggplot(cost_trend, aes(x = year, y = avg_cost)) +
  geom_line(color = "darkgreen", size = 1.2) +
  geom_point(color = "darkgreen", size = 2) +
  labs(title = "Global Average Cost of Living Over Time",
       subtitle = "Relative cost index trends based on available global data",
       x = "Year", 
       y = "Avg Cost Index") +
  theme_minimal()