Final Project Stats

Abstract

This project analyzes global monthly food price inflation using an international dataset to examine overall inflation patterns, volatility, and extreme price events. The analysis explores the distribution of food price inflation across countries, compares inflation behavior between stable and unstable markets, and statistically tests whether average inflation differs between high and low inflation countries. Using summary statistics, visualizations, correlation analysis, and hypothesis testing, the results show that food price inflation is highly right-skewed, with rare but extreme inflation events concentrated in a small number of countries. These findings highlight substantial differences in price stability across regions and the economic risks associated with persistent high inflation.

Introduction

Food price inflation plays a critical role in economic stability, household purchasing power, and food security, particularly in countries experiencing economic or political instability. Large swings in food prices can disproportionately affect vulnerable populations and create uncertainty for businesses and policymakers. This project examines global food price inflation to better understand how inflation levels and volatility vary across countries. The primary objective is to identify overall inflation patterns, assess extreme inflation events, and determine whether statistically significant differences exist between high-inflation and low-inflation markets.

Introduction to the Data Where did you find it?

The dataset was obtained from a publicly available international economic database that reports monthly food price inflation statistics for countries around the world.

Who or what organization uploaded it?

The data was published and maintained by an international economic organization that tracks inflation and price indices across global markets.

When was it last updated?

The dataset includes observations through recent years, with update timing varying by country based on reporting availability.

Time-frame and geographical-frame

The data covers monthly observations beginning in 2001 and spans a global geographical frame, including over 200 countries and regions worldwide.

How many columns and rows?

The dataset contains 59,839 rows and 4 columns.

What does each row represent?

Each row represents a single country’s monthly food price inflation rate for a specific month and year.

Data Cleaning and Preparation

Several data preparation steps were required before analysis. The time variable was converted from a character format into a Date format to allow for time-series analysis. Variables were renamed for clarity, with the inflation measure labeled as Inflation_Rate and country names labeled as Region. Observations with missing inflation values were removed to ensure the accuracy of summary statistics, visualizations, and statistical tests.

Creating new variables or calculating new fields

New variables were created to improve readability and usability, including renamed inflation and region fields derived from the original dataset.

Binning or re-coding

No categorical binning was required; however, percentile-based thresholds were used to identify extreme inflation outliers.

Data type conversions

The time variable was converted to a Date class, and inflation values were confirmed to be numeric for statistical analysis.

Business Questions

This analysis addresses the following questions:

What is the overall distribution of monthly food price inflation across global markets?

Which countries experience the most extreme food price inflation events?

Is there a relationship between a country’s average food price inflation and its inflation volatility?

Is there a statistically significant difference in average monthly food price inflation between a high-inflation country and a low-inflation country?

Analysis

The analysis includes summary statistics and histograms to examine the global distribution of food price inflation. Time-series plots compare inflation trends across selected countries. An outlier analysis identifies extreme inflation events using the 99th percentile threshold. A correlation analysis evaluates the relationship between average inflation and inflation volatility across countries. Finally, a two-sample t-test compares average monthly inflation rates between Venezuela and Switzerland as representative high- and low-inflation markets.

Results

The results indicate that global food price inflation is heavily right-skewed, with most observations clustered near low positive values and a small number of extreme inflation events. Venezuela accounts for the majority of the highest inflation observations, particularly during 2018 and 2019. The correlation analysis shows a strong positive relationship between a country’s average inflation rate and inflation volatility, suggesting that higher inflation is associated with greater price instability. The t-test results demonstrate a statistically significant difference in average monthly food price inflation between Venezuela and Switzerland.

Conclusion

This project demonstrates that while most countries experience relatively stable food price inflation, a small subset of markets faces extreme and volatile inflation conditions. These environments create heightened economic risk for consumers, businesses, and policymakers. The statistically significant differences between high- and low-inflation countries emphasize the importance of stable monetary and fiscal policies. Future research could explore the underlying causes of extreme inflation or examine how food price inflation impacts food security and income inequality.

LLM Usage Report

A large language model was used to assist with structuring the report and refining the clarity of written explanations. All data analysis, coding, and interpretation of statistical results were performed by the author.

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)

##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

library(tidyverse)

## Warning: package 'readr' was built under R version 4.5.2

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

setwd("C:/stats 2")
 read_csv("food_price_inflation.csv")

## Rows: 59839 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): REF_AREA, REF_AREA_LABEL, TIME_PERIOD
## dbl (1): OBS_VALUE
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

## # A tibble: 59,839 × 4
##    REF_AREA REF_AREA_LABEL TIME_PERIOD OBS_VALUE
##    <chr>    <chr>          <chr>           <dbl>
##  1 AFG      Afghanistan    1/1/2001       22.9  
##  2 AFG      Afghanistan    2/1/2001       24.4  
##  3 AFG      Afghanistan    3/1/2001       21.2  
##  4 AFG      Afghanistan    4/1/2001       17.3  
##  5 AFG      Afghanistan    5/1/2001        9.33 
##  6 AFG      Afghanistan    6/1/2001       12.0  
##  7 AFG      Afghanistan    7/1/2001        9.48 
##  8 AFG      Afghanistan    8/1/2001        2.52 
##  9 AFG      Afghanistan    9/1/2001        0.789
## 10 AFG      Afghanistan    10/1/2001      12.5  
## # ℹ 59,829 more rows

# Load necessary libraries
library(tidyverse)
library(dplyr)
library(scales)

## 
## Attaching package: 'scales'

## The following object is masked from 'package:purrr':
## 
##     discard

## The following object is masked from 'package:readr':
## 
##     col_factor

# --- Part 1: Explore and Identify High-Inflation Markets (Simple & Well-Labeled Charts) ---

# 1. Data Import and Structure

data_path <- "food_price_inflation.csv" 
inflation_data <- read_csv(data_path)

## Rows: 59839 Columns: 4

## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): REF_AREA, REF_AREA_LABEL, TIME_PERIOD
## dbl (1): OBS_VALUE
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

inflation_data <- inflation_data %>%
  mutate(
    # Convert TIME_PERIOD to Date format
    TIME_PERIOD = as.Date(TIME_PERIOD, format = "%m/%d/%Y"),
    # Rename for clarity
    Region = REF_AREA_LABEL,
    Inflation_Rate = OBS_VALUE
  ) %>%
  select(REF_AREA, Region, TIME_PERIOD, Inflation_Rate) %>%
  drop_na(Inflation_Rate)

print("1. Data Structure (Head and Summary):")

## [1] "1. Data Structure (Head and Summary):"

print(head(inflation_data))

## # A tibble: 6 × 4
##   REF_AREA Region      TIME_PERIOD Inflation_Rate
##   <chr>    <chr>       <date>               <dbl>
## 1 AFG      Afghanistan 2001-01-01           22.9 
## 2 AFG      Afghanistan 2001-02-01           24.4 
## 3 AFG      Afghanistan 2001-03-01           21.2 
## 4 AFG      Afghanistan 2001-04-01           17.3 
## 5 AFG      Afghanistan 2001-05-01            9.33
## 6 AFG      Afghanistan 2001-06-01           12.0

# ----------------------------------------------------------------------

# 2. Summary Statistics & Key Metrics

mean_inflation <- mean(inflation_data$Inflation_Rate, na.rm = TRUE)
overall_volatility <- sd(inflation_data$Inflation_Rate, na.rm = TRUE)

print("\n2. Summary Statistics for Inflation_Rate (OBS_VALUE):")

## [1] "\n2. Summary Statistics for Inflation_Rate (OBS_VALUE):"

summary_stats <- summary(inflation_data$Inflation_Rate)
print(summary_stats)

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##    -24.98      1.50      4.09     45.48      8.59 371537.50

cat(paste("Overall Inflation Volatility (Std Dev):", round(overall_volatility, 3), "\n"))

## Overall Inflation Volatility (Std Dev): 2864.567

# ----------------------------------------------------------------------

# 3. Simple & Well-Labeled Visualization (Two Key Charts)

## --- Chart 1: Distribution of Global Food Price Inflation (Simple Histogram) ---

inflation_histogram_simple <- ggplot(inflation_data, aes(x = Inflation_Rate)) +
  # Simple histogram bars, slightly transparent
  geom_histogram(binwidth = 2, fill = "darkblue", color = "white", alpha = 0.7) +
  
  # Add mean line with clear label
  geom_vline(aes(xintercept = mean_inflation), color = "red", linetype = "dashed", linewidth = 1) +
  geom_text(aes(x = mean_inflation + 2, y = 1500, label = paste("Mean:", round(mean_inflation, 2), "%")), 
            color = "red", size = 4) +
  
  labs(
    title = "Global Distribution of Monthly Food Price Inflation Rates",
    x = "Inflation Rate (%)",
    y = "Number of Observations (Frequency)",
    caption = "Data is highly skewed, indicating frequent extreme inflation events."
  ) +
  # Use a clean theme
  theme_classic(base_size = 14) +
  theme(plot.title = element_text(face = "bold", hjust = 0.5))

print(inflation_histogram_simple)

## Warning in geom_text(aes(x = mean_inflation + 2, y = 1500, label = paste("Mean:", : All aesthetics have length 1, but the data has 59839 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
##   a single row.

# 

## --- Chart 2: Time-series Line Plot for Selected Markets (Raw Line View) ---

# Using the same four markets for comparison
selected_markets_for_ts <- c("AFG", "EUR", "SAS", "AFR")

time_series_plot_simple <- inflation_data %>%
  filter(REF_AREA %in% selected_markets_for_ts) %>%
  mutate(Market_Label = paste0(Region, " (", REF_AREA, ")")) %>%
  
  ggplot(aes(x = TIME_PERIOD, y = Inflation_Rate)) +
  # Raw line plot only (no smoothing)
  geom_line(aes(color = Market_Label), linewidth = 0.8) +
  
  # Use facet_wrap for separate views
  facet_wrap(~ Market_Label, ncol = 2, scales = "free_y") +
  
  labs(
    title = "Food Price Inflation Trends in Select Global Markets",
    subtitle = "Y-axis scales vary for clarity.",
    x = "Time Period (Year)",
    y = "Monthly Inflation Rate (%)"
  ) +
  scale_x_date(date_breaks = "5 years", date_labels = "%Y") +
  # Use a simple theme
  theme_bw(base_size = 12) +
  theme(
    legend.position = "none",
    strip.background = element_rect(fill = "grey90"),
    strip.text = element_text(face = "bold"),
    plot.title = element_text(face = "bold", hjust = 0.5)
  )

print(time_series_plot_simple)

# 

# ----------------------------------------------------------------------

# 4. Outlier Analysis (Non-Visual)

q99 <- quantile(inflation_data$Inflation_Rate, 0.99, na.rm = TRUE)

outliers <- inflation_data %>%
  filter(Inflation_Rate >= q99) %>%
  arrange(desc(Inflation_Rate))

print(paste("\n4. Outlier Analysis: 99th Percentile Threshold is", round(q99, 3), "%"))

## [1] "\n4. Outlier Analysis: 99th Percentile Threshold is 68.376 %"

print("Top 10 Extreme Inflation Events:")

## [1] "Top 10 Extreme Inflation Events:"

print(head(outliers, 10))

## # A tibble: 10 × 4
##    REF_AREA Region        TIME_PERIOD Inflation_Rate
##    <chr>    <chr>         <date>               <dbl>
##  1 VEN      Venezuela, RB 2019-02-01         371538.
##  2 VEN      Venezuela, RB 2019-03-01         346805.
##  3 VEN      Venezuela, RB 2019-04-01         276791.
##  4 VEN      Venezuela, RB 2019-01-01         251932.
##  5 VEN      Venezuela, RB 2019-05-01         169200.
##  6 VEN      Venezuela, RB 2018-12-01         143746.
##  7 VEN      Venezuela, RB 2018-11-01         115840.
##  8 VEN      Venezuela, RB 2019-06-01         105928.
##  9 VEN      Venezuela, RB 2018-10-01          67975.
## 10 VEN      Venezuela, RB 2019-07-01          67167.

# 5. Correlation: Mean Inflation vs. Inflation Volatility

market_metrics <- inflation_data %>%
  group_by(Region) %>%
  summarise(
    Mean_Inflation = mean(Inflation_Rate, na.rm = TRUE),
    Inflation_Volatility = sd(Inflation_Rate, na.rm = TRUE),
    N_Obs = n(),
    .groups = 'drop'
  ) %>%
  filter(N_Obs > 10) %>% 
  drop_na(Inflation_Volatility)

correlation_result <- cor(market_metrics$Mean_Inflation, market_metrics$Inflation_Volatility)

cat(paste("\nCorrelation between Mean Inflation and Inflation Volatility (N=", nrow(market_metrics), " markets):", round(correlation_result, 3), "\n"))

## 
## Correlation between Mean Inflation and Inflation Volatility (N= 206  markets): 1

## --- Chart 3: Correlation Scatter Plot (Simple) ---

correlation_plot_simple <- ggplot(market_metrics, aes(x = Mean_Inflation, y = Inflation_Volatility)) +
  # Simple, uniform points
  geom_point(color = "darkgreen", alpha = 0.7) +
  # Linear trend line
  geom_smooth(method = "lm", color = "red", se = FALSE, linetype = "dashed") +
  
  labs(
    title = "Relationship Between Average Inflation and Volatility",
    subtitle = paste("Pearson Correlation (r) =", round(correlation_result, 3)),
    x = "Average Monthly Inflation Rate (%)",
    y = "Inflation Volatility (Standard Deviation)",
    caption = "A positive correlation indicates high-inflation markets are typically more volatile."
  ) +
  theme_minimal(base_size = 14) +
  theme(plot.title = element_text(face = "bold", hjust = 0.5))

print(correlation_plot_simple)

## `geom_smooth()` using formula = 'y ~ x'

Global Distribution of Monthly Food Price Inflation Rates (Histogram) This chart illustrates that the majority of monthly food price inflation observations are clustered near the zero to low positive range, indicating overall price stability for most data points. However, the distribution is extremely right-skewed, confirming that high-inflation months are rare but significant outliers in the dataset.
Food Price Inflation Trends in Select Global Markets (Time-series Plot) This visualization clearly separates the raw monthly inflation trends across four distinct markets, highlighting substantial differences in price stability and level. Markets such as Afghanistan (AFG) exhibit extremely high volatility and frequent large spikes, in stark contrast to the stable, low inflation rates observed in Europe (EUR).
Relationship Between Average Inflation and Volatility (Correlation Scatter Plot) The scatter plot demonstrates a strong positive correlation between a market’s long-term average food price inflation rate and its price volatility (standard deviation). This result indicates that regions suffering from higher chronic inflation also face significantly greater, less predictable swings in food prices, increasing economic risk.

# Load necessary libraries
library(tidyverse)
library(dplyr)
library(scales) 

# --- Part 2: Select and Compare Two Markets ---

# 1. Data Import, Cleaning, and Aggregation 
# This step ensures the script is standalone and runnable.

data_path <- "food_price_inflation.csv" 
inflation_data <- read_csv(data_path)

## Rows: 59839 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): REF_AREA, REF_AREA_LABEL, TIME_PERIOD
## dbl (1): OBS_VALUE
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

inflation_data <- inflation_data %>%
  mutate(
    # Ensure TIME_PERIOD is correctly converted to Date
    TIME_PERIOD = as.Date(TIME_PERIOD, format = "%m/%d/%Y"),
    Region = REF_AREA_LABEL,
    Inflation_Rate = OBS_VALUE
  ) %>%
  select(REF_AREA, Region, TIME_PERIOD, Inflation_Rate) %>%
  # Remove missing inflation rates
  drop_na(Inflation_Rate)

# Calculate key market-level metrics (Mean, Volatility, Peak)
market_metrics <- inflation_data %>%
  group_by(Region) %>%
  summarise(
    Mean_Inflation = mean(Inflation_Rate, na.rm = TRUE),
    Inflation_Volatility = sd(Inflation_Rate, na.rm = TRUE),
    Peak_Inflation = max(Inflation_Rate, na.rm = TRUE),
    N_Obs = n(),
    .groups = 'drop'
  ) %>%
  # Filter for markets with sufficient data for reliability
  filter(N_Obs >= 12) %>% 
  drop_na(Inflation_Volatility)


# ----------------------------------------------------------------------

# 2. Identify and Visualize the Top 10 Markets by Average Inflation Rate 

top_10_markets <- market_metrics %>%
  arrange(desc(Mean_Inflation)) %>%
  head(10)

print("\nTop 10 Markets by Overall Average Inflation Rate:")

## [1] "\nTop 10 Markets by Overall Average Inflation Rate:"

print(top_10_markets)

## # A tibble: 10 × 5
##    Region               Mean_Inflation Inflation_Volatility Peak_Inflation N_Obs
##    <chr>                         <dbl>                <dbl>          <dbl> <int>
##  1 Venezuela, RB                7967.               40737.        371538.    286
##  2 South Sudan                    73.0                113.           513.    187
##  3 Zimbabwe                       66.2                172.           980.    294
##  4 Sudan                          62.3                 60.2          273.    198
##  5 Lebanon                        52.9                111.           483.    294
##  6 Congo, Dem. Rep.               37.6                110.           635.    194
##  7 Argentina                      35.5                 55.0          308.    294
##  8 Syrian Arab Republic           32.5                 45.2          193.    277
##  9 Angola                         31.8                 32.2          166.    294
## 10 Iran, Islamic Rep.             26.9                 19.5           87.1   294

# Visualization: Top 10 Markets Bar Chart (Simple and Clean)
top_10_plot_simple <- ggplot(top_10_markets, aes(x = reorder(Region, Mean_Inflation), y = Mean_Inflation)) +
  geom_bar(stat = "identity", fill = "#1F77B4", color = "black") + 
  geom_text(aes(label = round(Mean_Inflation, 1)), hjust = -0.1, size = 4) +
  
  coord_flip() + 
  scale_y_continuous(labels = label_percent(scale = 1), name = "Average Monthly Inflation Rate (%)") +
  
  labs(
    title = "Top 10 Global Markets by Average Monthly Inflation",
    x = "Market/Region"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    panel.grid.major.y = element_blank()
  )

print(top_10_plot_simple)

# 

# ----------------------------------------------------------------------

# 3. & 4. Market Selection, Comparison, and Data Output

# Dynamic Selection: Choose the highest and lowest inflation markets
high_market_name <- top_10_markets$Region[1]
low_market_name <- market_metrics %>% 
  arrange(Mean_Inflation) %>% 
  head(1) %>% 
  pull(Region)

print(paste("\nSelected High-Risk Market:", high_market_name))

## [1] "\nSelected High-Risk Market: Venezuela, RB"

print(paste("Selected Low-Risk Market:", low_market_name))

## [1] "Selected Low-Risk Market: Switzerland"

# Comparison Table: Key Inflation Metrics
comparison_table <- market_metrics %>%
  filter(Region %in% c(high_market_name, low_market_name)) %>%
  select(Region, Mean_Inflation, Inflation_Volatility, Peak_Inflation, N_Obs)

print("\nComparison Table: Key Inflation Metrics for Selected Markets")

## [1] "\nComparison Table: Key Inflation Metrics for Selected Markets"

print(comparison_table)

## # A tibble: 2 × 5
##   Region        Mean_Inflation Inflation_Volatility Peak_Inflation N_Obs
##   <chr>                  <dbl>                <dbl>          <dbl> <int>
## 1 Switzerland            0.488                 1.84           6.48   294
## 2 Venezuela, RB       7967.                40737.        371538.     286

# Filter raw data and categorize for the comparison chart
comparison_data <- inflation_data %>%
  filter(Region %in% c(high_market_name, low_market_name)) %>%
  mutate(Category = ifelse(Region == high_market_name, "High Risk", "Low Risk"))

# ----------------------------------------------------------------------

# 5. Final Simple, Solid, and ZOOMED Boxplot Comparison

comparison_boxplot_final_zoomed <- ggplot(comparison_data, aes(x = Category, y = Inflation_Rate, fill = Category)) +
  
  # Solid Boxplot
  geom_boxplot(color = "black", outlier.color = "red", outlier.size = 2, linewidth = 0.8) + 
  
  labs(
    # Simplified Title
    title = paste("Inflation Distribution:", high_market_name, "vs.", low_market_name),
    subtitle = "Median, Quartiles, and Outliers shown. Y-axis zoomed for clarity.",
    x = "Market/Risk Category",
    y = "Monthly Inflation Rate (%)"
  ) +
  # Use simple, contrasting, solid colors
  scale_fill_manual(values = c("Low Risk" = "darkgreen", "High Risk" = "firebrick")) +
  
  # APPLYING THE ZOOM: Focuses the plot on the range from -5% to 100%
  coord_cartesian(ylim = c(-5, 100)) + 
  scale_y_continuous(labels = label_percent(scale = 1), breaks = seq(0, 100, 20)) +
  
  # Use theme_bw for a clean, classic look
  theme_bw(base_size = 14) +
  theme(
    legend.position = "none",
    plot.title = element_text(face = "bold", hjust = 0.5),
    panel.grid.major.x = element_blank()
  )

print(comparison_boxplot_final_zoomed)

Top 10 Global Markets by Average Monthly Inflation (Bar Chart) This bar chart clearly ranks global markets by their long-term average monthly food price inflation, identifying the ten regions facing the highest sustained price pressure. The top-ranked market exhibits an average inflation rate that is dramatically higher than the rest, demonstrating extreme and persistent cost pressure over the period observed.
Inflation Distribution: [High Market] vs. [Low Market] (Zoomed Boxplot) The zoomed boxplot clearly illustrates the vast difference in inflation stability between the selected highest- and lowest-risk markets. The low-risk market’s distribution is tightly compressed near zero (low median and volatility), contrasting sharply with the high-risk market, which shows a much higher median and a wide spread, indicating significant baseline prices and extreme volatility.

# Load necessary libraries
library(tidyverse)
library(dplyr)
library(scales) 

# --- Data Preparation (Re-running Part 2 selection for consistency) ---

data_path <- "food_price_inflation.csv" 
inflation_data <- read_csv(data_path)

## Rows: 59839 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): REF_AREA, REF_AREA_LABEL, TIME_PERIOD
## dbl (1): OBS_VALUE
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

inflation_data <- inflation_data %>%
  mutate(
    TIME_PERIOD = as.Date(TIME_PERIOD, format = "%m/%d/%Y"),
    Region = REF_AREA_LABEL,
    Inflation_Rate = OBS_VALUE
  ) %>%
  select(REF_AREA, Region, TIME_PERIOD, Inflation_Rate) %>%
  drop_na(Inflation_Rate)

# Calculate key market-level metrics to find comparison markets
market_metrics <- inflation_data %>%
  group_by(Region) %>%
  summarise(
    Mean_Inflation = mean(Inflation_Rate, na.rm = TRUE),
    N_Obs = n(),
    .groups = 'drop'
  ) %>%
  filter(N_Obs >= 12)

# Dynamic Selection: Choose the highest and lowest inflation markets
high_market_name <- market_metrics %>% 
  arrange(desc(Mean_Inflation)) %>% 
  head(1) %>% 
  pull(Region)
low_market_name <- market_metrics %>% 
  arrange(Mean_Inflation) %>% 
  head(1) %>% 
  pull(Region)

# Filter raw data and categorize for the statistical test
comparison_data <- inflation_data %>%
  filter(Region %in% c(high_market_name, low_market_name)) %>%
  mutate(Category = ifelse(Region == high_market_name, "High Risk", "Low Risk"))

# Ensure the data contains both categories
if (length(unique(comparison_data$Category)) < 2) {
  stop("Error: Cannot perform t-test. Data for both selected markets is not available.")
}

# ----------------------------------------------------------------------

# --- Part 3: Statistical Test ---

cat("\n--- Part 3: Statistical Test ---\n")

## 
## --- Part 3: Statistical Test ---

cat(paste("Hypothesis Test: Comparing Average Monthly Inflation Rate between", high_market_name, "and", low_market_name, "\n"))

## Hypothesis Test: Comparing Average Monthly Inflation Rate between Venezuela, RB and Switzerland

# 1. Conduct a t-test
# H0 (Null Hypothesis): The true difference in mean inflation rates between the two markets is zero.
# Ha (Alternative Hypothesis): The true difference in mean inflation rates is not zero.

# We use var.equal = FALSE (Welch's t-test) as variances are highly likely to be unequal 
# between a high-volatility and a low-volatility market.
t_test_result <- t.test(Inflation_Rate ~ Category, 
                        data = comparison_data,
                        var.equal = FALSE)

# 2. Report the p-value and Interpretation

print(t_test_result)

## 
##  Welch Two Sample t-test
## 
## data:  Inflation_Rate by Category
## t = 3.3073, df = 285, p-value = 0.001063
## alternative hypothesis: true difference in means between group High Risk and group Low Risk is not equal to 0
## 95 percent confidence interval:
##   3225.387 12708.047
## sample estimates:
## mean in group High Risk  mean in group Low Risk 
##            7967.2046979               0.4878354

# Extract key statistics
p_value <- t_test_result$p.value
mean_high <- t_test_result$estimate[1]
mean_low <- t_test_result$estimate[2]
test_statistic <- t_test_result$statistic

cat(paste("\nTest Statistic (t):", round(test_statistic, 3), "\n"))

## 
## Test Statistic (t): 3.307

cat(paste("P-value:", format.pval(p_value, digits = 5), "\n"))

## P-value: 0.0010628

if (p_value < 0.05) {
  interpretation <- paste("Since the p-value is less than 0.05, we reject the null hypothesis (H0).")
  conclusion <- paste("Conclusion: There is statistically significant evidence to suggest that the average monthly inflation rate in", high_market_name, "is different from the average rate in", low_market_name, ".")
} else {
  interpretation <- paste("Since the p-value is greater than 0.05, we fail to reject the null hypothesis (H0).")
  conclusion <- paste("Conclusion: There is no statistically significant evidence to suggest that the average monthly inflation rate differs between", high_market_name, "and", low_market_name, ".")
}

cat(paste("\nStatistical Interpretation:", interpretation, "\n"))

## 
## Statistical Interpretation: Since the p-value is less than 0.05, we reject the null hypothesis (H0).

cat(paste("Final Conclusion:", conclusion, "\n"))

## Final Conclusion: Conclusion: There is statistically significant evidence to suggest that the average monthly inflation rate in Venezuela, RB is different from the average rate in Switzerland .

# 3. Provide a business explanation (This is a narrative part of the report):
# Based on the results, you would explain *why* this difference exists (or doesn't exist)
# focusing on economic factors (e.g., instability, currency devaluation, conflict, policy control, etc.)

# Load necessary libraries
library(tidyverse)
library(dplyr)
library(scales) 

# --- Data Preparation (Re-running Part 2 selection for consistency) ---

data_path <- "food_price_inflation.csv" 
inflation_data <- read_csv(data_path)

## Rows: 59839 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): REF_AREA, REF_AREA_LABEL, TIME_PERIOD
## dbl (1): OBS_VALUE
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

inflation_data <- inflation_data %>%
  mutate(
    TIME_PERIOD = as.Date(TIME_PERIOD, format = "%m/%d/%Y"),
    Region = REF_AREA_LABEL,
    Inflation_Rate = OBS_VALUE
  ) %>%
  select(REF_AREA, Region, TIME_PERIOD, Inflation_Rate) %>%
  drop_na(Inflation_Rate)

# Calculate key market-level metrics (Mean, Volatility, Peak)
market_metrics <- inflation_data %>%
  group_by(Region) %>%
  summarise(
    Mean_Inflation = mean(Inflation_Rate, na.rm = TRUE),
    Inflation_Volatility = sd(Inflation_Rate, na.rm = TRUE),
    Peak_Inflation = max(Inflation_Rate, na.rm = TRUE),
    N_Obs = n(),
    .groups = 'drop'
  ) %>%
  filter(N_Obs >= 12) %>% 
  drop_na(Inflation_Volatility)

# ----------------------------------------------------------------------

# --- Part 4: Decision Support Data ---

# 1. Identify Top Candidates for Intervention
# Prioritize by Mean Inflation (chronic suffering) and then select the ones with highest volatility (risk).

decision_support_table <- market_metrics %>%
  arrange(desc(Mean_Inflation), desc(Inflation_Volatility)) %>%
  # Select the top 5 markets for policy review
  head(5) %>%
  select(Region, Mean_Inflation, Inflation_Volatility, Peak_Inflation, N_Obs) %>%
  # Rename columns for clarity in the policy brief
  rename(
    Average_Inflation_Pct = Mean_Inflation,
    Inflation_Volatility_SD = Inflation_Volatility,
    Peak_Inflation_Pct = Peak_Inflation
  )

print("\n--- Part 4: Top 5 Markets for Policy Intervention Review ---")

## [1] "\n--- Part 4: Top 5 Markets for Policy Intervention Review ---"

print(decision_support_table)

## # A tibble: 5 × 5
##   Region   Average_Inflation_Pct Inflation_Volatility…¹ Peak_Inflation_Pct N_Obs
##   <chr>                    <dbl>                  <dbl>              <dbl> <int>
## 1 Venezue…                7967.                 40737.             371538.   286
## 2 South S…                  73.0                  113.                513.   187
## 3 Zimbabwe                  66.2                  172.                980.   294
## 4 Sudan                     62.3                   60.2               273.   198
## 5 Lebanon                   52.9                  111.                483.   294
## # ℹ abbreviated name: ¹Inflation_Volatility_SD

# 2. Policy Prioritization (Narrative based on the table output)

# Based on the table: The markets with the highest Average_Inflation_Pct (Mean Inflation) 
# and highest Volatility (SD) are the top candidates.

Final Project Stats

Onnika DeBruhl

2025-12-08

R Markdown

Including Plots