Karim M Data 110 Final

Data 110 Final

Image source: https://www.latimes.com/world-nation/story/2020-08-25/beirut-blast-unanswered-questions-conspiracy-theories

Intro:

I will be doing my final project on Lebanese product prices from the year 2013 to the current year 2025 using data sourced from the World Food Programme (wfp.org). In the final project I will be looking at specific time frames in the last decade, and how they’ve affected the value of products in the Lebanese economy. I will particularly look at the years 2020 (beginning of Covid) to the present year 2025. The data was collected using surveys, market monitoring, among other tools to monitor global and country wide economic standards. I chose this topic because ethnically I am Lebanese, it’s where I’m from and it will always be a piece of me. I wanted to explore the data set and see how prices of goods were impacted by major events in the past 5 years in Lebanon. Last time I was there it was 2014, and prices were viable and often affordable, with products being very accessible. Since the collapse of the banking system, the Beirut Port explosion, Covid, subsidy losses, etc. prices have skyrocketed and have made basic goods impossible to find or way too expensive for a large part of the population.

# Load data set and libraries, and removes the first row that has repetitive column labels.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(plotly)
Warning: package 'plotly' was built under R version 4.4.3

Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout
library(ggfortify)
Warning: package 'ggfortify' was built under R version 4.4.3
setwd("C:/Users/kmard/Desktop/Data Mine")
leb_food_prices <- read_csv("wfp_food_prices_lbn.csv", skip = 1, show_col_types = FALSE)
#Check Column names before renaming them.
colnames(leb_food_prices)
 [1] "#date"            "#adm1+name"       "#adm2+name"       "#loc+market+name"
 [5] "#loc+market+code" "#geo+lat"         "#geo+lon"         "#item+type"      
 [9] "#item+name"       "#item+code"       "#item+unit"       "#item+price+flag"
[13] "#item+price+type" "#currency+code"   "#value"           "#value+usd"      
#Renamed the Columns.
leb_food_prices <- leb_food_prices |>
  rename(Date = `#date`, Governance = `#adm1+name`, District = `#adm2+name`, Market = `#loc+market+name`, `Market Code` = `#loc+market+code`, Latitude = `#geo+lat`, Longitude = `#geo+lon`, `Item Type` = `#item+type`, `Item Name` = `#item+name`, `Item Code` = `#item+code`, `Item Unit` = `#item+unit`, `Price Flag` = `#item+price+flag`, `Price Retail` = `#item+price+type`, Currency = `#currency+code`, `LBP Value` = `#value`, `USD Value` = `#value+usd`)
# Look at unique Item Types.
unique(leb_food_prices$`Item Type`)
[1] "meat, fish and eggs"   "milk and dairy"        "oil and fats"         
[4] "vegetables and fruits" "non-food"              "pulses and nuts"      
[7] "cereals and tubers"    "miscellaneous food"   
# Look at unique Item Names and sorted in alphabetic order.
unique(leb_food_prices$`Item Name`)
 [1] "Meat (beef, canned)"               "Cheese (picon)"                   
 [3] "Oil (sunflower)"                   "Eggs"                             
 [5] "Fish (tuna, canned)"               "Fish (sardine, canned)"           
 [7] "Cabbage"                           "Lettuce"                          
 [9] "Meat (chicken, whole, frozen)"     "Cucumbers (greenhouse)"           
[11] "Spinach"                           "Fuel (diesel)"                    
[13] "Fuel (petrol-gasoline, 95 octane)" "Lentils (green)"                  
[15] "Lentils (red)"                     "Fuel (gas)"                       
[17] "Bread (pita)"                      "Pasta (spaghetti)"                
[19] "Wheat flour"                       "Milk (powder)"                    
[21] "Sugar (white)"                     "Tomatoes (paste)"                 
[23] "Beans (white)"                     "Chickpeas"                        
[25] "Bulgur (brown)"                    "Salt"                             
[27] "Rice (imported, Egyptian)"         "Tea"                              
[29] "Lentils"                           "Apples"                           
[31] "Carrots"                           "Potatoes"                         
[33] "Exchange rate (unofficial)"       
sort(unique(leb_food_prices$`Item Name`))
 [1] "Apples"                            "Beans (white)"                    
 [3] "Bread (pita)"                      "Bulgur (brown)"                   
 [5] "Cabbage"                           "Carrots"                          
 [7] "Cheese (picon)"                    "Chickpeas"                        
 [9] "Cucumbers (greenhouse)"            "Eggs"                             
[11] "Exchange rate (unofficial)"        "Fish (sardine, canned)"           
[13] "Fish (tuna, canned)"               "Fuel (diesel)"                    
[15] "Fuel (gas)"                        "Fuel (petrol-gasoline, 95 octane)"
[17] "Lentils"                           "Lentils (green)"                  
[19] "Lentils (red)"                     "Lettuce"                          
[21] "Meat (beef, canned)"               "Meat (chicken, whole, frozen)"    
[23] "Milk (powder)"                     "Oil (sunflower)"                  
[25] "Pasta (spaghetti)"                 "Potatoes"                         
[27] "Rice (imported, Egyptian)"         "Salt"                             
[29] "Spinach"                           "Sugar (white)"                    
[31] "Tea"                               "Tomatoes (paste)"                 
[33] "Wheat flour"                      
#Separated the descriptors from the Product Names.
leb_food_prices <- leb_food_prices |>
  separate(`Item Name`, into = c("Product", "Product Detail"), sep = " \\(", fill = "right") |>
  mutate(`Product Detail` = str_remove(`Product Detail`, "\\)$"))
# Source on how to split the character columns: https://tidyr.tidyverse.org/reference/separate.html
# Arranges the Product column alphabetically
unique(leb_food_prices$Product)
 [1] "Meat"          "Cheese"        "Oil"           "Eggs"         
 [5] "Fish"          "Cabbage"       "Lettuce"       "Cucumbers"    
 [9] "Spinach"       "Fuel"          "Lentils"       "Bread"        
[13] "Pasta"         "Wheat flour"   "Milk"          "Sugar"        
[17] "Tomatoes"      "Beans"         "Chickpeas"     "Bulgur"       
[21] "Salt"          "Rice"          "Tea"           "Apples"       
[25] "Carrots"       "Potatoes"      "Exchange rate"
leb_food_prices <- leb_food_prices |>
  arrange(Product)
# Check for any NAs.
colSums(is.na(leb_food_prices))
          Date     Governance       District         Market    Market Code 
             0              0              0              0              0 
      Latitude      Longitude      Item Type        Product Product Detail 
             0              0              0              0           5918 
     Item Code      Item Unit     Price Flag   Price Retail       Currency 
             0              0              0              0              0 
     LBP Value      USD Value 
             0              0 
# Cleans NAs in the Product Detail column.
lfp_clean <- leb_food_prices |>
  filter(!is.na(`Product Detail`))
# Check the data sets date range.
range(lfp_clean$Date)
[1] "1/15/2013" "9/15/2024"
# Convert characters in Date to Numerical.
lfp_2 <- lfp_clean |>
  mutate(Date = as.Date(Date, format = "%m/%d/%Y")) |>
  filter(Date >= as.Date("2020-01-15") & Date <= as.Date("2025-03-15"))
# Source: https://stat.ethz.ch/R-manual/R-devel/library/base/html/as.Date.html
# Filter for 4 cheapest food products in Lebanon.
cheapest_goods_lb <- lfp_2 |>
  filter(Product %in% c("Bread", "Bulgur", "Cheese", "Cucumbers"))
# Line graph that shows cheapest product inflation from 2020 - 2025 in Lebanon.
ggplot(cheapest_goods_lb, aes(x = Date, y = `USD Value`, color = Product)) +
  geom_line(alpha = 0.6, size = 0.8) +
  scale_color_manual(values = c("Bread" = "#0000CC","Bulgur" = "#0099CC","Cheese" = "#00CC66","Cucumbers" = "#FF00FF")) +
  labs(title = "Cheapest Food Products in USD (2020–2025)", x = "Year", y = "USD Value", color = "Cheap Foods",
       caption = "Source: WFP: World Food Programme") +
  theme_bw(base_size = 10) +
  theme(plot.title = element_text(size = 16, hjust = 0.5))
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

Chart 1

I chose the “cheapest” foods from the data set because these products are commonly eaten foods throughout Lebanon. Although these are common breakfast items that are widely cultivated and eaten in Lebanon, the combination of inflation and lack of financial support from the government has made these common household foods very expensive.

Linear Model Analysis

  • Model Equation:

    • USD Value = -746.6 (Intercept) + 0.0398 (Cents per Day) x Date + Product (Type)
  • The product type effects the model based on the value of said food/product. For example: things like meat, milk, fuel and oil are more valuable, thus altering the model.

  • From the linear regression model below we can see that the p_value is essentially 0.0000… making it extremely statistically significant. There is a clear positive correlation between the dates and increase in food prices, those confounding factors are in relation to the countries, economic collapse, the Beirut port explosion in 2020, among other factors.

  • R^2 Value: 0.3201, explains 32% of the variation is related to the type of food and the date.

#Linear Regression
model <- lm(`USD Value` ~ Date + Product, data = lfp_2)
summary(model)

Call:
lm(formula = `USD Value` ~ Date + Product, data = lfp_2)

Residuals:
    Min      1Q  Median      3Q     Max 
-162.30  -37.39   -0.03   13.19  482.60 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)          -7.466e+02  3.648e+01 -20.466  < 2e-16 ***
Date                  3.983e-02  1.889e-03  21.086  < 2e-16 ***
ProductBread         -1.598e+01  4.587e+00  -3.484 0.000496 ***
ProductBulgur        -1.001e+01  4.580e+00  -2.185 0.028876 *  
ProductCheese         6.204e+00  4.681e+00   1.325 0.185043    
ProductCucumbers     -5.509e+00  5.762e+00  -0.956 0.339074    
ProductExchange rate -1.139e+01  1.255e+01  -0.908 0.364114    
ProductFish          -1.292e+00  4.023e+00  -0.321 0.748125    
ProductFuel           1.260e+02  3.460e+00  36.415  < 2e-16 ***
ProductLentils        1.468e+01  4.030e+00   3.643 0.000270 ***
ProductMeat           1.267e+01  4.356e+00   2.908 0.003646 ** 
ProductMilk           5.265e+01  4.667e+00  11.281  < 2e-16 ***
ProductOil            9.844e+01  4.267e+00  23.069  < 2e-16 ***
ProductPasta         -1.187e+01  4.580e+00  -2.592 0.009555 ** 
ProductRice          -9.521e+00  4.671e+00  -2.038 0.041551 *  
ProductSugar          4.085e+01  4.454e+00   9.172  < 2e-16 ***
ProductTomatoes      -4.996e+00  4.720e+00  -1.058 0.289901    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 81.64 on 11884 degrees of freedom
Multiple R-squared:  0.3201,    Adjusted R-squared:  0.3191 
F-statistic: 349.6 on 16 and 11884 DF,  p-value: < 2.2e-16
#Linear Regression Model showing the increase in food items prices (USD) based on events from 2020-2025
ggplot(lfp_2, aes(x = Date, y = `USD Value`)) +
  geom_point(alpha = 0.3, color = "#5cd11a") +
  geom_smooth(method = "lm", se = TRUE, color = "red", linetype = "dashed") +
  
  geom_vline(xintercept = as.Date("2020-08-04"), color = "#178ae5") +
  annotate("text", x = as.Date("2020-08-04"), y = 500, label = "Beirut Port Explosion", angle = 90, vjust = -0.5, size = 3.5) +
  
  geom_vline(xintercept = as.Date("2021-03-01"), color = "#178ae5") +
  annotate("text", x = as.Date("2021-03-01"), y = 500, label = "LBP Currency Crisis", angle = 90, vjust = -0.5, hjust = 0.5, size = 3.5) +
  
  geom_vline(xintercept = as.Date("2021-07-01"), color = "#178ae5") +
  annotate("text", x = as.Date("2021-07-01"), y = 500, label = "Subsidy Removal", angle = 90, vjust = -0.5, size = 3.5) +
  
  geom_vline(xintercept = as.Date("2022-06-01"), color = "#178ae5") +
  annotate("text", x = as.Date("2022-06-01"), y = 500, label = "Hyperinflation", angle = 90, vjust = -0.5, size = 3.5) +
  
  geom_vline(xintercept = as.Date("2023-02-01"), color = "#178ae5") +
  annotate("text", x = as.Date("2023-02-01"), y = 500, label = "Peak Inflation", angle = 90, vjust = -0.5, size = 3.5) +
  
  labs(title = "Product Prices in Lebanon 2020-2025 ($)", x = "Year", y = "USD Value", caption = "Source: WFP: World Food Programme", subtitle = ) +
  theme_bw(base_size = 8) +
  theme(plot.title = element_text(size = 16, hjust = 0.5))
`geom_smooth()` using formula = 'y ~ x'

Chart 2

I used a scatter plot with a trend line to explain the relation with price increases based on the time frame (2020-2025). Based on the linear regression model it explains that everyday prices raised on average 40 cents per day, which varies depending on the type of product. In the chart we can see a clear positive correlation between the two variables, and I’ve marked on that chart specific time frames were important events have occurred that would cause an influx in product prices.

par(mfrow = c(2, 2))
autoplot(model)

Diagnostic Plot Analysis

The diagnostic plots don’t really have an linearity due to the massive confounding factors that come into play when considering the correlation between time and the costs of foods. Certain products that are commonly produced in Lebanon were not greatly affected. Things like fuel which is imported, became massive outliers, while basic foods like bread, cheese, and cucumbers were not heavily affected (although still expensive for the average Lebanese person).

ggplotly(ggplot(lfp_2, aes(x = Date, y = `USD Value`)) +
    geom_point(aes(color = Product), alpha = 0.5) +
    labs(title = "Food Prices in Lebanon 2020-2025 ($)", x = "Year", y = "USD Value", caption = "Source: WFP: World Food Programme"))

Chart 3

The third chart is simply an interactive version of the second chart. It shows the specific products and their price per unit (weight) in correlation to the same time period of 2020 to 2025. Fuel for example was hit the hardest when the Lebanese government removed subsides causing a massive price hike that would also affect agriculture and food production. Due to the LBP losing it’s value, a lot of goods entered the black market making the USD way more valuable. Gasoline, cooking oil, electronics, etc. all were sold on the BM at way higher prices.

Essay

  1. The data set I analyzed is the Lebanon Food Prices data set from the WFP: World Food Programme data base. The data set is complied of dates collected in monthly intervals (15th of every month) from the year 2013 to March 15th 2025. The variables of the data set are, Lebanese governance, district, district code, market, market code, longitude, latitude, item type, product, product characteristics (detail), item code, item weight (unit), retail and actual price, currency, and the Lebanese and USD value. I will look to first rename and organize the columns so that the data is readable, I will split the item+name category into individual items and then a descriptor column. I will then filter for the specific time frame that I am looking for and then perform a linear regression analysis with a diagnostic plot. I chose this data set because I am Lebanese and have experienced the countries market when it was stable. I haven’t been back since 2014 and was curious to see how the recent tragic events have affected the countries economy.
  1. In the past 5 years Lebanon has faced one of the worst economic collapses in the countries history. Lebanon has dealt with an extraordinary amount of external and internal conflicts that continue to cripple the country. Lebanon contains nearly 600 to 900 thousand Syrian refugees from the Arab Spring, and an additional 90 thousand Palestinian refugees creating a massive bottle neck on the food that is available in circulation. In August of 2020 Beirut’s main port had exploded (how the explosion occurred is still unknown), which created a price hike on essential goods. In March of 2021 the Lebanese Pound went from 1500 LBP to around 10000 LBP ($1 USD = 10000 LBPs) making import costs increase at rates that were not sustainable. In 2021 the Lebanese government had removed subsidy support from essentials like fuel, medication, and certain basic produce that created a massive price spike that increased that cost of living once again. 2022 and 2023 saw the worst inflation in the countries history with rates of a nearly 50% increase per month, and in the current year although prices are still high for a majority of the population the price of goods has stabilized.

  2. The visualizations simply represent the price increase in accordance to time. I chose 2020 to the current date because those are the most prolific events in recent time (the Beirut Port explosion was the largest non-nuclear explosion in modern history according to the NIH. My visuals figuratively show an economic explosion of sorts, in the linear regression chart I used the Lebanese flag colors with the scatter plot being green like the Lebanese Cedar tree on the center of the flag. The last visual is simply an interactive plot that you can easily navigate to see price changes on specific goods.

source: https://en.wikipedia.org/wiki/Flag_of_Lebanon

Sources:

https://www.wfp.org/news/food-insecurity-deepens-lebanon-following-conflict-new-report-shows

https://www.hrw.org/report/2021/08/03/they-killed-us-inside/investigation-august-4-beirut-blast

https://pmc.ncbi.nlm.nih.gov/articles/PMC7985624/#:~:text=The%20largest%20non%2Dnuclear%20blast,%2D19)%20pandemic%20were%20massive.

https://lebanon.un.org/en/230731-lebanon-emergency-response-plan-2023?afd_azwaf_tok=eyJhbGciOiJSUzI1NiJ9.eyJhdWQiOiJsZWJhbm9uLnVuLm9yZyIsImV4cCI6MTc0NzA5NDU0NiwiaWF0IjoxNzQ3MDk0NTM2LCJpc3MiOiJ0aWVyMS04NTlkNjk3ODY5LXB3ajliIiwic3ViIjoiMTI2OjUyMDE6ODA0ZjpkMDM1Ojc3MzA6YTU2OTo3ODg0OmIwYTYiLCJkYXRhIjp7InR5cGUiOiJpc3N1ZWQiLCJyZWYiOiIyMDI1MDUxM1QwMDAyMTZaLTE4NTlkNjk3ODY5cHdqOWJoQzFCTDE5cTU4MDAwMDAwMDlwMDAwMDAwMDAwMDI0MyIsImIiOiJiNlJDelJwUjZuSFlFbjJQckZYSWdiYWQyN2tRaW5CbkQtVUpoSkxSR0pFIiwiaCI6InRYTjliSkRBYXI1Yk9UNnRKdkhyNGt3b3F0YldycDhkWmR3c1NoX3pYMFEifX0.aPXqf5m6w7LbfsjhheieWscVoSAWwTEiy-eO-0X0vU0oKEFnN7EFOFza9-FS4VR0NAr6hxM6hNPn9sVkglLSk_U9QJgqmSUTPSNedn6fIoJKvK_Cdu67NxYN2XBFpln5WTG36OxiCnCX4fzi0efq2RKddcug8KVwND6_3y0CEbLhLlTb61MBPpGATMCI6qQGFVM027e_2X-s28mKALLAMLsj26nQXpbLkirf7Ws-2ovFmVVpGBBa2p0FaycyH3epmkRKK7JUXY9BsVQYSba9Q0gPGkTv2i4jvKX6rjXHTrqNwenS0-3ob07m1WmFvqRG1PF8aNSrjnG8Lx5FwS825w.WF3obl2IDtqgvMFRqVdYkD5s

Data for project and resources to performing certain cleanings:

https://data.humdata.org/dataset/wfp-food-prices-for-lebanon

https://stat.ethz.ch/R-manual/R-devel/library/base/html/as.Date.html

https://tidyr.tidyverse.org/reference/separate.html