Post-Harvest Loss and Agripreneur Profitability Analysis

Author

Zephania Mwangi

Code
library(readxl)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
library(knitr)
library(scales)

Attaching package: 'scales'

The following object is masked from 'package:purrr':

    discard

The following object is masked from 'package:readr':

    col_factor
Code
# Load the data set
data <- read_excel("Nigeria_agri_dataset.xlsx")
dim(data)
[1] 5000   20
Code
# Derived variables
data <- data %>%
  mutate(
    PHL_Percent = (Spoilage_Amount_kg / Total_Harvest_kg) * 100,
    Revenue_per_kg = (Market_Price_NGN_per_kg * (Total_Harvest_kg - Spoilage_Amount_kg)) / Total_Harvest_kg,
    long_distance = ifelse(Transport_Distance_km > 50, "Long", "Short"),
    projected_savings = Spoilage_Amount_kg * 0.10 * Market_Price_NGN_per_kg
  )

1 Project Overview

This project analyzes post-harvest loss, profitability, and operational efficiency among youth agripreneurs in Nigeria. It uses real-world agricultural data to identify patterns and recommend interventions to reduce spoilage and increase revenue.

2 Objectives

  • To identify factors contributing to post-harvest losses.
  • To assess the impact of storage, transport, and training on spoilage.
  • To evaluate how training and technology influence revenue.
  • To provide regional insights for targeted policy interventions.

3 Data Description

  • Source: Nigerian agricultural field survey on youth agripreneurs.
  • Variables: Includes crop type, storage and transport methods, spoilage, revenue, training status, and environmental conditions.
  • Time Range: Not explicitly provided.
  • Size: observations and variables.

4 Data Cleaning & Preparation

  • Calculated post-harvest loss percentage (PHL_Percent).
  • Calculated adjusted revenue per kg (Revenue_per_kg).
  • Created indicators for long-distance transport and potential savings.
  • Removed NA values for numeric analysis where needed.

5 Exploratory Data Analysis (EDA)

5.1 Summary Statistics

Code
summary(select(data, PHL_Percent, Revenue_per_kg, Transport_Distance_km, Revenue_Loss_NGN))
  PHL_Percent      Revenue_per_kg   Transport_Distance_km Revenue_Loss_NGN
 Min.   : 0.9623   Min.   : 33.29   Min.   :  1.000       Min.   :   16   
 1st Qu.:14.4958   1st Qu.: 86.09   1st Qu.:  5.675       1st Qu.: 2926   
 Median :19.2300   Median :143.61   Median : 13.900       Median : 5870   
 Mean   :19.2509   Mean   :173.62   Mean   : 19.921       Mean   : 8283   
 3rd Qu.:24.0003   3rd Qu.:255.00   3rd Qu.: 27.100       3rd Qu.:11359   
 Max.   :48.3473   Max.   :479.65   Max.   :169.300       Max.   :64215   

5.2 Top Crop–Storage Combinations with Lowest Loss

Code
data %>%
  group_by(Crop_Type, Storage_Method) %>%
  summarise(avg_phl = mean(PHL_Percent, na.rm = TRUE), .groups = "drop") %>%
  arrange(avg_phl) %>%
  head(5) %>%
  kable()
Crop_Type Storage_Method avg_phl
Yam Crates 18.43386
Tomato Crates 18.48072
Tomato Open shed 18.49088
Cassava Crates 18.61356
Maize Cold Storage 18.61755

5.3 Spoilage vs Storage Duration

Code
ggplot(data, aes(x = Storage_Duration_Days, y = PHL_Percent)) +
  geom_point(alpha = 0.4) +
  geom_smooth(method = "loess") +
  labs(title = "PHL vs Storage Duration", x = "Storage Days", y = "PHL (%)")
`geom_smooth()` using formula = 'y ~ x'

5.4 Revenue Loss by Transport Distance

Code
ggplot(data, aes(x = Transport_Distance_km, y = Revenue_Loss_NGN)) +
  geom_point(alpha = 0.4) +
  geom_smooth(method = "lm", color = "red") +
  labs(title = "Revenue Loss vs Transport Distance", x = "Distance (km)", y = "Loss (NGN)")
`geom_smooth()` using formula = 'y ~ x'

5.5 Market Access and Spoilage

Code
ggplot(data, aes(x = Market_Access, y = PHL_Percent)) +
  geom_boxplot(fill = "skyblue") +
  labs(title = "PHL by Market Access", x = "Market Access", y = "PHL (%)")

6 Modeling (if applicable)

Code
# Simple linear model: PHL ~ Storage Duration
model <- lm(PHL_Percent ~ Storage_Duration_Days + Humidity_Percent + Temperature_C, data = data)
summary(model)

Call:
lm(formula = PHL_Percent ~ Storage_Duration_Days + Humidity_Percent + 
    Temperature_C, data = data)

Residuals:
     Min       1Q   Median       3Q      Max 
-20.8894  -3.4220   0.0153   3.3712  17.3315 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)           -29.77685    0.80549  -36.97   <2e-16 ***
Storage_Duration_Days   0.92034    0.02298   40.05   <2e-16 ***
Humidity_Percent        0.16678    0.00495   33.70   <2e-16 ***
Temperature_C           0.99363    0.02343   42.41   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 5.029 on 4996 degrees of freedom
Multiple R-squared:  0.4729,    Adjusted R-squared:  0.4726 
F-statistic:  1494 on 3 and 4996 DF,  p-value: < 2.2e-16

7 Results & Discussion

  • Post-harvest loss increases sharply after 30 days of storage.
  • Long-distance transport (>50 km) is associated with higher revenue loss.
  • Regions differ significantly in spoilage and revenue efficiency.
  • Trained farmers and those using technology generally experience lower losses.

8 Limitations

  • No clear time reference in the dataset (harvest year or month missing).
  • Some missing values in critical numeric fields.
  • Potential underreporting or misclassification in Tech_Used and Training_Received.

9 Conclusion & Recommendations

  • Invest in improved, region-specific storage methods for high-loss crops.
  • Encourage training and technology adoption among youth farmers.
  • Prioritize infrastructure in long-distance regions to reduce spoilage.
  • Use insights from this report to design dashboards and mobile alerts for farmers.

10 References

  • Data set: Nigeria Youth Agripreneur Survey
  • R packages: readxl, dplyr, ggplot2, knitr
  • Methodologies: EDA, simple linear modeling