Technical Methodology & Model Construction

Data Provenance: Baseline market benchmarks were synthesized from the USDA AgTransport Open Data Portal.

Variable Engineering: I engineered proprietary variables to simulate full, actualized operational data used to reduce costing.

1. Executive Summary

This report identifies “System Leakage” within FFE operations by pinpointing “False Wins”—loads that look profitable but result in a net loss. By auditing internal costs by month and Rate Per Mile by destination, we isolate the specific market ceilings and high-burn months eroding our margins. The goal is to pivot toward high-density freight to ensure every trailer maximizes value.


2. Data Preparation

Cleaning The Data

Identified Oxnard Routes as variance outliers. The variance for these are all higher than 10.

# Load and clean data
FFE <- read_excel("C:FFE_DATA_R.xlsx") %>% 
  na.omit()

# Define the Audit Status to categorize load performance
FFE_clean <- FFE %>%
  filter(Variance < 10) %>% 
  mutate(
    Audit_Status = case_when(
      Operating_Ratio <= 100 ~ "Profitable/Healthy",
      `City_Specific_Variance_Percentage` > 0 ~ "System Leakage (False Win)",
      TRUE ~ "Known Loss (Below Target)"
    )
  )



FFE_clean %>%
  select(Date, Origin, Destination, Distance, `Rate Per Mile`, Operating_Ratio) %>%
  head(5) %>%
  kable(
    caption = "Snapshot of FFE Logistics Data (First 5 Rows)",
    col.names = c("Date", "Origin", "Destination", "Distance (Miles)", "RPM ($)", "Operating Ratio (%)"),
    align = "l"
  )
Snapshot of FFE Logistics Data (First 5 Rows)
Date Origin Destination Distance (Miles) RPM ($) Operating Ratio (%)
2024-01-09 IMPERIAL, COACHELLA VALLEYS CA, CENTRAL AND WESTERN AZ, MEXICO CROSSINGS THROUGH CALEXICO AND SAN LUIS ATLANTA 2100 2.571429 95.83333
2024-01-09 IMPERIAL, COACHELLA VALLEYS CA, CENTRAL AND WESTERN AZ, MEXICO CROSSINGS THROUGH CALEXICO AND SAN LUIS BALTIMORE 2600 2.615385 92.64706
2024-01-09 IMPERIAL, COACHELLA VALLEYS CA, CENTRAL AND WESTERN AZ, MEXICO CROSSINGS THROUGH CALEXICO AND SAN LUIS BOSTON 2900 2.534483 94.89796
2024-01-09 IMPERIAL, COACHELLA VALLEYS CA, CENTRAL AND WESTERN AZ, MEXICO CROSSINGS THROUGH CALEXICO AND SAN LUIS CHICAGO 2030 2.192118 112.75281
2024-01-09 IMPERIAL, COACHELLA VALLEYS CA, CENTRAL AND WESTERN AZ, MEXICO CROSSINGS THROUGH CALEXICO AND SAN LUIS DALLAS 1300 2.692308 96.42857

3. Visualizing the “Leakage Zone”

The following visualization isolates loads based on their pricing variance vs. their profitability. The Red Zone (Upper Right) is our primary concern: these are loads where we charged more than the city average, yet we still lost money. This suggests that high-cost factors—such are outstripping our pricing premiums. In these cases, the “win” on price is being negated by operational leakage.

# Generate the Leakage Visual
ggplot(FFE_clean, aes(x = `City_Specific_Variance_Percentage`, y = Operating_Ratio, color = Audit_Status)) +
  geom_point(alpha = 0.5, size = 2.5) +
  
  # Crosshair lines for benchmarks
  geom_hline(yintercept = 100, linetype = "dashed", color = "black", size = 0.8) + 
  geom_vline(xintercept = 0, linetype = "dashed", color = "blue", size = 0.8) +
  
  # The "Leakage" Zone (Upper Right Quadrant)
  annotate("rect", xmin = 0, xmax = Inf, ymin = 100, ymax = Inf, 
           fill = "red", alpha = 0.2) +
  
  # Audit Colors
  scale_color_manual(values = c("System Leakage (False Win)" = "red", 
                                "Known Loss (Below Target)" = "orange", 
                                "Profitable/Healthy" = "darkgreen")) +
  
  # Formatting X-axis as %
  scale_x_continuous(labels = scales::percent) +
  
  labs(title = "FFE Audit: City-Specific Margin Leakage",
       subtitle = "Red Zone isolates loads priced above city average with an Operating Ratio > 100",
       x = "City Specific Variance (%)",
       y = "Operating Ratio (%)",
       caption = "Dataset: FFE | Filtered for Variance < 10 | Target = City-Specific Average RPM") +
  
  theme_minimal() +
  theme(legend.position = "bottom",
        plot.title = element_text(face = "bold", size = 14),
        legend.title = element_blank())

Key Finding: 100% of the identified ‘System Leakage’ loads are concentrated in the Chicago lane, suggesting a specific operational cost or terminal efficiency issue rather than a pricing error
System_Leakage_Results <- FFE_clean %>%
filter(Audit_Status == "System Leakage (False Win)")

System_Leakage_Results %>%
  head(11) %>%
  select(Date, Origin, Destination, Distance, Operating_Ratio, City_Specific_Variance_Percentage) %>%
  kable(
    caption = "Top 11 System Leakage Loads (Chicago Lane)",
    col.names = c("Date", "Origin", "Destination", "Distance (Miles)", "Operating Ratio (%)", "Pricing Variance (%)")
  )
Top 11 System Leakage Loads (Chicago Lane)
Date Origin Destination Distance (Miles) Operating Ratio (%) Pricing Variance (%)
2024-01-16 IMPERIAL, COACHELLA VALLEYS CA, CENTRAL AND WESTERN AZ, MEXICO CROSSINGS THROUGH CALEXICO AND SAN LUIS CHICAGO 2030 102.3980 0.0096005
2024-01-16 KERN DISTRICT CALIFORNIA CHICAGO 2030 100.3500 0.0302046
2024-01-16 OXNARD DISTRICT CALIFORNIA CHICAGO 2070 102.1500 0.0102972
2024-03-12 IMPERIAL, COACHELLA VALLEYS CA, CENTRAL AND WESTERN AZ, MEXICO CROSSINGS THROUGH CALEXICO AND SAN LUIS CHICAGO 2030 102.3980 0.0096005
2024-03-12 KERN DISTRICT CALIFORNIA CHICAGO 2030 101.3636 0.0199025
2024-03-19 IMPERIAL, COACHELLA VALLEYS CA, CENTRAL AND WESTERN AZ, MEXICO CROSSINGS THROUGH CALEXICO AND SAN LUIS CHICAGO 2030 102.3980 0.0096005
2024-03-19 KERN DISTRICT CALIFORNIA CHICAGO 2030 101.3636 0.0199025
2024-08-13 SAN JOAQUIN VALLEY CALIFORNIA CHICAGO 2200 100.0000 0.0266457
2024-08-20 SAN JOAQUIN VALLEY CALIFORNIA CHICAGO 2200 100.0000 0.0266457
2024-08-27 SAN JOAQUIN VALLEY CALIFORNIA CHICAGO 2200 100.0000 0.0266457
2024-09-03 SAN JOAQUIN VALLEY CALIFORNIA CHICAGO 2200 100.0000 0.0266457

4. Visualization: Looking at Distance as a Variable

The data reveals a clear “Economy of Scale” for our long-haul business. For trips over 500 miles, the trend is undeniable: as the distance of a load increases, our profit margin improves.

By targeting long-haul shipments, we spread our fixed operational costs over a wider revenue base, which structurally lowers our Operating Ratio. This makes every additional mile driven beyond the 1,000-mile mark a direct contribution to our net profitability.

# Core OTR Insight: Showing how longer distances drive the Operating Ratio down.
# Filtering out the short-haul regional outliers that begin in Oxnard (Distance < 500).
ggplot(FFE %>% filter(Distance > 500), 
       aes(x = Distance, y = Operating_Ratio)) +
  geom_point(alpha = 0.3, color = '#2c3e50') +
  
  # Regression line showing the efficiency trend (Negative Slope)
  geom_smooth(method = "lm", color = "#e74c3c", size = 1.5, se = TRUE) +
  
  # The "Break-even" Threshold
  geom_hline(yintercept = 100, linetype = "dashed", color = "black", alpha = 0.6) +
  
  labs(title = "Distance vs. Efficiency: The Core Trend",
       subtitle = "Once regional outliers are removed, OTR efficiency improves over distance.",
       x = "Trip Distance (Miles)", 
       y = "Operating Ratio (%)",
       caption = "Regression Slope is Negative | Long-haul loads are more efficient than mid-haul.") +
  
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", size = 18))

5. Rates By City

This boxplot compares the Rate Per Mile (RPM) across our major long-haul destinations. It reveals a clear pricing hierarchy: Miami and Dallas consistently command the highest premiums, while Boston sits at the bottom of the OTR scale.

The wide “boxes” in cities like Dallas show we have more room to negotiate higher spot rates, whereas the tight distribution in Seattle suggests we are stuck at a fixed market ceiling.

One question to consider:

  • Should we stop hauling “light” freight to Chicago and only take high-density loads to force our revenue-per-trailer higher?
# --- Pricing Heat Check: Boxplots of Rates by City ---

# Filter for long-haul OTR (Distance > 500)
# Sort destinations by median Rate Per Mile
city_order <- FFE %>%
  filter(Distance > 500) %>%
  group_by(Destination) %>%
  summarise(median_rpm = median(`Rate Per Mile`)) %>%
  arrange(desc(median_rpm)) %>%
  pull(Destination)

ggplot(FFE %>% filter(Distance > 500), 
       aes(x = `Rate Per Mile`, y = factor(Destination, levels = city_order), fill = Destination)) +
  geom_boxplot(alpha = 0.7, outlier.shape = NA) +
  scale_fill_viridis_d(option = "mako") +
  
  labs(title = "Pricing Heat Check: Rate Per Mile by Destination",
       subtitle = "Comparing distribution and median rates across major OTR lanes.",
       x = "Rate Per Mile ($)",
       y = "Destination Market") +
  
  theme_minimal() +
  theme(legend.position = "none", plot.title = element_text(face = "bold", size = 16))

6. Monthly Expenditure Audit: Operating Costs

This chart breaks down exactly what it costs to run our trucks each month. By adding up the Linehaul (miles) and Pickup and Delivery (terminal fees), we can see exactly when we’re burning the most cash. If our contract rates are locked in, these high-spend months are the “danger zones” where we might be working hard just to break even.

One Questions To Consider

  • Cost vs. Revenue: During our highest spending months (like April or October), are our customer rates increasing enough to cover the spike in Pickup and Delivery costs?
# 1. Clean and Calculate
FFE_monthly <- FFE %>%
  mutate(
    Month_Num = as.numeric(format(Date, "%m")),
    # Convert number to full name (January, February, etc.)
    Month_Name = factor(month.name[Month_Num], levels = month.name),
    # Linehaul + Pickup and Delivery (P&D)
    Linehaul_Cost = Distance * 2.25,
    PD_Cost = 250 + (Terminal_Touches * 75),
    Total_Cost = Linehaul_Cost + PD_Cost
  ) %>%
  group_by(Month_Name) %>%
  summarise(Total_Operating_Cost = sum(Total_Cost, na.rm = TRUE))

# 2. Plot
ggplot(FFE_monthly, aes(x = Month_Name, y = Total_Operating_Cost)) +
  geom_bar(stat = "identity", fill = "#2980b9") +
  geom_text(aes(label = scales::dollar(Total_Operating_Cost)), 
            vjust = -0.5, size = 3, fontface = "bold") +
  labs(title = "Monthly Expenditure Audit: Total Operating Cost",
       subtitle = "Total spend using Linehaul costs and Pickup and Delivery costs.",
       x = "Month",
       y = "Total Expenditure ($)") +
  scale_y_continuous(labels = scales::dollar, expand = expansion(mult = c(0, 0.15))) +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", size = 16),
        axis.text.x = element_text(angle = 45, hjust = 1))

7. Conclusion

The data confirms that volume does not guarantee profit. While long-haul routes show structural efficiency, the “Chicago Leakage” proves that even high-rate loads can result in a net loss.

This audit underscores the importance of a high-density strategy. By prioritizing heavy freight and identifying “False Win” loads, FFE ensures that operational costs don’t outpace revenue.

Maintaining this margin-first focus is the only way to turn high-burn months into consistent bottom-line wins.