Train Loading Statistics

Relationship Between Volume and Load TIme

Author

Eric Clugy

Published

April 29, 2026

Introduction

This project analyzes train tracking data to examine the relationship between shipment volume and total load time. The dataset includes operational records such as the number of barrels (BBLS) loaded onto trains and the total hours required to complete each load. Understanding this relationship is important because load time is a key factor in operational efficiency and scheduling within transportation systems.

It is reasonable to expect that larger shipment volumes require more time to load and process. As the number of barrels increases, additional time may be needed for handling, coordination, and equipment usage. This aligns with general operational principles where increased workload typically results in longer processing times.

If my model is correct, there will be a positive linear relationship between shipment volume (BBLS) and total hours.

Descriptive Statistics

Summary Statistics
Variable Mean Median Variance SD
Hours 24.58 24 45.90 6.77
Volume (BBLS) 69,406.22 71,343.35 28,919,438.00 5,377.68

Plots

Dependent Variable: Hours

The histogram shows that most load times fall between approximately 20 and 30 hours, with a concentration around the mid-20s. The distribution is slightly right-skewed, with a few higher-duration observations, including one significant outlier.

The histogram indicates that shipment volumes are concentrated between approximately 65,000 and 72,000 BBLS. The distribution is slightly left-skewed, with a few lower-volume observations that appear as outliers.

The scatterplot shows a positive relationship between shipment volume and total load time, as indicated by the upward-sloping regression line. However, the data points are widely dispersed, suggesting that the relationship is relatively weak.

Correlation and Covariance

Correlation and Covariance
Measure Value
Correlation 0.1767
Covariance 6,437.1680

The correlation between shipment volume and total load time is 0.1767, indicating a weak positive relationship. As volume increases, load time tends to increase, but the relationship is not strong.

The covariance is 6,437.17, which also indicates a positive relationship, though it is less directly interpretable because it depends on the scale of the data.

This analysis uses a single independent variable, so no additional plots for multiple variables are included.

Regression Analysis

Regression Results: Volume vs Load Time
Dependent variable:
Hours
Volume (BBLS) 0.0002**
(0.0001)
Constant 9.1290
(7.6900)
Observations 128
R2 0.0312
Adjusted R2 0.0235
Residual Std. Error 6.6947 (df = 126)
F Statistic 4.0601** (df = 1; 126)
Note: p<0.1; p<0.05; p<0.01

The regression results show that shipment volume has a statistically significant positive effect on total load time (p = 0.046).

The coefficient for volume is 0.0002226, indicating that as shipment volume increases, load time also increases. An increase of 1,000 BBLS is associated with an increase of approximately 0.22 hours.

However, the R² value of 0.031 indicates that only about 3.1% of the variation in load time is explained by shipment volume. This suggests that while volume has a measurable impact, other factors likely play a larger role in determining load time.

Conclusion

The results partially support the hypothesis of a positive relationship between shipment volume and load time. While the relationship is statistically significant, it is weak, indicating that volume explains only a small portion of the variation in load time. This suggests that other operational factors play a larger role. An extreme outlier in load time was also observed and may have influenced the results. Overall, volume contributes to load time, but it is not the primary driver.

Works Cited:

Clugy, Eric. Train Tracking Data. Unpublished dataset. Harvest Midstream, 2026.