This report provides an in-depth analysis of TxDOT bid tab data, focused on evaluating vendor competitiveness and predicting pricing. It includes exploratory data analysis, win rate comparison, predictive modeling, spatial visualizations, and strategic insights to guide pricing and bid strategy.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.0 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Linking to GEOS 3.11.2, GDAL 3.8.2, PROJ 9.3.1; sf_use_s2() is TRUE
##
## To enable caching of data, set `options(tigris_use_cache = TRUE)`
## in your R script or .Rprofile.
##
## Loading required package: lattice
##
##
## Attaching package: 'caret'
##
##
## The following object is masked from 'package:purrr':
##
## lift
##
##
## corrplot 0.94 loaded
##
## randomForest 4.7-1.2
##
## Type rfNews() to see new features/changes/bug fixes.
##
##
## Attaching package: 'randomForest'
##
##
## The following object is masked from 'package:dplyr':
##
## combine
##
##
## The following object is masked from 'package:ggplot2':
##
## margin
##
##
## Rows: 378 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (11): DISTRICT, CCSJ #, COUNTY, COORDINATES, PROJECT LET DATE, PRICE PER...
## dbl (4): MMM MILES, VULCAN MILES, TONS, CONTRACT DAYS
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## PRICE PER TON BID TOTAL AMOUNT % OVER / UNDER TONS
## Min. : 80.0 Min. : 45448 Min. :-20.3200 Min. : 325
## 1st Qu.:110.9 1st Qu.: 267128 1st Qu.: -2.7375 1st Qu.: 2025
## Median :127.8 Median : 315867 Median : 0.0000 Median : 2690
## Mean :129.5 Mean : 501055 Mean : 0.2854 Mean : 3941
## 3rd Qu.:144.7 3rd Qu.: 614485 3rd Qu.: 1.6475 3rd Qu.: 5000
## Max. :182.8 Max. :3714805 Max. : 37.5300 Max. :21000
## CONTRACT DAYS
## Min. :120
## 1st Qu.:180
## Median :180
## Mean :214
## 3rd Qu.:183
## Max. :365
Price per ton ranges widely; moderate correlations found with tonnage and % over/under estimates
Summary: Most contracts are small to mid-sized in tonnage, with haul distances clustering below 300 miles
##
## Call:
## lm(formula = price_per_ton ~ tons + contract_days + pct_over_under,
## data = model_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -52.908 -14.783 -2.119 14.926 67.510
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.544e+02 3.147e+00 49.073 < 2e-16 ***
## tons -5.569e-04 3.415e-04 -1.631 0.10375
## contract_days -1.070e-01 1.329e-02 -8.048 1.13e-14 ***
## pct_over_under 4.616e-01 1.647e-01 2.803 0.00533 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.22 on 374 degrees of freedom
## Multiple R-squared: 0.1802, Adjusted R-squared: 0.1736
## F-statistic: 27.4 on 3 and 374 DF, p-value: 4.882e-16
##
## Call:
## randomForest(formula = price_per_ton ~ tons + contract_days + pct_over_under, data = model_data, importance = TRUE)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 1
##
## Mean of squared residuals: 164.8794
## % Var explained: 72.34
## [1] 10.24695
## [1] 0.8507934
Random forest model identified tonnage and estimate deviation as key price drivers with good predictive performance (high R²). The random forest model achieved an R² of 0.72, indicating that the predictors explain 72% of the variation in price per ton.
## # A tibble: 2 × 5
## `VENDOR NAME` Count Avg_Price Avg_Tons Avg_Contract_Days
## <chr> <int> <dbl> <dbl> <dbl>
## 1 MARTIN MARIETTA MATERIALS, INC. 54 136. 3092. 196.
## 2 VULCAN CONSTRUCTION MATERIALS, LLC 72 121. 4577. 228.
Martin’s contracts tend to be larger, but Vulcan shows tighter pricing.
Win rates vary significantly by county, with each vendor showing regional advantages.
Vulcan dominates southern/coastal counties; Martin Marietta leads inland.
Awarded prices are higher in remote or lower-density counties, consistent with increased logistics costs.
Tonnage-weighted prices highlight true economic load, emphasizing cost differentials in higher-volume counties.
Martin Marietta and Vulcan tend to price lower on larger contracts and shorter distances; pricing rises with complexity.
Prices have shown gradual increases, suggesting inflationary or market trend effects over time.
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:randomForest':
##
## combine
## The following object is masked from 'package:dplyr':
##
## combine
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
Average project sizes are increasing while % Over/Under variance is stabilizing, suggesting more consistent bidding behavior.
This section outlines data-driven actions Martin Marietta can take to increase win rates and pricing competitiveness across Texas regions.
To improve bid success and pricing efficiency, Martin Marietta should consider the following actions based on observed trends: