The contents have been reviewed and validated by Arnaud Amzallag on Friday, November 07, 2025.
Introduction
I have logged receipts of my gas consumption around 2011-1012 and I have a spreadsheet with Galons and price. I also continued to log until 2024 so I know the mileage I used during this period. I use both speadsheets to estimate how much I spent on gas during the last decade.
Therefore, this report analyzes fuel consumption data for a Toyota Sienna minivan using two data sources:
sienna_gas2.csv: Individual gas fill-up records with dates, volume, odometer readings, and prices
odometer_sienna.csv: Periodic odometer readings over multiple years
The analysis calculates the vehicle’s actual fuel economy (MPG) and estimates annual fuel consumption and costs.
Data Preparation
Loading Gas Fill-up Data
Code
# to publish: rsconnect::rpubsUpload("Gas Costs of Arnaud's Sienna - 2011-2024", "sienna_fuel_analysis.html", "sienna_fuel_analysis.qmd")# to update: rsconnect::rpubsUpload("Gas Costs of Arnaud's Sienna - 2011-2024", "sienna_fuel_analysis.html", "sienna_fuel_analysis.qmd", id = "https://api.rpubs.com/api/v1/document/1364883/83efae3e43f847c491802d1cdc1b718e")#| message: falselibrary(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
library(lubridate)library(gt)# Load the gas fill-up datagas_data <-read_csv("data/sienna_gas2.csv")
Rows: 14 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): date
dbl (3): volume, miles, gallon.price
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
# Fix any date issues (replace "0011" with "2011" if present)gas_data_clean <- gas_data |># mutate(date = str_replace_all(date, "0011", "2011")) |>mutate(date =mdy(date)) |>arrange(date) |>filter(!is.na(miles)) |>mutate(miles_driven = miles -lag(miles)) |>mutate(mpg = miles_driven / volume) |>mutate(days_since_last_fillup =as.numeric(date -lag(date)))
Key Finding: The vehicle achieves a median fuel economy of 15.6 MPG based on 6 reliable fill-up records.
Yearly Usage Analysis
Methodology: LOESS Curve Fitting
To accurately estimate annual mileage, we fit a LOESS (locally estimated scatterplot smoothing) curve to the odometer data. This accounts for irregular observation intervals and provides smooth predictions for specific dates.
Annual mileage is calculated as the difference between predicted mileage on January 1st and December 31st of each year.
Code
# Convert dates to numeric for loess (days since first observation)odo_with_numeric <- odo |>mutate(days_since_start =as.numeric(date -min(date)))# Fit loess modelloess_model <-loess(mileage ~ days_since_start, data = odo_with_numeric, span =0.15)# Get the range of years in the datayear_range <-year(min(odo$date)):year(max(odo$date))# For each year, predict mileage on Jan 1 and Dec 31yearly_predictions <-tibble(year = year_range) |>mutate(jan1_date =ymd(paste0(year, "-01-01")),dec31_date =ymd(paste0(year, "-12-31")),jan1_days =as.numeric(jan1_date -min(odo$date)),dec31_days =as.numeric(dec31_date -min(odo$date)) ) |>filter(jan1_date >=min(odo$date), dec31_date <=max(odo$date)) |>mutate(jan1_mileage =predict(loess_model, newdata =data.frame(days_since_start = jan1_days)),dec31_mileage =predict(loess_model, newdata =data.frame(days_since_start = dec31_days)),miles_driven = dec31_mileage - jan1_mileage )
LOESS Fit Visualization
Code
# Generate predictions for smooth curvepred_data <-tibble(days_since_start =seq(0, max(odo_with_numeric$days_since_start), by =10)) |>mutate(date =min(odo$date) + days_since_start,mileage_pred =predict(loess_model, newdata =data.frame(days_since_start = days_since_start)) )# Create points for Jan 1 and Dec 31 predictionsyear_points <- yearly_predictions |>pivot_longer(cols =c(jan1_mileage, dec31_mileage),names_to ="type",values_to ="mileage") |>mutate(date =if_else(type =="jan1_mileage", jan1_date, dec31_date),label =if_else(type =="jan1_mileage", "Jan 1", "Dec 31") )ggplot() +geom_point(data = odo, aes(x = date, y = mileage), size =2, alpha =0.5) +geom_line(data = pred_data, aes(x = date, y = mileage_pred), color ="blue", linewidth =1) +# geom_point(data = year_points, aes(x = date, y = mileage, color = label), # size = 3, shape = 17) +scale_color_manual(values =c("Jan 1"="darkgreen", "Dec 31"="darkred")) +scale_y_continuous(labels = scales::comma) +labs(title ="Odometer Readings with LOESS Fit",subtitle ="Triangles show predicted mileage on Jan 1 (green) and Dec 31 (red) of each year",x ="Date",y ="Mileage",color ="Prediction Point" ) +theme_minimal() +theme(legend.position ="bottom")
Warning: No shared levels found between `names(values)` of the manual scale and the
data's colour values.
Calculate Yearly Metrics
For fuel consumption and cost estimates, we use an assumed fuel economy of 15 MPG and calculate costs using a price range of $3.00 to $3.30 per gallon.
Typical usage: Most years show 4,500-6,500 miles annually
Fuel Cost Estimates (at $3.00–$4.00/gallon):
Typical annual cost: $1,200–$1,600 for moderate usage years
High-usage years (2016, 2019, 2022): $1,388–$2,170
Low-usage year (2021): $659–$878
Total Mileage: Vehicle reached approximately 136,530 miles by the end of 2022.
Vehicle Performance
The Toyota Sienna demonstrates fuel economy consistent with a full-size minivan, with actual performance of 15-16 MPG closely matching manufacturer specifications for combined city/highway driving. The LOESS-based analysis provides a more accurate picture of annual usage by accounting for irregular observation intervals in the odometer data.