Toyota Sienna Fuel Consumption Analysis

Author

Arnaud Amzallag

Published

November 7, 2025

Note

The contents have been reviewed and validated by Arnaud Amzallag on Friday, November 07, 2025.

Introduction

I have logged receipts of my gas consumption around 2011-1012 and I have a spreadsheet with Galons and price. I also continued to log until 2024 so I know the mileage I used during this period. I use both speadsheets to estimate how much I spent on gas during the last decade.

Therefore, this report analyzes fuel consumption data for a Toyota Sienna minivan using two data sources:

  • sienna_gas2.csv: Individual gas fill-up records with dates, volume, odometer readings, and prices
  • odometer_sienna.csv: Periodic odometer readings over multiple years

The analysis calculates the vehicle’s actual fuel economy (MPG) and estimates annual fuel consumption and costs.

Data Preparation

Loading Gas Fill-up Data

Code
# to publish: rsconnect::rpubsUpload("Gas Costs of Arnaud's Sienna - 2011-2024", "sienna_fuel_analysis.html", "sienna_fuel_analysis.qmd")
# to update: rsconnect::rpubsUpload("Gas Costs of Arnaud's Sienna - 2011-2024", "sienna_fuel_analysis.html", "sienna_fuel_analysis.qmd", id = "https://api.rpubs.com/api/v1/document/1364883/83efae3e43f847c491802d1cdc1b718e")
#| message: false
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
library(lubridate)
library(gt)

# Load the gas fill-up data
gas_data <- read_csv("data/sienna_gas2.csv")
Rows: 14 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): date
dbl (3): volume, miles, gallon.price

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
# Fix any date issues (replace "0011" with "2011" if present)
gas_data_clean <- gas_data |>
  # mutate(date = str_replace_all(date, "0011", "2011")) |>
  mutate(date = mdy(date)) |>
  arrange(date) |>
  filter(!is.na(miles)) |>
  mutate(miles_driven = miles - lag(miles)) |>
  mutate(mpg = miles_driven / volume) |>
  mutate(days_since_last_fillup = as.numeric(date - lag(date)))

Loading Odometer Data

Code
# Load odometer data
odo <- read_csv("data/odometer_sienna.csv", 
                skip = 0, 
                col_names = c("empty", "date", "mileage"))

# Clean the data
odo <- odo |>
  slice(-1) |>
  select(date, mileage) |>
  mutate(
    date = as.Date(date),
    mileage = as.numeric(str_replace_all(mileage, ",", ""))
  )

Gas Fill-up Analysis

Data Quality Assessment

Not all fill-up records are reliable for calculating MPG. We identify problematic entries based on:

  • Long gaps: More than 45 days between fill-ups (may indicate missing data)
  • Partial fill-ups: Less than 10 gallons (incomplete tank fill)
  • Unusual MPG: Outside the range of 10-30 MPG
Code
# Classify data quality
gas_data_plot <- gas_data_clean |>
  filter(!is.na(days_since_last_fillup)) |>
  mutate(
    flag = case_when(
      days_since_last_fillup > 45 ~ "Long gap",
      volume < 10 ~ "Partial fill-up",
      mpg > 30 | mpg < 10 ~ "Unusual MPG",
      TRUE ~ "OK"
    )
  )

Fill-up Records with Quality Flags

Code
# Show all entries with quality flags
gas_data_plot |>
  select(date, days_since_last_fillup, miles_driven, volume, mpg, flag) |>
  # arrange(desc(days_since_last_fillup)) |>
  mutate(
    date = format(date, "%Y-%m-%d"),
    days_since_last_fillup = round(days_since_last_fillup, 0),
    miles_driven = round(miles_driven, 0),
    volume = round(volume, 2),
    mpg = round(mpg, 1)
  ) |>
  rename(
    Date = date,
    `Days Since Last Fill` = days_since_last_fillup,
    `Miles Driven` = miles_driven,
    `Volume (gal)` = volume,
    `MPG` = mpg,
    `Quality Flag` = flag
  ) |>
  gt() |>
  data_color(
    columns = `Quality Flag`,
    method = "factor",
    palette = c("OK" = "#E6FFE6", "Long gap" = "#FFE6E6", 
                "Partial fill-up" = "#FFF4E6", "Unusual MPG" = "#F3E6FF")
  )
All Fill-up Records with Data Quality Assessment
Date Days Since Last Fill Miles Driven Volume (gal) MPG Quality Flag
2011-04-23 69 578 2.86 202.2 Long gap
2011-05-11 18 134 17.40 7.7 Unusual MPG
2011-06-05 25 253 19.02 13.3 OK
2011-07-26 51 519 18.08 28.7 Long gap
2011-08-25 30 259 18.59 13.9 OK
2011-09-10 16 189 14.25 13.3 OK
2011-09-26 16 342 19.77 17.3 OK
2011-11-25 60 598 18.61 32.1 Long gap
2011-12-22 27 230 8.86 26.0 Partial fill-up
2012-01-21 30 344 18.75 18.4 OK
2012-03-03 42 434 19.45 22.3 OK

Key issues identified:

  • April 23, 2011: 69-day gap with only 2.86 gallons filled, resulting in unrealistic 202 MPG
  • November 25, 2011: 60-day gap suggesting possible missing fill-up
  • May 11, 2011: Unusually low 7.7 MPG

Fuel Economy Results

Visualization: All Fill-ups vs. Reliable Data

Code
library(ggplot2)

median_mpg_ok <- median(gas_data_plot$mpg[gas_data_plot$flag == "OK"])

ggplot(filter(gas_data_plot, mpg < 100), aes(x = date, y = mpg, color = flag, shape = flag)) +
  geom_point(size = 4) +
  geom_hline(yintercept = median_mpg_ok, 
             linetype = "dashed", color = "blue", linewidth = 1) +
  annotate("text", x = min(gas_data_plot$date), y = median_mpg_ok, 
           label = paste0("Median MPG (OK only): ", round(median_mpg_ok, 1)), 
           hjust = 0, vjust = -0.5, color = "blue", size = 3.5) +
  scale_color_manual(values = c("OK" = "darkgreen", 
                                "Partial fill-up" = "orange",
                                "Long gap" = "red",
                                "Unusual MPG" = "purple")) +
  labs(
    title = "MPG Over Time: All Fill-ups vs. Reliable Data",
    subtitle = "Highlighting problematic entries that should be excluded from average calculations",
    x = "Date",
    y = "Miles Per Gallon (MPG)",
    color = "Data Quality",
    shape = "Data Quality"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

MPG Statistics

Using only reliable fill-up data (flagged as “OK”):

Code
mpg_stats <- gas_data_plot |>
  filter(flag == "OK") |>
  summarise(
    median_mpg = median(mpg),
    mean_mpg = mean(mpg),
    n_observations = n(),
    min_mpg = min(mpg),
    max_mpg = max(mpg)
  )

mpg_stats
# A tibble: 1 × 5
  median_mpg mean_mpg n_observations min_mpg max_mpg
       <dbl>    <dbl>          <int>   <dbl>   <dbl>
1       15.6     16.4              6    13.3    22.3

Key Finding: The vehicle achieves a median fuel economy of 15.6 MPG based on 6 reliable fill-up records.

Yearly Usage Analysis

Methodology: LOESS Curve Fitting

To accurately estimate annual mileage, we fit a LOESS (locally estimated scatterplot smoothing) curve to the odometer data. This accounts for irregular observation intervals and provides smooth predictions for specific dates.

Annual mileage is calculated as the difference between predicted mileage on January 1st and December 31st of each year.

Code
# Convert dates to numeric for loess (days since first observation)
odo_with_numeric <- odo |>
  mutate(days_since_start = as.numeric(date - min(date)))

# Fit loess model
loess_model <- loess(mileage ~ days_since_start, data = odo_with_numeric, span = 0.15)

# Get the range of years in the data
year_range <- year(min(odo$date)):year(max(odo$date))

# For each year, predict mileage on Jan 1 and Dec 31
yearly_predictions <- tibble(year = year_range) |>
  mutate(
    jan1_date = ymd(paste0(year, "-01-01")),
    dec31_date = ymd(paste0(year, "-12-31")),
    jan1_days = as.numeric(jan1_date - min(odo$date)),
    dec31_days = as.numeric(dec31_date - min(odo$date))
  ) |>
  filter(jan1_date >= min(odo$date), dec31_date <= max(odo$date)) |>
  mutate(
    jan1_mileage = predict(loess_model, newdata = data.frame(days_since_start = jan1_days)),
    dec31_mileage = predict(loess_model, newdata = data.frame(days_since_start = dec31_days)),
    miles_driven = dec31_mileage - jan1_mileage
  )

LOESS Fit Visualization

Code
# Generate predictions for smooth curve
pred_data <- tibble(
  days_since_start = seq(0, max(odo_with_numeric$days_since_start), by = 10)
) |>
  mutate(
    date = min(odo$date) + days_since_start,
    mileage_pred = predict(loess_model, newdata = data.frame(days_since_start = days_since_start))
  )

# Create points for Jan 1 and Dec 31 predictions
year_points <- yearly_predictions |>
  pivot_longer(cols = c(jan1_mileage, dec31_mileage),
               names_to = "type",
               values_to = "mileage") |>
  mutate(
    date = if_else(type == "jan1_mileage", jan1_date, dec31_date),
    label = if_else(type == "jan1_mileage", "Jan 1", "Dec 31")
  )

ggplot() +
  geom_point(data = odo, aes(x = date, y = mileage), size = 2, alpha = 0.5) +
  geom_line(data = pred_data, aes(x = date, y = mileage_pred), 
            color = "blue", linewidth = 1) +
  # geom_point(data = year_points, aes(x = date, y = mileage, color = label), 
  #            size = 3, shape = 17) +
  scale_color_manual(values = c("Jan 1" = "darkgreen", "Dec 31" = "darkred")) +
  scale_y_continuous(labels = scales::comma) +
  labs(
    title = "Odometer Readings with LOESS Fit",
    subtitle = "Triangles show predicted mileage on Jan 1 (green) and Dec 31 (red) of each year",
    x = "Date",
    y = "Mileage",
    color = "Prediction Point"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")
Warning: No shared levels found between `names(values)` of the manual scale and the
data's colour values.

Calculate Yearly Metrics

For fuel consumption and cost estimates, we use an assumed fuel economy of 15 MPG and calculate costs using a price range of $3.00 to $3.30 per gallon.

Code
mpg_assumed <- 15
price_low <- 3.00
price_high <- 4.00

yearly_summary <- yearly_predictions |>
  mutate(
    consumption_gallons = miles_driven / mpg_assumed,
    price_low = consumption_gallons * !!price_low,
    price_high = consumption_gallons * !!price_high
  ) |>
  select(year, jan1_mileage, dec31_mileage, miles_driven, consumption_gallons, price_low, price_high)

Annual Summary Table

Code
yearly_summary |>
  mutate(
    price_range = paste0("$", scales::comma(round(price_low)), " - $", 
                        scales::comma(round(price_high))),
    jan1_mileage = scales::comma(round(jan1_mileage)),
    dec31_mileage = scales::comma(round(dec31_mileage)),
    miles_driven = scales::comma(round(miles_driven)),
    consumption_gallons = round(consumption_gallons, 1)
  ) |>
  select(year, jan1_mileage, dec31_mileage, miles_driven, consumption_gallons, price_range) |>
  rename(
    Year = year,
    `Jan 1 Mileage` = jan1_mileage,
    `Dec 31 Mileage` = dec31_mileage,
    `Miles Driven` = miles_driven,
    `Fuel (gallons)` = consumption_gallons,
    `Annual Cost Range` = price_range
  ) |>
  gt()
Annual Vehicle Usage Summary
Year Jan 1 Mileage Dec 31 Mileage Miles Driven Fuel (gallons) Annual Cost Range
2014 88,652 93,934 5,282 352.1 $1,056 - $1,409
2015 93,945 99,488 5,543 369.5 $1,109 - $1,478
2016 99,500 106,439 6,939 462.6 $1,388 - $1,850
2017 106,453 109,774 3,321 221.4 $664 - $886
2018 109,790 115,881 6,091 406.1 $1,218 - $1,624
2019 115,891 123,033 7,142 476.1 $1,428 - $1,904
2020 123,044 125,077 2,033 135.5 $407 - $542
2021 125,081 128,374 3,293 219.6 $659 - $878
2022 128,392 136,530 8,138 542.5 $1,628 - $2,170

Conclusions

Key Findings

  1. Actual Fuel Economy: Based on reliable fill-up data, the vehicle achieves 15.6 MPG (median) across 6 measurements.

  2. Data Quality: Of 11 fill-up records analyzed, 6 were reliable, with issues including:

    • Long gaps suggesting missing fill-ups
    • Partial fill-ups that skew MPG calculations
    • Anomalous readings
  3. Annual Usage Patterns (using LOESS-fitted predictions):

    • High-usage years: 2019 (7,142 miles) and 2016 (6,939 miles)
    • Low-usage years: 2021 (3,293 miles) - likely pandemic-related
    • Typical usage: Most years show 4,500-6,500 miles annually
  4. Fuel Cost Estimates (at $3.00–$4.00/gallon):

    • Typical annual cost: $1,200–$1,600 for moderate usage years
    • High-usage years (2016, 2019, 2022): $1,388–$2,170
    • Low-usage year (2021): $659–$878
  5. Total Mileage: Vehicle reached approximately 136,530 miles by the end of 2022.

Vehicle Performance

The Toyota Sienna demonstrates fuel economy consistent with a full-size minivan, with actual performance of 15-16 MPG closely matching manufacturer specifications for combined city/highway driving. The LOESS-based analysis provides a more accurate picture of annual usage by accounting for irregular observation intervals in the odometer data.