Main purpose and central question
My Shiny app is designed to give an interactive overview of the
electric vehicle (EV) market in the Washington State EV registration
data. The main message is to show how the EV landscape has grown and
changed over time in terms of brands, model years, and driving range. In
particular, the app helps users explore questions such as:
“Which brands currently dominate the EV market, how has the
number of EVs and models changed across model years, and how fast is
typical driving range improving over time for different EV
types?”
By combining dashboards, an interactive prediction module, and a
SQL-style query interface, the app turns a static CSV file into a tool
for exploring trends in EV adoption and technology improvement.
Interactivity
Interactivity greatly improves how this dataset is communicated
because the EV market is multi-dimensional and cannot be fully
understood through static charts alone. The data vary across time,
brand, vehicle type, and driving range, and different users may care
about different subsets of the data. By allowing users to filter,
explore, and model the data themselves, the dashboard shifts from a
one-directional presentation into an exploratory analysis tool.
For example, sliders and dropdown menus let users instantly narrow
the dataset to a specific brand, EV type, or year range, making it
easier to detect patterns that may not be visible in an overall summary.
Instead of reading a fixed chart about “average EV range,” users can
directly compare how ranges differ between Battery Electric Vehicles
(BEVs) and Plug-in Hybrids (PHEVs), or how the distribution changes over
time.
The prediction module further improves communication by connecting
historical data to future expectations. Rather than simply displaying
trends, the app allows users to input a model year and see a predicted
range based on historical behavior. This turns abstract relationships
into something concrete and interpretable, making the implications of
the data clearer.
Finally, the SQL query panel enhances transparency and data literacy.
Users are not only given results, but also shown the exact SQL query
used to generate them. This makes the data filtering logic explicit and
helps users understand how analytical results are produced, rather than
treating the system as a “black box.”
Overall, interactivity makes the data more understandable,
personalized, and engaging by letting users ask their own questions
instead of passively viewing predefined plots.
Key interactive components by tab
The dashboard is organized into multiple tabs, each designed to
support a different type of analysis and learning.
Overview Tab
This tab answers the question: “What does the EV market
look like at a glance?”
The Overview tab provides a high level summary of
the EV market using value boxes and interactive charts.
Value boxes display key statistics such as the
total number of EV records, total number of
manufacturers, Average driving range across the dataset,
and these metrics allow users to immediately understand the scale and
scope of the dataset.
Top Brands bar chart shows which manufacturers
appear most frequently in the dataset. This helps users identify market
dominance and compare participation across brands.
Model Year distribution chart shows how EV models
are distributed over time. Users can see whether EV adoption is growing
and which years have the highest concentration of models.
Top-N selector allows users to control how many
brands are displayed. This makes it easier to focus on major
manufacturers or explore the long tail of smaller ones.
Range Explorer Tab
Together, these tools answer: “How has EV range evolved,
and how does it differ across types and brands?”
The Range Explorer focuses on understanding how
driving range varies across time, brands, and vehicle types.
Brand filter allows users to isolate specific
manufacturers. Vehicle type filter lets users compare
Battery EVs and Plug-in Hybrids. Year range slider
gives control over historical windows of interest.
Two interactive visualizations support analysis:
Range distribution (histogram) shows how driving
ranges are spread within the selected subset. Range vs. year
scatterplot reveals how expected range changes over time.
Prediction Model Tab
The Prediction Model tab implements two modeling approaches to
estimate electric vehicle (EV) driving range as a function of model
year: a log-linear regression model and a LOESS smoother. These models
allow users to examine long-term growth trends as well as local
nonlinear patterns in range evolution. This tab allows users to
explore: “Given historical trends, what might EV range look
like in a future year?”
Log-linear model
To model technological improvement in EV range, a log-linear
regression is used: \[
\log(1 + Y) = \beta_0 + \beta_1 X + \varepsilon
\]
In this case:
- Y is electric range (miles)
- X is model year
- \(\beta_0\) is the intercept
- \(\beta_1\) is the growth rate
parameter
- \(\varepsilon\) is the error
term
The transformation \(\log(1 + Y)\)
ensures positivity, reduces skewness, and allows exponential growth to
be modeled linearly.
Interpretation
The coefficient \(\beta_1\) is
interpreted as a yearly growth rate: \[
\text{Growth rate} = e^{\beta_1} - 1
\] This represents the percentage increase in electric range per
year.
Prediction
For a user-selected model year \(X^*\), the predicted range is computed as:
\[
\hat{Y} = e^{(\beta_0 + \beta_1 X^*)} - 1
\] A 95% prediction interval is constructed on the log scale and
back-transformed: \[
\left[ e^{\hat{Y}_{\text{lower}}} - 1,\; e^{\hat{Y}_{\text{upper}}} - 1
\right]
\] This interval reflects uncertainty in both the fitted model
and future observations.
LOESS smoother
The LOESS option provides a nonparametric alternative that fits
localized regressions: \[
\hat{Y}(x_0) = \sum w_i(x_0) \cdot Y_i
\] where weights \(w_i(x_0)\)
decrease with distance from the target year. Unlike the log-linear
model, LOESS does not assume exponential growth and is primarily used
for visualization and pattern discovery rather than forecasting.
Explanation of Modeling
The purpose of the modeling component is not to “predict the future
perfectly,” but to provide interpretable and reasonable ways to describe
how electric vehicle driving range has changed over time. I applied a
log transformation to the driving range before modeling because raw
range values are strictly positive and strongly right skewed, which
means most vehicles cluster at lower ranges, while a few newer models
have extremely high range values. If I chose to model raw ranges
directly, these high values dominate the model and distort overall
trends. Taking the logarithm compresses large values and spreads smaller
ones, making the relationship with year easier to model and
interpret.
More importantly, technological growth is rarely linear in real world
systems. Improvements in battery technology typically behave
multiplicatively rather than additively, which means capacity grows by a
percentage each year but not by a fixed number of miles. Using a log
transformation converts this type of exponential growth into a linear
relationship, allowing us to estimate a yearly growth rate instead of
just a raw slope. This makes the model easier to interpret and more
realistic.
Also I included LOESS as a second modeling option not for prediction,
but for visualization and exploration. Unlike regression LOESS does not
assume any global formula. Instead, it fits many small local regressions
to let the data decide the shape of the curve. This is useful when the
trend is not smooth or when improvement speed varies across decades. For
exampleearly EV development may have been slow, followed by rapid
breakthroughs in later years. LOESS makes such nonlinear patterns
visible without forcing a rigid model structure.
In short, the log linear model tells the user the long term growth
rate, while LOESS shows how that growth behaves across different
periods. Together they provide both explanation and visual insight,
rather than just prediction.
SQL Query Tab
This answers: “How is the data actually stored and
queried behind the scenes?” The SQL Query Panel implements real
database querying instead of simulated filtering. The original CSV
dataset is loaded into a SQLite database stored locally as ev.sqlite,
and all user interactions in this tab are translated into actual SQL
statements that are executed on the database backend.
User selections from the UI (make, year range, and EV type) are
dynamically converted into a SQL WHERE clause. For example, when a user
selects Tesla, BEV vehicles, and model years between 2015 and 2022, the
application constructs the following query in real time:
SELECT * FROM electric_vehicle WHERE Make = ‘Tesla’ AND
Electric.Vehicle.Type = ‘BEV’ AND Model.Year BETWEEN 2015 AND 2022;
The returned result set is rendered immediately as a data table in
the dashboard, and the SQL command itself is printed live to allow the
user to see exactly how their filter selections translate into database
logic. This design demonstrates how modern dashboards operate as a
bridge between user interfaces and backend data engines. Instead of
treating SQL as a static query language, the app exposes SQL as an
interactive system that responds in real time. Users do not simply view
results, they control the query logic itself through the UI.
AI and LLM Acknowledgement
This project used ChatGPT 5.1 (OpenAI) to assist with debugging Shiny
code, clarifying modeling logic, and drafting explanatory text for the
write-up. ChatGPT was used as a support tool for conceptual guidance and
error diagnosis. All analytical decisions, implementation, and
interpretation are the responsibility of the author.