Main purpose and central question

My Shiny app is designed to give an interactive overview of the electric vehicle (EV) market in the Washington State EV registration data. The main message is to show how the EV landscape has grown and changed over time in terms of brands, model years, and driving range. In particular, the app helps users explore questions such as:

“Which brands currently dominate the EV market, how has the number of EVs and models changed across model years, and how fast is typical driving range improving over time for different EV types?”

By combining dashboards, an interactive prediction module, and a SQL-style query interface, the app turns a static CSV file into a tool for exploring trends in EV adoption and technology improvement.

Interactivity

Interactivity greatly improves how this dataset is communicated because the EV market is multi-dimensional and cannot be fully understood through static charts alone. The data vary across time, brand, vehicle type, and driving range, and different users may care about different subsets of the data. By allowing users to filter, explore, and model the data themselves, the dashboard shifts from a one-directional presentation into an exploratory analysis tool.

For example, sliders and dropdown menus let users instantly narrow the dataset to a specific brand, EV type, or year range, making it easier to detect patterns that may not be visible in an overall summary. Instead of reading a fixed chart about “average EV range,” users can directly compare how ranges differ between Battery Electric Vehicles (BEVs) and Plug-in Hybrids (PHEVs), or how the distribution changes over time.

The prediction module further improves communication by connecting historical data to future expectations. Rather than simply displaying trends, the app allows users to input a model year and see a predicted range based on historical behavior. This turns abstract relationships into something concrete and interpretable, making the implications of the data clearer.

Finally, the SQL query panel enhances transparency and data literacy. Users are not only given results, but also shown the exact SQL query used to generate them. This makes the data filtering logic explicit and helps users understand how analytical results are produced, rather than treating the system as a “black box.”

Overall, interactivity makes the data more understandable, personalized, and engaging by letting users ask their own questions instead of passively viewing predefined plots.

Key interactive components by tab

The dashboard is organized into multiple tabs, each designed to support a different type of analysis and learning.

Overview Tab

This tab answers the question: “What does the EV market look like at a glance?”

The Overview tab provides a high level summary of the EV market using value boxes and interactive charts.

Value boxes display key statistics such as the total number of EV records, total number of manufacturers, Average driving range across the dataset, and these metrics allow users to immediately understand the scale and scope of the dataset.

Top Brands bar chart shows which manufacturers appear most frequently in the dataset. This helps users identify market dominance and compare participation across brands.

Model Year distribution chart shows how EV models are distributed over time. Users can see whether EV adoption is growing and which years have the highest concentration of models.

Top-N selector allows users to control how many brands are displayed. This makes it easier to focus on major manufacturers or explore the long tail of smaller ones.

Range Explorer Tab

Together, these tools answer: “How has EV range evolved, and how does it differ across types and brands?”

The Range Explorer focuses on understanding how driving range varies across time, brands, and vehicle types. Brand filter allows users to isolate specific manufacturers. Vehicle type filter lets users compare Battery EVs and Plug-in Hybrids. Year range slider gives control over historical windows of interest.

Two interactive visualizations support analysis:

Range distribution (histogram) shows how driving ranges are spread within the selected subset. Range vs. year scatterplot reveals how expected range changes over time.

Prediction Model Tab

The Prediction Model tab implements two modeling approaches to estimate electric vehicle (EV) driving range as a function of model year: a log-linear regression model and a LOESS smoother. These models allow users to examine long-term growth trends as well as local nonlinear patterns in range evolution. This tab allows users to explore: “Given historical trends, what might EV range look like in a future year?”

Log-linear model

To model technological improvement in EV range, a log-linear regression is used: \[ \log(1 + Y) = \beta_0 + \beta_1 X + \varepsilon \]

In this case:

  • Y is electric range (miles)
  • X is model year
  • \(\beta_0\) is the intercept
  • \(\beta_1\) is the growth rate parameter
  • \(\varepsilon\) is the error term

The transformation \(\log(1 + Y)\) ensures positivity, reduces skewness, and allows exponential growth to be modeled linearly.

Interpretation

The coefficient \(\beta_1\) is interpreted as a yearly growth rate: \[ \text{Growth rate} = e^{\beta_1} - 1 \] This represents the percentage increase in electric range per year.

Prediction

For a user-selected model year \(X^*\), the predicted range is computed as: \[ \hat{Y} = e^{(\beta_0 + \beta_1 X^*)} - 1 \] A 95% prediction interval is constructed on the log scale and back-transformed: \[ \left[ e^{\hat{Y}_{\text{lower}}} - 1,\; e^{\hat{Y}_{\text{upper}}} - 1 \right] \] This interval reflects uncertainty in both the fitted model and future observations.

LOESS smoother

The LOESS option provides a nonparametric alternative that fits localized regressions: \[ \hat{Y}(x_0) = \sum w_i(x_0) \cdot Y_i \] where weights \(w_i(x_0)\) decrease with distance from the target year. Unlike the log-linear model, LOESS does not assume exponential growth and is primarily used for visualization and pattern discovery rather than forecasting.

Explanation of Modeling

The purpose of the modeling component is not to “predict the future perfectly,” but to provide interpretable and reasonable ways to describe how electric vehicle driving range has changed over time. I applied a log transformation to the driving range before modeling because raw range values are strictly positive and strongly right skewed, which means most vehicles cluster at lower ranges, while a few newer models have extremely high range values. If I chose to model raw ranges directly, these high values dominate the model and distort overall trends. Taking the logarithm compresses large values and spreads smaller ones, making the relationship with year easier to model and interpret.

More importantly, technological growth is rarely linear in real world systems. Improvements in battery technology typically behave multiplicatively rather than additively, which means capacity grows by a percentage each year but not by a fixed number of miles. Using a log transformation converts this type of exponential growth into a linear relationship, allowing us to estimate a yearly growth rate instead of just a raw slope. This makes the model easier to interpret and more realistic.

Also I included LOESS as a second modeling option not for prediction, but for visualization and exploration. Unlike regression LOESS does not assume any global formula. Instead, it fits many small local regressions to let the data decide the shape of the curve. This is useful when the trend is not smooth or when improvement speed varies across decades. For exampleearly EV development may have been slow, followed by rapid breakthroughs in later years. LOESS makes such nonlinear patterns visible without forcing a rigid model structure.

In short, the log linear model tells the user the long term growth rate, while LOESS shows how that growth behaves across different periods. Together they provide both explanation and visual insight, rather than just prediction.

SQL Query Tab

This answers: “How is the data actually stored and queried behind the scenes?” The SQL Query Panel implements real database querying instead of simulated filtering. The original CSV dataset is loaded into a SQLite database stored locally as ev.sqlite, and all user interactions in this tab are translated into actual SQL statements that are executed on the database backend.

User selections from the UI (make, year range, and EV type) are dynamically converted into a SQL WHERE clause. For example, when a user selects Tesla, BEV vehicles, and model years between 2015 and 2022, the application constructs the following query in real time:

SELECT * FROM electric_vehicle WHERE Make = ‘Tesla’ AND Electric.Vehicle.Type = ‘BEV’ AND Model.Year BETWEEN 2015 AND 2022;

The returned result set is rendered immediately as a data table in the dashboard, and the SQL command itself is printed live to allow the user to see exactly how their filter selections translate into database logic. This design demonstrates how modern dashboards operate as a bridge between user interfaces and backend data engines. Instead of treating SQL as a static query language, the app exposes SQL as an interactive system that responds in real time. Users do not simply view results, they control the query logic itself through the UI.

AI and LLM Acknowledgement

This project used ChatGPT 5.1 (OpenAI) to assist with debugging Shiny code, clarifying modeling logic, and drafting explanatory text for the write-up. ChatGPT was used as a support tool for conceptual guidance and error diagnosis. All analytical decisions, implementation, and interpretation are the responsibility of the author.

Citation:

OpenAI. (2025). ChatGPT (Version 5.1) [Large language model]. https://openai.com/

Statistical Modeling & Transformation References

This project applies a log-transformation to electric vehicle (EV) range before regression modeling in order to handle right-skewed continuous data and ensure meaningful, positive predictions.

The regression model is expressed as: \[ \log(1 + Y) = \beta_0 + \beta_1 X + \varepsilon \]

This is equivalent to an exponential growth form on the original scale: \[ \hat{Y} = e^{(\beta_0 + \beta_1 X^*)} - 1 \] where \(\hat{Y}\) is the predicted electric range, \(X^*\) represents model year, \(\beta_0\) is the intercept, and \(\beta_1\) is the growth coefficient.

The annual proportional growth rate is computed as: \[ \text{Growth Rate} = e^{\beta_1} - 1 \] This modeling choice follows established best practices for handling right-skewed outcomes, heteroskedasticity, and multiplicative growth processes.

Log Transformation References

UVA Library Research Data Services. (n.d.). Interpreting log-transformations in linear models. https://library.virginia.edu/data/articles/interpreting-log-transformations-in-a-linear-model

Gelman, A. (2019). You should usually log-transform your positive data. Statistical Modeling, Causal Inference, and Social Science. https://statmodeling.stat.columbia.edu/2019/08/21/you-should-usually-log-transform-your-positive-data/

LOESS / Nonparametric Smoothing References

Mangiafico, S. (2023). Loess and LOWESS. In R Companion Handbook for Biological Statistics. https://rcompanion.org/handbook/I_12.html

Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. Journal of the American Statistical Association, 74(368), 829–836. https://doi.org/10.1080/01621459.1979.10481038