Install and load packages (run install only once)

install.packages(“tidyverse”) library(tidyverse)

Load the dataset (CSV version)

data <- read_csv(“Statistics_Dataset.csv”)

Clean the Price column by removing commas and turning it into a number

data\(Price <- as.numeric(gsub(",", "", data\)Price))

Convert Brand and Drive into categorical variables

data\(Brand <- factor(data\)Brand) data\(Drive <- factor(data\)Drive)

Remove rows with missing values

data <- na.omit(data)

Create a new variable: price per km of range

data\(PricePerKm <- data\)Price / data$Range

Take a quick look at the data

head(data)

Build the regression model including brand and drive as factors

model <- lm(Price ~ BatterySize + Acceleration + Range + Brand + Drive + ChargeSpeed, data = data)

Show the summary of the model

summary(model)

Find the top 10 most common brands

top_brands <- data %>% count(Brand, sort = TRUE) %>% slice_max(order_by = n, n = 10) %>% pull(Brand)

Create a boxplot to compare price across the top brands

ggplot(data %>% filter(Brand %in% top_brands), aes(x = Brand, y = Price)) + geom_boxplot(fill = “skyblue”) + labs(title = “Price by Top 10 Brands”, x = “Brand”, y = “Price (€)”) + theme_minimal() + theme(axis.text.x = element_text(angle = 45, hjust = 1))

Scatterplot to see the relationship between battery size and price

ggplot(data, aes(x = BatterySize, y = Price)) + geom_point(alpha = 0.6) + geom_smooth(method = “lm”, se = FALSE, color = “blue”) + labs(title = “Battery Size vs Price”, x = “Battery Size (kWh)”, y = “Price (€)”) + theme_minimal()

Add predicted prices from the model

data$PredictedPrice <- predict(model)

Plot actual price vs predicted price to see how well the model fits

ggplot(data, aes(x = Price, y = PredictedPrice)) + geom_point(alpha = 0.6) + geom_abline(slope = 1, intercept = 0, color = “red”, linetype = “dashed”) + labs(title = “Actual vs Predicted Price”, x = “Actual Price (€)”, y = “Predicted Price (€)”) + theme_minimal()

Conclusion

We built a regression model to understand what drives electric vehicle prices in this case. The model includes battery size, acceleration, range, charge speed, drive type, and brand.

Battery size and acceleration turned out to be significant numeric predictors. Range didn’t have much effect once battery size was in the model. This is probably because they’re closely related.

Brand had a clear impact, as some brands like Porsche and Lightyear are consistently more expensive, even after accounting for technical specs.

In conclusion we can say that battery size and brand are the biggest drivers of price. Management should focus on brand positioning and battery-related features when pricing EVs.