install.packages(“tidyverse”) library(tidyverse)
data <- read_csv(“Statistics_Dataset.csv”)
data\(Price <- as.numeric(gsub(",", "", data\)Price))
data\(Brand <- factor(data\)Brand) data\(Drive <- factor(data\)Drive)
data <- na.omit(data)
data\(PricePerKm <- data\)Price / data$Range
head(data)
model <- lm(Price ~ BatterySize + Acceleration + Range + Brand + Drive + ChargeSpeed, data = data)
summary(model)
top_brands <- data %>% count(Brand, sort = TRUE) %>% slice_max(order_by = n, n = 10) %>% pull(Brand)
ggplot(data %>% filter(Brand %in% top_brands), aes(x = Brand, y = Price)) + geom_boxplot(fill = “skyblue”) + labs(title = “Price by Top 10 Brands”, x = “Brand”, y = “Price (€)”) + theme_minimal() + theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(data, aes(x = BatterySize, y = Price)) + geom_point(alpha = 0.6) + geom_smooth(method = “lm”, se = FALSE, color = “blue”) + labs(title = “Battery Size vs Price”, x = “Battery Size (kWh)”, y = “Price (€)”) + theme_minimal()
data$PredictedPrice <- predict(model)
ggplot(data, aes(x = Price, y = PredictedPrice)) + geom_point(alpha = 0.6) + geom_abline(slope = 1, intercept = 0, color = “red”, linetype = “dashed”) + labs(title = “Actual vs Predicted Price”, x = “Actual Price (€)”, y = “Predicted Price (€)”) + theme_minimal()
We built a regression model to understand what drives electric vehicle prices in this case. The model includes battery size, acceleration, range, charge speed, drive type, and brand.
Battery size and acceleration turned out to be significant numeric predictors. Range didn’t have much effect once battery size was in the model. This is probably because they’re closely related.
Brand had a clear impact, as some brands like Porsche and Lightyear are consistently more expensive, even after accounting for technical specs.
In conclusion we can say that battery size and brand are the biggest drivers of price. Management should focus on brand positioning and battery-related features when pricing EVs.