Final Report
Electric vehicles are becoming increasingly prevalent in our everyday life. Even though owning a Tesla is still generally perceived as a luxury only a lucky few can afford, the technologies which enable the transition from fossil fuels to renewable energy have marched steadily forward. Nonetheless, challenges to EV ownership have persisted. In this report I examine EV trends present in Washington State, which signed into law a bill setting a date of 2030 for all new cars registered in the state to be electric. I analyze relationships among several important variables and segment the Washington EV market into several zones with their distinct characteristics.
In this report I examine the relationship between the range of an EV and its expected price. Unlike fossil-fuel vehicles, which are designed to take people from point A to point B, electric vehicles are better suited for commuters given their relatively lower range. I use the dataset to argue that the electric range of an EV is a main selling point for manufacturers. My research questions is to identify a statistically significant relationship between the Electric Range and Expected Price variables. A secondary research objective is to analyze the data through the lenses of a high-end EV manufacturer interested in expanding their presence in wealthier areas. I assess the saturation of EV ownership of certain luxury brands in order to best direct marketing efforts.
I define a luxury brand as a brand which provides increased price levels of comfort, amenities and quality compared to moderately priced cars. Since the term can be subjective, I use Washington State’s luxury car tax to create a threshold of $38,000 for luxury vehicles. According to the WA Department of Revenue, cars which cost over $38k are subject to a 5% luxury tax. This regulation sets them apart from moderately-priced vehicles and I use it as the main differentiator for the purposes of this report.
This dataset focuses predominantly on EVs in Washington State. Source
While giving a high overview of the data, these visualizations examine
relationships between variables. For instance, performing more thorough
analysis on the zip codes or cities to provide a better understanding of
the specific models preferred by those populations offers an insight
into a particular market. It is important to note that this dataset’s
intended use is for training a machine learning model to predict
expected prices which would then be tested independently on previously
unseen data. It is unlikely that this data accurately reflects real
world instances.
Segmenting the vehicles into luxury (LUX) and moderately-priced (MOD) reveals the main differences between the two groups. Luxury EVs tend to have a higher range and feature a more-than-double the median expected price.
el_range_lux.summ_stat <- lux_ev %>%
summarize(el_range.mean=mean(`Electric Range`, na.rm = TRUE), el_range.median=median(`Electric Range`, na.rm = TRUE), el_range.min=min(`Electric Range`, na.rm = TRUE), el_range.max=max(`Electric Range`, na.rm = TRUE), el_range.sd=sd(`Electric Range`, na.rm = TRUE))
el_range_mod.summ_stat <- mod_ev %>%
summarize(el_range.mean=mean(`Electric Range`, na.rm = TRUE), el_range.median=median(`Electric Range`, na.rm = TRUE), el_range.min=min(`Electric Range`, na.rm = TRUE), el_range.max=max(`Electric Range`, na.rm = TRUE), el_range.sd=sd(`Electric Range`, na.rm = TRUE))
# Get the top 5 luxury cars by median electric range
top5_lux_by_range <- el_range_lux.summ_stat %>%
filter(rank(desc(el_range.median))<=5)
# Get the top 5 moderately cars by median electric range
top5_mod_by_range <- el_range_mod.summ_stat %>%
filter(rank(desc(el_range.median))<=5)
top5_lux_by_range
# A tibble: 5 × 6
Make el_range.mean el_range.median el_range.min el_range.max
<chr> <dbl> <dbl> <dbl> <dbl>
1 CHEVROLET 238 238 238 238
2 JAGUAR 217. 234 0 234
3 NISSAN 160. 149 0 215
4 POLESTAR 116. 116. 0 233
5 TESLA 150. 215 0 337
# … with 1 more variable: el_range.sd <dbl>
top5_mod_by_range
# A tibble: 5 × 6
Make el_range.mean el_range.median el_range.min el_range.max
<chr> <dbl> <dbl> <dbl> <dbl>
1 FIAT 85.8 87 84 87
2 TESLA 214. 208 208 265
3 TH!NK 100 100 100 100
4 VOLKSWAGEN 107. 125 83 125
5 WHEEGO ELEC… 100 100 100 100
# … with 1 more variable: el_range.sd <dbl>
exp_price_lux.summ_stat <- lux_ev %>%
summarize(exp_price.mean=mean(`Expected Price ($1k)`, na.rm = TRUE), exp_price.median=median(`Expected Price ($1k)`, na.rm = TRUE), exp_price.min=min(`Expected Price ($1k)`, na.rm = TRUE), exp_price.max=max(`Expected Price ($1k)`, na.rm = TRUE), exp_price.sd=sd(`Expected Price ($1k)`, na.rm = TRUE))
exp_price_mod.summ_stat <- mod_ev %>%
summarize(exp_price.mean=mean(`Expected Price ($1k)`, na.rm = TRUE), exp_price.median=median(`Expected Price ($1k)`, na.rm = TRUE), exp_price.min=min(`Expected Price ($1k)`, na.rm = TRUE), exp_price.max=max(`Expected Price ($1k)`, na.rm = TRUE), exp_price.sd=sd(`Expected Price ($1k)`, na.rm = TRUE))
# Get the top 5 luxury cars by median electric range
top5_lux_by_price <- exp_price_lux.summ_stat %>%
filter(rank(desc(exp_price.median))<=5)
# Get the top 5 moderately cars by median electric range
top5_mod_by_price <- exp_price_mod.summ_stat %>%
filter(rank(desc(exp_price.median))<=5)
top5_lux_by_price
# A tibble: 5 × 6
Make exp_price.mean exp_price.median exp_price.min exp_price.max
<chr> <dbl> <dbl> <dbl> <dbl>
1 BENTLEY 90 90 90 90
2 DODGE 600 600 600 600
3 LAND RO… 73.0 79.6 54 106
4 PORSCHE 81.0 80 44 845
5 TESLA 68.7 69 46 142
# … with 1 more variable: exp_price.sd <dbl>
top5_mod_by_price
# A tibble: 5 × 6
Make exp_price.mean exp_price.median exp_price.min exp_price.max
<chr> <dbl> <dbl> <dbl> <dbl>
1 CHRYSLER 34.3 36.6 30 38
2 LAND RO… 33.5 33.5 33.5 33.5
3 LINCOLN 36.7 36.1 36.1 38
4 MERCEDE… 28.5 34.5 13 36
5 TESLA 32.3 33.9 26.4 35
# … with 1 more variable: exp_price.sd <dbl>
ggplot() +
geom_point(data = top5_lux_by_price, aes(x = exp_price.median, y=Make), color = "blue") +
geom_point(data = top5_mod_by_price, aes(x = exp_price.median, y=Make), color = "red") +
ggtitle("Scatterplot of median price of EVs in thousands") + # for the main title
labs(x="label") +
scale_x_continuous(name ="Expected price (in thousands)", breaks=seq(0,600,50)) +
theme_fivethirtyeight()
ggplot() +
geom_point(data = top5_lux_by_range, aes(x = el_range.median, y=Make), color = "blue") +
geom_point(data = top5_mod_by_range, aes(x = el_range.median, y=Make), color = "red") +
ggtitle("Scatterplot of median range of EVs in miles") + # for the main title
labs(x="label") +
theme_fivethirtyeight()
# Alternative calculation of summary statistics by creating a custom function
# Returns the summary statistics of a particular column
print_stats <- function(x) {
print(paste("Min: ",as.character(min(x, na.rm = TRUE))))
print(paste("Max: ",as.character(max(x, na.rm = TRUE))))
print(paste("Median: ",as.character(median(x, na.rm = TRUE))))
print(paste("Standard Deviation: ",as.character(sd(x, na.rm = TRUE))))
print(paste("Mean: ",as.character(mean(x, na.rm = TRUE))))
print(paste("Interquartile Range",as.character(IQR(x, na.rm = TRUE))))
}
#Calculating the summary statistics for the electric range of electric vehicles
el_range_summary_stats <- print_stats(lux_ev$`Electric Range`)
[1] "Min: 0"
[1] "Max: 337"
[1] "Median: 200"
[1] "Standard Deviation: 120.495862960754"
[1] "Mean: 130.300425781824"
[1] "Interquartile Range 220"
#Calculating the summary statistics for the expected price of electric vehicles
expected_price_summary_stats <- print_stats(mod_ev$`Expected Price ($1k)`)
[1] "Min: 0"
[1] "Max: 38"
[1] "Median: 21.998"
[1] "Standard Deviation: 6.99973049344048"
[1] "Mean: 23.6728561461794"
[1] "Interquartile Range 11.624"
The median range of the top luxury brands exceeds those of the lower-priced models. Tesla, which sells a variety of EVs, also provides a higher range for its high-end vehicles. While the median price of the top 5 most expensive vehicles in the MOD group is near the $38k threshold, the median price of the top 5 most expensive vehicles in the LUX group is approximately twice as much, barring the Dodge outlier of 600k.
VIN (1-10) - The 1st 10 characters of each vehicle’s
Vehicle Identification Number (VIN).
County - The county in which the registered owner
resides.
City - The city in which the registered owner
resides.
State- The state in which the registered owner
resides.
ZIP Code - The 5-digit zip code in which the registered
owner resides.
Model Year - The model year of the vehicle is
determined by decoding the Vehicle Identification Number (VIN).
Make - The manufacturer of the vehicle, determined by
decoding the Vehicle Identification Number (VIN).
Model- The model of the vehicle is determined by
decoding the Vehicle Identification Number (VIN).
Electric Vehicle Type - This distinguishes the vehicle
as all-electric or a plug-in hybrid.
Clean Alternative Fuel Vehicle (CAFV) Eligibility -
This categorizes vehicles as Clean Alternative Fuel Vehicles (CAFVs)
based on the fuel requirement and electric-only range requirement.
Electric Range - Describes how far a vehicle can travel
purely on its electric charge.
Base MSRP - This is the lowest Manufacturer’s Suggested
Retail Price (MSRP) for any trim level of the model in question.
Legislative District - The specific section of
Washington State that the vehicle’s owner resides in, as represented in
the state legislature.
DOL Vehicle ID - Unique number assigned to each vehicle
by the Department of Licensing for identification purposes.
Vehicle Location - The center of the ZIP Code for the
registered vehicle.
Electric Utility - This is the electric power retail
service territory serving the address of the registered vehicle.
Expected Price - This is the expected price of the
vehicle.
The summary statistics tell us that, on average, the range of an electric vehicle in the MOD group is just shy of 75 miles on a single charge. However, the standard deviation of 104 miles tells us that the data is quite a bit dispersed from the mean. The summary statistics of the expected price paint a somewhat different picture. The mean and median are very similar and the standard deviation is only 25000. The max of $1.1M tells us that there are some expensive outliers in the data. I visualize the primary variables analyzed in this report. From the plots, we can get a sense that while the median expected price differs by less than $10k, the expected range of the vehicles varies by 14 miles.
Battery electric vehicles outnumber Plug-in Hybrids in the dataset by almost 3 to 1. This is interesting observation, especially given that PHEVs tend to have smaller batteries and have lower upfront cost. This may indicate that the owners represented here are investing for the long haul with their EV purchase.
Additionally, we can make some conclusions about the electric range and expected price of luxury brands: excluding outliers, Teslas tend to have higher range than other luxury electric vehicles. However, they are also overrepresented in this dataset.
I ran several linear regression models focused on the relationship between the range and expected price of an EV: for the luxury and non-luxury groups. The models indicate that there is no statistically significant relationship between the Electric Range of an EV and its expected price. R-squared values of 0.01 and 0.03 show us that there is no strong relationship between the explanatory and response variables despite the low p-values for both tests. While some
Finally, I explore the charecteristics of wealthier areas to determine where luxury brands are mostly present and which cities would benefit from luxury brand EV marketing. I have identified Redmond, Seattle, Bellevue and Medina as wealthier areas, based on Washington State Data, where luxury brands may be interested in advertising more. While Tesla is, again, over-represented, the other 3 manufacturers may be interested in expanding their presence in those areas in an effort to claw back market share from the EV pioneer Tesla. However, no statistically significant relationship was discovered due to an \(R^2\) value of 0.03
# Display EV type proportions per category
lux_ev %>%
count(`Electric Vehicle Type`) %>%
mutate(perc = n / nrow(lux_ev)) -> lux_ev_type_percent
ggplot(data=lux_ev_type_percent, aes(x=`Electric Vehicle Type`, y=perc)) +
geom_bar(stat="identity") +
scale_x_discrete(name ="Electric Vehicle Type Breakdown") +
scale_y_continuous(name ="Percent") +
ggtitle("Breakdown of luxury electric vehicles by type") +
theme_economist()
# Plot the summary statistics of electric range and expected price using a boxplot
# Examine the two variables individually as well as their relationship using a scatter plot
ggplot(data=lux_ev, aes(x=`Electric Range`)) +
coord_flip() +
geom_boxplot(outlier.colour="black", outlier.shape=8,
outlier.size=2, notch=FALSE) +
scale_x_continuous(name ="Electric range (miles)", breaks=seq(0,350,50)) +
ggtitle("Range of luxury electric vehicles") +
theme_economist()
# Plot the summary statistics of the electric range and expected price for more affordable EVs
ggplot(data=mod_ev, aes(x=`Electric Range`)) +
coord_flip() +
geom_boxplot(outlier.colour="black", outlier.shape=8,
outlier.size=2, notch=FALSE) +
scale_x_continuous(name ="Electric range (miles)", breaks=seq(0,350,50)) +
ggtitle("Range of moderately priced vehicles") +
theme_economist()
# Boxplot of the expected price of MOD and LUX EVs
ggplot(data=mod_ev, aes(x=`Expected Price ($1k)`)) +
coord_flip() +
geom_boxplot(outlier.colour="blue", outlier.shape=8, horizontal=FALSE,
outlier.size=2, notch=FALSE) +
scale_x_continuous(name ="Expected price (thousands)", limits=c(0,75), breaks=seq(0,75,15)) +
ggtitle("Expected price of EVs in the MOD group") +
theme_economist()
ggplot(data=lux_ev, aes(x=`Expected Price ($1k)`)) +
coord_flip() +
geom_boxplot(outlier.colour="blue", outlier.shape=8, horizontal=FALSE,
outlier.size=2, notch=FALSE) +
scale_x_continuous(name ="Expected price (thousands)", limits=c(0,150), breaks=seq(0,150,30)) +
ggtitle("Expected price of EVs in the LUX group") +
theme_economist()
# Boxplot of the Expected price of luxury brands
ggplot(data=lux_ev, aes(x=`Expected Price ($1k)`)) +
coord_flip() +
geom_boxplot(outlier.colour="blue", outlier.shape=8, horizontal=FALSE,
outlier.size=2, notch=FALSE) +
scale_x_continuous(name ="Expected price (thousands)", limits=c(0,300), breaks=seq(0,300,50)) +
ggtitle("Range of moderately priced vehicles")
List of 41
$ line :List of 6
..$ colour : chr "black"
..$ size : NULL
..$ linetype : NULL
..$ lineend : NULL
..$ arrow : logi FALSE
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_line" "element"
$ rect :List of 5
..$ fill : Named chr NA
.. ..- attr(*, "names")= chr NA
..$ colour : logi NA
..$ size : NULL
..$ linetype : num 1
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_rect" "element"
$ text :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : chr "black"
..$ size : NULL
..$ hjust : NULL
..$ vjust : NULL
..$ angle : NULL
..$ lineheight : NULL
..$ margin : NULL
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ axis.title :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : 'rel' num 1
..$ hjust : NULL
..$ vjust : NULL
..$ angle : NULL
..$ lineheight : NULL
..$ margin : NULL
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ axis.title.x :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : NULL
..$ hjust : NULL
..$ vjust : NULL
..$ angle : NULL
..$ lineheight : NULL
..$ margin : NULL
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ axis.title.y :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : NULL
..$ hjust : NULL
..$ vjust : NULL
..$ angle : num 90
..$ lineheight : NULL
..$ margin : NULL
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ axis.text :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : 'rel' num 1
..$ hjust : NULL
..$ vjust : NULL
..$ angle : NULL
..$ lineheight : NULL
..$ margin : NULL
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ axis.text.x :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : NULL
..$ hjust : NULL
..$ vjust : num 0
..$ angle : NULL
..$ lineheight : NULL
..$ margin : 'margin' num [1:4] 10points 0points 0points 0points
.. ..- attr(*, "unit")= int 8
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ axis.text.x.top :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : NULL
..$ hjust : NULL
..$ vjust : num 0
..$ angle : NULL
..$ lineheight : NULL
..$ margin : 'margin' num [1:4] 0points 0points 10points 0points
.. ..- attr(*, "unit")= int 8
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ axis.text.y :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : NULL
..$ hjust : num 0
..$ vjust : NULL
..$ angle : NULL
..$ lineheight : NULL
..$ margin : 'margin' num [1:4] 0points 10points 0points 0points
.. ..- attr(*, "unit")= int 8
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ axis.ticks :List of 6
..$ colour : NULL
..$ size : NULL
..$ linetype : NULL
..$ lineend : NULL
..$ arrow : logi FALSE
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_line" "element"
$ axis.ticks.y : list()
..- attr(*, "class")= chr [1:2] "element_blank" "element"
$ axis.ticks.length : 'simpleUnit' num -5points
..- attr(*, "unit")= int 8
$ axis.line :List of 6
..$ colour : NULL
..$ size : 'rel' num 0.8
..$ linetype : NULL
..$ lineend : NULL
..$ arrow : logi FALSE
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_line" "element"
$ axis.line.y : list()
..- attr(*, "class")= chr [1:2] "element_blank" "element"
$ legend.background :List of 5
..$ fill : NULL
..$ colour : NULL
..$ size : NULL
..$ linetype : num 0
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_rect" "element"
$ legend.spacing : 'simpleUnit' num 15points
..- attr(*, "unit")= int 8
$ legend.key :List of 5
..$ fill : NULL
..$ colour : NULL
..$ size : NULL
..$ linetype : num 0
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_rect" "element"
$ legend.key.size : 'simpleUnit' num 1.2lines
..- attr(*, "unit")= int 3
$ legend.key.height : NULL
$ legend.key.width : NULL
$ legend.text :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : 'rel' num 1.25
..$ hjust : NULL
..$ vjust : NULL
..$ angle : NULL
..$ lineheight : NULL
..$ margin : NULL
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ legend.text.align : NULL
$ legend.title :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : 'rel' num 1
..$ hjust : num 0
..$ vjust : NULL
..$ angle : NULL
..$ lineheight : NULL
..$ margin : NULL
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ legend.title.align : NULL
$ legend.position : chr "top"
$ legend.direction : NULL
$ legend.justification: chr "center"
$ panel.background :List of 5
..$ fill : NULL
..$ colour : NULL
..$ size : NULL
..$ linetype : num 0
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_rect" "element"
$ panel.border : list()
..- attr(*, "class")= chr [1:2] "element_blank" "element"
$ panel.spacing : 'simpleUnit' num 0.25lines
..- attr(*, "unit")= int 3
$ panel.grid.major :List of 6
..$ colour : chr "white"
..$ size : 'rel' num 1.75
..$ linetype : NULL
..$ lineend : NULL
..$ arrow : logi FALSE
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_line" "element"
$ panel.grid.minor : list()
..- attr(*, "class")= chr [1:2] "element_blank" "element"
$ plot.background :List of 5
..$ fill : Named chr "#d5e4eb"
.. ..- attr(*, "names")= chr "blue-gray"
..$ colour : logi NA
..$ size : NULL
..$ linetype : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_rect" "element"
$ plot.title :List of 11
..$ family : NULL
..$ face : chr "bold"
..$ colour : NULL
..$ size : 'rel' num 1.5
..$ hjust : num 0
..$ vjust : NULL
..$ angle : NULL
..$ lineheight : NULL
..$ margin : NULL
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ plot.margin : 'simpleUnit' num [1:4] 12points 10points 12points 10points
..- attr(*, "unit")= int 8
$ strip.background :List of 5
..$ fill : Named chr NA
.. ..- attr(*, "names")= chr NA
..$ colour : logi NA
..$ size : NULL
..$ linetype : num 0
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_rect" "element"
$ strip.text :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : 'rel' num 1.25
..$ hjust : NULL
..$ vjust : NULL
..$ angle : NULL
..$ lineheight : NULL
..$ margin : NULL
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ strip.text.x :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : NULL
..$ hjust : NULL
..$ vjust : NULL
..$ angle : NULL
..$ lineheight : NULL
..$ margin : NULL
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ strip.text.y :List of 11
..$ family : NULL
..$ face : NULL
..$ colour : NULL
..$ size : NULL
..$ hjust : NULL
..$ vjust : NULL
..$ angle : num -90
..$ lineheight : NULL
..$ margin : NULL
..$ debug : NULL
..$ inherit.blank: logi TRUE
..- attr(*, "class")= chr [1:2] "element_text" "element"
$ panel.grid.major.x : list()
..- attr(*, "class")= chr [1:2] "element_blank" "element"
- attr(*, "class")= chr [1:2] "theme" "gg"
- attr(*, "complete")= logi TRUE
- attr(*, "validate")= logi TRUE
# Boxplot of the Expected price of moderately-priced brands
ggplot(data=mod_ev, aes(x=`Electric Range`, y=`Expected Price ($1k)`)) +
geom_point(outlier.colour="blue", outlier.shape=8,
outlier.size=2, notch=FALSE) +
scale_x_continuous(name="Electric range (miles)",breaks=seq(0,350,50)) +
scale_y_continuous(name="Expected price (thousands)",limits=c(0,200),breaks=seq(0,200,50))
# Facet wraps of electric range and expected price
ggplot(lux_ev, aes(`Electric Range`)) +
geom_histogram(binwidth = 5) +
labs(title = "Luxury brand electric range") +
theme_bw() +
facet_wrap(vars(`Electric Vehicle Type`)) +
theme_economist()
ggplot(lux_ev, aes(`Expected Price ($1k)`)) +
geom_histogram(binwidth = 100) +
labs(title = "Luxury brand expected price") +
theme_bw() +
facet_wrap(vars(Make)) +
theme_economist()
# Calculate frequencies for the following categorical variables: EV Type, Make, Electric Utility and County
table(lux_ev$`Electric Vehicle Type`)
Battery Electric Vehicle (BEV)
30800
Plug-in Hybrid Electric Vehicle (PHEV)
3255
table(mod_ev$`Electric Vehicle Type`)
Battery Electric Vehicle (BEV)
16950
Plug-in Hybrid Electric Vehicle (PHEV)
13150
prop.table(table(lux_ev$`Electric Vehicle Type`))
Battery Electric Vehicle (BEV)
0.90441932
Plug-in Hybrid Electric Vehicle (PHEV)
0.09558068
prop.table(table(mod_ev$`Electric Vehicle Type`))
Battery Electric Vehicle (BEV)
0.5631229
Plug-in Hybrid Electric Vehicle (PHEV)
0.4368771
Call:
lm(formula = `Expected Price ($1k)` ~ `Electric Range`, data = lux_ev)
Residuals:
Min 1Q Median 3Q Max
-25.98 -7.87 0.15 8.52 1035.42
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.448e+01 1.426e-01 452.087 <2e-16 ***
`Electric Range` 1.772e-03 8.036e-04 2.206 0.0274 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 17.87 on 34053 degrees of freedom
Multiple R-squared: 0.0001428, Adjusted R-squared: 0.0001135
F-statistic: 4.865 on 1 and 34053 DF, p-value: 0.02742
lux_ev %>% filter(`Expected Price ($1k)`< 150) %>%
ggplot(aes(`Electric Range`, `Expected Price ($1k)`)) +
geom_point() +
scale_y_continuous(name ="Expected price (in thousands)", breaks=seq(0,150,25)) +
theme_economist()
pred <- predict(range.p1)
lux_ev %>% ggplot(mapping=aes(x=`Electric Range`)) +
geom_point(aes(y=`Expected Price ($1k)`)) +
geom_jitter(aes(y=pred), color="red") +
scale_y_continuous(name ="Expected price (in thousands)", breaks=seq(0,250,50)) +
theme_economist()
Call:
lm(formula = `Expected Price ($1k)` ~ `Electric Range`, data = mod_ev)
Residuals:
Min 1Q Median 3Q Max
-22.770 -5.440 -1.973 5.824 15.072
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.277e+01 5.959e-02 382.10 <2e-16 ***
`Electric Range` 1.122e-02 5.483e-04 20.46 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.952 on 30098 degrees of freedom
Multiple R-squared: 0.01372, Adjusted R-squared: 0.01368
F-statistic: 418.6 on 1 and 30098 DF, p-value: < 2.2e-16
mod_ev %>%
ggplot(aes(`Electric Range`, `Expected Price ($1k)`)) +
geom_point() +
theme_economist()
pred <- predict(range.p1)
mod_ev %>% ggplot(mapping=aes(x=`Electric Range`)) +
geom_point(aes(y=`Expected Price ($1k)`)) +
geom_jitter(aes(y=pred), color="red") +
theme_economist()
Call:
lm(formula = `Expected Price ($1k)` ~ `Electric Range`, data = lux_ev)
Residuals:
Min 1Q Median 3Q Max
-25.98 -7.87 0.15 8.52 1035.42
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.448e+01 1.426e-01 452.087 <2e-16 ***
`Electric Range` 1.772e-03 8.036e-04 2.206 0.0274 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 17.87 on 34053 degrees of freedom
Multiple R-squared: 0.0001428, Adjusted R-squared: 0.0001135
F-statistic: 4.865 on 1 and 34053 DF, p-value: 0.02742
lux_ev %>%
ggplot(aes(`Electric Range`, `Expected Price ($1k)`)) +
geom_point()
pred <- predict(range.lb)
lux_ev %>% ggplot(mapping=aes(x=`Electric Range`)) +
geom_point(aes(y=`Expected Price ($1k)`)) +
geom_jitter(aes(y=pred), color="red") +
theme_economist()
evs_rich_zips <- lux_ev %>%
filter(City == "REDMOND" | City == 'SEATTLE' | City == 'BELLEVUE' | City == 'MEDINA') %>% #
select(`ZIP Code`, City, Make, Model, `Electric Vehicle Type`, `Electric Range`, `Expected Price ($1k)`)
range.p2 <- lm(`Expected Price ($1k)`~`Electric Range`, data=evs_rich_zips)
summary(range.p2)
Call:
lm(formula = `Expected Price ($1k)` ~ `Electric Range`, data = evs_rich_zips)
Residuals:
Min 1Q Median 3Q Max
-27.061 -8.094 -0.115 7.439 76.439
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 65.560588 0.228780 286.566 <2e-16 ***
`Electric Range` -0.002120 0.001274 -1.664 0.0962 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 15.65 on 10424 degrees of freedom
Multiple R-squared: 0.0002655, Adjusted R-squared: 0.0001696
F-statistic: 2.769 on 1 and 10424 DF, p-value: 0.09616
evs_rich_zips %>%
ggplot(aes(`Electric Range`, `Expected Price ($1k)`)) +
geom_point()
pred <- predict(range.p2)
evs_rich_zips %>% ggplot(mapping=aes(x=`Electric Range`)) +
geom_point(aes(y=`Expected Price ($1k)`)) +
geom_jitter(aes(y=pred), color="red") +
theme_economist()
#Univariate barplot showing the tpye of EVs in Washington's most expensive cities
ggplot(data=evs_rich_zips, aes(x=`Electric Vehicle Type`)) +
geom_bar() +
theme_economist()
#Bivariate plot telling us which EV makes are present in each of the 4 wealthier cities
ggplot(data=evs_rich_zips, aes(x=City, y=Make)) +
geom_point() +
theme_economist()
#Bivariate plot giving an overview of the electric range of EVs in the high-end cities
ggplot(data=evs_rich_zips, aes(x=`Electric Range`,y=Make)) +
geom_point() +
scale_x_continuous(name="Electric range (miles)",breaks=seq(0,350,50)) +
theme_economist()
lux_rich_zips <- ggplot(data=lux_ev, aes(x=Make)) +
geom_bar() +
theme_economist()
lux_rich_zips <- lux_ev %>% filter(rank(desc(`Expected Price ($1k)`))<=3)
ggplot(data=lux_rich_zips, aes(x=Make)) +
geom_bar() +
theme_economist()
In this report, I have examined a dataset of Electric Vehicles in Washington State and discovered no correlation between the range of an EV and its expected price for luxury as well as modestly-priced models. This could be attributed to a more complicated relationship which cannot be distilled to a simple linear regression. By running a linear model on previously identified high-income cities I wanted to explore regions where luxury brands may be interested in ramping up their advertising efforts. No correlation was discovered by the Make of a vehicle in the LUX group and the communities where certain luxury models are more likely to be present.
https://electrek.co/2022/03/25/washington-passes-bill-targeting-all-electric-car-sales-by-2030-for-real-this-time/
https://cars.usnews.com/cars-trucks/rankings/luxury-electric-cars
https://dor.wa.gov/education/industry-guides/auto-dealers/federal-taxes
https://www.bloomberg.com/news/articles/2022-01-14/the-cost-of-a-new-car-won-t-be-dropping-anytime-soon
https://www.kaggle.com/datasets/rithurajnambiar/electric-vehicle-data
https://ofm.wa.gov/washington-data-research/economy-and-labor-force/median-household-income-estimates