Over the years, the use of electric vehicles has skyrocketed. This analysis will explore just how much the EV population has increased over the years, and where EVs seem to be used the most. After completing this analysis, one will be able to determine the most popular EV makes and their corresponding average electric ranges, the most popular years when it comes to EV population, the clean alternative fuel vehicle eligibility of these EVs, and which states have the greatest population of EVs.
The data set consists of 138,779 observations and 17 variables. The following variables were investigated throughout this analysis: EV Make, Electric Range, State, Model Year, Clean Alternative Fuel Vehicle Eligibility, and City. The data set in its entirety included EV data between the years 1997-2024. However, most of the data that was looked at for the purposes of this analysis revolves around the years 2016-2023. In addition, not only were a wide variety of U.S. states (including Washington DC) included in this data set, but territories located in other countries were also included as well. However, for the purposes of this analysis, the U.S. states and Washington D.C. were the only locations observed.
The following tabs provide visualizations that will help to answer the above questions. The charts will be looking at EV counts in relation to make, model year, clean alternative fuel vehicle eligibility, and state. It will also be looking at the average electric ranges of each make and will then compare these values to their associated EV populations.
The first visualization created looks at the top 10 most frequently used EVs and compares them to their corresponding average electric ranges through a dual axis bar chart. Based on the visualization, it is not only evident that Tesla is the most commonly used EV with a count of 25,603, but it also possesses the highest average electric range as well with a range of 240.25. In a way, this makes sense because a consumer is likely to want to purchase an EV that has a longer lasting battery rather than a shorter lasting one. One interesting thing to note is that though Kia EVs have a fairly long lasting battery with an average range of 95.21, they rank seventh in EV count in the data set with a count of 3,298. Though not as extreme, a similar story can be told for Chevrolet, with an EV count of 8,704 but an average electric range of 128.04.
import os
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'C:/ProgramData/Anaconda3/Library/plugins/platforms'
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
import plotly.graph_objects as go
import seaborn as sns
from matplotlib.ticker import FuncFormatter
warnings.filterwarnings("ignore")
path = "U:/"
filename = path + 'Electric_Vehicle_Population_Data.csv'
df = pd.read_csv(filename,
usecols = ['Make', 'Electric Range', 'State', 'Model Year',
'Clean Alternative Fuel Vehicle (CAFV) Eligibility', 'City'])
x = df[df['Electric Range'] != 0]
x = x.groupby(['Make']).agg({'Make':['count'], 'Electric Range':['mean']}).reset_index()
x.columns = ['Make', 'MakeCount', "AverRange"]
x = x.sort_values('MakeCount', ascending = False)
x.reset_index(inplace=True, drop=True)
x['Make'] = x['Make'].str.title()
bottom1 = 0
top1 = 9
d1 = x.loc[bottom1:top1]
def autolabel(these_bars, this_ax, place_of_decimals):
for each_bar in these_bars:
height = each_bar.get_height()
this_ax.text(each_bar.get_x()+each_bar.get_width()/2, height*1.01, format(height,place_of_decimals),
fontsize=9, color='black', ha='center', va='bottom')
fig = plt.figure(figsize=(18,10))
ax1 = fig.add_subplot(1, 1, 1)
ax2 = ax1.twinx()
bar_width = 0.4
fig.suptitle('Only Includes EVs Where Battery Range Has Been Researched:', fontsize=18, fontweight='bold')
x_pos = np.arange(10)
count_bars = ax1.bar(x_pos-(0.5*bar_width), d1.MakeCount, bar_width, color='red', edgecolor='black', label='EV Count')
aver_range_bars = ax2.bar(x_pos+(0.5*bar_width), d1.AverRange, bar_width, color='blue', edgecolor='black', label='Average Range')
ax1.set_xlabel('EV Name', fontsize=18)
ax1.set_ylabel('Count of EV Vehicles', fontsize=18, labelpad=2)
ax2.set_ylabel('Average Range', fontsize=18, rotation=270, labelpad=20)
ax1.tick_params(axis='y', labelsize=14)
ax2.tick_params(axis='y', labelsize=14)
plt.title('EV Count and Average Range Analysis\n Top 10 Most Frequently Used EVs', fontsize=18)
ax1.set_xticks(x_pos)
ax1.set_xticklabels(d1.Make, fontsize=14)
ax1.yaxis.set_major_formatter(FuncFormatter( lambda i, pos:f'{int(i):,}'))
count_color, count_label = ax1.get_legend_handles_labels()
range_color, range_label = ax2.get_legend_handles_labels()
legend = ax1.legend(count_color + range_color, count_label + range_label, loc='upper right', frameon = True, ncol = 1, shadow = True,
borderpad=1, fontsize=14)
autolabel(count_bars, ax1,',.0f')
autolabel(aver_range_bars, ax2, '.2f')
plt.show()
The second visualization conveys a scatter chart that looks at EV population counts between the years of 2019-2023. The scatter chart organizes the EV makes so that if the make possesses an EV count that is less than 200, it is combined into an EV make category labeled “Other.” By doing so, the immaterial EV makes do not take attention away from the materially dominant EV makes. When looking at this visualization, it is apparent that Tesla still holds the top spot for being the most commonly used EV based on the size of the stars. However, in addition to this piece of information, one can see how much the usage of EVs has increased between 2019-2023. When looking at 2019-2021, the number of EV makes that did not yet exist is much greater during those years than it is between 2022-2023. Their lack of existence is evident through the blank spaces next to the EV names on the scatter plot. In 2023, the only make that had not yet developed an EV is Porsche. However, in 2019, there are 12 EV makes that had not yet explored the development of EVs.
x1 = df.groupby(['Model Year', 'Make'])['Model Year'].count().reset_index(name='count')
x1 = pd.DataFrame(x1)
x1 = x1[(x1['Model Year'] >= 2019) & (x1['Model Year'] <= 2023)]
x1['Make'] = x1.apply(lambda row: 'Other' if row['count'] < 200 else row['Make'], axis=1)
x1['count_tens'] = round(x1['count']/10, 0)
x1 = x1.reset_index(drop=True)
x1 = x1.sort_values('count', ascending = False)
x1 = x1.reset_index(drop=True)
x1['Make'] = x1['Make'].str.title()
plt.figure(figsize=(18,10))
plt.scatter(x1['Model Year'], x1['Make'], marker='*', cmap='plasma',
c=x1['count_tens'], s=x1['count_tens'], edgecolors='black')
plt.title('Total EV Makes Between 2019-2023', fontsize=20)
plt.xlabel('Year', fontsize=20)
plt.ylabel('Make', fontsize=20)
cbar = plt.colorbar()
cbar.set_label('Number of EV Makes', rotation=270, fontsize=20, color='black', labelpad=30)
my_colorbar_ticks = [*range(100, int(x1['count_tens'].max()), 100 )]
cbar.set_ticks(my_colorbar_ticks)
my_colorbar_tick_labels = [*range(1000, int(x1['count'].max()), 1000 )]
my_colorbar_tick_labels = ['{:,}'.format(each) for each in my_colorbar_tick_labels]
cbar.set_ticklabels(my_colorbar_tick_labels, fontsize=14)
my_x_ticks = [*range( x1['Model Year'].min(), x1['Model Year'].max()+1, 1 )]
plt.xticks(my_x_ticks, fontsize=14, color='black')
plt.yticks(fontsize=14, color='black')
plt.gca().invert_yaxis()
plt.show()
The third visualization reveals whether each make type is eligible for clean alternative fuel status. As seen in the previous visualizations, Tesla continues to prosper and hold its position as the leading EV make with a total EV count that is greater than 60,000. In addition, Tesla also possess the most vehicles with clean alternative fuel eligibility, totaling about 20,000 vehicles. Nissan and Chevrolet are the next top holders when it comes to possessing clean alternative fuel eligibility. It is interesting to note that Jeep and Toyota have more EV vehicles without eligibility, due to low battery range, than they do with eligibility. Another interesting thing to note is that while Tesla dominates the EV market in terms of EV population, they also have more vehicles for which clean alternative fuel eligibility is unknown, due to insufficient data on battery range, than vehicles that are eligible. The number of vehicles Tesla possesses with unknown eligibility is about 40,000 whereas the number of vehicles possessed with eligibility is only about 20,000.
x2 = df.groupby(['Make', 'Clean Alternative Fuel Vehicle (CAFV) Eligibility']).agg({'Make':['count']}).reset_index()
x2.columns = ['Make', 'CAFV_Eligible', "TotalCount"]
x2.reset_index(inplace=True, drop=True)
x2['Make'] = x2['Make'].str.title()
stacked_df = x2.pivot(index='Make', columns='CAFV_Eligible', values='TotalCount')
CAFV_order = ['Clean Alternative Fuel Vehicle Eligible','Eligibility unknown as battery range has not been researched','Not eligible due to low battery range']
stacked_df = stacked_df.reindex(columns=reversed(CAFV_order))
fig = plt.figure(figsize=(18,10))
ax = fig.add_subplot(1, 1, 1)
stacked_df.plot(kind='bar', stacked=True, ax=ax)
plt.ylabel('Total EV Count', fontsize=18, labelpad=10)
plt.title('Clean Alternative Fuel Vehicle Eligibility by EV Make', fontsize=18)
plt.yticks(fontsize=14)
plt.xticks(fontsize=14)
ax.set_xlabel('EV Make', fontsize=18)
handles, labels = ax.get_legend_handles_labels()
handles = [ handles[2], handles[1], handles[0] ]
labels = [ labels[2], labels[1], labels[0],]
plt.legend(handles, labels, loc='best')
ax.yaxis.set_major_formatter(FuncFormatter( lambda i, pos:f'{int(i):,}'))
fig.subplots_adjust(bottom=0.4)
plt.show()
The fourth visualization shows the difference in EV population between the years 2022 and 2023. The waterfall diagram only included EV makes that had data in both 2022 and 2023 and dropped any makes that did not possess data in either one or both of the years. From the visualization, it is apparent that overall, the number of EVs decreased from 2022 to 2023 by 2,210. Something else that is interesting to note is that though in previous visualizations Tesla held the top spot for highest EV population, their EV population decreased by 1,033 vehicles between 2022 and 2023. A potential reason for the decline in Tesla population can be due to the arrival of other makes. For a long while, Tesla remained the dominant force in the EV industry because they were one of the few companies to make EV vehicles. However, with more makes making themselves known in the EV world, it is likely that this domination will lessen as consumers broaden their purchases and explore other makes.
x3 = df[['Make', 'Model Year']]
x3['check'] = x3['Model Year'].isin([2022,2023])
x3 = x3[x3['check'] != False]
x3 = x3.groupby(['Make', 'Model Year']).agg({'Make':['count']}).reset_index()
x3.columns = ['Make', 'ModelYear', "TotalCount"]
x3.reset_index(inplace=True, drop=True)
x3 = x3[x3['Make'] != 'BENTLEY']
x3 = x3[x3['Make'] != 'CADILLAC']
x3 = x3[x3['Make'] != 'GENESIS']
x3 = x3[x3['Make'] != 'LAND ROVER']
x3 = x3[x3['Make'] != 'SUBARU']
x3['Make'] = x3['Make'].str.title()
col1 = x3[x3['ModelYear'] == 2022].groupby(['Make'])['TotalCount'].sum().reset_index(name='Count2022')
col2 = x3[x3['ModelYear'] == 2023].groupby(['Make'])['TotalCount'].sum().reset_index(name='Count2023')
wf_df = x3[x3['ModelYear'].isin([2022,2023])].groupby(['Make'])['TotalCount'].sum().reset_index(name='TotalSum')
wf_df['TotalCount2022'] = col1['Count2022']
wf_df['TotalCount2023'] = col2['Count2023']
wf_df['Deviation'] = wf_df.TotalCount2023 - wf_df.TotalCount2022
wf_df['Percentage'] = (wf_df.TotalCount2023 - wf_df.TotalCount2022)/wf_df.TotalCount2022
wf_df.loc[wf_df.index.max()+1] = ['Total',
wf_df.TotalSum.sum(),
wf_df.TotalCount2022.sum(),
wf_df.TotalCount2023.sum(),
wf_df.Deviation.sum(),
((wf_df.TotalCount2023 - wf_df.TotalCount2022)/wf_df.TotalCount2022).sum()]
if wf_df.loc[23, 'Deviation'] > 0:
end_color = 'black'
elif wf_df.loc[23, 'Deviation'] < 0:
end_color = 'red'
else: end_color = 'blue'
fig = go.Figure(go.Waterfall( name = '', orientation = 'v', x = wf_df['Make'], textposition = 'outside',
measure = ['relative', 'relative', 'relative', 'relative', 'relative', 'relative', 'relative',
'relative', 'relative', 'relative','relative', 'relative', 'relative', 'relative',
'relative', 'relative', 'relative','relative', 'relative', 'relative', 'relative',
'relative', 'total'],
y= wf_df['Deviation'],
text = ['{:,.0f}'.format(each) for each in wf_df['TotalSum']],
decreasing = {'marker':{'color':'red'}},
increasing = {'marker':{'color':'green'}},
totals = {'marker':{'color':end_color}},
hovertemplate = 'Cumulative Deviation to Date: ' + '%{y:,.0f}' + '<br>' +
'Total Count in %{x}: %{text}',
textfont=dict(family='Arial', size=9, color='black')))
fig.layout = go.Layout(yaxis=dict(tickformat = ',.0f', zeroline=True))
fig.update_layout(
xaxis = dict(title=dict(text='Make', font=dict(family='Arial', size=12, color='black'))),
yaxis = dict(title=dict(text='Total EV Count', font=dict(family='Arial', size=12, color='black'))),
title=dict(
text='EV Deviation Between 2022 and 2023<br>Surpluses Appear in Green, Deficits Appear in Red',
font=dict(family='Arial', size=12, color='black')),
template='simple_white',
title_x=0.5,
showlegend=False,
autosize=True,
margin=dict(l=30, r=30, t=60, b=30))
#fig.show()
import plotly.io as pio
pio.write_html(fig, path + "plotly_result.html", auto_open=False)
The fifth visualization is a heatmap that conveys the population of each EV make by year for the years 2016-2023. Again, it is evident that Tesla holds the top spot for greatest EV population. It is also evident that overtime, the EV population increased. This can be seen as the colors in the heatmap lighten in color and turn into a purple. Tesla holds the largest number of EVs in 2022 with a total of 13,912 EVs and Mitsubishi holds the lowest number of EVs in 2017 with a total of 1 EV.
x4 = df.groupby(['Model Year','Make']).agg({'Make':['count']}).reset_index()
x4.columns = ['ModelYear', 'Make', "TotalCount"]
x4.reset_index(inplace=True, drop=True)
x4 = x4[(x4['ModelYear'] >= 2016) & (x4['ModelYear'] <= 2023)]
x4['Make'] = x4['Make'].str.title()
hm_df = x4.pivot(index='Make', columns='ModelYear', values='TotalCount')
hm_df = hm_df.dropna()
fig = plt.figure(figsize=(15,15))
ax = fig.add_subplot(1, 1, 1)
comma_fmt = FuncFormatter(lambda x, p: format(int(x), ','))
ax=sns.heatmap(hm_df, linewidth=0.2, annot=True, cmap='Spectral', fmt=',.0f', square = True, annot_kws={'size': 11},
cbar_kws = {'format': comma_fmt, 'orientation': 'vertical'})
plt.title('Number of EVs by Year', fontsize=18, pad=15)
plt.xlabel('Year', fontsize=18, labelpad=10)
plt.ylabel('Make', fontsize=18, labelpad=10)
plt.yticks(rotation=0, size=14)
plt.xticks(size=14)
ax.invert_yaxis()
cbar = ax.collections[0].colorbar
max_count = hm_df.to_numpy().max()
my_colorbar_ticks = [*range(0, int(max_count), 1000)]
cbar.set_ticks(my_colorbar_ticks)
my_colorbar_tick_labels = ['{:,}'.format(each) for each in my_colorbar_ticks]
cbar.set_ticklabels(my_colorbar_tick_labels)
cbar.set_label('Number of EVs', rotation=270, fontsize=14, color='black', labelpad=20)
plt.show()
The sixth visualization provides a yearly ranking of EV makes and how they fluctuate between the years 2016-2023. The chart only includes EV makes that consistently possessed data between 2016-2023, and dropped any makes that had no data in these corresponding years. From the bump chart, it is apparent that Tesla was the make that fluctuated the least and held the top position for all years except for 2017. Excluding Tesla, it appears that no other make remains consistently in the top or the bottom when it comes to EV population. It is evident that each year, the success of each EV make varies. Therefore, besides Tesla, there does not appear to be consistency among dominant EV makes and less dominant EV makes.
bump_df = df.groupby(['Model Year','Make']).agg({'Make':['count']}).reset_index()
bump_df.columns = ['ModelYear', 'Make', "TotalCount"]
bump_df.reset_index(inplace=True, drop=True)
bump_df = bump_df[(bump_df['ModelYear'] >= 2016) & (bump_df['ModelYear'] <= 2023)]
bump_df['Make'] = bump_df['Make'].str.title()
bump_df = bump_df.pivot(index='Make', columns='ModelYear', values='TotalCount')
bump_df = bump_df.dropna()
bump_df_ranked = bump_df.rank(0, ascending=False, method='min')
bump_df_ranked = bump_df_ranked.T
fig = plt.figure(figsize=(18,10))
ax = fig.add_subplot(1, 1, 1)
bump_df_ranked.plot(kind='line', ax=ax, marker='o', markeredgewidth = 1, linewidth = 6,
markersize = 18, markerfacecolor = 'white')
ax.invert_yaxis()
num_rows = bump_df_ranked.shape[0]
num_cols = bump_df_ranked.shape[1]
plt.ylabel('Yearly Ranking', fontsize=18, labelpad=10)
plt.title('Ranking of Total EVs by Year', fontsize=18, pad=15)
plt.yticks(range(1, num_cols+1, 1), fontsize=14)
plt.xticks(fontsize=14)
ax.set_xlabel('Year', fontsize=18)
handles, labels = ax.get_legend_handles_labels()
handles = [handles[9], handles[7],handles[3], handles[5],handles[1], handles[2], handles[0], handles[10], handles[8],
handles[4],handles[6]]
labels = [labels[9], labels[7], labels[3], labels[5], labels[1], labels[2], labels[0], labels[10], labels[8], labels[4],
labels[6]]
ax.legend(handles, labels, bbox_to_anchor=(1.01,1.01), fontsize=14)
plt.show()
The final visualization depicts a pie chart that includes information regarding EV count in relation to U.S. states. California, Virginia, Maryland, and Texas are the top states with the highest EV population, each holding percentages of 29.17%, 10.26%, 9.29%, and 5.77% respectively. One thing to note is that the states listed in the “Other” category were states that had EV counts less than 7, and were therefore considered immaterial when it came to the data set as a whole. Another thing to note is that the state of Washington was excluded from the data set given its extremely high EV population count. If included in the chart, it would have been considered an outlier and would have overpowered the remainder of the data. A final thing to note is that Washington DC was included in the analyzed data, even though it is not technically considered a state.
x5 = df[['Make', 'State', 'City']]
x5 = x5.dropna()
x5 = x5[x5['State'] != 'WA']
x5 = x5.groupby(['State']).agg({'Make':['count']}).reset_index()
x5.columns = ['State', 'TotalCount']
x5.reset_index(inplace=True, drop=True)
x5['State'] = x5.apply(lambda row: 'Other' if row['TotalCount'] < 7 else row['State'], axis=1)
x5 = x5.groupby(['State']).agg({'TotalCount':['sum']}).reset_index()
x5.columns = ['State', 'TotalCount']
x5.reset_index(inplace=True, drop=True)
number_outside_colors = len(x5.State.unique())
outside_color_ref_number = np.arange(number_outside_colors)
fig = plt.figure(figsize=(7,7))
ax = fig.add_subplot(1, 1, 1)
colormap = plt.get_cmap("Paired")
outer_colors = colormap(outside_color_ref_number)
all_evs = x5.TotalCount.sum()
x5.groupby(['State'])['TotalCount'].sum().plot(
kind='pie', radius=1, colors=outer_colors, pctdistance = 0.85, labeldistance = 1.05,
wedgeprops = dict(edgecolor='white'),
textprops={'fontsize':7}, autopct = lambda p: '{:.2f}%\n({:.0f})'.format(p,(p/100)*all_evs), startangle=90)
hole = plt.Circle((0,0), 0.3, fc='white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)
ax.yaxis.set_visible(False)
plt.title('Total EVs by U.S. State (Including DC and Excluding WA)', fontsize = 14)
ax.axis('equal')
plt.tight_layout()
ax.text(0, 0, 'Total Count\n' + str(all_evs), size = 14, ha='center', va='center')
plt.show()
After completing this analysis, the topics listed in the introduction can now be addressed.
It is apparent that the EV population has increased drastically between the years of 2016-2023 and seems to have the greatest count during the years 2022 and 2023. The scatter chart and heat map visually illustrate this increase. However, there seems to be an overall drop in EV population between the years 2022-2023 by 2,210 EVs. The waterfall diagram conveys this statistic. Therefore, though there seemed to be a surge in EV usage in the late 2010s and early 2020s, it seems to be starting to level out as time moves forward. A likely cause for this leveling out is due to a spike in people purchasing EV vehicles between the years 2016-2022. There are now less people available to purchase these EV vehicles because of this spike, resulting in the decline seen between 2022 and 2023.
Tesla is shown to be the EV that has generated the most success out of all other EV makes. Their success is depicted in the dual bar chart, scatter chart, stacked bar chart, heat map, and bump chart. They consistently are shown to have the highest count when it comes to population and are ranked as the number 1 make more times than any other EV make is. Therefore, it is safe to assume that Tesla is the most successful EV make out of all other EV makes.
Not only is Tesla considered the most popular EV make, but they also possess the greatest average electric range out of all other EV makes. As seen in the dual bar chart, average electric range typically correlates with EV population. If the EV population of a certain make increases, so does the average electric range of that vehicle, indicating a positive correlation. However, there are a few instances where this correlation is not necessarily mimicked. Kia is ranked 7th in EV population, but has the 4th highest average electric range. Chevrolet is ranked 3rd in EV population but has the 2nd highest average electric range. However, even with these unexpected differences, it is still safe to assume that average electric range correlates with EV population.
When looking at the clean alternative fuel vehicle eligibility of these EVs, one can look at the stacked bar chart and see that Tesla holds the greatest number of vehicles with this eligibility. Other notable makes that held a relatively large amount of eligibility compared to their total population count includes Nissan and Chevrolet. Though this chart further proves Tesla’s dominance in the EV industry, it also points out that they have a large number of EV vehicles where their clean alternative fuel vehicle eligibility is unknown due to their battery ranges not being researched.
Finally, in looking at the pie chart, one can see that California possesses the largest number of EVs, making up about 30% of all EVs in the U.S. This is understandable because California is the most populous state in the U.S., so they have a larger pool of potential buyers available to purchase these EVs. Excluding the states labelled as “Other,” following behind California in possessing the largest EV population is Virginia, Maryland, and Texas.