First, all appropriate libraries are imported for the proper execution of the below code chunks. Additionally, the data set is pulled in from the appropriate local file path.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import matplotlib.colors as mcolors
from matplotlib.ticker import FuncFormatter
import folium
import math
import seaborn as sns
file = 'Electric Vehicle Population Data.csv'
path = '/Users/andrewkadish/Desktop/Loyola/DS736/Python Project Files/'
evs = pd.read_csv(path+file)
evs
The data set used for this analysis is ‘Electric Vehicle Population Data’, and was pulled from Kaggle.com. The data set contains information about electric vehicles, including the year the model was produced, location information for where it is registered, and attributes like range, EV type, and clean energy status.
Using a line chart, we will explore the top electric vehicle manufacturers in terms of overall counts produced. The data will map the top 5 manufacturers that have been making electric vehicles consistently for the last 10 years, spanning from 2015 to 2024. This will provide an indication of how quickly the top overall EV manufacturers are growing in terms of their EV production.
# how many EVs does each manufacturer produce?
make_counts = evs['Make'].value_counts().reset_index()
# grab top 5 makers
my_makes = list(make_counts.head(5)['Make'])
# how many created in each year?
yr_counts = evs['Model Year'].value_counts().reset_index().sort_values('Model Year', ascending=False).reset_index(drop=True)
# limit to most recent 10 years, excluding the latest (2025) as the data is incomplete
most_current = yr_counts['Model Year'].max()
my_yrs = list(np.arange(most_current-10, most_current))
# combine years and makes into a new data frame
mycols = ['Make','Model Year']
mk_yr_df = evs[mycols].value_counts().reset_index().sort_values(mycols).reset_index(drop=True)
# updates mk_yr_df with only the makes and years of interest
mk_yr_df = mk_yr_df.loc[mk_yr_df['Make'].isin(my_makes)]
mk_yr_df = mk_yr_df.loc[mk_yr_df['Model Year'].isin(my_yrs)].reset_index(drop=True)
# let's make the counts cumulative for each manufacturer
makes_plus_df = mk_yr_df.copy()
i = 0
for mk in list(makes_plus_df['Make']):
if i != 0:
if list(makes_plus_df['Make'])[i] == list(makes_plus_df['Make'])[i-1]:
# if conditions met, increments current count by last index count
makes_plus_df.loc[i, 'count'] += makes_plus_df.loc[i-1, 'count']
i += 1
#----------------------------------------------------------------------------------------------------------------#
# make a line chart plotting numbers over time
from matplotlib.ticker import FuncFormatter
fig = plt.figure(figsize=(18,10))
ax = fig.add_subplot()
# setting up colors for each make
color_list = ['gray', 'goldenrod', 'darkblue', 'darkred', 'darkgreen']
colors = {} # initializing dict
for i in np.arange(len(my_makes)):
colors[my_makes[i]] = color_list[i]
for key, grp in makes_plus_df.groupby('Make'):
grp.plot(ax=ax, x='Model Year', y='count', label=key, color=colors[key], marker='o')
ax.set_title('Registered EV Counts for Top 5 EV Manufacturers\n(US - Cumulative by Year)', fontsize=20)
ax.set_xticks(np.arange(max(my_yrs)-10, max(my_yrs)+1)) # making space for labels
ax.set_xlabel('Year', fontsize=16)
ax.set_ylabel('EVs Registered', fontsize=16)
ax.tick_params(axis='y', labelsize=12)
ax.legend(title="Make", fontsize=14, title_fontsize=18)
# for loop to grab the top count & make for each set of yearly data
for i, yr in enumerate(my_yrs):
yr_rows = makes_plus_df[makes_plus_df['Model Year'] == yr]
top = yr_rows.groupby('Make').agg({'count':'sum'}).reset_index()
top_count = top['count'].max()
top_name = top.loc[top['count'] == top_count, 'Make'].iloc[0]
# output text with defined values
ax.text(yr+0.02, top_count+1500,
'Top: ' + top_name + '\n' + 'EVs: ' + str(round(top_count/1000,1)) + "k\n|",
fontsize=12, ha='right')
ax.set_yticks(np.arange(0,top_count+10000,10000))
plt.ylim(0,top_count+10000)
ax.yaxis.set_major_formatter(FuncFormatter(lambda x, pos: f'{x/1000:.1f}k'))
plt.show()
Based on the line chart above, Tesla clearly dominates the EV market in terms of EVs produced overall. While Nissan appeared to have the edge in the mid 2010s, by 2017 Tesla had taken over, and over each year broadened its lead over the competition. While the other top 5 EV makers peaked at around 10-20K EVs produced, Tesla’s strong and steady growth saw it reaching almost 100K registered EVs by the end of 2024. This is likely due to Tesla’s status as a solely EV manufacturer, whereas all the other top EV makers primarily produce gasoline-powered vehicles and EVs are just a portion of their inventory. While Chevy appears to be the top cumulative EV producer next to Tesla, it only appears to have a slight edge over the rest of the competition.
Using a horizontal bar chart, we will look to gain insight into the vehicles with the top electric ranges on average. This analysis will focus primarily on EV models, but will also give clear indication as to the make to identify which manufacturers produce the vehicles with the highers ranges.
# what makes and models have the highest electric range on average? (remove items with 0 ranges)
model_range_df = evs[evs['Electric Range'] != 0].groupby(['Make','Model']).agg({'Electric Range':'mean'}).reset_index()
model_range_df = model_range_df.sort_values('Electric Range', ascending=False).reset_index(drop=True)
# taking the top 10
top_ranges_df = model_range_df.head(10)
# how many unique makes? will be used to define colors
range_makes = list(np.unique(top_ranges_df.Make))
# 6 makes were grabbed, creating a 6 element color list
make_colors = ['royalblue', 'forestgreen', 'darkred', 'goldenrod', 'mediumpurple', 'gray']
# create a dictionary for colors
range_colors = {}
for i in np.arange(len(range_makes)):
range_colors[range_makes[i]] = make_colors[i]
#----------------------------------------------------------------------------------------------------------------#
# plot this with horizontal bars
top_ranges_df = top_ranges_df.sort_values('Electric Range')
barfig = plt.figure(figsize=(18,10))
axb = barfig.add_subplot()
# list comprehension for bar colors and legend handles
bar_colors = [range_colors[make] for make in top_ranges_df.Make] # creates a list of colors for order of makes
handles = [mpatches.Patch(color=range_colors[make], label=make) for make in range_colors]
axb.barh(top_ranges_df.Model, top_ranges_df['Electric Range'], color=bar_colors)
axb.legend(loc='lower right', handles=handles, fontsize=11, title='Make', title_fontsize=15)
axb.set_xlabel('Electric Range (Miles)', fontsize=16)
axb.set_ylabel('Model Name', fontsize=16)
axb.set_xticks(list(range(0, round(max(top_ranges_df['Electric Range']))+50, 25)))
axb.set_xticklabels(list(range(0, round(max(top_ranges_df['Electric Range']))+50, 25)))
axb.set_title('Top 10 Average Electric Ranges\nby EV Make and Model', fontsize=20)
for i, avg_range in enumerate(top_ranges_df['Electric Range']):
axb.text(avg_range+1, i, 'Avg Range: \n' + str(round(avg_range)) + ' Miles', va='center')
plt.xlim(0, max(top_ranges_df['Electric Range'])+25)
plt.show()
From the above bar chart, it is clear that Tesla once again has dominated the scoreboard. While the top average electric range went to Porsche with the Macan, its range is fairly comparable to the runner-up Tesla Model Y, and another 4 of the remaining 10 top ranges belonged to other Tesla models. This reinforces the idea that Tesla has established itself as one of, if not the most, consistent EV manufacturers with registered vehicles in the US.
Electric vehicles in the data set are split in to two type categories - Battery Electric Vehicles (BEVs) and Plug-In Hybrid Electric Vehicles (PHEVs). PHEVs are a reasonable alternative to BEVs for those looking for alternative energy cars, as while they can run on alternative fuel, they also have the capability to use gasoline. This may make them more accessible for those looking for soft entry into the alternative energy game. This said, we will look to explore the rise of PHEV production among manufacturers that produce PHEVs and BEVs. To do so, we will generate a heat map that displays the rate of PHEV registration vs BEVs over the last 10 years for manufacturers that produced both during that time period.
# new df to count types for make and model
types_df = evs[['Model Year','Make','Electric Vehicle Type']].groupby(['Model Year','Make']).value_counts().reset_index()
# which car makers make both BEVs and PHEVs?
bevs = set(evs[evs['Electric Vehicle Type'] == 'Battery Electric Vehicle (BEV)'].Make.unique())
phevs = set(evs[evs['Electric Vehicle Type'] == 'Plug-in Hybrid Electric Vehicle (PHEV)'].Make.unique())
makes_both = list(bevs & phevs) # look for items in both sets, converting to a list
makes_both.sort()
# update types_df to be specific to these makes
types_df = types_df[types_df.Make.isin(makes_both)].reset_index(drop=True)
# to make things a bit easier going forward...
BEV = 'Battery Electric Vehicle (BEV)'
PHEV = 'Plug-in Hybrid Electric Vehicle (PHEV)'
# get BEV and PHEV rates for each manufacturer
rates_df = [] # starting off with a list to be converted
rate_yrs = list(range(2016,2026)) # limit to last 6 years
for year in list(rate_yrs):
for make in makes_both:
BEVs = types_df[(types_df['Electric Vehicle Type']==BEV)&(types_df.Make==make)&(types_df['Model Year']==year)]['count'].sum()
PHEVs = types_df[(types_df['Electric Vehicle Type']==PHEV)&(types_df.Make==make)&(types_df['Model Year']==year)]['count'].sum()
if (BEVs != 0) | (PHEVs != 0):
bev_rate = BEVs/(PHEVs+BEVs)*100
phev_rate = PHEVs/(PHEVs+BEVs)*100
row = {'Model Year':year, 'Make':make, 'BEVs':BEVs, 'PHEVs':PHEVs, 'BEV Rate':bev_rate, 'PHEV Rate':phev_rate}
rates_df.append(row)
rates_df = pd.DataFrame(rates_df) # convert list to data frame
# pivot data and remove NAs
rates_map = pd.pivot_table(rates_df, columns='Model Year', index='Make', values='PHEV Rate').dropna()
#----------------------------------------------------------------------------------------------------------------#
# plot the heat map
hmap = plt.figure(figsize=(10,10))
h_ax = sns.heatmap(rates_map, cmap='Wistia', annot=rates_map.applymap(lambda a: '{:.0f}%'.format(a)),
square=True, fmt='', cbar_kws={'shrink': 0.75})
plt.title('Heat Map:\n'+PHEV+' Rate by Make', fontsize=16, pad=20)
plt.ylabel('Vehicle Make', fontsize=14, labelpad=10)
plt.xlabel('Model Year', fontsize=14, labelpad=10)
plt.yticks(rotation=0)
cbar = h_ax.collections[0].colorbar
cbar.set_ticks(list(range(0,110,10)))
cbar.set_ticklabels(['{:.0f}%'.format(t) for t in list(range(0,110,10))])
cbar.set_label('Rate of PHEVs Produced (%)', rotation=270, fontsize=14, color='black', labelpad=20)
plt.show()
From the above heat map, there appears to be a large variance between manufacturers in terms of their PHEV production rates over time. Mitsubishi appears to be the only manufacturer on this list came into the game with a full focus on plug-in hybrids. Most car makers typically saw some sort of decrease from start to finish, with many beginning in 2015 with 100% (or close to it) PHEV production rates, and then slowly (and sometimes sharply) cutting production in favor of BEVs. This signals that most companies are aiming to focus their EV lineup on more battery power and less gasoline, which could be due to environmental considerations or increased public familiarity and comfort with solely battery-powered vehicles driving demand over time.
Using a pie chart, we will look to break down the environmental impact of different EV types through their clean energy eligibility. The data set contains information about Clean Alternative Fuel Vehicle (CAFV) Eligibility, with the categories eligible, non-eligible, and unknown (more research required). The chart will provide insight into the distribution of each vehicle type by CAFV data.
# new starting df for the pie chart
CAFV = 'Clean Alternative Fuel Vehicle (CAFV) Eligibility'
pie_df = evs[['Electric Vehicle Type',CAFV]]
# collect new data for different pie layers - EV types, CAFV status, and total count
cafv_pie = pie_df[CAFV].value_counts().reset_index()
type_pie = pie_df[[CAFV,'Electric Vehicle Type']].value_counts().reset_index()
total_count = pie_df.groupby(CAFV).count()['Electric Vehicle Type'].sum()
#----------------------------------------------------------------------------------------------------------------#
# begin plotting the pie chart
pie = plt.figure(figsize=(9,9))
p_ax = pie.add_subplot()
# function to remove pct if it's too small
def too_small(pct):
return f'{pct:.2f}%' if pct > 0.05 else ''
# defining colors
inner_colors = ['goldenrod', '#d62728', '#2ca02c'] # y, r, g
bev_colors = [mcolors.to_rgba(col, alpha=0.6) for col in inner_colors] # lightens the shade
phev_colors = [mcolors.to_rgba(col, alpha=0.8) for col in inner_colors]
outer_colors = [bev_colors[0], bev_colors[1], phev_colors[1], phev_colors[2], bev_colors[2]]
# plotting the outer pie
type_pie['count'].plot(kind='pie', radius=0.95, startangle=60, explode=[0.05]*len(type_pie),
colors=outer_colors, wedgeprops=dict(edgecolor='black'),
labels=type_pie['Electric Vehicle Type'].apply(lambda t: t[-5:-1].replace('(','')),
#type_pie.apply(
# lambda t: t['Electric Vehicle Type'][-5:-1].replace('(','') if too_small(t['count']/total_count) != '' else '',
# axis=1 # row-wise
#),
labeldistance=1.05, pctdistance=0.77, textprops={'fontsize':11}, autopct=too_small)
# creating a white "hole"
hole = plt.Circle((0,0), 0.5, fc='w')
plot_hole = plt.gcf() # get it?
plot_hole.gca().add_artist(hole)
# plotting the inner pie
cafv_pie['count'].plot(kind='pie', radius=0.45, pctdistance=0.6, autopct='%1.1f%%', wedgeprops=dict(edgecolor='black'),
explode=[0.02]*len(cafv_pie), startangle=60, textprops={'fontsize':11},
labels=['']*len(cafv_pie), colors=inner_colors)
p_ax.yaxis.set_visible(False)
plt.title(
'Clean Alternative Fuel Vehicle Eligibility for\n Battery Electric Vehicles (BEVs) & Plug-in Hybrid Electric Vehicles (PHEVs)'
)
unk, ne, e = [mpatches.Patch(label='Unknown - Research Required', color=inner_colors[0]),
mpatches.Patch(label='Not Eligible', color=inner_colors[1]),
mpatches.Patch(label='Eligible', color=inner_colors[2])]
p_ax.legend(handles=[unk, ne, e], loc='lower left', title='Eligibility')
pie.patch.set_facecolor('w')
plt.show()
Interestingly, from the above pie chart, it becomes clear that the vast majority of CAFV eligible vehicles are actually plug-in hybrids. BEVs occupied an essentially negiligible portion of eligible EVs, and made up the majority of those ineligible. Additionally, BEVs seem to be the only vehicle type for which eligibility is unknown. This could be due to the fact that BEVs appear to make up the vast majority of EVs in the data set, and that it is easier to assess eligibility for the smaller number of PHEVs. This does however indicate that PHEVs are, at this time, more likely to be certified as ‘clean’ than their battery powered counterparts.
Using a map, we will look at a breakdown of Clean Alternative Fuel Vehicle (CAFV) Eligibility by location. Using latitude and longitude data from the data set, we will plot these points in such a way that they can be filtered by CAFV eligibility. For the purposes of this visualization, the state of Washington (WA) is being removed, as it has an excessively large number of data points compared to other states in the data set, which slows down the visual and detracts from the rest of the data.
# create lat and long columns
for ind, coord in enumerate(evs['Vehicle Location']):
if type(coord) == type('x'):
x = coord.split('(')[1].replace(')','').split()
evs.loc[ind, 'Long'] = '%.5f' % float(x[0])
evs.loc[ind, 'Lat'] = '%.5f' % float(x[1])
# state totals
state_count = evs['State'].value_counts().reset_index()
state_count = state_count[state_count.State != 'WA'] # too many points, slows down plot and negates other data
state_count['count'] = state_count['count']
CAFV = 'Clean Alternative Fuel Vehicle (CAFV) Eligibility'
has_loc = evs[evs.Lat.notna()].reset_index(drop=True)
has_loc = has_loc[has_loc.State != 'WA'].reset_index(drop=True)
us_geo = path + 'us-state-boundaries.geojson'
us_center = [39.8333, -98.5833]
ev_map = folium.Map(location=us_center, zoom_start=4, tiles='cartodbpositron')
ch_map = folium.Choropleth(geo_data=us_geo, name='choropleth', data=state_count, columns=['State','count'],
key_on='feature.properties.stusab', fill_color='PuBu', fill_opacity=0.9, line_opacity=0.4,
nan_fill_color='lightgray', highlight=True, legendname='EVs Registered In-State').add_to(ev_map)
ch_map.geojson.add_child(
folium.features.GeoJsonTooltip(fields=['stusab'], aliases=['State:'],
labels=True, style=('background-color: black; color: white;'))
)
# create groups for filtering
e_group = folium.FeatureGroup(name='Eligible')
ne_group = folium.FeatureGroup(name='Not Eligible')
unk_group = folium.FeatureGroup(name='Eligibility Unknown')
e_group.add_to(ev_map)
ne_group.add_to(ev_map)
unk_group.add_to(ev_map)
# add points to the map, coloring based on status
for i, status in enumerate(has_loc[CAFV]):
if status == 'Clean Alternative Fuel Vehicle Eligible':
color = 'green'
group = e_group
elif status == 'Not eligible due to low battery range':
color = 'red'
group = ne_group
elif status == 'Eligibility unknown as battery range has not been researched':
color = 'goldenrod'
group = unk_group
folium.Circle(location=[has_loc.loc[i, 'Lat'],has_loc.loc[i, 'Long']],
color=color, fill=True, fill_color=color, radius=5000,fill_opacity=0.4,
tooltip='Make: {}<br>State: {}'.format(has_loc.loc[i, 'Make'], has_loc.loc[i,'State'])).add_to(group)
folium.LayerControl(collapsed=False).add_to(ev_map)
ev_map.save(path + 'ev_dots.html')
ev_map
From this map, it is apparent that EV registrations are relatively scattered throughout the different states of the US. It does appear that many registrations are centered around marked cities on the map, with what appear to be the largest quantities in Maryland, Virginia, California, and Texas. Unknown eligibility is reinforced here as the largest of all the categories, and eligible vehicles appear to be more widely spread than non-eligible vehicles. As you travel towards the center of the US, there are seemingly less instances of registered EVs, and it appears that the larger concentrations are on the west coast and in the northeast.