The purpose of this analysis is to delve into the characteristics of used cars listed for sale at a static date in 2022, utilizing a dataset encompassing various attributes of these vehicles. This dataset is complemented by geographical data, offering a broad-ranging view of the used car market. The primary dataset includes a range of vehicle-specific details such as price, engine type, mileage, and consumer ratings, among others. These elements are critical in understanding the dynamics that influence the used car market, shedding light on factors that dictate car valuation and consumer preferences.
By integrating this dataset with latitude and longitude information for ZIP codes, the analysis achieves a multidimensional perspective, enabling a geographical breakdown of car listings. Such an approach allows for a granular examination of market distribution and trends across different regions.
The analysis aims to unearth patterns and preferences that influence the sale and purchase of used cars, offering valuable insights into the automotive secondary market. Through this exploration, stakeholders can gain a deeper understanding of the factors that drive consumer choices and market dynamics in the realm of used vehicles.
The core attributes analyzed in the dataset include:
The reliability ratings, which are a key component of a few of the visuals, are based on a statistical model that estimates problem rates within the first five years of ownership.
This map shows the distribution of car listings across various zip codes in the United States. It plots points on a map corresponding to the number of cars listed for sale within each zip code. The size of each point (blue circle) on the map is proportional to the square root of the number of car listings in that location, scaled up by a factor of 2 for visibility.
import pandas as pd
import folium as f
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import plotly.express as px
import plotly.io as pio
import os
from branca.element import Figure
from matplotlib.ticker import FuncFormatter
from matplotlib.colors import LinearSegmentedColormap
os.environ['QT_QPA_PLATFORM_PLUGIN_PATH'] = 'C:/ProgramData/Anaconda3/Library/plugins/platforms'
path = 'C:/Users/GregoryNeeley/Desktop/Python Visualizations/'
# Load the dataset for car listings
df = pd.read_csv(path + 'cars_raw.csv')
# Drop all Tesla rows, as the data has proven erroneous and unreliable in review
df = df[df['Make'] != 'Tesla']
# Load the dataset with latitude and longitude data for zip codes
zip_lat_long_df = pd.read_csv(path + 'zip_lat_long.csv')
### Data Preparation
# Drop rows without price data and convert prices to Float
df = df[df['Price'] != 'Not Priced']
df['Price'] = df['Price'].str.replace('$', '', regex=True).str.replace(',', '', regex=True).astype(float)
# Convert the ZIP code columns to string
df['Zipcode'] = df['Zipcode'].astype(str)
zip_lat_long_df['ZIP'] = zip_lat_long_df['ZIP'].astype(str)
### Car Listings by Zip Code
## Data Preparation
# Copy main dataframe for transformations
location_df = df[['Zipcode']].copy()
# Count the number of cars in each ZIP code
car_counts = location_df['Zipcode'].value_counts().reset_index()
car_counts.columns = ['Zipcode', 'CarCount']
# Merge the car counts back with the original dataframe
location_df = pd.merge(location_df, car_counts, on='Zipcode', how='left')
# Append Zip Codes to main dataframe
location_df = pd.merge(location_df, zip_lat_long_df, left_on='Zipcode', right_on='ZIP', how='left')
## Plot
map_title = "Car Listings by Zip Code"
title_html = f'<h1 style="position:absolute;z-index:100000;left:40vw" >{map_title}</h1>'
# Initialize the map at a central location
m = f.Map(location=[37.0902, -95.7129], zoom_start=4)
m.get_root().html.add_child(f.Element(title_html))
# Add points to the map for each car listing, scaled by car count
for idx, row in location_df.iterrows():
if not pd.isnull(row['LAT']) and not pd.isnull(row['LNG']) and not pd.isnull(row['CarCount']):
radius = row['CarCount'] ** 0.5 * 2
f.CircleMarker(
location=[row['LAT'], row['LNG']],
radius=radius,
fill=True,
fill_color='blue',
fill_opacity=0.1,
color='blue',
opacity=0.05,
tooltip=f"Zip Code: {row['Zipcode']}, Cars: {row['CarCount']}"
).add_to(m)
m.save(path + 'Car_Listings_by_Zip_Code.html')