I am analyzing the most popular SUVs on the market (according to car rating website Edmunds.com) with the intention of finding the best one to purchase after graduation. I will be examining variables such as price, MPG, value, technology, and overall rating to gain a comprehensive understanding of each vehicle. There were more variables available to extract on the website, but these were the qualities I prioritized most.
Data Preparation
Question 1: Which brand has the most cars on the list?
Because the list consists of the top 3 SUVs in each sub-category (such as Small 3 Row and Midsize Luxury), the brand with the most cars on the list is likely a brand that consistently produces quality vehicles.
Analysis
Mercedes is the clear front runner with 7 cars mentioned on the list, compared to the next highest of Audi at 4 cars. I found it more useful to compare the number of cars from each brand that made the list rather than the mean or median rating because all of the cars on the list are considered the best in their respective sub-categories. Therefore, there is not much difference in the mean and median values of their total rating by Edmunds.
Question 2: Is the overall rating from Edmunds experts aligned with the owner ratings?
While the experts st Edmunds likely have a lot of technical knowledge of what makes a “good” car, I myself am not a car enthusiast and likely don’t prioritize all the same features in a car that experts do. I feel that the opinions of common people who drive the cars regularly would more accurately predict how I might rate a car.
Analysis
It appears that at an aggregate level, while the median of the owner reviews and the expert reviews are virtually the same (8 vs 8.1), the owner reviews have vastly more variation. This makes sense as common consumers are likely to have more variation in their standards and preferences than experts. Additionally, there are more total owner reviews than expert reviews, meaning there is more opportunity for variation with owner reviews, but, as the central limit theorem suggests, a greater likelihood that the median of this larger sample size will more accurately reflect the true median.
Question 3: Which car has the best value and how much does it cost?
The four cars tide for the highest value all have a value rating one full point above the median and prices well below the median. It’s also worth noting that two of the cars tied for best value are Kias, suggesting that this might be a more budget friendly alternative to Mercedes, which has the most total cars on the list.
Question 4: Is there a correlation between MPG and price?
Analysis:
Yes, there is a negative correlation between price and MPG. This is likely due to the fact that performance vehicles (which tend to be more expensive) often prioritize power over fuel efficiency.
Question 5: Is there a correlation between tech rating and price
Analysis:
Yes, it appears that the better the tech is in a car, the higher the price. However, it is worth noting that that cars with a tech rating of 9 have a wide range of prices, meaning that it is possible to get a car with high quality tech without breaking the bank.
Source Code
---title: "SUV Analysis"author: "Emma Black"editor: visualtoc: true # Generates an automatic table of contents.format: # Options related to formatting. html: # Options related to HTML output. code-tools: TRUE # Allow the code tools option showing in the output. embed-resources: TRUE # Embeds all components into a single HTML file. execute: # Options related to the execution of code chunks. warning: FALSE # FALSE: Code chunk sarnings are hidden by default. message: FALSE # FALSE: Code chunk messages are hidden by default. echo: FALSE # TRUE: Show all code in the output.---## IntroductionI am analyzing the most popular SUVs on the market (according to car rating website Edmunds.com) with the intention of finding the best one to purchase after graduation. I will be examining variables such as price, MPG, value, technology, and overall rating to gain a comprehensive understanding of each vehicle. There were more variables available to extract on the website, but these were the qualities I prioritized most.```{r}#| label: load libraries#| include: FALSE#| message: falselibrary(tidyverse) # The tidyverse collection of packageslibrary(httr) # Useful for web authenticationlibrary(rvest) # Useful tools for working with HTML and XMLlibrary(lubridate) # Working with dateslibrary(magrittr) library(chromote) #allows for live view of web pageslibrary(ggplot2)```## Data Preparation```{r}#| label: load the data#| include: FALSE#| message: falseall_suvs <-read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/blacke6_xavier_edu/Eb16DCSE_dhEgA12H1R6BkYB24IpNnfeLL1M284T286NPQ?download=1")# Clean the dataall_suvs <- all_suvs %>%select(-rank_in_sub_cat) %>%#with only cars pulled from each sub category, this information was not helpful (the only values were 1, 2, and 3)mutate(cost_to_drive = cost_to_drive %>%str_replace_all("\\$", "") %>%str_replace_all("/mo", "") %>%str_trim() %>%as.numeric(),owner_stars = owner_stars %>%str_remove_all("out of 5 stars") %>%as.numeric(),num_owner_reviews = num_owner_reviews %>%str_remove_all("Owner Reviews") %>%as.numeric(),mpg = mpg %>%str_replace_all("[^0-9.]", "") %>%na_if("") %>%as.numeric(),car_price = car_price %>%str_replace_all("\\$", "") %>%str_replace_all(",", "") %>%str_trim() %>%str_replace_all(" - ", "-") %>%map_chr(~ifelse(str_detect(., "-"), mean(as.numeric(str_split(., "-")[[1]])), .)) %>%as.numeric() %>%round())```## Question 1: Which brand has the most cars on the list?Because the list consists of the top 3 SUVs in each sub-category (such as Small 3 Row and Midsize Luxury), the brand with the most cars on the list is likely a brand that consistently produces quality vehicles.```{r}#| label: most cars on listall_suvs <- all_suvs %>%mutate(brand =case_when(str_detect(car_name, "Cadillac") ~"Cadillac",str_detect(car_name, "BMW") ~"BMW",str_detect(car_name, "Mercedes") ~"Mercedes",str_detect(car_name, "Audi") ~"Audi",str_detect(car_name, "Porsche") ~"Porsche",str_detect(car_name, "Tesla") ~"Tesla",str_detect(car_name, "Rover") ~"Land Rover",str_detect(car_name, "Bentley") ~"Bentley",str_detect(car_name, "Lincoln") ~"Lincoln",str_detect(car_name, "Acura") ~"Acura",str_detect(car_name, "Lexus") ~"Lexus",str_detect(car_name, "Genesis") ~"Genesis",str_detect(car_name, "Ford") ~"Ford",str_detect(car_name, "Chevy") ~"Chevy",str_detect(car_name, "GMC") ~"GMC",str_detect(car_name, "Toyota") ~"Toyota",str_detect(car_name, "Hyundai") ~"Hyundai",str_detect(car_name, "Kia") ~"Kia",str_detect(car_name, "Jeep") ~"Jeep",str_detect(car_name, "Honda") ~"Honda",str_detect(car_name, "Mazda") ~"Mazda",str_detect(car_name, "Buick") ~"Buick" ))all_suvs %>%count(brand, name ="count") %>%# Count occurrences of each brandggplot(aes(x =reorder(brand, -count), y = count)) +# Order brands by countgeom_bar(stat ="identity", fill ="steelblue") +# Create bar chartlabs(title ="Number of Cars by Brand",x ="Brand",y ="Number of Cars" ) +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1)) # Rotate x-axis labels```### AnalysisMercedes is the clear front runner with 7 cars mentioned on the list, compared to the next highest of Audi at 4 cars. I found it more useful to compare the number of cars from each brand that made the list rather than the mean or median rating because all of the cars on the list are considered the best in their respective sub-categories. Therefore, there is not much difference in the mean and median values of their total rating by Edmunds.## Question 2: Is the overall rating from Edmunds experts aligned with the owner ratings?While the experts st Edmunds likely have a lot of technical knowledge of what makes a "good" car, I myself am not a car enthusiast and likely don't prioritize all the same features in a car that experts do. I feel that the opinions of common people who drive the cars regularly would more accurately predict how I might rate a car.```{r}#| label: owner vs edumnds ratingall_suvs <- all_suvs %>%mutate(owner_stars_2x = owner_stars *2)# Create the box plotall_suvs %>%mutate(metric ="Owner Stars 2x") %>%# Add a category for owner_stars_2xselect(owner_stars_2x, total_rating) %>%pivot_longer(cols =everything(), names_to ="metric", values_to ="value") %>%ggplot(aes(x = metric, y = value, fill = metric)) +geom_boxplot() +stat_summary(fun = median, geom ="text", aes(label =round(..y.., 1)), color ="black", vjust =-0.5, size =3.5) +labs(title ="Owner vs Expert Rating",x ="Review Type",y ="Value" ) +theme_minimal() +theme(legend.position ="none")```### AnalysisIt appears that at an aggregate level, while the median of the owner reviews and the expert reviews are virtually the same (8 vs 8.1), the owner reviews have vastly more variation. This makes sense as common consumers are likely to have more variation in their standards and preferences than experts. Additionally, there are more total owner reviews than expert reviews, meaning there is more opportunity for variation with owner reviews, but, as the central limit theorem suggests, a greater likelihood that the median of this larger sample size will more accurately reflect the true median.## Question 3: Which car has the best value and how much does it cost?```{r}#| label: best value# Find the highest value ratehighest_value_rate <-max(all_suvs$value_rate, na.rm =TRUE)# Filter the data for cars with the highest value ratehighest_value_cars <- all_suvs %>%filter(value_rate == highest_value_rate) %>%select(car_name, value_rate, car_price)# Calculate median car price and value ratemedian_values <-data.frame(median_car_price =median(all_suvs$car_price, na.rm =TRUE),median_value_rate =median(all_suvs$value_rate, na.rm =TRUE))# Display the data frame highest_value_carsmedian_values```### Analysis:The four cars tide for the highest value all have a value rating one full point above the median and prices well below the median. It's also worth noting that two of the cars tied for best value are Kias, suggesting that this might be a more budget friendly alternative to Mercedes, which has the most total cars on the list.## Question 4: Is there a correlation between MPG and price? ```{r}#| title: mpg vs price# Create a scatter plot comparing mpg and car priceall_suvs %>%ggplot(aes(x = mpg, y = car_price)) +geom_point(color ="steelblue") +geom_smooth(method ="lm", color ="red", se =FALSE) +labs(title ="Scatter Plot of MPG vs Car Price",x ="Miles Per Gallon (MPG)",y ="Car Price" ) +theme_minimal()```### Analysis:Yes, there is a negative correlation between price and MPG. This is likely due to the fact that performance vehicles (which tend to be more expensive) often prioritize power over fuel efficiency.## Question 5: Is there a correlation between tech rating and price```{r}# Create a box plot with tech_rate on the x-axis and car_price on the y-axis, grouped by tech_rateall_suvs %>%ggplot(aes(x =factor(tech_rate), y = car_price, fill =factor(tech_rate))) +geom_boxplot() +labs(title ="Box Plot of Car Price by Tech Rate",x ="Tech Rate",y ="Car Price" ) +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1)) # Rotate x-axis labels for readability```### Analysis:Yes, it appears that the better the tech is in a car, the higher the price. However, it is worth noting that that cars with a tech rating of 9 have a wide range of prices, meaning that it is possible to get a car with high quality tech without breaking the bank.