I am analyzing the most popular SUVs on the market (according to car rating website Edmunds.com) with the intention of finding the best one to purchase after graduation. I will be examining variables such as price, MPG, value, technology, owner ratings, and expert ratings to gain a comprehensive understanding of each vehicle.
Data Dictionary
Click here to access the specific data set I obtained for this analysis.
Variable Name
Description
car_name
year, make, and model of a car
car_price
an average of the given range of MSRP values, reflecting different trims
cost_to_drive
the estimated monthly fuel costs assuming the car is driven primarily in Ohio, driven 15,000 miles per year, and 55% of those miles are 45% are highway
owner_stars
the average star rating out of 5 given in reviews of real owners
num_owner_reviews
the number of owner reviews posted for the car
total_rating
the rating out of 10 that the experts at Edmunds assessed for the overall vehicle
mpg
miles per gallon
tech_rate
the rating out of 10 that experts at Edmunds assessed for the vehicle’s technology features
interior_rate
the rating out of 10 that experts at Edmunds assessed for the vehicle’s interior quality
value_rating
the rating out of 10 that experts at Edmunds assigned based on the quality and amount of features for the given vehicle price
Part 1
Summary Statistics
Transposed Summary Statistics
variable
cost_to_drive
owner_stars
num_reviews
mpg
car_price
avg
198.32558
3.9688889
37.40909
22.44186
77322.71
median
197.00000
4.0000000
37.50000
22.00000
61975.00
sd
60.06137
0.5107135
22.05313
4.62036
48551.63
min
116.00000
2.9000000
2.00000
13.00000
26200.00
max
368.00000
5.0000000
98.00000
30.00000
224800.00
Which brand has the most cars on the list?
Because the list consists of the top 3 SUVs in each sub-category (such as Small 3 Row and Midsize Luxury), the brand with the most cars on the list is likely a brand that consistently produces quality vehicles.
Analysis
Mercedes is the clear front runner with 7 cars mentioned on the list, compared to the next highest of Audi at 4 cars. I found it more useful to compare the number of cars from each brand that made the list rather than the mean or median rating because all of the cars on the list are considered the best in their respective sub-categories. Therefore, there is not much difference in the mean and median values of their total rating by Edmunds.
Is the overall rating from Edmunds experts aligned with the owner ratings?
While the experts st Edmunds likely have a lot of technical knowledge of what makes a “good” car, I myself am not a car enthusiast and likely don’t prioritize all the same features in a car that experts do. I feel that the opinions of common people who drive the cars regularly would more accurately predict how I might rate a car.
Analysis
It appears that at an aggregate level, while the median of the owner reviews and the expert reviews are virtually the same (8 vs 8.1), the owner reviews have vastly more variation. This makes sense as common consumers are likely to have more variation in their standards and preferences than experts. Additionally, there are more total owner reviews than expert reviews, meaning there is more opportunity for variation with owner reviews, but, as the central limit theorem suggests, a greater likelihood that the median of this larger sample size will more accurately reflect the true median.
Which car has the best value and how much does it cost?
Cars with the Highest Value Rate
car_name
value_rate
car_price
2025 Genesis GV70
8.5
52000
2024 Hyundai Palisade
8.5
45250
2025 Kia Telluride
8.5
44788
2025 Kia Sorento
8.5
39690
Median Car Price and Value Rate
median_car_price
median_value_rate
61975
7.5
Analysis:
The four cars tide for the highest value all have a value rating one full point above the median and prices well below the median. It’s also worth noting that two of the cars tied for best value are Kias, suggesting that this might be a more budget friendly alternative to Mercedes, which has the most total cars on the list.
Is there a correlation between MPG and price?
Analysis:
Yes, there is a negative correlation between price and MPG. This is likely due to the fact that performance vehicles (which tend to be more expensive) often prioritize power over fuel efficiency.
Is there a correlation between tech rating and price
Analysis:
Yes, it appears that the better the tech is in a car, the higher the price. However, it is worth noting that that cars with a tech rating of 9 have a wide range of prices, meaning that it is possible to get a car with high quality tech without breaking the bank.
Part 2
Two very comparable SUV brands in my price range are Lexus and Acura, so I will be comparing owner reviews sourced from Edmunds.com of two of their most popular SUV models: the Acura RDX and the Lexus NX.
Note: I will refer to a positivity value throughout the analysis. This is a calculated field based on the overall positivity or negativity of the words used to write a review. A negative positivity value indicates a negative review.
Exploration: What are the top words used to describe each car? Do they differ between the two cars?
Analysis
Note: the word “5” for the Acura was used 391 times (not 3) but was cut off
The top 10 words used to describe both cars were extremely similar, with “5” ,“stars”, and car being the top 3 words used for both. These common words highlight the general qualities of a car that the reviews value, such as technology, comfort, and reliability.
Is there a correlation between the month and the number of reviews published?
My hypothesis is that there would be more reviews around December and January, as these are generally the most popular times to purchase a car and it seems likely that people would review the car while it is relatively new to them.
Analysis
New model years typically being released in December, coupled with Christmas and promotions (such as the Lexus December to Remember Sales Event), makes December one of the most popular months to purchase a car. It would be reasonable to infer that people are more likely to review cars that they have recently bought, therefore causing a spike in car reviews around December, January, and February. While this pattern generally holds true, Lexus seems to have the most significant spike in reviews in December, while Acura’s largest spike happens in May. This suggests that there are likely other variables that impact when a consumer chooses to purchase a review outside of how recently they bought the car, or that Acura has particularly effective sales events in “off” months that Lexus does not have.
Is one car reviewed more positively than the other?
I examined the distribution of the overall positivity scores of each car to understand where the majority of reviews fell and how outliers impacted the scores.
Both cars have very similar median positivity values, with Acura having a median value of 5 and Lexus having a median value of 5.5. Outliers can likely be attributed to excessively long reviews, which would have a higher number of scoreable words. The similarity in positivity ratings is unsurprising, as both cars have very similar over-all ratings on the Edmunds website, with the Acura having a 4.1/5 star rating and the Lexus having a 4.2/5 star rating.
Is the positivity value an accurate reflection of the reviewer’s feelings about the car?
To answer this question, I compared the star value that the reviewer assigned the car with the calculated positivity value to understand if a higher star value correlates to a higher positivity score.
Analysis
There appears to be a general trend that the higher the star rating the reviewer gave, the higher the overall positivity was of the language they used in their reviews. This suggets that the results of the text analysis are generally reliable. However, it should be noted that the overlapping margins of error suggest that further ANOVA is needed to understand the statistical relationship between these two variables.
Source Code
---title: "SUV Analysis"author: "Emma Black"editor: visualtoc: true # Generates an automatic table of contents.format: # Options related to formatting. html: # Options related to HTML output. code-tools: TRUE # Allow the code tools option showing in the output. embed-resources: TRUE # Embeds all components into a single HTML file. execute: # Options related to the execution of code chunks. warning: FALSE # FALSE: Code chunk sarnings are hidden by default. message: FALSE # FALSE: Code chunk messages are hidden by default. echo: FALSE # TRUE: Show all code in the output.---## IntroductionI am analyzing the most popular SUVs on the market (according to car rating website [Edmunds.com](https://www.edmunds.com/)) with the intention of finding the best one to purchase after graduation. I will be examining variables such as price, MPG, value, technology, owner ratings, and expert ratings to gain a comprehensive understanding of each vehicle.## Data Dictionary[Click here](https://myxavier-my.sharepoint.com/:x:/g/personal/blacke6_xavier_edu/Eb16DCSE_dhEgA12H1R6BkYB24IpNnfeLL1M284T286NPQ?download=1) to access the specific data set I obtained for this analysis.| Variable Name | Description ||-------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|| car_name | year, make, and model of a car || car_price | an average of the given range of MSRP values, reflecting different trims || cost_to_drive | the estimated monthly fuel costs assuming the car is driven primarily in Ohio, driven 15,000 miles per year, and 55% of those miles are 45% are highway || owner_stars | the average star rating out of 5 given in reviews of real owners || num_owner_reviews | the number of owner reviews posted for the car || total_rating | the rating out of 10 that the experts at Edmunds assessed for the overall vehicle || mpg | miles per gallon || tech_rate | the rating out of 10 that experts at Edmunds assessed for the vehicle's technology features || interior_rate | the rating out of 10 that experts at Edmunds assessed for the vehicle's interior quality || value_rating | the rating out of 10 that experts at Edmunds assigned based on the quality and amount of features for the given vehicle price |```{r}#| label: load libraries#| include: FALSE#| message: falselibrary(tidyverse) # The tidyverse collection of packageslibrary(httr) # Useful for web authenticationlibrary(rvest) # Useful tools for working with HTML and XMLlibrary(lubridate) # Working with dateslibrary(magrittr) # Extracting items from list objects using piping grammarlibrary(chromote) #allows for live view of web pageslibrary(ggplot2)library(tidyr)library(jsonlite) # Converting json data into data frameslibrary(tidytext) # Tidy text mininglibrary(textdata) # Lexicons of sentiment datalibrary(widyr) # Easily calculating pairwise countslibrary(stringr) library(knitr)``````{r}#| label: load the data#| include: FALSE#| message: falseall_suvs <-read_csv("https://myxavier-my.sharepoint.com/:x:/g/personal/blacke6_xavier_edu/EWMMLYx6aLhDoxjkHGoLrDsBCy6tduSMoodm9Qaos2MF8A?download=1")# Clean the dataall_suvs <- all_suvs %>%select(-rank_in_sub_cat) %>%#with only cars pulled from each sub category, this information was not helpful (the only values were 1, 2, and 3)mutate(cost_to_drive = cost_to_drive %>%str_replace_all("\\$", "") %>%str_replace_all("/mo", "") %>%str_trim() %>%as.numeric(),owner_stars = owner_stars %>%str_remove_all("out of 5 stars") %>%as.numeric(),num_owner_reviews = num_owner_reviews %>%str_remove_all("Owner Reviews") %>%as.numeric(),mpg = mpg %>%str_replace_all("[^0-9.]", "") %>%na_if("") %>%as.numeric(),car_price = car_price %>%str_replace_all("\\$", "") %>%str_replace_all(",", "") %>%str_trim() %>%str_replace_all(" - ", "-") %>%map_chr(~ifelse(str_detect(., "-"), mean(as.numeric(str_split(., "-")[[1]])), .)) %>%as.numeric() %>%round())```## Part 1### Summary Statistics```{r}#| label: summary statistics# Create a summary statistics data frame with variables as rowssummary_stats <- all_suvs %>%summarise(avg_cost_to_drive =mean(cost_to_drive, na.rm =TRUE),median_cost_to_drive =median(cost_to_drive, na.rm =TRUE),sd_cost_to_drive =sd(cost_to_drive, na.rm =TRUE),min_cost_to_drive =min(cost_to_drive, na.rm =TRUE),max_cost_to_drive =max(cost_to_drive, na.rm =TRUE),avg_owner_stars =mean(owner_stars, na.rm =TRUE),median_owner_stars =median(owner_stars, na.rm =TRUE),sd_owner_stars =sd(owner_stars, na.rm =TRUE),min_owner_stars =min(owner_stars, na.rm =TRUE),max_owner_stars =max(owner_stars, na.rm =TRUE),avg_num_reviews =mean(num_owner_reviews, na.rm =TRUE),median_num_reviews =median(num_owner_reviews, na.rm =TRUE),sd_num_reviews =sd(num_owner_reviews, na.rm =TRUE),min_num_reviews =min(num_owner_reviews, na.rm =TRUE),max_num_reviews =max(num_owner_reviews, na.rm =TRUE),avg_mpg =mean(mpg, na.rm =TRUE),median_mpg =median(mpg, na.rm =TRUE),sd_mpg =sd(mpg, na.rm =TRUE),min_mpg =min(mpg, na.rm =TRUE),max_mpg =max(mpg, na.rm =TRUE),avg_car_price =mean(car_price, na.rm =TRUE),median_car_price =median(car_price, na.rm =TRUE),sd_car_price =sd(car_price, na.rm =TRUE),min_car_price =min(car_price, na.rm =TRUE),max_car_price =max(car_price, na.rm =TRUE) )summary_stats_transposed <- summary_stats %>%pivot_longer(cols =everything(),names_to =c("variable", "statistic"),names_pattern ="^(.*?)_(.*)$") %>%pivot_wider(names_from = statistic, values_from = value)# Display as a neat table using kablekable(summary_stats_transposed, format ="html", caption ="Transposed Summary Statistics")```### Which brand has the most cars on the list?Because the list consists of the top 3 SUVs in each sub-category (such as Small 3 Row and Midsize Luxury), the brand with the most cars on the list is likely a brand that consistently produces quality vehicles.```{r}#| label: most cars on listall_suvs <- all_suvs %>%mutate(brand =case_when(str_detect(car_name, "Cadillac") ~"Cadillac",str_detect(car_name, "BMW") ~"BMW",str_detect(car_name, "Mercedes") ~"Mercedes",str_detect(car_name, "Audi") ~"Audi",str_detect(car_name, "Porsche") ~"Porsche",str_detect(car_name, "Tesla") ~"Tesla",str_detect(car_name, "Rover") ~"Land Rover",str_detect(car_name, "Bentley") ~"Bentley",str_detect(car_name, "Lincoln") ~"Lincoln",str_detect(car_name, "Acura") ~"Acura",str_detect(car_name, "Lexus") ~"Lexus",str_detect(car_name, "Genesis") ~"Genesis",str_detect(car_name, "Ford") ~"Ford",str_detect(car_name, "Chevy") ~"Chevy",str_detect(car_name, "GMC") ~"GMC",str_detect(car_name, "Toyota") ~"Toyota",str_detect(car_name, "Hyundai") ~"Hyundai",str_detect(car_name, "Kia") ~"Kia",str_detect(car_name, "Jeep") ~"Jeep",str_detect(car_name, "Honda") ~"Honda",str_detect(car_name, "Mazda") ~"Mazda",str_detect(car_name, "Buick") ~"Buick" ))all_suvs %>%count(brand, name ="count") %>%# Count occurrences of each brandggplot(aes(x =reorder(brand, -count), y = count)) +# Order brands by countgeom_bar(stat ="identity", fill ="steelblue") +# Create bar chartlabs(title ="Number of Cars by Brand",x ="Brand",y ="Number of Cars" ) +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1)) # Rotate x-axis labels```#### AnalysisMercedes is the clear front runner with 7 cars mentioned on the list, compared to the next highest of Audi at 4 cars. I found it more useful to compare the number of cars from each brand that made the list rather than the mean or median rating because all of the cars on the list are considered the best in their respective sub-categories. Therefore, there is not much difference in the mean and median values of their total rating by Edmunds.### Is the overall rating from Edmunds experts aligned with the owner ratings?While the experts st Edmunds likely have a lot of technical knowledge of what makes a "good" car, I myself am not a car enthusiast and likely don't prioritize all the same features in a car that experts do. I feel that the opinions of common people who drive the cars regularly would more accurately predict how I might rate a car.```{r}#| label: owner vs edumnds ratingall_suvs <- all_suvs %>%mutate(owner_stars_2x = owner_stars *2)# Create the box plotall_suvs %>%mutate(metric ="Owner Stars 2x") %>%# Add a category for owner_stars_2xselect(owner_stars_2x, total_rating) %>%pivot_longer(cols =everything(), names_to ="metric", values_to ="value") %>%ggplot(aes(x = metric, y = value, fill = metric)) +geom_boxplot() +stat_summary(fun = median, geom ="text", aes(label =round(..y.., 1)), color ="black", vjust =-0.5, size =3.5) +labs(title ="Owner vs Expert Rating",x ="Review Type",y ="Value" ) +theme_minimal() +theme(legend.position ="none")```#### AnalysisIt appears that at an aggregate level, while the median of the owner reviews and the expert reviews are virtually the same (8 vs 8.1), the owner reviews have vastly more variation. This makes sense as common consumers are likely to have more variation in their standards and preferences than experts. Additionally, there are more total owner reviews than expert reviews, meaning there is more opportunity for variation with owner reviews, but, as the central limit theorem suggests, a greater likelihood that the median of this larger sample size will more accurately reflect the true median.### Which car has the best value and how much does it cost?```{r}#| label: best value# Find the highest value ratehighest_value_rate <-max(all_suvs$value_rate, na.rm =TRUE)# Filter the data for cars with the highest value ratehighest_value_cars <- all_suvs %>%filter(value_rate == highest_value_rate) %>%select(car_name, value_rate, car_price)# Display the highest value cars as a neat table using kablekable(highest_value_cars, format ="html", caption ="Cars with the Highest Value Rate")``````{r}# Calculate median car price and value ratemedian_values <-data.frame(median_car_price =median(all_suvs$car_price, na.rm =TRUE),median_value_rate =median(all_suvs$value_rate, na.rm =TRUE))# Display the median values as a neat table using kablekable(median_values, format ="html", caption ="Median Car Price and Value Rate")```#### Analysis:The four cars tide for the highest value all have a value rating one full point above the median and prices well below the median. It's also worth noting that two of the cars tied for best value are Kias, suggesting that this might be a more budget friendly alternative to Mercedes, which has the most total cars on the list.### Is there a correlation between MPG and price?```{r}#| title: mpg vs price# Create a scatter plot comparing mpg and car priceall_suvs %>%ggplot(aes(x = mpg, y = car_price)) +geom_point(color ="steelblue") +geom_smooth(method ="lm", color ="red", se =FALSE) +labs(title ="Scatter Plot of MPG vs Car Price",x ="Miles Per Gallon (MPG)",y ="Car Price" ) +theme_minimal()```#### Analysis:Yes, there is a negative correlation between price and MPG. This is likely due to the fact that performance vehicles (which tend to be more expensive) often prioritize power over fuel efficiency.### Is there a correlation between tech rating and price```{r}# Create a box plot with tech_rate on the x-axis and car_price on the y-axis, grouped by tech_rateall_suvs %>%ggplot(aes(x =factor(tech_rate), y = car_price, fill =factor(tech_rate))) +geom_boxplot() +labs(title ="Box Plot of Car Price by Tech Rate",x ="Tech Rate",y ="Car Price" ) +theme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1)) # Rotate x-axis labels for readability```#### Analysis:Yes, it appears that the better the tech is in a car, the higher the price. However, it is worth noting that that cars with a tech rating of 9 have a wide range of prices, meaning that it is possible to get a car with high quality tech without breaking the bank.## Part 2Two very comparable SUV brands in my price range are Lexus and Acura, so I will be comparing owner reviews sourced from Edmunds.com of two of their most popular SUV models: the Acura RDX and the Lexus NX.Note: I will refer to a positivity value throughout the analysis. This is a calculated field based on the overall positivity or negativity of the words used to write a review. A negative positivity value indicates a negative review.[Click here](https://myxavier-my.sharepoint.com/:x:/g/personal/blacke6_xavier_edu/Ef-R7Rp2nRVCvd1Ae8cXPgwB7zD6WlmtXA5w1QnWXFLdXw?download=1) to access the exact data set I used.```{r}#| label: load csvlex_acura_reviews <-read.csv("https://myxavier-my.sharepoint.com/:x:/g/personal/blacke6_xavier_edu/Ef-R7Rp2nRVCvd1Ae8cXPgwB7zD6WlmtXA5w1QnWXFLdXw?download=1")``````{r}#| label: prepare the data#tokenize each review by wordtidy_reviews <- lex_acura_reviews %>%unnest_tokens(word, review) %>%anti_join(stop_words) #join the review data to the NRC lexicon nrc <-get_sentiments("nrc")suv_sentiments <- tidy_reviews %>%inner_join(nrc, by ="word", relationship ="many-to-many") %>%group_by(sentiment, user_name) %>%summarize(n =n()) %>%pivot_wider(names_from = sentiment, values_from = n, values_fill =0) %>%mutate(positivity = positive-negative)#group sentiment analysis at the review level reviews_w_sentiments <- lex_acura_reviews %>%inner_join(suv_sentiments, by ="user_name") %>%mutate(date =ymd(date))``````{r}# Filter rows using str_detecttotal_rating_df <- all_suvs %>%filter(str_detect(car_name, "Lexus NX|Acura RDX"))# Print as a neat tablekable(total_rating_df, format ="html", caption ="Total Ratings for Lexus NX and Acura RDX")```### Exploration: What are the top words used to describe each car? Do they differ between the two cars?```{r}top_words <- tidy_reviews %>%count(car_model, word, sort =TRUE) %>%# Count the frequency of each word per car modelgroup_by(car_model) %>%top_n(10, n) %>%# Get the top 10 most frequent words for each car modelungroup() ggplot(top_words, aes(x =reorder(word, n), y = n, fill = car_model)) +geom_bar(stat ="identity") +coord_flip() +# Flip the coordinates to make it horizontalfacet_wrap(~ car_model, scales ="free_y") +# Separate the plots by car model (Lexus and Acura)labs(title ="Top 10 Most Frequently Used Words in Lexus and Acura Reviews",x ="Word",y ="Frequency",fill ="Car Model" ) +geom_text(aes(label = n), hjust =-0.2, color ="black", size =3) +# Add labels to the barstheme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1), # Rotate x-axis labels for readabilitylegend.position ="top" )```#### Analysis Note: the word "5" for the Acura was used 391 times (not 3) but was cut offThe top 10 words used to describe both cars were extremely similar, with "5" ,"stars", and car being the top 3 words used for both. These common words highlight the general qualities of a car that the reviews value, such as technology, comfort, and reliability.### Is there a correlation between the month and the number of reviews published?My hypothesis is that there would be more reviews around December and January, as these are generally the most popular times to purchase a car and it seems likely that people would review the car while it is relatively new to them.```{r}#| label: question 1# Add a column for the monthreviews_w_sentiments <- reviews_w_sentiments %>%mutate(month =floor_date(date, unit ="month")) # Extract the month from the date# Group by month and car_model, then count the number of reviewsmonthly_reviews_count <- reviews_w_sentiments %>%group_by(month, car_model) %>%summarize(review_count =n(), .groups ="drop")# Create the bar graphggplot(monthly_reviews_count, aes(x = month, y = review_count, fill = car_model)) +geom_bar(stat ="identity", position ="dodge") +# Use dodge to separate bars by car_modellabs(title ="Number of Reviews by Month and Car Model",x ="Month",y ="Number of Reviews",fill ="Car Model" ) +scale_x_date(date_labels ="%b %Y", date_breaks ="1 month") +# Format x-axis for monthstheme_minimal() +theme(axis.text.x =element_text(angle =45, hjust =1)) # Rotate x-axis labels for readability```#### AnalysisNew model years typically being released in December, coupled with Christmas and promotions (such as the Lexus December to Remember Sales Event), makes December one of the most popular months to purchase a car. It would be reasonable to infer that people are more likely to review cars that they have recently bought, therefore causing a spike in car reviews around December, January, and February. While this pattern generally holds true, Lexus seems to have the most significant spike in reviews in December, while Acura's largest spike happens in May. This suggests that there are likely other variables that impact when a consumer chooses to purchase a review outside of how recently they bought the car, or that Acura has particularly effective sales events in "off" months that Lexus does not have.### Is one car reviewed more positively than the other?I examined the distribution of the overall positivity scores of each car to understand where the majority of reviews fell and how outliers impacted the scores.```{r}ggplot(reviews_w_sentiments, aes(x = positivity, fill = car_model)) +geom_histogram(binwidth =1, position ="dodge", alpha =0.7, color ="black") +facet_wrap(~ car_model) +# Facet by car_model to create separate plotsgeom_vline(data = reviews_w_sentiments %>%group_by(car_model) %>%summarize(median_positivity =median(positivity, na.rm =TRUE)),aes(xintercept = median_positivity, color = car_model),linetype ="dashed", size =1) +# Add dashed lines for medianlabs(title ="Distribution of Positivity Scores for Lexus and Acura",x ="Positivity Score",y ="Count",fill ="Car Model",color ="Car Model" ) +theme_minimal() +theme(legend.position ="top", # Position the legend on topaxis.text.x =element_text(angle =45, hjust =1) # Rotate x-axis labels for better readability )``````{r}median_positivity_by_model <- reviews_w_sentiments %>%group_by(car_model) %>%summarize(median_positivity =median(positivity, na.rm =TRUE))# Print the resultprint(median_positivity_by_model)```#### AnalysisBoth cars have very similar median positivity values, with Acura having a median value of 5 and Lexus having a median value of 5.5. Outliers can likely be attributed to excessively long reviews, which would have a higher number of scoreable words. The similarity in positivity ratings is unsurprising, as both cars have very similar over-all ratings on the Edmunds website, with the Acura having a 4.1/5 star rating and the Lexus having a 4.2/5 star rating.### Is the positivity value an accurate reflection of the reviewer's feelings about the car?To answer this question, I compared the star value that the reviewer assigned the car with the calculated positivity value to understand if a higher star value correlates to a higher positivity score.```{r}ggplot(reviews_w_sentiments, aes(x =as.factor(stars), y = positivity, fill = car_model)) +geom_boxplot(alpha =0.7, outlier.shape =16, outlier.colour ="red") +# Box plot with outliers highlighted in redlabs(title ="Positivity Score Distribution by Star Rating",x ="Star Rating",y ="Positivity Score",fill ="Car Model" ) +theme_minimal() +theme(legend.position ="top", # Position the legend on topaxis.text.x =element_text(angle =45, hjust =1) # Rotate x-axis labels for readability )```#### AnalysisThere appears to be a general trend that the higher the star rating the reviewer gave, the higher the overall positivity was of the language they used in their reviews. This suggets that the results of the text analysis are generally reliable. However, it should be noted that the overlapping margins of error suggest that further ANOVA is needed to understand the statistical relationship between these two variables.