Introduction

Using R and Python, I extract usernames, ratings, and reviews from Trustpilot Store reviews. This analysis reveals key customer insights, tracks rating trends, and identifies common feedback themes.

Key insights:

  • How ratings have changed over time

  • Common words and recurring themes in user feedback

  • Sentiment trends: improving, declining, or stable

  • Top complaints and what users love most

By leveraging data-driven insights, this report helps refine app strategy and maintain a competitive edge.


Insights about CARFAX Italy

Declining user satisfaction

Customer ratings have dropped from 1.9 (2022) to 1.3 (2023-2024), reflecting worsening experiences with report accuracy, pricing, and customer support. However, some users still find value in the reports, particularly when they confirm expected vehicle conditions.

Mixed experiences with report accuracy

Many customers complain about inaccurate reports, citing false mileage data, missing service history, and incorrect accident records. These errors have led to financial loss and distrust in the service. However, some users find CARFAX reports helpful, especially when the provided information matches reality and prevents them from making bad purchases.

Perceived lack of value for money

A large number of reviews express dissatisfaction with the cost of CARFAX reports versus the information provided. Users believe they can find similar or better information for free elsewhere.

Many describe the purchase as a waste of money due to the lack of additional useful insights.

Mileage discrepancies leading to customer distrust

Users report major discrepancies in recorded mileage, which can negatively impact vehicle sales.
Some reviews mention inflated mileage numbers, while others complain about missing data.

These errors damage trust in CARFAX reports and create problems for vehicle buyers and sellers.


Conclusion

CARFAX Italy’s biggest challenge is data accuracy—some users find reports extremely useful, while others face major discrepancies that undermine trust in the service. The pricing is widely questioned, as users expect high reliability in exchange for the cost. The mixed feedback suggests CARFAX can be valuable, but only when the provided information is accurate. Addressing report inconsistencies could help rebuild confidence in the service.

Limitations

Bias in review submission

People who have negative experiences are more likely to leave reviews compared to satisfied customers, which may skew the overall sentiment.

Lack of verification

There is no control over who leaves reviews on Trustpilot, meaning some feedback may come from competitors, one-time users, or misinformed customers.

Contextual differences

Some negative experiences may be due to misunderstandings of how CARFAX reports work, rather than actual inaccuracies.


GPT Bot

As a part of this analysis, I also developed a GPT Bot that is provided with the dataset used for this analysis. Therefore, you can simply use to explore the more information on your own:

https://chatgpt.com/g/g-67bf512ac3908191acdd5f3a08257b86-carfax-user-reviews-analyzer

IMPORTANT: You will need to specify the that you would like to analyze Italian reviews from the Trustpilot.


Scrapping data

The first step is to scrape Trustpilot reviews and structure the data for text analysis.

# Function to scrape a single Trustpilot review page
scrape_trustpilot <- function(page_num) {
  # Construct the URL dynamically for the given page number
  url <- paste0("https://www.trustpilot.com/review/www.carfax.com?languages=it&page=", page_num)
  webpage <- read_html(url)  # Read the webpage once

  # Extract reviewer names
  reviewer_names <- webpage %>%
    html_nodes("span[data-consumer-name-typography='true']") %>%
    html_text()

  # Extract number of reviews per user
  reviewer_counts <- webpage %>%
    html_nodes(".styles_consumerExtraDetails__NFM0b span[data-consumer-reviews-count-typography='true']") %>%
    html_text() %>%
    str_extract("\\d+") %>% 
    as.numeric()

  # Extract reviewer countries
  reviewer_countries <- webpage %>%
    html_nodes(".styles_consumerExtraDetails__NFM0b span:last-child") %>%
    html_text()

  # Extract ratings
  review_ratings <- webpage %>%
    html_nodes("div[data-service-review-rating]") %>%
    html_attr("data-service-review-rating") %>%
    as.numeric()

  # Extract review dates
  review_dates <- webpage %>%
    html_nodes("time[data-service-review-date-time-ago]") %>%
    html_attr("datetime") %>%
    as.character()

  # Extract review titles
  review_titles <- webpage %>%
    html_nodes("h2[data-service-review-title-typography='true']") %>%
    html_text()

  # Extract review content
  review_contents <- webpage %>%
    html_nodes("p[data-service-review-text-typography='true']") %>%
    html_text()

  # Extract experience dates
  experience_dates <- webpage %>%
    html_nodes("p[data-service-review-date-of-experience-typography='true']") %>%
    html_text() %>%
    str_remove("Date of experience: ") %>%
    trimws()

  # Ensure all columns have the same length (Handle missing values)
  max_length <- max(length(reviewer_names), length(reviewer_counts), length(reviewer_countries),
                    length(review_ratings), length(review_dates), length(review_titles),
                    length(review_contents), length(experience_dates))

  df <- data.frame(
    Reviewer = c(reviewer_names, rep(NA, max_length - length(reviewer_names))),
    ReviewCount = c(reviewer_counts, rep(NA, max_length - length(reviewer_counts))),
    Country = c(reviewer_countries, rep(NA, max_length - length(reviewer_countries))),
    Rating = c(review_ratings, rep(NA, max_length - length(review_ratings))),
    Review_Date = c(review_dates, rep(NA, max_length - length(review_dates))),
    Title = c(review_titles, rep(NA, max_length - length(review_titles))),
    Content = c(review_contents, rep(NA, max_length - length(review_contents))),
    Experience_Date = c(experience_dates, rep(NA, max_length - length(experience_dates))),
    stringsAsFactors = FALSE
  )

  return(df)
}

# Set number of pages to scrape
num_pages <- 6

# Scrape multiple pages and combine results
reviews_all <- bind_rows(lapply(1:num_pages, scrape_trustpilot))

Data overview

Now we can take a sneak peak in the data we collected:

Sneak Peek of Customer Reviews
Reviewer ReviewCount Country Rating Review_Date Title Content Experience_Date
Dale Olmsted 1 US 1 2025-02-12T21:36:08.000Z Carfax report (From a certified… Carfax report (From a certified dealership) was in… February 05, 2025
Kal O 4 CA 1 2025-02-06T04:26:02.000Z Beware of Inaccurate Reports - Don’t Trust Them I always believed CARFAX reports were reliable and… January 10, 2025
Kristine Turk 2 US 1 2025-01-28T06:47:30.000Z I was told my vehicle was 100% clear of… I was told my vehicle was 100% clear of accidents … January 02, 2025
John Thomas 1 US 1 2025-01-29T22:57:09.000Z Carfax is a complete scam! Carfax is a complete scam!They have a report on my… January 29, 2025
Fred 1 US 1 2025-01-23T02:13:11.000Z No phone number to contact anyone No phone number to contact anyone. As of now I hav… January 22, 2025
Michael McCauley 1 US 1 2025-01-09T18:21:29.000Z I have brand new tires I have brand new tires, carfax says I’m overdue on… January 09, 2025

We successfully scraped 119 reviews, each containing the following details:

  • Reviewer’s Name – The name of the person who left the review.

  • Total Reviews by Reviewer – The total number of reviews this user has submitted on Trustpilot.

  • Country – The reviewer’s location.

  • Rating – The star rating given in the review.

  • Review Date – When the review was posted.

  • Review Title – The headline or summary of the review.

  • Review Content – The full text of the review.

  • Experience Date – When the reviewer had the experience they wrote about.


We will first explore the data we scrapped in order to understand what we can analyze.

Customer Ratings by Country
Country Count_of_reviews Avg_rating Min_date Max_date
IT 96 1.364583 2021-12-13T06:54:46.000Z 2025-01-05T01:49:06.000Z
US 16 1.062500 2021-06-18T06:02:35.000Z 2025-02-12T21:36:08.000Z
CA 2 1.000000 2024-11-09T16:57:50.000Z 2025-02-06T04:26:02.000Z
BG 1 5.000000 2025-01-17T11:23:37.000Z 2025-01-17T11:23:37.000Z
DE 1 1.000000 2023-08-15T13:13:07.000Z 2023-08-15T13:13:07.000Z
GB 1 5.000000 2023-12-12T09:47:06.000Z 2023-12-12T09:47:06.000Z
PK 1 1.000000 2025-02-14T22:28:22.000Z 2025-02-14T22:28:22.000Z
VA 1 1.000000 2023-01-18T12:17:05.000Z 2023-01-18T12:17:05.000Z

Based on the data overview, it seems that majority of the reviews are coming from users from Italy. In the next steps, I will only take into account reviews from Italy.

We can also see that the average rating in italian market is pretty low, around 1.4 out of 5. The reviews spand from the period of end of 2021 to the beginning of 2025.

Based on this, we will try to explain why is the rating so low. The first point to do is to check rating average year over year.

Customer Ratings by Country
Year Count_of_reviews Avg_rating
2021 1 1.000000
2022 10 1.900000
2023 31 1.258064
2024 53 1.339623
2025 1 1.000000

As it seems that the most of the reviews are from 2022 to 2024 period, we can see that the average rating was somewhat bigger in 2022, standing at 1.9, while it decrased in 2023 and 2024 to the level around 1.3. Judging the average ratings and development over the last few years, we could expect to hear probably bad experiences that these users have had bad experiences.


Most Frequent Words

Many words indicate dissatisfaction

“soldi” (money), “buttati” (wasted), “inutile” (useless), and “rimborso” (refund) suggest that many users feel they wasted money on reports. “nulla” (nothing) and “solo” (only) hint that reports lacked expected details.

Technical & report-related issues

“informazioni” (information), “dati” (data), “targa” (license plate), and “incidente” (accident) suggest concerns about report completeness or accuracy.

Possible pricing concerns

“soldi” (money), “euro” (currency), and “rimborso” (refund) imply frustration with cost vs. value. Website & service issues


Sentiment Analysis

The Sentiment Analysis graph displays polarity trends over time, with recent reviews on the left (Index 0-20) and older reviews on the right (Index 60-80). The blue line represents the average sentiment, where higher values indicate positive sentiment and lower values indicate negativity. The gray shaded area shows the confidence interval, reflecting variability in sentiment at different points.

The sentiment in the reviews followed a distinct pattern over time:

  • Recent Decline (Index 0-20): Sentiment dropped sharply, indicating increased user frustration in the latest reviews. This could be due to recent changes in service, pricing concerns, or missing information in reports.

  • Stable Positive Phase (Index 20-60): Before the decline, sentiment remained relatively stable and slightly positive, suggesting that for a period, users were satisfied with the service.

  • Early Sentiment Increase (Index 60-80): Older reviews started with neutral to slightly positive sentiment, gradually improving over time. This could indicate that past issues were resolved, leading to a more positive outlook before the recent decline.


Negative reviews

Incorrect information (35 reviews)

Example

“Servizio inutile. Fornisce dati ed informazioni facilmente verificabili, consultando altri portali gratuiti senza spendere un centesimo in ricerche dalla dubbia utilità. Ho voluto provare questo servizio, per verificare di persona i risultati e la delusione è stata totale. È da sconsigliare nel modo più assoluto.”

English Translation:

“Useless service. It provides data and information that can be easily verified by checking other free portals without spending a cent on research of questionable usefulness. I wanted to try this service to check the results myself, and the disappointment was total. I strongly advise against it.”

Waste of Money (25 reviews)

Example

“Utile per conoscere il peso dell’auto… Non risultano anomalie nel chilometraggio, vorrei capire in che modo sarebbero risultate, c’è solo il chilometraggio del primo acquisto e quello della prima revisione dopo 4 anni. Nessuna notizia in più da quelle che è possibile reperire gratuitamente, soldi buttati.”

English Translation:

“Useful for knowing the car’s weight… No mileage anomalies were found, but I’d like to understand how they would even appear. There’s only the mileage from the first purchase and the first inspection after four years. No additional information beyond what you can find for free, money wasted.”

Useless reports (7 reviews)

Example

“Sono stato danneggiato da questo sito, non sono più riuscito a vendere la mia autovettura perché hanno prodotto un report falso indicando molti più km dei km effettivi. Io ho la documentazione di tutti i tagliandi effettuati, la Carfax non so quale documentazione ha prodotto.”

English Translation:

“I have been harmed by this site. I can no longer sell my car because they produced a false report indicating many more kilometers than the actual ones. I have documentation of all the maintenance records, but I have no idea what documentation CARFAX used to make such claims.”

Mileage discrepancies (6 reviews)

Example

“Ho collaudato tempo fa la mia Toyota, nel riportare i km anziché 191000, come da contachilometri, ha riportato 218000. Ora dovrò rifare il collaudo e mi troverò ancora molti km in più del contachilometri. Cosa devo fare?”

English Translation:

“I had my Toyota inspected some time ago. Instead of reporting 191,000 km as shown on the odometer, they reported 218,000 km. Now, when I go for my next inspection, I will still have extra kilometers recorded. What should I do?”


Positive reviews

Helps avoid bad purchases

Many users appreciate how CARFAX reports help them avoid buying problematic vehicles.
The reports provide key insights into a car’s past, helping users make informed decisions.

Example: > “Servizio quasi perfetto, bravi, mi ha evitato un grosso problema con un’auto che sembrava perfetta ma aveva problemi nascosti.”

English Translation:

“Almost perfect service, well done, it saved me from a big problem with a car that seemed perfect but had hidden issues.”


Detailed vehicle history reports

Users value the comprehensive information provided in CARFAX reports. The reports include ownership history, service records, and accident reports, giving a full picture of a car’s past.

Example: > “..nel mio caso è stato molto utile se avessi preso l’auto senza il report avrei avuto brutte sorprese.”

English Translation:

“..in my case, it was very useful. If I had bought the car without the report, I would have had unpleasant surprises.”


Professional and helpful customer service

Some users praise CARFAX’s customer support team for their professionalism and helpfulness.
A few reviews specifically mention positive interactions with CARFAX representatives.

Example: > “Vorrei complimentarmi con la dott.ssa Elena Martino per la professionalità e disponibilità con cui mi ha aiutato.”

English Translation:

“I would like to compliment Dr. Elena Martino for the professionalism and availability with which she helped me.”


Transparency and trustworthiness

CARFAX reports are seen as a trustworthy source of vehicle information.
Users feel more confident in their car purchases when they have access to a CARFAX report.

Example: > “Grazie a Carfax ho potuto verificare che l’auto che volevo comprare era effettivamente in buone condizioni.”

English Translation:

“Thanks to CARFAX, I was able to verify that the car I wanted to buy was actually in good condition.”

Easy-to-use platform

Many users find CARFAX’s website and report system easy to use.
The interface is clear, and the process of retrieving a vehicle history report is simple.

Example: > “Il sito è intuitivo e il report è stato facile da ottenere. Tutto chiaro e semplice.”

English Translation:

“The website is intuitive, and the report was easy to obtain. Everything is clear and simple.”

These insights highlight the key aspects that users appreciate about CARFAX, including helping them make informed purchases, providing detailed reports, and offering a user-friendly platform.