Introduction

Using R and Python, I extract usernames, ratings, and reviews from Trustpilot Store reviews. This analysis reveals key customer insights, tracks rating trends, and identifies common feedback themes.

Key insights:

  • How ratings have changed over time

  • Common words and recurring themes in user feedback

  • Top complaints from users

By leveraging data-driven insights, this report helps refine app strategy and maintain a competitive edge.


Insights about CARFAX in Sweden, Netherlands and Spain

Based on Trustpilot reviews across Sweden, the Netherlands, and Spain, several key themes emerge:

Data accuracy and completeness issues

Customers frequently report missing or incorrect information in CarFax reports. Many users cross-check CarFax data with official sources (e.g., Transportstyrelsen in Sweden, RDW in the Netherlands, and DGT in Spain) and find inconsistencies. Accident history and maintenance records are often absent or outdated.

Pricing concerns

A common complaint across all three markets is that CarFax reports are too expensive relative to the amount and accuracy of information provided. Users feel that free alternatives or government databases offer comparable data, making CarFax reports a poor value for money.

Transparency issues

Customers feel that CarFax does not clearly state what information is included before purchase. Many users express frustration after buying a report only to find it lacks key details they expected.

Trust and reputation concerns

Negative reviews frequently mention words like “scam”, “fraud”, or “misleading”, indicating a general lack of trust. Users expect a more comprehensive and reliable report, especially when paying for it.


Limitations

Bias in review submission

Trustpilot reviews tend to be skewed towards negative experiences, as dissatisfied customers are more likely to leave feedback compared to satisfied ones.

Lack of verification

There is no control over who leaves reviews on Trustpilot. Some reviews could come from competitors, one-time users, or misinformed customers.

Contextual differences

Some negative reviews may result from misunderstandings of what CarFax reports include, rather than actual inaccuracies.

For example, a user expecting a full service history on an imported car may not realize that such data depends on availability from prior owners or national databases.

Limited sample size

The number of reviews collected from Trustpilot is relatively small, making it difficult to generalize findings across all CarFax customers in Sweden, the Netherlands, and Spain.

Changes in service over time

Reviews reflect experiences at a specific point in time, and CarFax may have updated or improved its service since the reviews were posted.


GPT Bot

As a part of this analysis, I also developed a GPT Bot that is provided with the dataset used for this analysis. Therefore, you can simply use to explore the more information on your own:

https://chatgpt.com/g/g-67bf512ac3908191acdd5f3a08257b86-carfax-user-reviews-analyzer

IMPORTANT: You will need to specify the country market that you would like to analyze.


Scrapping data

The first step is to scrape Trustpilot reviews and structure the data for text analysis.

# Function to scrape a single Trustpilot review page
scrape_trustpilot <- function(page_num) {
  # Construct the URL dynamically for the given page number
  url <- paste0("https://www.trustpilot.com/review/www.carfax.com?languages=sv&page", page_num)
  webpage <- read_html(url)  # Read the webpage once

  # Extract reviewer names
  reviewer_names <- webpage %>%
    html_nodes("span[data-consumer-name-typography='true']") %>%
    html_text()

  # Extract number of reviews per user
  reviewer_counts <- webpage %>%
    html_nodes(".styles_consumerExtraDetails__NFM0b span[data-consumer-reviews-count-typography='true']") %>%
    html_text() %>%
    str_extract("\\d+") %>% 
    as.numeric()

  # Extract reviewer countries
  reviewer_countries <- webpage %>%
    html_nodes(".styles_consumerExtraDetails__NFM0b span:last-child") %>%
    html_text()

  # Extract ratings
  review_ratings <- webpage %>%
    html_nodes("div[data-service-review-rating]") %>%
    html_attr("data-service-review-rating") %>%
    as.numeric()

  # Extract review dates
  review_dates <- webpage %>%
    html_nodes("time[data-service-review-date-time-ago]") %>%
    html_attr("datetime") %>%
    as.character()

  # Extract review titles
  review_titles <- webpage %>%
    html_nodes("h2[data-service-review-title-typography='true']") %>%
    html_text()

  # Extract review content
  review_contents <- webpage %>%
    html_nodes("p[data-service-review-text-typography='true']") %>%
    html_text()

  # Extract experience dates
  experience_dates <- webpage %>%
    html_nodes("p[data-service-review-date-of-experience-typography='true']") %>%
    html_text() %>%
    str_remove("Date of experience: ") %>%
    trimws()

  # Ensure all columns have the same length (Handle missing values)
  max_length <- max(length(reviewer_names), length(reviewer_counts), length(reviewer_countries),
                    length(review_ratings), length(review_dates), length(review_titles),
                    length(review_contents), length(experience_dates))

  df <- data.frame(
    Reviewer = c(reviewer_names, rep(NA, max_length - length(reviewer_names))),
    ReviewCount = c(reviewer_counts, rep(NA, max_length - length(reviewer_counts))),
    Country = c(reviewer_countries, rep(NA, max_length - length(reviewer_countries))),
    Rating = c(review_ratings, rep(NA, max_length - length(review_ratings))),
    Review_Date = c(review_dates, rep(NA, max_length - length(review_dates))),
    Title = c(review_titles, rep(NA, max_length - length(review_titles))),
    Content = c(review_contents, rep(NA, max_length - length(review_contents))),
    Experience_Date = c(experience_dates, rep(NA, max_length - length(experience_dates))),
    stringsAsFactors = FALSE
  )

  return(df)
}

# Set number of pages to scrape
num_pages <- 2

# Scrape multiple pages and combine results
reviews_sweden <- bind_rows(lapply(1:num_pages, scrape_trustpilot))

Data overview

Now we can take a sneak peak in the data we collected:

Sneak Peek of Customer Reviews
Reviewer ReviewCount Country Rating Review_Date Title Content Experience_Date
A Baktawar 3 NL 1 2024-09-11T05:30:49.000Z Waarom ben ik zo lomp om niet eerst… Waarom ben ik zo lomp om niet eerst hier dd review… September 10, 2024
A.N. Verdoes 15 NL 1 2024-05-26T13:08:37.000Z Onmogelijke buitenlandse schades gemeld Mijn auto kocht ik op 7-11-2019 via mijn VW dealer… May 25, 2024
Vincent86c 14 NL 1 2024-01-02T18:30:54.000Z Oplichting: rapport vermeld niet wat het beloofd Ik zou het rapport aanschaffen om inzage te krijge… January 02, 2024
cees van Iterson 36 NL 1 2022-01-19T10:53:05.000Z Zeer slechte zaak Recent een rapport opgevraagd voor een gekochte au… January 19, 2022
Brainy Kid 4 NL 1 2022-12-14T00:35:53.000Z Oplichting, doen niet wat ze beloven! Oplichting, 20 euro vragen voor info die je niet k… December 12, 2022
heer Bosma 14 NL 1 2023-11-09T09:06:01.000Z WAARDELOOS GELDKLOPPERIJ NIET DOEN… WAARDELOOS GELDKLOPPERIJ NIET DOEN waardelose alge… November 08, 2023

We successfully scraped 76 reviews from Trustpilot, each containing the following details:

  • Reviewer’s Name – The name of the person who left the review.

  • Total Reviews by Reviewer – The total number of reviews this user has submitted on Trustpilot.

  • Country – The reviewer’s location.

  • Rating – The star rating given in the review.

  • Review Date – When the review was posted.

  • Review Title – The headline or summary of the review.

  • Review Content – The full text of the review.

  • Experience Date – When the reviewer had the experience they wrote about.

We will first explore the data structure in order to understand what we can analyze.

Customer Ratings by Country
Country Count_of_reviews Avg_rating Min_date Max_date
SE 40 1.00 2020-05-24T16:46:18.000Z 2025-02-26T14:46:18.000Z
NL 17 1.00 2020-06-05T09:05:53.000Z 2024-09-11T05:30:49.000Z
ES 16 1.25 2023-05-30T17:20:46.000Z 2025-01-14T13:48:16.000Z
BE 2 1.00 2023-12-07T15:59:13.000Z 2024-02-03T08:01:12.000Z
US 1 1.00 2023-12-01T05:20:21.000Z 2023-12-01T05:20:21.000Z

Based on the data overview, it seems that majority of the reviews are coming from users from:

  • Sweden - 40

  • Netherlands - 17

  • Spain - 16

Even though the sample is pretty small to make any conclusions, we can explore the contents of reviews to understand what is the point which frustrates users the most.


We can also see that the average rating across all 3 markets italian market is pretty low, 1 out of 5.

Based on this, we will try to explain why is the rating so low. The first point to do is to check rating average year over year.

Customer Ratings by Year and Country
Year Country Count_of_reviews Avg_rating
2020 NL 1 1.000000
2020 SE 2 1.000000
2021 NL 4 1.000000
2021 SE 4 1.000000
2022 NL 2 1.000000
2022 SE 10 1.000000
2023 ES 6 1.000000
2023 NL 5 1.000000
2023 SE 8 1.000000
2024 ES 9 1.444444
2024 NL 5 1.000000
2024 SE 14 1.000000
2025 ES 1 1.000000
2025 SE 2 1.000000

The rating level across all markets over years seems pretty much consistent, which only suggests that we can expect very bad reviews in all 3 markets and each year.


Most frequent words

Sweden

  • The presence of “gratis” (free) and “betalade” (paid) suggests users are comparing free vs. paid services.

  • “bluff” (scam/fraud) appearing indicates trust issues with CARFAX reports.

  • “transportstyrelsen” suggests users are comparing CARFAX data with official sources, likely questioning accuracy.

Spain

  • The focus on “informe” (report), “información” (information), and “datos” (data) suggests users are concerned about report accuracy.

  • “kms” (kilometers) and “revisión” (inspection) indicate discussions around mileage discrepancies and service history.

  • The presence of “estafa” (scam) and “cobran” (charge) implies concerns about pricing and reliability.

Netherlands

  • “gratis” (free) and “geld” (money) suggest users are evaluating cost vs. value.
  • “rdw” (official authority) and “onderhoudshistorie” (maintenance history) show that users compare CARFAX data with official Dutch records.

  • “trap” (fraud/deception) appearing indicates trust concerns, similar to Sweden and Spain.


Most common reviews

Sweden

1. Incorrect or missing information

Users report that CarFax reports contain missing or inaccurate vehicle details.

“Idag köpte jag rapporter om en bil och upptäckte att viktig information saknades.”

English translation:

“Today, I bought reports about a car and discovered that important information was missing”

2. Expensive reports

Many customers feel that the reports are overpriced for the level of detail provided.

“Jag betalade för rapporten, men fick nästan ingen extra information jämfört med gratisalternativ.”

English translation:

I paid for the report but got almost no extra information compared to free alternatives.

3. Outdated data

Several users mention that the reports show old or outdated information, making them unreliable.

“Den senaste servicen i rapporten var flera år gammal och irrelevant.”

English translation:

The last service in the report was several years old and irrelevant.

4. Limited vehicle history

Customers complain that CarFax does not provide enough history, especially for imported cars.

“Rapporten hade ingen information om bilens tidigare ägare eller registreringsland.”

English translation:

The report had no information about the car’s previous owners or country of registration.


Netherlands

1. Lack of transparency

Users criticize CarFax for not clearly stating what information is included before purchasing a report.

“Waarom ben ik zo lomp om niet eerst het voorbeeld te bekijken? Ik kreeg amper informatie.”

English translation:

Why was I so foolish not to check the example first? I got almost no information.

2. Missing accident and repair history

Many customers report that their reports do not include key accident or damage history.

“Er stond niets in het rapport over schade, terwijl ik weet dat de auto een ongeluk heeft gehad.”

English translation:

There was nothing in the report about damage, even though I know the car was in an accident.

3. Overpriced service

Several users feel that the cost is too high for the quality and completeness of the report.

“Je betaalt een flink bedrag, maar krijgt amper nieuwe informatie over de auto.”

English translation:

You pay a hefty amount but get almost no new information about the car.

4. Reports not up-to-date

Users report that some reports are outdated and missing recent records.

“Het rapport toonde onderhoudsgegevens van jaren geleden, niets over recente reparaties.”

English translation

The report showed maintenance records from years ago, nothing about recent repairs.


Spain

1. Missing information in reports

Users often complain that CarFax reports lack key details about the car’s history.

“En el informe faltan muchos datos. Mejor no gastar dinero en esto.”

English translation:

The report is missing a lot of data. Better not to waste money on this.

2. Inaccurate or old data

Many users note that the data in the reports is outdated or incorrect.

“Los datos de kilometraje estaban equivocados y no coincidían con la DGT.”

English translation:

The mileage data was wrong and did not match the official records.

3. High price for low value

Customers feel that CarFax charges too much for reports with little useful information.

“Muy caro para la poca información que ofrece el informe.”

English translation:

Very expensive for the little information the report provides.