Final Project Bais

Area of Interest

For my final project in BAIS I will be looking in to Carvana, the online car selling company that handles and ships these cars to a persons house after they have completed the purchase of the car. Carvana is the new way of selling the car their isn’t a middle man they have all different types of cars on hand.

Research Question

What factors influence the pricing of used cars on Carvana’s Platform? When people are buying cars are they enjoying this automated service of buying the car online as opposed to dealing with the annoying salesman that we all have heard stores of from our parents and friends.

My hypothesis upon this is that typically one of the biggest factors on there website is mileage, make and the overall price of the car are the biggest impactor of the car. I believe this because on there website they don’t really

Data Dictionary

Search Cars - which is the tab that will display you 21 cars that are for sell per page

Type of Car -company and model

Price - Price of the car as listed

Make of car

Shipping

Stars

Time Stamp

Review Title

Review Paragraph

Collecting Data

For collecting the cars, I just used the recommended tab and decided to scrape and collect For collecting the reviews I wanted to keep it very similar throughout so for all of the questions listed above I will be drawing from the most recent section of the Carvana reviews as that gives myself the best opportunity to look at various reviews ranging from 1 to 5 star and everything in between. Along with trying to collect a data component of this, I decided to do my analysis on the first 20 pages of Carvana as that allows for me to see reviews from earlier today ranging to this past week or 7 days ago allowing for me to draw some conclusions on how people feel about the car buying process and it being fully online.

On Carvana it is very odd as far as the length of reviews per page they host about 8 on the first page and then about 30 on every page after that but when running and scraping the reviews on the website. It only allows you to get about 8 reviews from a page. I thought this was a result of the website only having 8 reviews on the first page but whether you include that or not it always gives you 8. So for the sake of looking at Carvana reviews, I kept this consistent so to accommodate for this I did the first 20 pages to get a total of 160 unique reviews.

Scraped Version of Carvana

If you wanted to see how I went about the process of getting the Carvana data in to R you can see it below.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(rvest)

Attaching package: 'rvest'

The following object is masked from 'package:readr':

    guess_encoding
library(lubridate)
library(httr)



set_config(user_agent("*"))

Carvana_url <-
  read_html("https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9")

scrape_carvana <- function(url) {
  


## Retriving Type of Car

Type_of_car <-
  Carvana_url %>% 
  html_elements("div.flex.flex-col.justify-between") %>% 
  html_elements("p.font-bold") %>% 
  html_text2()

Car_Brand <-
  Carvana_url %>% 
  html_elements("div.flex.flex-col.justify-between") %>% 
  html_elements("p.font-bold") %>% 
  html_text2() %>% 
  substr(6,20) %>% 
  as.character()


Year <-
  Carvana_url %>% 
  html_elements("div.flex.flex-col.justify-between") %>% 
  html_elements("p.font-bold") %>% 
  html_text2() %>% 
  substr(1,4) %>% 
  as.numeric()

## Price of Car
Price_of_Car <-
  Carvana_url %>% 
  html_elements("div.-mb-\\[2px\\].flex.font-bold.gap-8.items-center.text-2xl.text-blue-6") %>% 
  html_text() %>% 
  as.character() %>% 
  str_replace_all("\\$","") %>% 
  str_replace_all("\\,","") %>% 
  as.numeric()






carvana_df <-
  data.frame(Type_of_car,Year, Car_Brand, Price_of_Car)

return(carvana_df) 
}

test <- 
  scrape_carvana("https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9")

pages <-
  c("https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9", #Ford,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiS2lhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0", #Kia,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVGVzbGEifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ", #Tesla,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVG95b3RhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0", #Toyota,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ", #Lexus,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ" #Cheverolt
    )

scrape_carvana_pages <- function(urls) {
  carvana_FFKTLC <- data.frame()
  
  for (i in seq_along(urls)) {
    print(paste("Collecting page", i, "of",length(urls), ":)", sep = ""))
    Sys.sleep(runif(1,5,15))
    
 carvana_FFKTLC <-
    scrape_carvana(urls[i]) %>% 
      mutate(page_id = urls[i]) %>% 
      bind_rows(carvana_FFKTLC)
    print(paste(urls[i], "collected", sep = " "))
    print(paste(nrow(carvana_FFKTLC), "total reviews collected so far!", sep = " "))
  }
  return(carvana_FFKTLC)
}

Carvana_cars_scraped <- 
  scrape_carvana_pages(pages) 
[1] "Collecting page1of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9 collected"
[1] "20 total reviews collected so far!"
[1] "Collecting page2of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiS2lhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0 collected"
[1] "40 total reviews collected so far!"
[1] "Collecting page3of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVGVzbGEifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ collected"
[1] "60 total reviews collected so far!"
[1] "Collecting page4of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVG95b3RhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0 collected"
[1] "80 total reviews collected so far!"
[1] "Collecting page5of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ collected"
[1] "100 total reviews collected so far!"
[1] "Collecting page6of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ collected"
[1] "120 total reviews collected so far!"
view(Carvana_cars_scraped)




#| echo: false
2 * 2
[1] 4

Scraped Version of Reviews

library(tidyverse)
library(tidytext)
library(gutenbergr)
library(ggwordcloud)
library(textdata)

Attaching package: 'textdata'
The following object is masked from 'package:httr':

    cache_info
library(rvest)
library(httr)
library(chromote)



set_config(user_agent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36"))

Carvana_Reviews_Url <-
  read_html_live("https://www.carvana.com/reviews?bvstate=pg:1/ct:r")

carvana_review_scrape <- function(url) {

#Elements of a review 
  Stars <-
    Carvana_Reviews_Url %>% 
    html_elements("div.bv-content-header-meta") %>% 
    html_elements("span.bv-off-screen") %>% 
    html_text2()



Time_stamp <-
  Carvana_Reviews_Url %>% 
  html_elements("div.bv-content-header-meta") %>%
  html_elements("span.bv-content-datetime-stamp") %>% 
  html_text2() 


Review_Paragraph <-
  Carvana_Reviews_Url %>% 
  html_elements("div.bv-content-summary-body-text") %>% 
  html_text2()

Review_Title <-
  Carvana_Reviews_Url %>% 
  html_elements("div.bv-content-container") %>% 
  html_elements("h3.bv-content-title") %>% 
  html_text2() 

carvana_review_df <-
  data.frame(Stars, Time_stamp, Review_Title, Review_Paragraph)


return(carvana_review_df) 
}


pages <-
  c("https://www.carvana.com/reviews?bvstate=pg:1/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:2/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:3/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:4/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:5/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:6/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:7/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:8/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:9/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:10/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:11/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:12/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:13/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:14/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:15/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:16/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:17/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:18/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:19/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:20/ct:r"
  )


scrape_carvana_reviews_pages <- function(urls) {
  carvana_review_pages <- data.frame()
  
  for (i in seq_along(urls)) {
    print(paste("Collecting page", i, "of",length(urls), ":)", sep = ""))
    Sys.sleep(runif(1,5,15))
    
    carvana_review_pages <-
      carvana_review_scrape(urls[i]) %>% 
      mutate(page_id = urls[i]) %>% 
      bind_rows(carvana_review_pages)
    print(paste(urls[i], "collected", sep = " "))
    print(paste(nrow(carvana_review_pages), "total reviews collected so far!", sep = " "))
  }
  return(carvana_review_pages)
}

test_reviews <-
  scrape_carvana_reviews_pages("https://www.carvana.com/reviews?bvstate=pg:3/ct:r")
[1] "Collecting page1of1:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:3/ct:r collected"
[1] "8 total reviews collected so far!"
Carvana_reviews <- 
  scrape_carvana_reviews_pages(pages) 
[1] "Collecting page1of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:1/ct:r collected"
[1] "8 total reviews collected so far!"
[1] "Collecting page2of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:2/ct:r collected"
[1] "16 total reviews collected so far!"
[1] "Collecting page3of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:3/ct:r collected"
[1] "24 total reviews collected so far!"
[1] "Collecting page4of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:4/ct:r collected"
[1] "32 total reviews collected so far!"
[1] "Collecting page5of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:5/ct:r collected"
[1] "40 total reviews collected so far!"
[1] "Collecting page6of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:6/ct:r collected"
[1] "48 total reviews collected so far!"
[1] "Collecting page7of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:7/ct:r collected"
[1] "56 total reviews collected so far!"
[1] "Collecting page8of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:8/ct:r collected"
[1] "64 total reviews collected so far!"
[1] "Collecting page9of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:9/ct:r collected"
[1] "72 total reviews collected so far!"
[1] "Collecting page10of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:10/ct:r collected"
[1] "80 total reviews collected so far!"
[1] "Collecting page11of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:11/ct:r collected"
[1] "88 total reviews collected so far!"
[1] "Collecting page12of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:12/ct:r collected"
[1] "96 total reviews collected so far!"
[1] "Collecting page13of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:13/ct:r collected"
[1] "104 total reviews collected so far!"
[1] "Collecting page14of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:14/ct:r collected"
[1] "112 total reviews collected so far!"
[1] "Collecting page15of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:15/ct:r collected"
[1] "120 total reviews collected so far!"
[1] "Collecting page16of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:16/ct:r collected"
[1] "128 total reviews collected so far!"
[1] "Collecting page17of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:17/ct:r collected"
[1] "136 total reviews collected so far!"
[1] "Collecting page18of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:18/ct:r collected"
[1] "144 total reviews collected so far!"
[1] "Collecting page19of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:19/ct:r collected"
[1] "152 total reviews collected so far!"
[1] "Collecting page20of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:20/ct:r collected"
[1] "160 total reviews collected so far!"
view(Carvana_reviews)
#| echo: false
2 * 2
[1] 4

Some Visuals Reflecting The Data

On Carvana how does price trend for them in the last 6 years?

From 2008 to 2024 you can see that cars that were made earlier than 2015 that these cars are selling for less money which is what you would think but the interesting part is that you can see that the total value of cars from 2021 to 2022 and could bring up additional questions that if you dove deeper into Carvana websites is this true for more than just these 6 car brands or what makes the 2022 cars so much more valuable. An additional comment is that you are unable to see 2024 cars I imagine because no one is selling 2024 cars on Carvana yet or at the time of my scraping of their website they didn’t have any of these cars up for sale yet.

library(tidyverse)
library(rvest)
library(lubridate)
library(httr)



set_config(user_agent("*"))

Carvana_url <-
  read_html("https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9")

scrape_carvana <- function(url) {
  


## Retriving Type of Car

Type_of_car <-
  Carvana_url %>% 
  html_elements("div.flex.flex-col.justify-between") %>% 
  html_elements("p.font-bold") %>% 
  html_text2()

Car_Brand <-
  Carvana_url %>% 
  html_elements("div.flex.flex-col.justify-between") %>% 
  html_elements("p.font-bold") %>% 
  html_text2() %>% 
  substr(6,20) %>% 
  as.character()


Year <-
  Carvana_url %>% 
  html_elements("div.flex.flex-col.justify-between") %>% 
  html_elements("p.font-bold") %>% 
  html_text2() %>% 
  substr(1,4) %>% 
  as.numeric()

## Price of Car
Price_of_Car <-
  Carvana_url %>% 
  html_elements("div.-mb-\\[2px\\].flex.font-bold.gap-8.items-center.text-2xl.text-blue-6") %>% 
  html_text() %>% 
  as.character() %>% 
  str_replace_all("\\$","") %>% 
  str_replace_all("\\,","") %>% 
  as.numeric()






carvana_df <-
  data.frame(Type_of_car,Year, Car_Brand, Price_of_Car)

return(carvana_df) 
}

test <- 
  scrape_carvana("https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9")

pages <-
  c("https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9", #Ford,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiS2lhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0", #Kia,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVGVzbGEifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ", #Tesla,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVG95b3RhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0", #Toyota,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ", #Lexus,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ" #Cheverolt
    )

scrape_carvana_pages <- function(urls) {
  carvana_FFKTLC <- data.frame()
  
  for (i in seq_along(urls)) {
    print(paste("Collecting page", i, "of",length(urls), ":)", sep = ""))
    Sys.sleep(runif(1,5,15))
    
 carvana_FFKTLC <-
    scrape_carvana(urls[i]) %>% 
      mutate(page_id = urls[i]) %>% 
      bind_rows(carvana_FFKTLC)
    print(paste(urls[i], "collected", sep = " "))
    print(paste(nrow(carvana_FFKTLC), "total reviews collected so far!", sep = " "))
  }
  return(carvana_FFKTLC)
}

Carvana_cars_scraped <- 
  scrape_carvana_pages(pages) 
[1] "Collecting page1of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9 collected"
[1] "20 total reviews collected so far!"
[1] "Collecting page2of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiS2lhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0 collected"
[1] "40 total reviews collected so far!"
[1] "Collecting page3of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVGVzbGEifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ collected"
[1] "60 total reviews collected so far!"
[1] "Collecting page4of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVG95b3RhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0 collected"
[1] "80 total reviews collected so far!"
[1] "Collecting page5of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ collected"
[1] "100 total reviews collected so far!"
[1] "Collecting page6of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ collected"
[1] "120 total reviews collected so far!"
Carvana_cars_scraped %>% 
  ggplot(aes(x = Year , y =Price_of_Car)) +
  geom_bar(stat = "summary", fun = sum) +
  geom_smooth() +
  scale_y_continuous(labels = scales::dollar) +
  labs(title = "Total Price of All Cars from the Year they were made ",
       y = "Total Selling Price of All Cars")
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'

#| echo: false
2 * 2
[1] 4

How does price vary by car brand?

In this visualization, you can see that all car prices tend to stay from anywhere to 15-60000. These car companies all have their cars competently priced with other companies while still being in this affordable range. What’s interesting is that you can see that one car it might have a variety of ranges as far as price because of the car’s package.

library(tidyverse)
library(rvest)
library(lubridate)
library(httr)



set_config(user_agent("*"))

Carvana_url <-
  read_html("https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9")

scrape_carvana <- function(url) {
  


## Retriving Type of Car

Type_of_car <-
  Carvana_url %>% 
  html_elements("div.flex.flex-col.justify-between") %>% 
  html_elements("p.font-bold") %>% 
  html_text2()

Car_Brand <-
  Carvana_url %>% 
  html_elements("div.flex.flex-col.justify-between") %>% 
  html_elements("p.font-bold") %>% 
  html_text2() %>% 
  substr(6,20) %>% 
  as.character()


Year <-
  Carvana_url %>% 
  html_elements("div.flex.flex-col.justify-between") %>% 
  html_elements("p.font-bold") %>% 
  html_text2() %>% 
  substr(1,4) %>% 
  as.numeric()

## Price of Car
Price_of_Car <-
  Carvana_url %>% 
  html_elements("div.-mb-\\[2px\\].flex.font-bold.gap-8.items-center.text-2xl.text-blue-6") %>% 
  html_text() %>% 
  as.character() %>% 
  str_replace_all("\\$","") %>% 
  str_replace_all("\\,","") %>% 
  as.numeric()






carvana_df <-
  data.frame(Type_of_car,Year, Car_Brand, Price_of_Car)

return(carvana_df) 
}

test <- 
  scrape_carvana("https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9")

pages <-
  c("https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9", #Ford,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiS2lhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0", #Kia,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVGVzbGEifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ", #Tesla,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVG95b3RhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0", #Toyota,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ", #Lexus,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ" #Cheverolt
    )

scrape_carvana_pages <- function(urls) {
  carvana_FFKTLC <- data.frame()
  
  for (i in seq_along(urls)) {
    print(paste("Collecting page", i, "of",length(urls), ":)", sep = ""))
    Sys.sleep(runif(1,5,15))
    
 carvana_FFKTLC <-
    scrape_carvana(urls[i]) %>% 
      mutate(page_id = urls[i]) %>% 
      bind_rows(carvana_FFKTLC)
    print(paste(urls[i], "collected", sep = " "))
    print(paste(nrow(carvana_FFKTLC), "total reviews collected so far!", sep = " "))
  }
  return(carvana_FFKTLC)
}

Carvana_cars_scraped <- 
  scrape_carvana_pages(pages) 
[1] "Collecting page1of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9 collected"
[1] "20 total reviews collected so far!"
[1] "Collecting page2of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiS2lhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0 collected"
[1] "40 total reviews collected so far!"
[1] "Collecting page3of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVGVzbGEifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ collected"
[1] "60 total reviews collected so far!"
[1] "Collecting page4of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVG95b3RhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0 collected"
[1] "80 total reviews collected so far!"
[1] "Collecting page5of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ collected"
[1] "100 total reviews collected so far!"
[1] "Collecting page6of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ collected"
[1] "120 total reviews collected so far!"
Carvana_cars_scraped %>% 
  ggplot(aes(x = Car_Brand , y =Price_of_Car)) +
  geom_boxplot() +
  labs(title = "Distribution of Car Brand by Price")

#| echo: false
2 * 2
[1] 4

For Ford cars made in 2023 what is the price range in which you could sell one of the cars for?, Price, And Year

In the visual above you can see that the only 2023 models that are being sold on carvana are the bronco, and they are going for a range of 35,000 -38,000 so if you wanted to sell a ford bronco you would most likely want to sell the car between that.

library(tidyverse)
library(rvest)
library(lubridate)
library(httr)



set_config(user_agent("*"))

Carvana_url <-
  read_html("https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9")

scrape_carvana <- function(url) {
  


## Retriving Type of Car

Type_of_car <-
  Carvana_url %>% 
  html_elements("div.flex.flex-col.justify-between") %>% 
  html_elements("p.font-bold") %>% 
  html_text2()

Car_Brand <-
  Carvana_url %>% 
  html_elements("div.flex.flex-col.justify-between") %>% 
  html_elements("p.font-bold") %>% 
  html_text2() %>% 
  substr(6,20) %>% 
  as.character()


Year <-
  Carvana_url %>% 
  html_elements("div.flex.flex-col.justify-between") %>% 
  html_elements("p.font-bold") %>% 
  html_text2() %>% 
  substr(1,4) %>% 
  as.numeric()

## Price of Car
Price_of_Car <-
  Carvana_url %>% 
  html_elements("div.-mb-\\[2px\\].flex.font-bold.gap-8.items-center.text-2xl.text-blue-6") %>% 
  html_text() %>% 
  as.character() %>% 
  str_replace_all("\\$","") %>% 
  str_replace_all("\\,","") %>% 
  as.numeric()






carvana_df <-
  data.frame(Type_of_car,Year, Car_Brand, Price_of_Car)

return(carvana_df) 
}

test <- 
  scrape_carvana("https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9")

pages <-
  c("https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9", #Ford,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiS2lhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0", #Kia,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVGVzbGEifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ", #Tesla,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVG95b3RhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0", #Toyota,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ", #Lexus,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ" #Cheverolt
    )

scrape_carvana_pages <- function(urls) {
  carvana_FFKTLC <- data.frame()
  
  for (i in seq_along(urls)) {
    print(paste("Collecting page", i, "of",length(urls), ":)", sep = ""))
    Sys.sleep(runif(1,5,15))
    
 carvana_FFKTLC <-
    scrape_carvana(urls[i]) %>% 
      mutate(page_id = urls[i]) %>% 
      bind_rows(carvana_FFKTLC)
    print(paste(urls[i], "collected", sep = " "))
    print(paste(nrow(carvana_FFKTLC), "total reviews collected so far!", sep = " "))
  }
  return(carvana_FFKTLC)
}

Carvana_cars_scraped <- 
  scrape_carvana_pages(pages) 
[1] "Collecting page1of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9 collected"
[1] "20 total reviews collected so far!"
[1] "Collecting page2of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiS2lhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0 collected"
[1] "40 total reviews collected so far!"
[1] "Collecting page3of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVGVzbGEifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ collected"
[1] "60 total reviews collected so far!"
[1] "Collecting page4of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVG95b3RhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0 collected"
[1] "80 total reviews collected so far!"
[1] "Collecting page5of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ collected"
[1] "100 total reviews collected so far!"
[1] "Collecting page6of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ collected"
[1] "120 total reviews collected so far!"
Carvana_cars_scraped %>% 
  select(Car_Brand,Price_of_Car,Year) %>%
  filter(Year == 2023) %>% 
  filter(Car_Brand == "Ford") %>% 
  ggplot(aes(x = Car_Brand , y =Price_of_Car)) +
  geom_boxplot() +
  labs(title = "Comparion better Car Brands in How much they sold the car for in 2023")

#| echo: false
2 * 2
[1] 4

Who has the most cars being sold on Carvana in 2023 and additionally what 2023 car company has the widest price range for there cars?

For the cars that appeared maybe by all of the companies you can see that for the year 2023 on carvana the cars that is being sold for the most amount of money for multiple instances is a Lexus. This box plot is showing you that 2023 Lexus also have the most amount of trucks for sale.

library(tidyverse)
library(rvest)
library(lubridate)
library(httr)



set_config(user_agent("*"))

Carvana_url <-
  read_html("https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9")

scrape_carvana <- function(url) {
  


## Retriving Type of Car

Type_of_car <-
  Carvana_url %>% 
  html_elements("div.flex.flex-col.justify-between") %>% 
  html_elements("p.font-bold") %>% 
  html_text2()

Car_Brand <-
  Carvana_url %>% 
  html_elements("div.flex.flex-col.justify-between") %>% 
  html_elements("p.font-bold") %>% 
  html_text2() %>% 
  substr(6,20) %>% 
  as.character()


Year <-
  Carvana_url %>% 
  html_elements("div.flex.flex-col.justify-between") %>% 
  html_elements("p.font-bold") %>% 
  html_text2() %>% 
  substr(1,4) %>% 
  as.numeric()

## Price of Car
Price_of_Car <-
  Carvana_url %>% 
  html_elements("div.-mb-\\[2px\\].flex.font-bold.gap-8.items-center.text-2xl.text-blue-6") %>% 
  html_text() %>% 
  as.character() %>% 
  str_replace_all("\\$","") %>% 
  str_replace_all("\\,","") %>% 
  as.numeric()






carvana_df <-
  data.frame(Type_of_car,Year, Car_Brand, Price_of_Car)

return(carvana_df) 
}

test <- 
  scrape_carvana("https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9")

pages <-
  c("https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9", #Ford,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiS2lhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0", #Kia,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVGVzbGEifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ", #Tesla,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVG95b3RhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0", #Toyota,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ", #Lexus,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ" #Cheverolt
    )

scrape_carvana_pages <- function(urls) {
  carvana_FFKTLC <- data.frame()
  
  for (i in seq_along(urls)) {
    print(paste("Collecting page", i, "of",length(urls), ":)", sep = ""))
    Sys.sleep(runif(1,5,15))
    
 carvana_FFKTLC <-
    scrape_carvana(urls[i]) %>% 
      mutate(page_id = urls[i]) %>% 
      bind_rows(carvana_FFKTLC)
    print(paste(urls[i], "collected", sep = " "))
    print(paste(nrow(carvana_FFKTLC), "total reviews collected so far!", sep = " "))
  }
  return(carvana_FFKTLC)
}

Carvana_cars_scraped <- 
  scrape_carvana_pages(pages) 
[1] "Collecting page1of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9 collected"
[1] "20 total reviews collected so far!"
[1] "Collecting page2of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiS2lhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0 collected"
[1] "40 total reviews collected so far!"
[1] "Collecting page3of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVGVzbGEifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ collected"
[1] "60 total reviews collected so far!"
[1] "Collecting page4of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVG95b3RhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0 collected"
[1] "80 total reviews collected so far!"
[1] "Collecting page5of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ collected"
[1] "100 total reviews collected so far!"
[1] "Collecting page6of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ collected"
[1] "120 total reviews collected so far!"
Carvana_cars_scraped %>% 
  select(Car_Brand,Price_of_Car,Year) %>%
  filter(Year == 2023) %>% 
  filter(Car_Brand == "Ford") %>% 
  ggplot(aes(x = Car_Brand , y =Price_of_Car)) +
  geom_boxplot() +
  labs(title = "Car Brands in How much they sold the car for in 2023")

#| echo: false
2 * 2
[1] 4

What is the average rate for which a 2023 car would go for?

Based off the visualization above you can see that for the 21 cars that are being sold from the year 2023 that if you just took the average for the cost of those cars that if you were to sell your 2023 car you could expect to make around $31,913 on your car or that you could expect to sell/buy your car for anywhere from 30,000 to 60,000.

Are the people who have purchased cars from Carvana as speaking more positively or more negatively about the process of buying the car as a whole?

For this visualization, a lot of what you are going to see is that based on the 160 reviews that I got spanning over the past week of when I collected these reviews, people are inherently very positively speaking of Carvana due to different processes in place.

library(tidyverse)
library(tidytext)
library(gutenbergr)
library(ggwordcloud)
library(textdata)
library(rvest)
library(httr)
library(chromote)



set_config(user_agent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36"))

Carvana_Reviews_Url <-
  read_html_live("https://www.carvana.com/reviews?bvstate=pg:1/ct:r")

carvana_review_scrape <- function(url) {

#Elements of a review 
  Stars <-
    Carvana_Reviews_Url %>% 
    html_elements("div.bv-content-header-meta") %>% 
    html_elements("span.bv-off-screen") %>% 
    html_text2()



Time_stamp <-
  Carvana_Reviews_Url %>% 
  html_elements("div.bv-content-header-meta") %>%
  html_elements("span.bv-content-datetime-stamp") %>% 
  html_text2() 


Review_Paragraph <-
  Carvana_Reviews_Url %>% 
  html_elements("div.bv-content-summary-body-text") %>% 
  html_text2()

Review_Title <-
  Carvana_Reviews_Url %>% 
  html_elements("div.bv-content-container") %>% 
  html_elements("h3.bv-content-title") %>% 
  html_text2() 

carvana_review_df <-
  data.frame(Stars, Time_stamp, Review_Title, Review_Paragraph)


return(carvana_review_df) 
}

pages <-
  c("https://www.carvana.com/reviews?bvstate=pg:1/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:2/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:3/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:4/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:5/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:6/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:7/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:8/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:9/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:10/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:11/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:12/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:13/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:14/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:15/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:16/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:17/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:18/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:19/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:20/ct:r"
  )


scrape_carvana_reviews_pages <- function(urls) {
  carvana_review_pages <- data.frame()
  
  for (i in seq_along(urls)) {
    print(paste("Collecting page", i, "of",length(urls), ":)", sep = ""))
    Sys.sleep(runif(1,5,15))
    
    carvana_review_pages <-
      carvana_review_scrape(urls[i]) %>% 
      mutate(page_id = urls[i]) %>% 
      bind_rows(carvana_review_pages)
    print(paste(urls[i], "collected", sep = " "))
    print(paste(nrow(carvana_review_pages), "total reviews collected so far!", sep = " "))
  }
  return(carvana_review_pages)
}


test_reviews <-
  scrape_carvana_reviews_pages("https://www.carvana.com/reviews?bvstate=pg:3/ct:r")
[1] "Collecting page1of1:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:3/ct:r collected"
[1] "8 total reviews collected so far!"
Carvana_reviews <- 
  scrape_carvana_reviews_pages(pages) 
[1] "Collecting page1of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:1/ct:r collected"
[1] "8 total reviews collected so far!"
[1] "Collecting page2of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:2/ct:r collected"
[1] "16 total reviews collected so far!"
[1] "Collecting page3of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:3/ct:r collected"
[1] "24 total reviews collected so far!"
[1] "Collecting page4of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:4/ct:r collected"
[1] "32 total reviews collected so far!"
[1] "Collecting page5of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:5/ct:r collected"
[1] "40 total reviews collected so far!"
[1] "Collecting page6of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:6/ct:r collected"
[1] "48 total reviews collected so far!"
[1] "Collecting page7of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:7/ct:r collected"
[1] "56 total reviews collected so far!"
[1] "Collecting page8of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:8/ct:r collected"
[1] "64 total reviews collected so far!"
[1] "Collecting page9of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:9/ct:r collected"
[1] "72 total reviews collected so far!"
[1] "Collecting page10of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:10/ct:r collected"
[1] "80 total reviews collected so far!"
[1] "Collecting page11of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:11/ct:r collected"
[1] "88 total reviews collected so far!"
[1] "Collecting page12of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:12/ct:r collected"
[1] "96 total reviews collected so far!"
[1] "Collecting page13of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:13/ct:r collected"
[1] "104 total reviews collected so far!"
[1] "Collecting page14of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:14/ct:r collected"
[1] "112 total reviews collected so far!"
[1] "Collecting page15of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:15/ct:r collected"
[1] "120 total reviews collected so far!"
[1] "Collecting page16of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:16/ct:r collected"
[1] "128 total reviews collected so far!"
[1] "Collecting page17of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:17/ct:r collected"
[1] "136 total reviews collected so far!"
[1] "Collecting page18of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:18/ct:r collected"
[1] "144 total reviews collected so far!"
[1] "Collecting page19of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:19/ct:r collected"
[1] "152 total reviews collected so far!"
[1] "Collecting page20of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:20/ct:r collected"
[1] "160 total reviews collected so far!"
bing <- 
  get_sentiments("bing")

Carvana_counts <-
  Carvana_reviews %>% 
  group_by(Review_Paragraph) %>% 
  unnest_tokens(word, sentences) %>% 
  summarise(n=n()) %>% 
  cross_join(bing)

Carvana_counts %>% 
  filter(n>5) %>% 
  mutate(n = ifelse(sentiment == "negative", -n, n)) %>% 
  mutate(word = reorder(word, n)) %>% 
  ggplot(aes(word, n)) +
  geom_col() +
  coord_flip() +
  geom_text(aes(label = signif(n, digits = 3)), nudge_y = 8) +
  labs(title = "Positive and Negative Words for Carvana",
       subtitle = "Only words appearing at least 5 times are shown")

#| echo: false
2 * 2
[1] 4

For 3 and 5 star reviews what are the most common words that people are using that separate them enjoying the process vs not enjoying the process?

For these reviews some of the most common words that people are using are words like easy, fair, quick, and timely. Even from looking at the negative reviews people seemed to echo these same words just some of the contractual things where incorrect and that was the words they used not correct and wrong.

library(tidytext)
library(gutenbergr)
library(ggwordcloud)
library(textdata)
library(rvest)
library(httr)
library(chromote)



set_config(user_agent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36"))

Carvana_Reviews_Url <-
  read_html_live("https://www.carvana.com/reviews?bvstate=pg:1/ct:r")

carvana_review_scrape <- function(url) {

#Elements of a review 
  Stars <-
    Carvana_Reviews_Url %>% 
    html_elements("div.bv-content-header-meta") %>% 
    html_elements("span.bv-off-screen") %>% 
    html_text2()



Time_stamp <-
  Carvana_Reviews_Url %>% 
  html_elements("div.bv-content-header-meta") %>%
  html_elements("span.bv-content-datetime-stamp") %>% 
  html_text2() 


Review_Paragraph <-
  Carvana_Reviews_Url %>% 
  html_elements("div.bv-content-summary-body-text") %>% 
  html_text2()

Review_Title <-
  Carvana_Reviews_Url %>% 
  html_elements("div.bv-content-container") %>% 
  html_elements("h3.bv-content-title") %>% 
  html_text2() 

carvana_review_df <-
  data.frame(Stars, Time_stamp, Review_Title, Review_Paragraph)


return(carvana_review_df) 
}

pages <-
  c("https://www.carvana.com/reviews?bvstate=pg:1/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:2/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:3/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:4/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:5/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:6/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:7/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:8/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:9/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:10/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:11/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:12/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:13/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:14/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:15/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:16/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:17/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:18/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:19/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:20/ct:r"
  )


scrape_carvana_reviews_pages <- function(urls) {
  carvana_review_pages <- data.frame()
  
  for (i in seq_along(urls)) {
    print(paste("Collecting page", i, "of",length(urls), ":)", sep = ""))
    Sys.sleep(runif(1,5,15))
    
    carvana_review_pages <-
      carvana_review_scrape(urls[i]) %>% 
      mutate(page_id = urls[i]) %>% 
      bind_rows(carvana_review_pages)
    print(paste(urls[i], "collected", sep = " "))
    print(paste(nrow(carvana_review_pages), "total reviews collected so far!", sep = " "))
  }
  return(carvana_review_pages)
}


test_reviews <-
  scrape_carvana_reviews_pages("https://www.carvana.com/reviews?bvstate=pg:3/ct:r")
[1] "Collecting page1of1:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:3/ct:r collected"
[1] "8 total reviews collected so far!"
Carvana_reviews <- 
  scrape_carvana_reviews_pages(pages) 
[1] "Collecting page1of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:1/ct:r collected"
[1] "8 total reviews collected so far!"
[1] "Collecting page2of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:2/ct:r collected"
[1] "16 total reviews collected so far!"
[1] "Collecting page3of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:3/ct:r collected"
[1] "24 total reviews collected so far!"
[1] "Collecting page4of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:4/ct:r collected"
[1] "32 total reviews collected so far!"
[1] "Collecting page5of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:5/ct:r collected"
[1] "40 total reviews collected so far!"
[1] "Collecting page6of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:6/ct:r collected"
[1] "48 total reviews collected so far!"
[1] "Collecting page7of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:7/ct:r collected"
[1] "56 total reviews collected so far!"
[1] "Collecting page8of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:8/ct:r collected"
[1] "64 total reviews collected so far!"
[1] "Collecting page9of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:9/ct:r collected"
[1] "72 total reviews collected so far!"
[1] "Collecting page10of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:10/ct:r collected"
[1] "80 total reviews collected so far!"
[1] "Collecting page11of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:11/ct:r collected"
[1] "88 total reviews collected so far!"
[1] "Collecting page12of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:12/ct:r collected"
[1] "96 total reviews collected so far!"
[1] "Collecting page13of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:13/ct:r collected"
[1] "104 total reviews collected so far!"
[1] "Collecting page14of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:14/ct:r collected"
[1] "112 total reviews collected so far!"
[1] "Collecting page15of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:15/ct:r collected"
[1] "120 total reviews collected so far!"
[1] "Collecting page16of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:16/ct:r collected"
[1] "128 total reviews collected so far!"
[1] "Collecting page17of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:17/ct:r collected"
[1] "136 total reviews collected so far!"
[1] "Collecting page18of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:18/ct:r collected"
[1] "144 total reviews collected so far!"
[1] "Collecting page19of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:19/ct:r collected"
[1] "152 total reviews collected so far!"
[1] "Collecting page20of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:20/ct:r collected"
[1] "160 total reviews collected so far!"
bing <- 
  get_sentiments("bing")

Carvana_Stars <-
  Carvana_reviews %>% 
  group_by(Stars) %>% 
  summarise(n=n()) %>% 
  cross_join(bing)


Carvana_Stars %>% 
  filter(Stars == "5" | Stars == "3") %>% 
  filter(n>5) %>% 
  mutate(n = ifelse(sentiment == "negative", -n, n)) %>% 
  mutate(word = reorder(word, n)) %>% 
  ggplot(aes(word, n)) +
  geom_col() +
  coord_flip() +
  geom_text(aes(label = signif(n, digits = 3)), nudge_y = 8) +
  labs(title = "Positive and Negative Words for 3 and 5 star Carvana Reviews",
       subtitle = "Only words appearing at least 5 times are shown")

#| echo: false
2 * 2
[1] 4

On what day of the week are those who purchased from Carvana most happy with the car buying process? What day of the week have they been the most dissatisfied?

As far as trying to track what day in particular are people deciding to write reviews there is no indication based on the visualization that people are more angry or more happy it seems to be similar throughout where the customers of Carvana are often very happy with the process or that the only ones going to leave reviews are those who are happy with buying or selling their car.

library(tidytext)
library(gutenbergr)
library(ggwordcloud)
library(textdata)
library(rvest)
library(httr)
library(chromote)



set_config(user_agent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36"))

Carvana_Reviews_Url <-
  read_html_live("https://www.carvana.com/reviews?bvstate=pg:1/ct:r")

carvana_review_scrape <- function(url) {

#Elements of a review 
  Stars <-
    Carvana_Reviews_Url %>% 
    html_elements("div.bv-content-header-meta") %>% 
    html_elements("span.bv-off-screen") %>% 
    html_text2()



Time_stamp <-
  Carvana_Reviews_Url %>% 
  html_elements("div.bv-content-header-meta") %>%
  html_elements("span.bv-content-datetime-stamp") %>% 
  html_text2() 


Review_Paragraph <-
  Carvana_Reviews_Url %>% 
  html_elements("div.bv-content-summary-body-text") %>% 
  html_text2()

Review_Title <-
  Carvana_Reviews_Url %>% 
  html_elements("div.bv-content-container") %>% 
  html_elements("h3.bv-content-title") %>% 
  html_text2() 

carvana_review_df <-
  data.frame(Stars, Time_stamp, Review_Title, Review_Paragraph)


return(carvana_review_df) 
}

pages <-
  c("https://www.carvana.com/reviews?bvstate=pg:1/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:2/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:3/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:4/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:5/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:6/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:7/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:8/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:9/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:10/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:11/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:12/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:13/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:14/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:15/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:16/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:17/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:18/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:19/ct:r",
    "https://www.carvana.com/reviews?bvstate=pg:20/ct:r"
  )


scrape_carvana_reviews_pages <- function(urls) {
  carvana_review_pages <- data.frame()
  
  for (i in seq_along(urls)) {
    print(paste("Collecting page", i, "of",length(urls), ":)", sep = ""))
    Sys.sleep(runif(1,5,15))
    
    carvana_review_pages <-
      carvana_review_scrape(urls[i]) %>% 
      mutate(page_id = urls[i]) %>% 
      bind_rows(carvana_review_pages)
    print(paste(urls[i], "collected", sep = " "))
    print(paste(nrow(carvana_review_pages), "total reviews collected so far!", sep = " "))
  }
  return(carvana_review_pages)
}


test_reviews <-
  scrape_carvana_reviews_pages("https://www.carvana.com/reviews?bvstate=pg:3/ct:r")
[1] "Collecting page1of1:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:3/ct:r collected"
[1] "8 total reviews collected so far!"
Carvana_reviews <- 
  scrape_carvana_reviews_pages(pages) 
[1] "Collecting page1of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:1/ct:r collected"
[1] "8 total reviews collected so far!"
[1] "Collecting page2of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:2/ct:r collected"
[1] "16 total reviews collected so far!"
[1] "Collecting page3of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:3/ct:r collected"
[1] "24 total reviews collected so far!"
[1] "Collecting page4of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:4/ct:r collected"
[1] "32 total reviews collected so far!"
[1] "Collecting page5of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:5/ct:r collected"
[1] "40 total reviews collected so far!"
[1] "Collecting page6of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:6/ct:r collected"
[1] "48 total reviews collected so far!"
[1] "Collecting page7of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:7/ct:r collected"
[1] "56 total reviews collected so far!"
[1] "Collecting page8of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:8/ct:r collected"
[1] "64 total reviews collected so far!"
[1] "Collecting page9of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:9/ct:r collected"
[1] "72 total reviews collected so far!"
[1] "Collecting page10of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:10/ct:r collected"
[1] "80 total reviews collected so far!"
[1] "Collecting page11of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:11/ct:r collected"
[1] "88 total reviews collected so far!"
[1] "Collecting page12of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:12/ct:r collected"
[1] "96 total reviews collected so far!"
[1] "Collecting page13of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:13/ct:r collected"
[1] "104 total reviews collected so far!"
[1] "Collecting page14of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:14/ct:r collected"
[1] "112 total reviews collected so far!"
[1] "Collecting page15of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:15/ct:r collected"
[1] "120 total reviews collected so far!"
[1] "Collecting page16of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:16/ct:r collected"
[1] "128 total reviews collected so far!"
[1] "Collecting page17of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:17/ct:r collected"
[1] "136 total reviews collected so far!"
[1] "Collecting page18of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:18/ct:r collected"
[1] "144 total reviews collected so far!"
[1] "Collecting page19of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:19/ct:r collected"
[1] "152 total reviews collected so far!"
[1] "Collecting page20of20:)"
[1] "https://www.carvana.com/reviews?bvstate=pg:20/ct:r collected"
[1] "160 total reviews collected so far!"
bing <- 
  get_sentiments("bing")

Carvana_Time <-
  Carvana_reviews %>% 
  group_by(Time_stamp) %>% 
  summarise(n=n()) %>% 
  cross_join(bing)

Carvana_Time %>% 
  filter(n>5) %>% 
  mutate(n = ifelse(sentiment == "negative", -n, n)) %>% 
  mutate(word = reorder(word, n)) %>% 
  ggplot(aes(word, n)) +
  geom_col() +
  coord_flip() +
  geom_text(aes(label = signif(n, digits = 3)), nudge_y = 8) +
  labs(title = "What are people emotions on carvana by when they wrote the review",
       subtitle = "Only words appearing at least 5 times are shown")

#| echo: false
2 * 2
[1] 4

Conclusion

In both the Carvana reviews and just their website selling cars I enjoyed it because I was able to learn about the car selling process while being able to key into a multitude of things while having a centralized location specifically telling me what it was that people thought of buying a Car purely online with little to no human interaction. This was an area of interest to me because of the riches of data and information that Carvana hosts online in just their car selling process and their reviews. You could gain insight into what consumers are thinking and pick up on what it is that goes into them deciding what price to sell a car for outside of just the overall thought that newer cars would go for more.