Analysis of Carvana

Introduction

Buying, Selling, and Trading a car has been very cruical for our society as cities are becoming bigger and bigger making it harder to get from place to place without a vehicle or being near public transportation. So from this it has peaked my interest in to the leading online sales company CARVANA. In scraping my data I will be looking at what it would take to buy a car on carvana and get it shipped to Cincinnati.

Data Points

For my data I decided to look in to 5 different car brands on Carvana, ford, kia, cheverolt, tesla, toyota, and Lexus.

Here is a link to be able to see this: https://myxavier-my.sharepoint.com/:x:/r/personal/hatcherr2_xavier_edu/Documents/carvana_cars.csv?d=w242f917193f5433e8d657caaf4b56b03&csf=1&web=1&e=Ri65cG

I will be looking in to essentially everything that is listed within these tiles. Some minor things of note is that I don’t have a carvana account so for the sake of this assignment i just left everything to be sorted by recommended. Upon further inspection after hitting refresh on the page you can see that the page of the car changes but everything is in the same spot.

Search Cars - which is the tab that will display you 21 cars that are for sell per page

Type of Car -company and model

Package of Car - Any special features of the car or packages someone purchased with the car

Mileage - How many miles are on the car

Price - Price of the car as listed

Make of car

Shipping

Issues

Some of the issues are that when you are looking at the year is that when you refresh the page it will update the cars making it very difficult to keep this the same for others who wanted to additionally run this code as well. Alongside of this tile where they randomly throughout the page well have this financing calculator.

Scraped Version of Carvana

Analysis

Who has the most cars being sold on Carvana in 2023 and additionally what 2023 car company has the widest price range for there cars?

For the cars that appeared maybe by all of the companies you can see that for the year 2023 on carvana the cars that is being sold for the most amount of money for multiple instances is a Lexus. This box plot is showing you that 2023 Lexus also have the most amount of trucks for sale.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(rvest)

Attaching package: 'rvest'

The following object is masked from 'package:readr':

    guess_encoding
library(lubridate)
library(httr)



set_config(user_agent("*"))

scrape_carvana <- function(url) {
  
Carvana_url <-
  read_html(url)

## Retriving Type of Car

Type_of_car <-
  Carvana_url %>% 
  html_elements("div.flex.flex-col.justify-between") %>% 
  html_elements("p.font-bold") %>% 
  html_text2()

Car_Brand <-
  Carvana_url %>% 
  html_elements("div.flex.flex-col.justify-between") %>% 
  html_elements("p.font-bold") %>% 
  html_text2() %>% 
  substr(6,20) %>% 
  as.character()


Year <-
  Carvana_url %>% 
  html_elements("div.flex.flex-col.justify-between") %>% 
  html_elements("p.font-bold") %>% 
  html_text2() %>% 
  substr(1,4) %>% 
  as.numeric()
## Retriving Package of Car
Package_of_car <-
  Carvana_url %>% 
  html_elements("div.flex.gap-4.items-center.t-body-s.text-blue-6") %>% 
  html_element("p.truncate") %>% 
  html_text2()

## Mileage
Mileage <-
  Carvana_url %>% 
  html_elements("div.flex.gap-4.items-center.t-body-s.text-blue-6") %>% 
  html_elements("span.shrink-0") %>% 
  html_text2()
## Price of Car
Price_of_Car <-
  Carvana_url %>% 
  html_elements("div.-mb-\\[2px\\].flex.font-bold.gap-8.items-center.text-2xl.text-blue-6") %>% 
  html_text() %>% 
  as.character() %>% 
  str_replace_all("\\$","") %>% 
  str_replace_all("\\,","") %>% 
  as.numeric()



## Shipping Cost

Shipping_Car <-
  Carvana_url %>% 
  html_elements("div.flex.items-center.t-body-xs") %>% 
  html_text2()


carvana_df <-
  data.frame(Type_of_car,Year, Car_Brand, Package_of_car, Price_of_Car, Shipping_Car)

return(carvana_df) 
}

test <- 
  scrape_carvana("https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9")

pages <-
  c("https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9", #Ford,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiS2lhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0", #Kia,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVGVzbGEifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ", #Tesla,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVG95b3RhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0", #Toyota,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ", #Lexus,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ" #Cheverolt
    )

scrape_carvana_pages <- function(urls) {
  carvana_FFKTLC <- data.frame()
  
  for (i in seq_along(urls)) {
    print(paste("Collecting page", i, "of",length(urls), ":)", sep = ""))
    Sys.sleep(runif(1,5,15))
    
 carvana_FFKTLC <-
    scrape_carvana(urls[i]) %>% 
      mutate(page_id = urls[i]) %>% 
      bind_rows(carvana_FFKTLC)
    print(paste(urls[i], "collected", sep = " "))
    print(paste(nrow(carvana_FFKTLC), "total reviews collected so far!", sep = " "))
  }
  return(carvana_FFKTLC)
}

test_carvana <- 
  scrape_carvana_pages(pages) 
[1] "Collecting page1of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9 collected"
[1] "20 total reviews collected so far!"
[1] "Collecting page2of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiS2lhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0 collected"
[1] "40 total reviews collected so far!"
[1] "Collecting page3of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVGVzbGEifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ collected"
[1] "60 total reviews collected so far!"
[1] "Collecting page4of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVG95b3RhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0 collected"
[1] "80 total reviews collected so far!"
[1] "Collecting page5of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ collected"
[1] "100 total reviews collected so far!"
[1] "Collecting page6of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ collected"
[1] "120 total reviews collected so far!"
write_csv(test_carvana,"Carvana_Cars.csv")  

test_carvana %>% 
  select(Car_Brand,Price_of_Car,Year) %>%
    filter(Year == 2023) %>% 
  ggplot(aes(x = Car_Brand , y =Price_of_Car)) +
  geom_boxplot() +
  labs(title = "Comparion better Car Brands in How much they sold the car for in 2023")

library(tidyverse)
library(rvest)
library(lubridate)
library(httr)



set_config(user_agent("*"))

scrape_carvana <- function(url) {
  
Carvana_url <-
  read_html(url)

## Retriving Type of Car

Type_of_car <-
  Carvana_url %>% 
  html_elements("div.flex.flex-col.justify-between") %>% 
  html_elements("p.font-bold") %>% 
  html_text2()

Car_Brand <-
  Carvana_url %>% 
  html_elements("div.flex.flex-col.justify-between") %>% 
  html_elements("p.font-bold") %>% 
  html_text2() %>% 
  substr(6,20) %>% 
  as.character()


Year <-
  Carvana_url %>% 
  html_elements("div.flex.flex-col.justify-between") %>% 
  html_elements("p.font-bold") %>% 
  html_text2() %>% 
  substr(1,4) %>% 
  as.numeric()
## Retriving Package of Car
Package_of_car <-
  Carvana_url %>% 
  html_elements("div.flex.gap-4.items-center.t-body-s.text-blue-6") %>% 
  html_element("p.truncate") %>% 
  html_text2()

## Mileage
Mileage <-
  Carvana_url %>% 
  html_elements("div.flex.gap-4.items-center.t-body-s.text-blue-6") %>% 
  html_elements("span.shrink-0") %>% 
  html_text2()
## Price of Car
Price_of_Car <-
  Carvana_url %>% 
  html_elements("div.-mb-\\[2px\\].flex.font-bold.gap-8.items-center.text-2xl.text-blue-6") %>% 
  html_text() %>% 
  as.character() %>% 
  str_replace_all("\\$","") %>% 
  str_replace_all("\\,","") %>% 
  as.numeric()



## Shipping Cost

Shipping_Car <-
  Carvana_url %>% 
  html_elements("div.flex.items-center.t-body-xs") %>% 
  html_text2()


carvana_df <-
  data.frame(Type_of_car,Year, Car_Brand, Package_of_car, Price_of_Car, Shipping_Car)

return(carvana_df) 
}

test <- 
  scrape_carvana("https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9")

pages <-
  c("https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9", #Ford,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiS2lhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0", #Kia,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVGVzbGEifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ", #Tesla,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVG95b3RhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0", #Toyota,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ", #Lexus,
    "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ" #Cheverolt
    )

scrape_carvana_pages <- function(urls) {
  carvana_FFKTLC <- data.frame()
  
  for (i in seq_along(urls)) {
    print(paste("Collecting page", i, "of",length(urls), ":)", sep = ""))
    Sys.sleep(runif(1,5,15))
    
 carvana_FFKTLC <-
    scrape_carvana(urls[i]) %>% 
      mutate(page_id = urls[i]) %>% 
      bind_rows(carvana_FFKTLC)
    print(paste(urls[i], "collected", sep = " "))
    print(paste(nrow(carvana_FFKTLC), "total reviews collected so far!", sep = " "))
  }
  return(carvana_FFKTLC)
}

test_carvana <- 
  scrape_carvana_pages(pages) 
[1] "Collecting page1of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiRm9yZCJ9XX0sInNvcnRCeSI6Ik5ld2VzdEludmVudG9yeSJ9 collected"
[1] "20 total reviews collected so far!"
[1] "Collecting page2of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiS2lhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0 collected"
[1] "40 total reviews collected so far!"
[1] "Collecting page3of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVGVzbGEifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ collected"
[1] "60 total reviews collected so far!"
[1] "Collecting page4of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiVG95b3RhIn1dfSwic29ydEJ5IjoiTmV3ZXN0SW52ZW50b3J5In0 collected"
[1] "80 total reviews collected so far!"
[1] "Collecting page5of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ collected"
[1] "100 total reviews collected so far!"
[1] "Collecting page6of6:)"
[1] "https://www.carvana.com/cars/filters?cvnaid=eyJmaWx0ZXJzIjp7Im1ha2VzIjpbeyJuYW1lIjoiTGV4dXMifV19LCJzb3J0QnkiOiJOZXdlc3RJbnZlbnRvcnkifQ collected"
[1] "120 total reviews collected so far!"
write_csv(test_carvana,"Carvana_Cars.csv")  

test_carvana %>% 
  select(Car_Brand,Price_of_Car,Year) %>%
  filter(Year == 2023) %>% 
  filter(Car_Brand == "Ford") %>% 
  ggplot(aes(x = Car_Brand , y =Price_of_Car)) +
  geom_boxplot() +
  labs(title = "Comparion better Car Brands in How much they sold the car for in 2023")

For Ford cars made in 2023 what is the price range in which you could sell one of the cars for?, Price, And Year

In the visual above you can see that the only 2023 models that are being sold on carvana are the bronco, and they are going for a range of 35,000 -38,000 so if you wanted to sell a ford bronco you would most likely want to sell the car between that.

What is the average rate for which a 2023 car would go for?

For the 21 cars that are being sold from the year 2023 that if you just took the average for the cost of those cars that if you were to sell your 2023 car you could expect to make around $31,913 on your car or that you could expect to sell/buy your car for anywhere from 30,000 to 60,000.

Conclusion

In this Data from Carvana, you can find access to a multitude of things what i decided to focus on where a variety of different car companies in particular but I think with some more work you can add in other car companies and eventually I will configure it to be able to include the different descriptions of the car so that you can better understand the package of cars.