Assignment7

Author

Drew Vonder Meulen

Topic

I intend to scrape from car gurus directly comparing different trim levels and years of the Audi S7, looking at the most common price, how mileage and year affect the price, how does the change from v8 to v6 affect pricing, despite the v6’s being new cars, and more.

library(tidyverse) 
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(httr)  
library(rvest)    

Attaching package: 'rvest'

The following object is masked from 'package:readr':

    guess_encoding
library(lubridate)  
library(magrittr)   

Attaching package: 'magrittr'

The following object is masked from 'package:purrr':

    set_names

The following object is masked from 'package:tidyr':

    extract
library(jsonlite)    

Attaching package: 'jsonlite'

The following object is masked from 'package:purrr':

    flatten
library(scales)     

Attaching package: 'scales'

The following object is masked from 'package:purrr':

    discard

The following object is masked from 'package:readr':

    col_factor
library(knitr) 

Loading the Data

all_listings <- read.csv("audi_s7_listings.csv")

Visualizations

lets first take a look at how many vehicles we have by the year of the vehicle.

all_listings %>% 
  ggplot(aes(x = year)) +
  geom_histogram(bins = 25, fill = "blue2") +
  scale_x_continuous(breaks = seq(2013, 2025, by = 1)) +
  labs(title = "Distribution of Vehicle Year") +
    theme_minimal(base_size = 10) 

We can see a variety of vehicles throughout the years the Audi S7 has been produced. Majority comes from the the years 2013 and 2014 and the year 2020. These are the years that there was a design refresh and new engine. I believe this shows that the first couple years after the redesign are the most unreliable as later on they had time to improve on the components that have a tendency to break.

Now lets take a look how how the price is affected by year.

all_listings %>%
  ggplot(aes(x = year, y = price)) +
  geom_point(alpha = 0.4) +
  scale_x_continuous(breaks = seq(2013, 2025, by = 1)) +
  labs(title = "Price vs Vehicle Year",
       x = "Year",
       y = "Price") 
Warning: Removed 4 rows containing missing values or values outside the scale range
(`geom_point()`).

This price to year relationship is to be expected. Its a slight exponential curve which shows the faster depreciation as the car is new and then it slows down.

Lets now see how mileage changes the price of an Audi S7 regardless of year.

all_listings %>%
  ggplot(aes(x = mileage, y = price)) +
  geom_point(alpha = 0.4) +
    scale_x_continuous(breaks = seq(0, 150000, by = 20000)) +
  labs(title = "Price vs Vehicle Mileage",
       x = "Mileage",
       y = "Price") 
Warning: Removed 4 rows containing missing values or values outside the scale range
(`geom_point()`).

We can see a relatively similar relationship between the mileage of the car and the price of the car compared to the year and price of the car.

Lets now see the distribution of what kind of deal the car is rates. Between “Fair Deal”, “Good Deal”, and “Great Deal”.

all_listings %>% 
  ggplot(aes(x = deal)) +
  geom_bar(fill = "green4") +
  labs(title = "Count of Listings by Deal",
       x = "Deal",
       y = "Count") +
  theme_minimal(base_size = 10) 

Most listings fall into the “good deal” category, while “great deal” listings make up a much smaller share. This shows that strong discounts are less common in the market. Sellers tend to price vehicles close to market value, which leads to more fair deals than standout bargains. For you as a buyer, this means “great deals” are scarce and often get picked up quickly. When one appears, it signals a price that is well below similar listings. Acting quickly in those cases can lead to better value.

Lets lastly allow users to enter the year of the Audi S7 they want and they can instantly see the average price, the low price, the high price, and how many listing there are.

avg_price <- function(year_input) {
  
  result <- all_listings %>%
    filter(year == year_input)
  
  if (nrow(result) == 0) {
    print(paste("No listings found for a", year_input, "Audi S7"))
    return(NULL)
  }
  
  avg  <- mean(result$price, na.rm = TRUE)
  low  <- min(result$price, na.rm = TRUE)
  high <- max(result$price, na.rm = TRUE)
  n    <- nrow(result)
  
  print(paste("Vehicle:        ", year_input, "Audi S7"))
  print(paste("Listings Found: ", n))
  print(paste("Average Price:  ", dollar(avg)))
  print(paste("Lowest Listing: ", dollar(low)))
  print(paste("Highest Listing:", dollar(high)))
  
}

avg_price(2014)
[1] "Vehicle:         2014 Audi S7"
[1] "Listings Found:  20"
[1] "Average Price:   $19,507.85"
[1] "Lowest Listing:  $15,584"
[1] "Highest Listing: $36,000"