library(plotly)
library(dplyr)



Problem 1

player_data <- read.csv("volleyball player_data.csv")

player_data %>%
  plot_ly(x = ~spike, y=~block, color = ~position_number, colors = c("red", "purple", "blue", "black", "pink", "yellow"),
          hoverinfo = "text", text = ~paste("Name ", name, "<br>",
                                            "Height (cm) ", height, "<br>", 
                                            "Weight (kg) ", weight, "<br>", 
                                            "Spike Height ", spike, "<br>", 
                                            "Block Height ", block)) %>% add_markers() %>% 
layout(title = "Female Volleyball Player Stats: Spike Height vs Block Height", xaxis = list(title = "Spike Height (cm)", range = c(150,350)), yaxis = list(title = "Block Height (cm) ", range = c(150, 350)))

I created a scatterplot using the a Women Volleyball Player’s statistics dataset found on Kaggle.com. In this plot, a player’s block height (the maximum height they can block) is on the x-axis, and their spike height (the maximum height they can spike a ball) is on the y-axis. Since centimeters are universal units, both of these values are in centimeters. In volleyball a players vertical is crucial for playing, but the action of hitting a ball and blocking are two vastly different actions. A spike has a powerful 2-3 step approach, while a block does not. So I wanted to know if there was a correlation between the two maximum heights of the players. After creating this graph, it becomes apparent that there is a correlation between the block and spike height for the players, however there are a few platers with a slight variation with a higher spike height and a lower block hieght. I did not run into many issuses while creating this graph, however there was a slight issue with the limits on the graph, the limits were more spread than the data. I adjusted it by adding a “range” to both the x and y axis where all of the data sat, but the graph was easier to read.


Problem 2

data2<- read.csv("car_price_prediction_.csv")
#install.packages("plotly")


data2<- read.csv("car_price_prediction_.csv")

data2 %>% filter(Year >=2010) %>%
  plot_ly(x = ~Mileage, y = ~Price, color = ~Condition, frame = ~Year,
          hoverinfo = "text", text = ~paste("Brand ", Brand, "<br>", 
                                            "Fuel Type ", Fuel.Type, "<br>",
                                            "Price $", Price, "<br>",
                                            "Mileage ", Mileage)) %>%
  layout(title = "Car $ vs Mileage: Year and Brand (2010-2023)", xaxis = list(title = "Mileage (miles)", yaxis = list(title = "Price ($usd)"))) %>%
  add_text(x= 280000, y = 20000, text = ~Year, frame = ~Year,
          textfont = list(size = 20, color = toRGB("black"), showlegend = FALSE)) %>%
            add_markers(frame = ~Year, color = ~Condition, showlegend = TRUE)

For this graph I used a cars prediction dataset from Kaggle.com, within this dataset there were mutiple variables included that could determine a used car’s price like model, year made, mileage, fuel type, etc. I chose mileage for the x-axis because I thought that it would have the most influence on the price of a used car, higher mileage would cause the price to be lower. Overall, I was quite suprised to see that there is little to no correlation between price and mileage of the cars. Also, looking at the model year, I had assumed that used car prices would be higher with more recent models compared older ones, but the prices tend to stay around the same with little spikes. When creating this graph, I decided to filter out cars models older than 2010, because a car’s life span is typically around 13. In reality, most buyers would not want a car older than this. I had difficultly creating the legend, orginally I was going to use the Brand as the “color”, but the condition is a more important factor for a person evaluting a car to purchase.