You may use any data of your choosing in the following problems, but I would suggest you choose a data set you find interesting or would give an interesting graph (so, don’t use something like the old iris data set). You will get more out of the project in the end, and it will look better to those in the future you are showing it to. If the data set comes from an R package then reference this. If the data set is from elsewhere, then upload a copy to blackboard (.csv format).
Create a plotly graph of your choosing that represents at least two variables, one of which must be a categorical variable.
This plot can be a scatter plot, overlayed density plots (graphing variable is continuous, separate densities grouped by categorical variable), etc. choropleth maps could also be on the list…you have to admit they look kinda cool.
The graph must include:
customized hover text that is informative to the graphing elements in the plot
separate color to represent groups
labeled axes and appropriate title
Include at least a 1-paragraph discussion about the graph. Discuss what is being plotted and what information is being displayed in the graph. Discuss any information that the reader may gain from hovering the cursor over graphing elements. Discuss any issues/chalenges you had (if any) while making the plot, and you you dealt with or overcame them.
Answer: The information being plotted is from a data set of food deliveries which I found on Kaggle. The data set contains various statistics related to the deliveries, such as the delivery time, restaurant zone, delivery mode, customer zone, and more. The plot below shows the density of delivery times for each delivery mode. The data set did not differentiate the difference between bicycle and bike, therefore I decided bike in this case would represent a motorcycle. Regarding information that may be gained from the plot, the first point would be that densities for bike delivery times appear to have a relatively normal distribution. In contrast, one can see that the delivery time distributions for cars have a right / positive skew. Furthermore, bike deliveries appear to have less variability in delivery times as opposed to the other delivery methods. Lastly, I did not encounter any issues with creating the below plot.
library(plotly)
## Warning: package 'plotly' was built under R version 4.5.2
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.5.2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
fdre_df <- read.csv("C:/Temp/Food_Delivery_Route_Efficiency_Dataset.csv")
bike_df <- fdre_df[fdre_df$delivery_mode == "Bike", ]
bicycle_df <- fdre_df[fdre_df$delivery_mode == "Bicycle", ]
car_df <- fdre_df[fdre_df$delivery_mode == "Car", ]
scooter_df <- fdre_df[fdre_df$delivery_mode == "Scooter", ]
d_bike <- density(bike_df$delivery_time_min, na.rm = TRUE)
d_bicycle <- density(bicycle_df$delivery_time_min, na.rm = TRUE)
d_car <- density(car_df$delivery_time_min, na.rm = TRUE)
d_scooter <- density(scooter_df$delivery_time_min, na.rm = TRUE)
plot_ly() %>%
add_lines(x = d_bike$x, y = d_bike$y, name = "Bike", fill = 'tozeroy', hoverinfo = "text",
text = ~paste("Mode: Bike <br>",
"Minutes: ", d_bike$x, "<br>",
"Density: ", d_bike$y)) %>%
add_lines(x = d_bicycle$x, y = d_bicycle$y, name = "Bicycle", fill = 'tozeroy', hoverinfo = "text",
text = ~paste("Mode: Bicycle <br>",
"Minutes: ", d_bicycle$x, "<br>",
"Density: ", d_bicycle$y)) %>%
add_lines(x = d_car$x, y = d_car$y, name = "Car", fill = 'tozeroy', hoverinfo = "text",
text = ~paste("Mode: Car <br>",
"Minutes: ", d_car$x, "<br>",
"Density: ", d_car$y)) %>%
add_lines(x = d_scooter$x, y = d_scooter$y, name = "Scooter", fill = 'tozeroy', hoverinfo = "text",
text = ~paste("Mode: Scooter <br>",
"Minutes: ", d_scooter$x, "<br>",
"Density: ", d_scooter$y)) %>%
layout(xaxis = list(title = 'Delivery Time (Minutes)'),
yaxis = list(title = 'Density'),
title = "Delivery Time (Minutes) Density")
Create an animated plotly graph with a data set of your choosing. This can be, but does not have to be a scatter plot. Also, the animation does not have to take place over time. As mentioned in the notes, the frame can be set to a categorical variable. However, the categories the frames cycle through should be organized (if needs be) such that the progression through them shows some pattern or trend.
This graph should include:
Aside from the graphing variable, a separate categorical variable. For example, in our animated scatter plot we color grouped the points by continent.
Appropriate axis labels and a title
Augment the frame label to make it more visible. This can include changing the font size and color to make it stand out more, and/or moving the frame label to a new location in the plotting region. Note, if you do this, make sure it is still clearly visible and does not obstruct the view of your plot.
Include at least a 1-paragraph discussion about the plot. Discuss what you are plotting and what trends can be seen throughout the animation. Discuss any issues, if any, you ran into in making the plot and how you overcame them.
Answer: The plot below utilizes the same data set of food deliveries from Problem #1 above.The plot shows the delivery time, distance, and restaurant zone for the deliveries on each date. The size of the marker is based on the length of the delivery route, with markers becoming larger as the delivery route length increases. Regarding trends that can be seen throughout the plot, the first would be that 01-02-2025 appears to be the busiest day for the west restaurant zone. Additionally, on 01-03-2025 delivery time and delivery distances were higher than on the other days. Regarding issues, the only issue that I had with the below chart was getting the dates to display correctly. To fix this issue, I converted them to a factor.
library(plotly)
library(gapminder)
## Warning: package 'gapminder' was built under R version 4.5.2
fdre_df <- read.csv("C:/Temp/Food_Delivery_Route_Efficiency_Dataset.csv")
fdre_df$date <- as.Date(fdre_df$order_time)
fdre_df$date <- as.factor(as.Date(fdre_df$order_time))
fdre_df %>%
plot_ly(x = ~distance_km,
y = ~delivery_time_min,
hoverinfo = 'text',
text = ~paste("Zone: ", restaurant_zone, "<br>",
"Distance: ", distance_km, "<br>",
"Time: ", delivery_time_min)) %>%
add_markers(frame = ~date,
size = ~route_length_km,
color = ~restaurant_zone,
marker = list(sizemode = "diameter")) %>%
layout(xaxis=list(title="Distance (KM)"),
yaxis=list(title="Delivery Time (Minutes)"),
title="Delivery Time vs. Distance Per Zone & Day") %>%
animation_slider(currentvalue = list(font = list(color="black"), prefix = "Date: "))
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
What to turn in:
knit your final assignment to an html document and publish it to an RPubs page.
submit (1) the rmd file and (2) the link to this page in Blackboard (this can be in a word document or some other form to submit the link).