Assignment 2

Author

Shanna Dubay

Introduction

Ice Movies data set is blog post for a collection of movies complete with ranking and amounts made during weekend and total time frames. This data set shows several films from different years and includes multiple weekend versus total revenue amounts.

After reading through the different variables and seeing what data was there, I noticed that some distributors were more frequent than others. I also noticed that there were several duplicated ranks, titles, and duplicated start dates. I knew the data set would be complicated in that the variables were often repeated.

Code
library(tidyverse)
ice_movies <- read_delim(
  "https://query.data.world/s/pmbfldxflx7ttdyfs23cx3abehcl5c",
  delim = ";",
  escape_double = FALSE,
  trim_ws = TRUE,
  locale = locale(encoding = "ISO-8859-1")
)
ice_movies_summary <- ice_movies |>
  group_by(distributor.name) |>
  summarise(total_revenue = sum(total.box.o.to.date, na.rm = TRUE)) |>
  ungroup()
ice_movies_summary$distributor.name <- factor(ice_movies_summary$distributor.name, 
                                              levels = ice_movies_summary$distributor.name[order(-ice_movies_summary$total_revenue)])
ggplot(ice_movies_summary, aes(x = distributor.name, 
                               y = total_revenue, 
                               fill = distributor.name)) + 
  geom_col() + 
  labs(title = "Distributor Revenues Based on ICE Movies", 
       x = "Distributors", 
       y = "Revenue to Date") + 
  theme_minimal() + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

From the Distributor Revenues Based on ICE Movies visual, I was able to see right off that Samfilm was the highest by far. Sena and Myndform were the other two leading distributors which made up most of the box office revenues in this data set, the other distributors provided minuscule amounts to the revenue stream. I chose the bar graph as a great way to contrast the revenue disbursement of the distributors. I feel it captures the true extent each distributor contributed to the overall box office amounts during the films. This does not capture which distributors provided the highest ranking but I would think that the higher ranking films would make the most money.

Code
library(tidyverse)
ice_movies <- read_delim(
  "https://query.data.world/s/pmbfldxflx7ttdyfs23cx3abehcl5c",
  delim = ";",
  escape_double = FALSE,
  trim_ws = TRUE,
  locale = locale(encoding = "ISO-8859-1"))
ggplot(ice_movies, aes(x = weekend.start, y = gross.box.o.weekend)) + geom_line(color = "darkgreen", size = 1.2) + geom_smooth(method = "loess", color = "gold", se = FALSE, size = 1) + labs(
    title = "Total Weekend Box Office Over Time",
    x = "Date",
    y = "Total Revenue ($)",
    caption = "Smoothed Trend in Gold (Like a Pot of Gold!)"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    text = element_text(color = "darkgreen"),
    axis.title = element_text(color = "gold", face = "bold"),
    plot.title = element_text(color = "darkgreen", face = "bold", size = 16),
    plot.caption = element_text(color = "gold", face = "italic")
  )

The Temporal Visualization was best shown in a line graph, with a smoothing trend line. I felt this shows not only the cyclical time series the films offered but also the fact that the trend for box office movies are slowing down. I feel this is due to the increasing amount of people looking to stream movies rather than go to movie theaters in person. there is also a clear indication just after the year 2020 when people were not allowed to go to the movies due to the pandemic. I only used the total amounts made on box office weekends as a way to focus on the time series.

I think this graph could be better with the revenue label being easier to read rather than exponential numbers but I would need to have adjusted the monetary amounts and then aggregated them into something like millions to appear more digestible.

Conclusion

Both of these visuals helped to show the story of the data in the way to show the revenue sharing over distributors as well as over time. While the first visual showed the distributors with a clear leading company, the second visual helped the viewer see how the films did monetarily over time adding a value of a trend line to show declining attendance.

I feel that my analysis shows that there are very few distributors in this data set. I also feel it brought to light the effect of the pandemic and the increasing demand for streaming services rather than going to a movie theater. I would love to see the data set with more years added to it from after the year 2021. I feel the trend line would continue to decline.