Code
library(tidyverse)
<- read_delim(
ice_movies "https://query.data.world/s/pmbfldxflx7ttdyfs23cx3abehcl5c",
delim = ";",
escape_double = FALSE,
trim_ws = TRUE,
locale = locale(encoding = "ISO-8859-1")
)
This dataset contains information about films, their distributors, their rankings, admissions, and box office numbers.
For this assignment, the aspect I focused in on was the distributors of these movies and compared them to each other using the number of their appearances in the top ranks and admission rates.
A. Visualization
library(tidyverse)
<- read_delim(
ice_movies "https://query.data.world/s/pmbfldxflx7ttdyfs23cx3abehcl5c",
delim = ";",
escape_double = FALSE,
trim_ws = TRUE,
locale = locale(encoding = "ISO-8859-1")
)
|>
ice_movies count(distributor.name) |>
ggplot(aes(x = fct_reorder(distributor.name, n, .desc = TRUE), y = n)) +
geom_col(fill = "steelblue") +
geom_text(aes(label = paste0(round(n))),
position = position_stack(vjust = .5),
color = "black") +
labs(x = "Distributor Name" , y = "Number of Appearances", title = "Movie Distributor Appearances in the last 4 Years")
B . Analysis and Reflection
Key Distributional Patterns:
The distributional patterns I first notice is how dominant Samfilm is with their number of appearances compared to the other distributors, with the second place, Myndform only having 2/3 of the appearances that Samfilm has. Noticiably, we also have a few strays at the end while one only having 5 appearances and the other only having 1 film appearance.
Visualization Choices:
Visualization choices I made was to keep it simple, I also followed along with the choices made in the presentation in brightspace, using the steelblue color and placing the number of appearances in the center of the bars using .5 I also decided to use the actual number of appearances instead of the percentage because of the outlying bars that have 1 and 5 appearances, which I think is a lot more intuitive.
Critical Evaluation
I would say this is the better of the 2 graphs I have in this assignment, mainly because there’s not much else I would change. The only thing I would think about adjusting is the number positioning on the bar graphs, as the last 2 distributor’s film appearances can be slightly difficult to read because of how small the bars are.
A. Visualization
<- c("Sena", "Myndform", "Samfilm")
distributor.names |>
ice_movies filter(distributor.name %in% distributor.names) |>
ggplot(aes( x = blog.date, y = adm.weekend)) +
geom_line(color = "steelblue", linewidth = .5) +
facet_wrap(vars(distributor.name))
B. Analysis and Reflection
Key Temporal Patterns
I think a really interesting trend I found that all three of these graphs have in common is how the admission rates all seem to taper off around the beginning of 2020, which is most likely because of Covid. Another pattern is how Samfilm has the top admission weekends, most likely due to the amount of films they have in the top rated weekends
Visualization Choices
For visualization choices, I made quite a few changes as I went, first off, I started off with everything on one graph, however, changed this because it was way too cluttered with the amount of information given, so I chose to have the multi graph option in stead. The second choice I made was to drop the bottom 3 distributors to simplify what we are looking at, and I didn’t think we needed to see all 6 to get to the point of what is being shown. Lastly I chose to stick with the steel blue color because I wanted to get a similar theme in this assignment.
Critical Evaluation
I think this shows exactly what I wanted to show, however this can be tweaked slightly. First off, I spent a lot of time trying to figure out how to add a title to each graph, but when doing so, it ended up combining all of these graphs into one again. So unfortunately I had to stick with the default titles. And this includes the x, y, and title. And second, I wanted to try to sort the graphs to have Samfilm first, Myndform second, and Sena third, but I couldn’t find a way to do that either. However, overall I think I did a good job on creating a graph that gets my point across with just a few visual flaws.
Overall I think I did a great job sticking to a story, showing how all of the movie distributors compared to each other, and then sticking with the same information and narrowing it down to how successful the top 3 movie distributors were overtime. I think another question my analysis raises is how much gross box office was collected overtime for each of these distributors.