Assignment 2

Author

Abdilraouf Mohamed

Introduction

This study looks at how many people go to the movies in Iceland by closely examining a dataset that records various aspects of movie performance over the years. The data includes the names of the movies, how much they made at the box office over the weekend, how many people went to see them, and the exact times of each weekend’s showings. Movies can appear more than once in the dataset, so I started by sorting it by movie title and picking the ones that made the most money at the box office. I was able to show the top 20 movies based on how much money they made with this method. Also, I looked at how weekend admissions have changed over time to see if there were any clear seasonal or trend patterns. This report aims to give us a clear picture of which movies make money and how audience attendance changes over time. This will help us understand how the movie business is always changing in a way that is easy to understand.

Distribution Visualization and Analysis

Code

library(tidyverse)
ice_movies <- read_delim(
  "https://query.data.world/s/pmbfldxflx7ttdyfs23cx3abehcl5c",
  delim = ";",
  escape_double = FALSE,
  trim_ws = TRUE,
  locale = locale(encoding = "ISO-8859-1")
)

Code

boxoffice_data <- ice_movies |>
  group_by(film_title = film.title) |>
  summarize(total_box = max(`total.box.o.to.date`, na.rm = TRUE)) |>
  ungroup()

boxoffice_top20 <- boxoffice_data |>
  arrange(desc(total_box)) |>
  slice_head(n = 20)

ggplot(boxoffice_top20, aes(x = reorder(film_title, total_box), y = total_box)) +
  geom_bar(stat = "identity", fill = "purple") +
  coord_flip() +
  scale_y_continuous(labels = scales::comma) +
  labs(
    title = "Top 20 Films by Total Box Office Earnings",
    x = "Film Title",
    y = "Total Box Office (dollars)"
  ) +
  theme_minimal()

Analysis Reflection

The horizontal bar chart shows how rather unevenly the top 20 movies’ total box office profits are spread. While many movies make much less, a small handful of movies achieve very great profits. This trend points to an increased concentration of income where mega-hit films rule the market, an occurrence often seen in the entertainment business. In other words, the data reveals a sharp decline in the highest grossing films to those at the lower end of the top 20, therefore showing an uneven distribution wherein various movies account for a significant portion of the overall revenues. Since horizontal bar charts are particularly good for comparing distinct categories like movie titles; especially when those categories have lengthy labels, I chose one. Johnathon Schwabish says in Chapter 4 of Better Data Visualizations, “with long axis labels, consider rotating the chart to make the labels horizontal and easier to read”. By flipping the coordinates, one guarantees that the viewer’s attention is focused on the variations in income and that the film titles are readable. Furthermore, Schwabish’s advice on lowering mental strain when reading big numbers is followed in framing the axis using commas. Overall, the representation clearly conveys the essential message: a small number of movies generate a disproportionately high portion of box office revenue. The simplicity of the horizontal bar chart helps viewers to rapidly understand the main variations among the movies.The downside of concentrating only on the top 20 movies, however, is a limited perspective of the whole dataset. This does not fully represent the range of movie performance, even when it emphasizes the clear income differences among the top performers. Schwabish underlines the need of context in data visualizations and suggests that further pictures or descriptions might provide more information. Including a supplemental chart showing the whole distribution, for instance, could help you understand how remarkable the top 20 really are. Nevertheless, the layout I chose guarantees clarity, simplicity of understanding, and an efficient interpretation regarding movie attendance patterns as it fits very well the ideas presented in Better Data Visualizations.

Temporal Analysis

Code

temporal <- ice_movies |>
  mutate(
    weekend_date = as.Date(weekend.start, format = "%Y-%m-%d"),
    month = as.Date(format(weekend_date, "%Y-%m-01"))
  ) |>
  group_by(month) |>
  summarize(total_admissions = sum(adm.weekend, na.rm = TRUE)) |>
  ungroup()

ggplot(temporal, aes(x = month, y = total_admissions)) +
  geom_line(color = "orange", linewidth = 1) +
  geom_point(color = "orange") +
  geom_smooth(method = "loess", formula = y ~ x, se = FALSE, color = "black") +
  scale_x_date(date_breaks = "4 months", date_labels = "%b %Y") +
  labs(
    title = "Monthly Admissions Over Time",
    x = "Month",
    y = "Total Admissions"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Analysis Reflection

The graph shown displays movie weekend attendance over time. The pattern changes so some weekends saw more people than others. The smooth red line shows the general trend and helps one to determine if attendance was rising, declining, or constant. To avoid overcrowding, I focused on major months such as February, June, and October, rather than presenting all dates. A line chart with a smooth trend line tracks changes in movie industry attendance throughout time. The dots reflect different weekends; the red line shows the more general trend. The x-axis of the chart only displays the three months of the year—February, June, and October—to help avoid it from being too crowded. It’s obvious as the dots and trend line sharply contrast with the white the background. This graph clearly shows movie attendance ups and downs, therefore facilitating the recognition of patterns throughout time. Following the trend line makes sense more than staring at every single dot. While the months I choose assist to keep things orderly, vital information from the other months may be missing. Including labels for significant spikes or big movie releases helps one to better grasp why attendance was either high or low at certain periods. Although the graph does a decent job of displaying general trends in cinema attendance, further data would help clarify the causes of those variations.

Conclusion

In conclusion, the research gives us an interesting look into how movies work and how people act when they watch them. The distribution study clearly shows that there is a clear difference between the top films and the rest. Still, weekend tickets show that the number of people who go to the movies doesn’t stay the same. It changes over time, probably because of seasonal changes, holidays, or when big movies come out. These insights show that even though some movies are popular and make a lot of money, the number of people who go to the movies can change a lot from one weekend to the next. This report shows the differences in how much money each movie makes and what might be changing audience trends. Finally, the results give us a good reason to learn more about the things that affect how many people go to movies in Iceland and how well they do that.

Ai Use

Generative AI Use Link:

https://chatgpt.com/share/67c23121-d350-800f-8224-dd4d553bb09b

I used ChatGPT to assist me in this assignment. On my first code, my y-axis was using “e” as a way to shorten higher numbers for top 20 box office movies. With the help of AI, I was able to use scale_y_continuous(labels = scales::comma) to fully read out the numbers so that the reader can know how much each movie in the top 20 was able to generate. During my second code, my initial code was on a yearly basis and didn’t provide much information on the trends of weekend admisssion using a line chart. With the help of AI, I was able to alter my code to a 4-monthly basis so that I could see trends more clearly and that the Data would be clearer to read and understand.