Code
library(tidyverse)
ice_movies <- read_delim(
"https://query.data.world/s/pmbfldxflx7ttdyfs23cx3abehcl5c",
delim = ";",
escape_double = FALSE,
trim_ws = TRUE,
locale = locale(encoding = "ISO-8859-1")
)These two graphs provide a useful overview of the financial and ticket sales performance of films. To figure out which films were the biggest hits, the first code highlights the top 20 films that earned the most money at the box office. With a focus on February, June, and October, the second graph shows the number of attendees over time. This allows movie theater owners to find out whether some seasons are busier than others. A can get a better understanding of what makes a movie effective by looking at both. It also shows that, the best times for visitors to attend, which again is helpful for movie theaters and studios.
library(tidyverse)
ice_movies <- read_delim(
"https://query.data.world/s/pmbfldxflx7ttdyfs23cx3abehcl5c",
delim = ";",
escape_double = FALSE,
trim_ws = TRUE,
locale = locale(encoding = "ISO-8859-1")
)boxoffice_data <- ice_movies |>
group_by(film_title = film.title) |>
summarise(total_box = max(`total.box.o.to.date`, na.rm = TRUE)) |>
ungroup()
boxoffice_top20 <- boxoffice_data |>
arrange(desc(total_box)) |>
slice_head(n = 20)
ggplot(boxoffice_top20, aes(x = reorder(film_title, total_box), y = total_box)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
scale_y_continuous(labels = function(x) formatC(x, format = "f", big.mark = ",", digits = 0)) + # Add commas
labs(
title = "Top 20 Films by Total Box Office Earnings",
x = "Film Title",
y = "Total Box Office (dollars)"
) +
theme_minimal()The bar graph shows how a small number of highly successful movies account for a large portion of box office receipts. Only a small number of films got large financial success, shown by the fact that the highest-grossing films stand out the most and others have a fast decline in revenue. This trend shows that the business is dominated by blockbuster films, whereas lower-grossing films make a lot less money. The data adds to the fact that marketing, brand familiarity, and franchise popularity are important factors in box office success by showing how people tend to flock to a few significant releases.
To make comparing earnings between films easier, I chose a horizontal bar chart. Long movie names can still be seen without overlapping because to the inverted axis. Large amounts are easier to understand when commas are used in the numbers. The “steelblue” color makes sure the data is the main emphasis and keeps the visualization neat. It’s clear which movies did the best thanks to the chart’s effortless and understandable ranking of the highest-grossing movies.
This graphic does a good job of clearly and simply displaying the order of the highest-grossing films. The provided numbers make it simpler to understand the dollar scale of each film’s earnings, and the horizontal set up increases readability. One drawback, though, is that it doesn’t display the earnings timeline, so we can’t see how long it took for each movie to earn its full box office. The storytelling element could be improved by including an additional element, like color shading or notes for revenues that break records. The graphic works well for ranking sake overall, but it might be smart to provide additional background information on earnings increase over time.
temporal_alt <- ice_movies |>
mutate(weekend_date_alt = as.Date(weekend.start, format = "%Y-%m-%d")) |>
group_by(weekend_date_alt) |>
summarize(total_admissions_alt = sum(adm.weekend, na.rm = TRUE)) |>
ungroup()
date_range <- seq(min(temporal_alt$weekend_date_alt, na.rm = TRUE),
max(temporal_alt$weekend_date_alt, na.rm = TRUE), by = "month")
selected_dates <- date_range[format(date_range, "%m") %in% c("02", "06", "10")]
ggplot(temporal_alt, aes(x = weekend_date_alt, y = total_admissions_alt)) +
geom_point(color = "darkblue", size = 1) +
geom_smooth(method = "loess", color = "red", se = FALSE, formula = y ~ x) +
scale_x_date(breaks = selected_dates, date_labels = "%b %Y") +
labs(
title = "Weekend Admissions Over Time",
x = "Month",
y = "Total Admissions"
) +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) This graph shows the number of weekend attendance at movies over time. The trend varies so there were more people on some weekends than others. The overall trend is shown by the smooth red line, which makes it easier to figure out if attendance was increasing, decreasing, or stayed the same. I made sure that the graph didn’t become overly crowded by focusing on important months instead of giving every date by displaying data just for February, June, and October.
Changes in the film industry attendance over time are tracked using a scatter plot with a smooth trend line. Different weekends are represented by the dots, while the larger trend is shown by the red line. To keep the chart from seeming overly cluttered, the x-axis only shows the three months of the year—February, June, and October. It’s easy to see because the red trend line and blue dots contrast strongly with the white background.
This graph makes it easy to identify trends over time by clearly displaying movie attendance ups and downs. It is simpler to follow the red trend line than to look at every single dot. Although the months I chose helps to keep it organized, important information from the other months can be missing. It might be easier to understand why attendance was high or low at certain times if labels were included for notable surges or major film releases. The graph does a good job of showing patterns in movie attendance overall, but it might need more information to clarify the reasons behind those trends.
It’s was easy to see which films brought in the most money and when theaters were most crowded thanks to these graphs. While the audience attendance graph lets us find patterns in movie-going behavior, the box office chart displays the highest earners. When you think of it, as a whole, they paint a clear picture of how the market runs. Theaters can use this type of information when deciding when to show blockbusters and when to offer promotions to attract more people. It can also help film studios to decide when to release new films in order to maximize ticket sales.
I used AI to help me with my assignment. In my ggplot code, I made a mistake and used element_txt() instead of element_text(), which lead to an error because element_txt() isn’t a legitimate method in ggplot2. With ChatGBT help, I found out that element_text() is the right function for changing text elements, like changing the axis labels’ size. After I fixed the issue, my code ran correctly.
On top of that, I found out that, unlike some other applications, R doesn’t automatically repair errors. It’ll stop and throw an error if I type a function that doesn’t exist and not trying to guess what I meant. I was able to understand why it happened and how to prevent it in the future with Chat’s help. The way I think of R’s exact syntax and function names went up because of this, and it’ll probably be helpful for upcoming coding projects.