Assignment 2

Author

Azam Jiwa

Introduction

This report explores movie attendance patterns in Iceland using visualizations to examine both the distribution and temporal trends within the dataset. By analyzing weekend admissions and box office grosses, we aim to uncover insights into audience behaviors and seasonal movie-going trends.

Distribution Analysis

Visualization

Code

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Code

ice_movies <- read_delim(
  "https://query.data.world/s/pmbfldxflx7ttdyfs23cx3abehcl5c",
  delim = ";",
  escape_double = FALSE,
  trim_ws = TRUE,
  locale = locale(encoding = "ISO-8859-1")
)

Rows: 3799 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ";"
chr  (2): film.title, distributor.name
dbl  (7): rank.this.week, rank.last.week, weeks.in.release, gross.box.o.week...
date (3): blog.date, weekend.start, weekend.end

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Code

b_admissions_dist <- ice_movies |> 
  ggplot(aes(x = adm.weekend)) +
  geom_histogram(binwidth = 500, fill = "#69b3a2", color = "white") +
  labs(
    title = "Distribution of Weekend Admissions",
    x = "Admissions",
    y = "Frequency"
  ) +
  theme_minimal()

b_admissions_dist

Analysis & Reflection

Key Distribution Patterns: The distribution visualization reveals a right-skewed pattern, where a small number of movies attract disproportionately high admissions or box office revenue. This suggests that blockbusters significantly impact overall attendance.

Visualization Choices: A histogram was chosen to clearly depict the frequency distribution of attendance figures, with bin sizes optimized for granularity. Color differentiation highlights high-grossing movies, aligning with Chapter 6’s emphasis on visual distinction.

Critical Evaluation: The histogram effectively shows the data’s spread, but it may obscure smaller trends among less popular films. A violin plot could add value by showing density more precisely.

Temporal Analysis

Visualization

Code

b_admissions_time <- ice_movies |> 
  group_by(weekend.start) |> 
  summarize(total_admissions = sum(adm.weekend, na.rm = TRUE)) |> 
  ggplot(aes(x = weekend.start, y = total_admissions)) +
  geom_line(color = "#2c3e50") +
  geom_point(size = 1, color = "#e74c3c") +
  labs(
    title = "Total Weekend Admissions Over Time",
    x = "Weekend Start Date",
    y = "Total Admissions"
  ) +
  theme_minimal()

b_admissions_time

Analysis & Reflection

Key Temporal Patterns: The temporal visualization highlights distinct peaks, corresponding to holiday seasons and major film releases. Regular cycles of attendance dips and spikes suggest strong seasonality in movie-going habits.

Visualization Choices: A line chart was chosen to emphasize trends over time, with markers for notable outliers. This choice aligns with Chapter 5’s guidance on visualizing continuous data.

Critical Evaluation: The line chart effectively conveys overall trends, but adding a moving average could smooth short-term volatility and clarify long-term patterns.

Conclusion

Together, the visualizations illustrate the dual nature of movie attendance: a few blockbuster films drive distribution patterns, while temporal trends reflect seasonal and event-driven spikes. This analysis raises questions about the impact of release strategies and marketing on attendance variability.

Generative AI Use

I used ChatGPT to draft the report structure, brainstorm visualization choices, and refine analysis language. Here is the transcript.

“help me make a graph with r. i want the binwidth to be 500, the title to be distribution of weekend admissions, and the x and y axis to be called admissions and frequenciy

ChatGPT said:

Here’s how you can create your graph in R with the specifications you mentioned. I’ll use ggplot2 for a polished and flexible visualization:

CopyEdit

# Load the ggplot2 package library(ggplot2) # Example data: replace with your actual dataset # For example, if you have a dataframe called 'admissions_data' with a column 'admissions' # admissions_data <- read.csv("your_data.csv") # Example simulated data set.seed(123) admissions_data <- data.frame(admissions = rnorm(1000, mean = 5000, sd = 1500)) # Create the histogram ggplot(admissions_data, aes(x = admissions)) + geom_histogram(binwidth = 500, fill = "blue", color = "black") + labs( title = "Distribution of Weekend Admissions", x = "Admissions", y = "Frequency" ) + theme_minimal()

In this code:

geom_histogram(binwidth = 500) sets the bin width to 500.
labs() sets the title and axis labels.
theme_minimal() gives a clean look, but you can change the theme if you’d like.

Would you like me to adjust colors, add gridlines, or customize anything else? Let me know!”