The US horror film trailers data set

The US horror film trailers data set contains audio, colour, motion, and shot length data for trailers for the fifty highest grossing horror films at the US box office from 2011 to 2015.

All trailers were pre-processed to trim MPAA tag screens and YouTube channel promotional materials, and cropped to remove letterbox blanking.

The sample is described in the file US_Horror_Trailers_Sample_Summary.csv, which contains the title of the film promoted, the URL for the trailer on YouTube (all URLs were correct as of 27 January 2021), the dimensions of the video file after pre-processing, the number of frames after pre-processing, the native frame rate, and the running time of a trailer after pre-processing in seconds.

For each trailer the following information is available:

The wildcard * indicates the title of a film as it is listed in US_Horror_Trailers_Sample_Summary.csv.

The data set is available on Zenodo under a Creative Commons Attribution 4.0 International license at http://doi.org/10.5281/zenodo.4479068.

Here I demonstrate some visualizations of the different elements of film style for the trailer for Insidious: Chapter 2 to illustrate some of the ways in which this data can be presented.


Visualising audio data

The audio data for each trailer in the data set comprises the time contour of the normalized aggregated power envelope of its soundtrack derived by summing the columns in the time-frequency matrix produced by the short-time Fourier transform (STFT) and normalizing to resulting vector to a unit area. The csv file for each trailer has two columns: the time of the window of the short-time spectra of the STFT and the time contour.

The audio data for the trailer for Insidious: Chapter 2 is stored in the file Insidious_Chapter_2_audio.csv.

# load the audio data
audio_dat <- as.data.frame(read.csv("Insidious_Chapter_2_audio.csv", header = T))

head(audio_dat)
##         time     contour
## 1 0.00000000 0.00000e+00
## 2 0.02134003 0.00000e+00
## 3 0.04268007 0.00000e+00
## 4 0.06402010 0.00000e+00
## 5 0.08536013 0.00000e+00
## 6 0.10670017 3.00366e-08

Plotting this data using

# create the plot
options(scipen = 999)

library(ggplot2)

audio_plot <- ggplot(data = audio_dat, aes(x = time, y = contour)) +
  geom_line(size = 0.7, colour = "#2C115F") +
  scale_x_continuous(breaks = c(0, 25, 50, 75, 100, 125), 
                     labels = c(0, 25, 50, 75, 100, 125)) +
  scale_y_continuous(limits = c(0, 0.0015), breaks = c(0, 0.0005, 0.0010, 0.0015),
                     labels = c(0, 0.0005, 0.0010, 0.0015)) +
  labs(x = "Time (s)",
       y = "Normalised aggregated power") +
  theme_classic()

audio_plot

returns the following plot that allows us to identify the overall structure of the soundtrack, noting, for example, that the first 50 seconds of the trailer is relatively quiet, and to pick out some key features of interest, such as the nonlinear increase in power at 115 seconds.

For a detailed discussion of this method see my introduction to computational analysis of film audio in R.


Visualising colour data

The colour data for a trailer comprises the average colour of each frame in a trailer as an RGB triplet in sRGB colour space.

The RGB colour data for the trailer for Insidious: Chapter 2 is stored in the file Insidious_Chapter_2_rgb.csv.

This data can be visualized using the chromaR package by Tommaso Buonocore, which we will need to install from GitHub:

install.packages("devtools")
library(devtools)
install_github("detsutut/chroma", subdir = "chromaR")
library(chromaR)

This package gives us a range of options to visualize this data. Details on how to use the chromaR package can be accessed at https://github.com/detsutut/chroma.

First, we need to load the RGB data and merge the frames together:

# load the colour data
col_dat <- getFrames("Insidious_Chapter_2_rgb.csv")

# group the frames with the correct frame rate
col_grouped <- lapply(col_dat, function(x){groupframes(x, seconds = 0.25, fps = 23.976)})

Here I am using a merging window of 0.25 seconds. (This appears to be the lower limit for accurate representation of the time axis for chromaR::groupframes, with smaller windows resulting in timing errors). Different sized windows will affect the temporal resolution of the visualizations.

Note that when grouping frames of raw RGB data, the grouping and timings depend on the frame rate of a trailer. Therefore, use of the incorrect frame rate will lead to incorrect timings. The frame rate information for each trailer is available in the US_Horror_Trailers_Sample_Summary.csv file.

chromaR stores data frames in a list and so to access the data for a particular film we will need to use its index: col_grouped[[1]].

Now, lets produce a barcode of the colour in the trailer using chromaR::plotFrameline():

barcode <- plotFrameline(col_grouped[[1]], verbose = 2, vivid = TRUE, summary = FALSE, 
                       title = "\nBarcode", subtitle = "")

barcode <- barcode + 
  scale_x_continuous(breaks = c(0, 25, 50, 75, 100, 125),
                     labels = c(0, 25, 50, 75, 100, 125)) +
  labs(subtitle = NULL, y = NULL) + 
  theme(axis.ticks.x = element_blank(), 
        axis.text.x = element_blank(), 
        plot.title = element_text(size = 12, face = "bold", colour = "black"))

barcode

which gives us a visualization of the colour structure of the trailer that is easy to understand.

Using chromaR::temperature() we can plot the distance of each column in the barcode from red with the following code:

temp_plot <- temperature(as.data.frame(col_grouped[[1]]))

temp_plot <- temp_plot + 
  scale_x_continuous(breaks = c(0, 25, 50, 75, 100, 125),
                     labels = c(0, 25, 50, 75, 100, 125)) +
  xlab("Seconds") + 
  theme(plot.title = element_text(size = 12, face = "bold"), 
        legend.position = "bottom",
        legend.box = "vertical", legend.margin=margin())

temp_plot

which shows us there is a shift towards colder, bluer hues over time in this trailer.

Combing the two plots together using ggpubr lets us how the colours of the barcode relate to the evolution of colour temperature in this trailer.

library(ggpubr)

col_fig <- ggarrange(barcode, temp_plot, nrow = 2, align = "v", heights = c(0.7, 1.1))

col_fig


Visualising motion data

Motion data was captured using FlowAnalyzer, which applies optical flow analysis to compare pixel intensities between consecutive video frames and returns the magnitude and direction of motion for each pixel from one frame to the next. These are summed to provide scalars of the magnitude and horizontal and vertical direction of motion in each video frame.

The motion data for the trailer for Insidious: Chapter 2 is stored in the file Insidious_Chapter_2_motion.csv.

# load the motion data
motion_dat <- as.data.frame(read.csv("Insidious_Chapter_2_motion.csv", header = T))

head(motion_dat)
##         time  frame_mag      frame_x frame_x_mag    frame_y frame_y_mag
## 1 0.00000000  0.0000000  0.000000000   0.0000000 0.00000000   0.0000000
## 2 0.04170833  0.7674828  0.007014627   0.5222523 0.03149724   0.3758687
## 3 0.08341667  2.7022001 -0.240150705   1.9363017 0.18578520   1.2520472
## 4 0.12512500  4.6384536 -0.423660193   3.3303169 0.32181608   2.1725229
## 5 0.16683333  7.8361336 -0.594791738   5.6454085 0.46172592   3.8514768
## 6 0.20854167 10.7081880 -0.443226590   7.6092472 0.56908954   5.5973469

To create plots of the motion magnitude of each frame (frame_mag), the horizontal direction (frame_x), and the vertical direction (frame_y) over time we use:

mag_plot <- ggplot(motion_dat) +
  geom_line(aes(x = time, y = frame_mag), size = 0.7, colour = "#B73779") +
  scale_x_continuous(breaks = c(0, 25, 50, 75, 100, 125), labels = c(0, 25, 50, 75, 100, 125)) +
  scale_y_continuous(breaks = c(0, 25, 50, 75, 100, 125), labels = c(0, "", 50, "", 100, "")) +
  labs(title = "Motion magnitude") +
  theme_classic()+
  theme(plot.title = element_text(size = 12, face = "bold"),
        axis.title.x = element_blank(),
        axis.title.y = element_blank())

x_dir <- ggplot(data = motion_dat) +
  geom_line(aes(x = time, y = frame_x), size = 0.7, colour = "#B73779") +
  scale_x_continuous(breaks = c(0, 25, 50, 75, 100, 125), labels = c(0, 25, 50, 75, 100, 125)) +
  scale_y_continuous(limits = c(-10, 10), breaks = c(-10, -5, 0, 5, 10), labels = c(-10, "", 0, "", 10)) +
  labs(title = "Horizontal direction") +
  theme_classic() +
  theme(plot.title = element_text(size = 12, face = "bold"),
        axis.title.x = element_blank(),
        axis.title.y = element_blank())

y_dir <- ggplot(data = motion_dat) +
  geom_line(aes(x = time, y = frame_y), size = 0.7, colour = "#B73779") +
  scale_x_continuous(breaks = c(0, 25, 50, 75, 100, 125), labels = c(0, 25, 50, 75, 100, 125)) +
  scale_y_continuous(limits = c(-10, 10), breaks = c(-10, -5, 0, 5, 10), labels = c(-10, "", 0, "", 10)) +
  labs(title = "Vertical direction", 
       x = "Time (s)")+
  theme_classic()+
  theme(plot.title = element_text(size = 12, face = "bold"),
        axis.title.x = element_text(size = 10),
        axis.title.y = element_blank())

motion_fig <- ggarrange(mag_plot, x_dir, y_dir, nrow = 3, align = "v", heights = c(1.5, 1, 1))

motion_fig

This allows us to see where motion occurs in a trailer and when motion in a horizontal direction is contributing the most to the magnitude of motion in a frame and where vertical motion is greatest.


Visualising shot length data

The shot length data for all fifty trailers in the data set is stored in a single csv file, where each column contains the data for a trailer with the name of that film.

# load the csv file containing shot length data
SL_data <- read.csv("US_Horror_Trailers_SL_Data.csv", header = T)

# Access the data for Insidious: Chapter 2 and remove NAs
InC2 <- na.omit(SL_data$Insidious_Chapter_2)

head(InC2)
## [1] 1.50 0.88 1.17 1.79 1.50 1.54

To visualize the shot structure of this trailer at different scales we can use the loessggplot() function available at https://github.com/DrNickRedfern/loessggplot. This function fits a set of LOESS smoothers to the shot length data of a motion picture, iterating over a range of spans specified by the user and plotting the result, in order to identify the temporal structure of a film’s editing without committing the analyst to a particular level of smoothing before applying the function.

# source the function from GitHub
devtools::source_url("https://github.com/DrNickRedfern/loessggplot/blob/master/loessggplot.R?raw=TRUE")
## SHA-1 hash of file is e8221295bdf5737be91c99e193ef3840bda9e2bc
SL_plot <- loessggplot(InC2, low = 0.1, high = 0.9, step = 0.01, title = " ", ticks = 0.1)

SL_plot

For a discussion of how the loessggplot function improves on existing methods for visualizing motion picture shot length data see the blog post loessggplot: an improved tool for visualising cutting patterns in film.

Alternatively we can plot the timings at which the cuts occur as a point process:

# express timings of cuts as cumulative from the beginning of the trailer
InC2_cum <- as.data.frame(cumsum(InC2)); names(InC2_cum)[1] <- "cuts"

# dummy y variable with n repetition of 
Ones <- as.list(rep(1, length(InC2_cum)))

# create data frame
InC2_cum_df <- cbind(InC2_cum, Ones); names(InC2_cum_df)[1:2] <- c("cuts", "Ones")

# create the plot
cut_plot <- ggplot(data = InC2_cum_df, aes(x = cuts, y = Ones)) +
  geom_vline(xintercept = InC2_cum_df$cuts, col = '#440154') +
  scale_x_continuous(name="Time (s)", 
                     breaks = c(0, 25, 50, 75, 100, 125), 
                     labels = c(0, 25, 50, 75, 100, 125)) + 
  scale_y_continuous(name = element_blank()) +
  theme_light()

cut_plot


Bringing it all together

Although we can analyze each element of a film’s style individually, the effect it has upon an audience is produced by the interaction of the different elements. By plotting these together we can begin to see relationships between the editing, the soundtrack, and the motion in the frame.

style <- ggarrange(cut_plot, audio_plot, mag_plot, barcode, temp_plot, nrow = 5, align = "v",
                   heights = c(1, 1, 1, 1, 1.5))

style


For example, we can see that with the cut that occurs at 130.92 seconds there is also a sharp increase in the power of the soundtrack, a large increase in motion, and a sudden flash of a lighter, warmer colour evident in the barcode. Here we see an example of a the use of film style to create a startle response in the viewer, that combines all the different elements of film style. In this case, style is deployed to draw the viewer’s attention to a title card showing the release date of the movie so that the trailer can fulfil it’s marketing function.

This can be seen in the trailer below, which is cued up to play this moment.


Summary

I have covered some of the ways in which the different elements of style can be visualized using data from the US horror film trailers data set. The examples here by no means exhaust the different ways in which this data can be visualized and analyzed.