Introduction

I consider myself very freak of the cinema and tv shows’s world, and then this Dataset has awakened my curiosity. I’m very fan of Tarantino movies so, ‘The Tarantino Love’ is a document that I explore, in a simple and fast way, some features about Tarantino movies.

That short Dataset contains the following features:

Movie: string variable that contains the title of Tarantino movie.
Type: factor with two possible values ‘word’or ’death’.
Word: if type is ‘word’, the swear word in each movie.
Minutes_in: minute of movie that death or swear word appears.

First steps

Load libraries

library(dplyr)
library(ggplot2)
library(gridExtra)

Load the Data

tarantino <- read.csv('tarantino.csv')

Structure of Data

str(tarantino) #Structure

## 'data.frame':    1894 obs. of  4 variables:
##  $ movie     : Factor w/ 7 levels "Django Unchained",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ type      : Factor w/ 2 levels "death","word": 2 2 2 2 2 2 2 2 2 2 ...
##  $ word      : Factor w/ 61 levels "","ass","asses",..: 19 21 25 30 11 24 51 24 21 24 ...
##  $ minutes_in: num  0.4 0.43 0.55 0.61 0.61 0.66 0.9 1.43 1.56 1.66 ...

head(tarantino, 5) #View content

##            movie type     word minutes_in
## 1 Reservoir Dogs word     dick       0.40
## 2 Reservoir Dogs word    dicks       0.43
## 3 Reservoir Dogs word   fucked       0.55
## 4 Reservoir Dogs word  fucking       0.61
## 5 Reservoir Dogs word bullshit       0.61

unique(tarantino$movie) #Films in Dataset

## [1] Reservoir Dogs      Pulp Fiction        Kill Bill: Vol. 1  
## [4] Kill Bill: Vol. 2   Inglorious Basterds Django Unchained   
## [7] Jackie Brown       
## 7 Levels: Django Unchained Inglorious Basterds ... Reservoir Dogs

unique(tarantino$type)

## [1] word  death
## Levels: death word

unique(tarantino$word) #Swear words contained

##  [1] dick          dicks         fucked        fucking       bullshit     
##  [6] fuck          shit          motherfucker  pussy         fucks        
## [11] hell          jap           bastard       goddamn       motherfuckers
## [16] asshole       ass           assholes      n-word        asses        
## [21] bitch         fuckup        fucker        shitty        asshead      
## [26] damn                        damned        bitches       wetback      
## [31] faggot        cocksucker    gook          fuckers       gooks        
## [36] motherfucking dickless      chickenshit   slope         fuckhead     
## [41] merde         shithead      cunt          cunts         fuckface     
## [46] cockblockery  japs          jew (verb)    bastards      horeshit     
## [51] shitless      shitting      negro         squaw         slut         
## [56] goddamned     jackass       horseshit     shittiest     shitload     
## [61] dumbass      
## 61 Levels:  ass asses asshead asshole assholes bastard bastards ... wetback

Manipulation

Dataset with ‘deaths’

death.by.movies <- as.data.frame(tarantino %>% 
  group_by(movie) %>% 
  filter(type == 'death') %>% 
  count(type))

Dataset with ‘swear words’

words.by.movies <- as.data.frame(tarantino %>% 
  group_by(movie) %>% 
  filter(type == 'word') %>% 
  count(type))

Visualizations

Here we have the count of deaths in each movie. First part of Kill Bill has the most deaths along the film, against Jackie Brown, that have the lowest number of deaths.

It’s interesting the differences between the two parts of Kill Bill. Kill Bill Vol.2 has 11 deaths along the movie. Mabye is becouse the budget on Kill Bill Vol.1 was 55M, and in Kill Bill Vol.2 was 30M (almost half).

Pulp Fiction and Reservoir Dogs is where there is say the most quantity of swear words, against Kill Bill Vol.1 and Inglorious Basterds, that is where the lowest quantity of swear words (no time to swearswords when you’re killing/dying).

The following both graphs shows the deaths and swear words along each film:

Reservoir Dogs has a great part of scene that is sayed most swear words, the beggining (bar scene). Pulp Fiction and Jackie Brown focus the most quantity of swear words in the beggining and the end. Instead of Django Unchained, that distribute all his swear words along all the film.

The three most swear words sayed is ‘fucking’, ‘shit’ and ‘fuck’ along all the films:

‘Fucking’ is sayed most in Reservoir Dogs and Pulp Fiction, ‘shit’ in Pulp Fiction and Jackie Brown, and ‘fuck’ in Pulp Fiction and Reservoir Dogs.

Best Visualization

In proportion

To have standarized information, according to the duration of each film here we have the proportion of deaths and swear words by movie.

First, it is created another column with the minutes duration for each movie (IMDb resource), and then, the number of deaths/swear words are divided by duration.

#Death by movie Dataset
death.by.movies$duration <- c(165, 153, 154, 111, 137, 154, 99)
death.by.movies$proportion <- (death.by.movies$n/death.by.movies$duration)

#Swear Words by movie
words.by.movies$duration <- c(165, 153, 154, 111, 137, 154, 99)
words.by.movies$proportion <- (words.by.movies$n/words.by.movies$duration)

More than 50% of Kill Bill Vol.1 movie are deaths scenes, and the 25% of Djando Unchained and Inglorious Basterds are deaths scenes.

No distribution differences between the simple count of deaths by movie and proportion deaths by movie:

grid.arrange(a, e, ncol = 1)

On the other hand, for every minute in Reservoir Dogs, 4 swear words are said, 3 in Pulp Fiction, and more than 2 in Jackie Brown.

There are some distribution differences between the simple count of swear words by movie and the swear words proportion. The distribution it keeps until Kill Bill; based on duration, in Kill Bill Vol.1 it is sayed a little more of swear words than the second part. The most difference we find it is in Pulp Fiction and Reservoir Dogs. When we do a simple count, Pulp Fiction has the most quantity of swear words than Reservoir Dogs. But when we compute the swear words of that movies in proportion of his duration, Reservoir Dogs has the most quantity of swear words by minute than Pulp Fiction.

grid.arrange(b, f, ncol = 1)

Conclusions

This last one implies that we can not just look over the information that we have, sometimes must be relativize the information.

The Data/Business Analyst not only is responsible of report what there is, is responsible of rummage among the information. Is like an archaeologist, he wouldn’t do a good job if only he looked at the floor he has under his feet, must kneel and work to get something more than what is a sight.

Obviously my job isn’t based only in visualizations skills, is needed data manipulation, data preparation, predictive analysis, KPIs reporting, Machine Learning and statistics skills, among others.

For more types of works, please, visit my other publications.

End

This project don’t have any utility, the only purpose is to improve my visualization skills.

The Tarantino dataset has been taken of Kaggle, a free repository.

The Tarantino Love ♥

Jessica González

22/9/2017