About

Quantitative Descriptive Analytics aims to gather an in-depth understanding of the underlying reasons and motivations for an event or observation. It is typically represented with visuals or charts.

Qualitative Descriptive Analytics focuses on investigating a phenomenon via statistical, mathematical, and computationaly techniques. It aims to quantify an event with metrics and numbers.

In this lab, we will explore the marketing data set and understand it better through simple statistics.

Setup

Make sure to download the folder titled ‘bsad_lab03’ zip folder and extract the folder to unzip it. Next, we must set this folder as the working directory. The way to do this is to open R Studio, go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Now, follow the directions to complete the lab.


Task 1

First begin by reading in the data from the ‘marketing.csv’ file, and viewing it to make sure we see it being read in correctly.

mydata = read.csv(file="data/rottentomatoes.csv")
head(mydata)
##                                                      title
## 1                                                 Avatar 
## 2               Pirates of the Caribbean: At World's End 
## 3                                                Spectre 
## 4                                  The Dark Knight Rises 
## 5 Star Wars: Episode VII - The Force Awakens             
## 6                                            John Carter 
##                            genres          director          actor1
## 1 Action|Adventure|Fantasy|Sci-Fi     James Cameron     CCH Pounder
## 2        Action|Adventure|Fantasy    Gore Verbinski     Johnny Depp
## 3       Action|Adventure|Thriller        Sam Mendes Christoph Waltz
## 4                 Action|Thriller Christopher Nolan       Tom Hardy
## 5                     Documentary       Doug Walker     Doug Walker
## 6         Action|Adventure|Sci-Fi    Andrew Stanton    Daryl Sabara
##             actor2               actor3 length    budget director_fb_likes
## 1 Joel David Moore            Wes Studi    178 237000000                 0
## 2    Orlando Bloom       Jack Davenport    169 300000000               563
## 3     Rory Kinnear     Stephanie Sigman    148 245000000                 0
## 4   Christian Bale Joseph Gordon-Levitt    164 250000000             22000
## 5       Rob Walker                          NA        NA               131
## 6  Samantha Morton         Polly Walker    132 263700000               475
##   actor1_fb_likes actor2_fb_likes actor3_fb_likes total_cast_likes
## 1            1000             936             855             4834
## 2           40000            5000            1000            48350
## 3           11000             393             161            11700
## 4           27000           23000           23000           106759
## 5             131              12              NA              143
## 6             640             632             530             1873
##   fb_likes critic_reviews users_reviews users_votes score aspect_ratio
## 1    33000            723          3054      886204   7.9         1.78
## 2        0            302          1238      471220   7.1         2.35
## 3    85000            602           994      275868   6.8         2.35
## 4   164000            813          2701     1144337   8.5         2.35
## 5        0             NA            NA           8   7.1           NA
## 6    24000            462           738      212204   6.6         2.35
##       gross year
## 1 760505847 2009
## 2 309404152 2007
## 3 200074175 2015
## 4 448130642 2012
## 5        NA   NA
## 6  73058679 2012

After reviewing the data, it can be seen that within the title column there are characters that require the data to be cleaned. So the column must be extracted in order to have the character removed.

title = mydata$title
title = sub("Â","",title)
head(title)
## [1] "Avatar "                                                
## [2] "Pirates of the Caribbean: At World's End "              
## [3] "Spectre "                                               
## [4] "The Dark Knight Rises "                                 
## [5] "Star Wars: Episode VII - The Force Awakens             "
## [6] "John Carter "

In the following, I have calculated Range, Min, Max, Mean, STDEV, and Variance for the following columns: Users Votes, Director Facebook Likes, Total Cast Likes, Gross, and Budget.

Users Votes

UsersVotes = mydata$users_votes
head(UsersVotes)
## [1]  886204  471220  275868 1144337       8  212204
#Max
maxUsersVotes = max(UsersVotes)
maxUsersVotes
## [1] 1689764
#Min
minUsersVotes = min(UsersVotes)
minUsersVotes
## [1] 5
#Mean
averageUsersVotes = mean(UsersVotes)
averageUsersVotes
## [1] 83668.16
#Range
rangeUsersVotes = maxUsersVotes - minUsersVotes
rangeUsersVotes
## [1] 1689759
#Standard Deviation
sdUsersVotes = sd(UsersVotes)
sdUsersVotes
## [1] 138485.3
#Variance
varUsersVotes = var(UsersVotes)
varUsersVotes
## [1] 19178166353

Director Facebook Likes

TotalCastLikes = mydata$total_cast_likes
head(TotalCastLikes)
## [1]   4834  48350  11700 106759    143   1873
#Max
maxTotalCastLikes = max(TotalCastLikes)
maxTotalCastLikes
## [1] 656730
#Min
minTotalCastLikes = min(TotalCastLikes)
minTotalCastLikes
## [1] 0
#Mean
averageTotalCastLikes = mean(TotalCastLikes)
averageTotalCastLikes
## [1] 9699.064
#Range
rangeTotalCastLikes = maxTotalCastLikes - minTotalCastLikes
rangeTotalCastLikes
## [1] 656730
#Standard Deviation 
sdTotalCastLikes = sd(TotalCastLikes)
sdTotalCastLikes
## [1] 18163.8
#Variance
varTotalCastLikes = var(TotalCastLikes)
varTotalCastLikes
## [1] 329923599

Director Facebook Likes

DirectorFacebookLikes = mydata$director_fb_likes
DirectorFacebookLikes = DirectorFacebookLikes[!is.na(DirectorFacebookLikes)]
head(DirectorFacebookLikes)
## [1]     0   563     0 22000   131   475
#Max
maxDirectorFacebookLikes = max(DirectorFacebookLikes)
maxDirectorFacebookLikes
## [1] 23000
#Min
minDirectorFacebookLikes = min(DirectorFacebookLikes)
minDirectorFacebookLikes
## [1] 0
#Mean
averageDirectorFacebookLikes = mean(DirectorFacebookLikes)
averageDirectorFacebookLikes
## [1] 686.5092
summary(mydata)
##                         title                       genres    
##  Ben-Hur                  :   3   Drama               : 236  
##  Halloween                :   3   Comedy              : 209  
##  Home                     :   3   Comedy|Drama        : 191  
##  King Kong                :   3   Comedy|Drama|Romance: 187  
##  Pan                      :   3   Comedy|Romance      : 158  
##  The Fast and the Furious :   3   Drama|Romance       : 152  
##  (Other)                   :5025   (Other)             :3910  
##              director                  actor1                 actor2    
##                  : 104   Robert De Niro   :  49   Morgan Freeman :  20  
##  Steven Spielberg:  26   Johnny Depp      :  41   Charlize Theron:  15  
##  Woody Allen     :  22   Nicolas Cage     :  33   Brad Pitt      :  14  
##  Clint Eastwood  :  20   J.K. Simmons     :  31                  :  13  
##  Martin Scorsese :  20   Bruce Willis     :  30   James Franco   :  11  
##  Ridley Scott    :  17   Denzel Washington:  30   Meryl Streep   :  11  
##  (Other)         :4834   (Other)          :4829   (Other)        :4959  
##             actor3         length          budget         
##                :  23   Min.   :  7.0   Min.   :2.180e+02  
##  Ben Mendelsohn:   8   1st Qu.: 93.0   1st Qu.:6.000e+06  
##  John Heard    :   8   Median :103.0   Median :2.000e+07  
##  Steve Coogan  :   8   Mean   :107.2   Mean   :3.975e+07  
##  Anne Hathaway :   7   3rd Qu.:118.0   3rd Qu.:4.500e+07  
##  Jon Gries     :   7   Max.   :511.0   Max.   :1.222e+10  
##  (Other)       :4982   NA's   :15      NA's   :492        
##  director_fb_likes actor1_fb_likes  actor2_fb_likes  actor3_fb_likes  
##  Min.   :    0.0   Min.   :     0   Min.   :     0   Min.   :    0.0  
##  1st Qu.:    7.0   1st Qu.:   614   1st Qu.:   281   1st Qu.:  133.0  
##  Median :   49.0   Median :   988   Median :   595   Median :  371.5  
##  Mean   :  686.5   Mean   :  6560   Mean   :  1652   Mean   :  645.0  
##  3rd Qu.:  194.5   3rd Qu.: 11000   3rd Qu.:   918   3rd Qu.:  636.0  
##  Max.   :23000.0   Max.   :640000   Max.   :137000   Max.   :23000.0  
##  NA's   :104       NA's   :7        NA's   :13       NA's   :23       
##  total_cast_likes    fb_likes      critic_reviews  users_reviews   
##  Min.   :     0   Min.   :     0   Min.   :  1.0   Min.   :   1.0  
##  1st Qu.:  1411   1st Qu.:     0   1st Qu.: 50.0   1st Qu.:  65.0  
##  Median :  3090   Median :   166   Median :110.0   Median : 156.0  
##  Mean   :  9699   Mean   :  7526   Mean   :140.2   Mean   : 272.8  
##  3rd Qu.: 13756   3rd Qu.:  3000   3rd Qu.:195.0   3rd Qu.: 326.0  
##  Max.   :656730   Max.   :349000   Max.   :813.0   Max.   :5060.0  
##                                    NA's   :50      NA's   :21      
##   users_votes          score        aspect_ratio       gross          
##  Min.   :      5   Min.   :1.600   Min.   : 1.18   Min.   :      162  
##  1st Qu.:   8594   1st Qu.:5.800   1st Qu.: 1.85   1st Qu.:  5340988  
##  Median :  34359   Median :6.600   Median : 2.35   Median : 25517500  
##  Mean   :  83668   Mean   :6.442   Mean   : 2.22   Mean   : 48468408  
##  3rd Qu.:  96309   3rd Qu.:7.200   3rd Qu.: 2.35   3rd Qu.: 62309438  
##  Max.   :1689764   Max.   :9.500   Max.   :16.00   Max.   :760505847  
##                                    NA's   :329     NA's   :884        
##       year     
##  Min.   :1916  
##  1st Qu.:1999  
##  Median :2005  
##  Mean   :2002  
##  3rd Qu.:2011  
##  Max.   :2016  
##  NA's   :108
#Range
rangeDirectorFacebookLikes = maxDirectorFacebookLikes - minDirectorFacebookLikes
rangeDirectorFacebookLikes
## [1] 23000
#Standard Deviation
sdDirectorFacebookLikes = sd(DirectorFacebookLikes)
sdDirectorFacebookLikes
## [1] 2813.329
#Variance
varDirectorFacebookLikes = var(DirectorFacebookLikes)
varDirectorFacebookLikes
## [1] 7914818

Gross

gross = mydata$gross
gross = gross[!is.na(gross)]
head(gross)
## [1] 760505847 309404152 200074175 448130642  73058679 336530303
#Max
maxGross = max(gross)
maxGross
## [1] 760505847
#Min
minGross = min(gross)
minGross
## [1] 162
#Mean
averageGross = mean(gross)
averageGross
## [1] 48468408
#Range
rangeGross = maxGross - minGross
rangeGross
## [1] 760505685
#Standard Deviation
sdGross = sd(gross)
sdGross
## [1] 68452990
#Variance
varGross = var(gross)
varGross
## [1] 4.685812e+15

Budget

budget = mydata$budget
budget = budget[!is.na(budget)]
head(budget)
## [1] 237000000 300000000 245000000 250000000 263700000 258000000
#Maximum
maxbudget = max(budget)
maxbudget
## [1] 12215500000
#MIN
minbudget = min(budget)
minbudget
## [1] 218
#Mean
averagebudget = mean(budget)
averagebudget
## [1] 39752620
#Range
rangebudget = maxbudget - minbudget
rangebudget
## [1] 12215499782
#Standard Deviation
sdbudget = sd(budget)
sdbudget
## [1] 206114898
#Variance
varbudget = var(budget)
varbudget
## [1] 4.248335e+16

Task 2

The summary function computes all of these statistics with one process.

summary(mydata)
##                         title                       genres    
##  Ben-Hur                  :   3   Drama               : 236  
##  Halloween                :   3   Comedy              : 209  
##  Home                     :   3   Comedy|Drama        : 191  
##  King Kong                :   3   Comedy|Drama|Romance: 187  
##  Pan                      :   3   Comedy|Romance      : 158  
##  The Fast and the Furious :   3   Drama|Romance       : 152  
##  (Other)                   :5025   (Other)             :3910  
##              director                  actor1                 actor2    
##                  : 104   Robert De Niro   :  49   Morgan Freeman :  20  
##  Steven Spielberg:  26   Johnny Depp      :  41   Charlize Theron:  15  
##  Woody Allen     :  22   Nicolas Cage     :  33   Brad Pitt      :  14  
##  Clint Eastwood  :  20   J.K. Simmons     :  31                  :  13  
##  Martin Scorsese :  20   Bruce Willis     :  30   James Franco   :  11  
##  Ridley Scott    :  17   Denzel Washington:  30   Meryl Streep   :  11  
##  (Other)         :4834   (Other)          :4829   (Other)        :4959  
##             actor3         length          budget         
##                :  23   Min.   :  7.0   Min.   :2.180e+02  
##  Ben Mendelsohn:   8   1st Qu.: 93.0   1st Qu.:6.000e+06  
##  John Heard    :   8   Median :103.0   Median :2.000e+07  
##  Steve Coogan  :   8   Mean   :107.2   Mean   :3.975e+07  
##  Anne Hathaway :   7   3rd Qu.:118.0   3rd Qu.:4.500e+07  
##  Jon Gries     :   7   Max.   :511.0   Max.   :1.222e+10  
##  (Other)       :4982   NA's   :15      NA's   :492        
##  director_fb_likes actor1_fb_likes  actor2_fb_likes  actor3_fb_likes  
##  Min.   :    0.0   Min.   :     0   Min.   :     0   Min.   :    0.0  
##  1st Qu.:    7.0   1st Qu.:   614   1st Qu.:   281   1st Qu.:  133.0  
##  Median :   49.0   Median :   988   Median :   595   Median :  371.5  
##  Mean   :  686.5   Mean   :  6560   Mean   :  1652   Mean   :  645.0  
##  3rd Qu.:  194.5   3rd Qu.: 11000   3rd Qu.:   918   3rd Qu.:  636.0  
##  Max.   :23000.0   Max.   :640000   Max.   :137000   Max.   :23000.0  
##  NA's   :104       NA's   :7        NA's   :13       NA's   :23       
##  total_cast_likes    fb_likes      critic_reviews  users_reviews   
##  Min.   :     0   Min.   :     0   Min.   :  1.0   Min.   :   1.0  
##  1st Qu.:  1411   1st Qu.:     0   1st Qu.: 50.0   1st Qu.:  65.0  
##  Median :  3090   Median :   166   Median :110.0   Median : 156.0  
##  Mean   :  9699   Mean   :  7526   Mean   :140.2   Mean   : 272.8  
##  3rd Qu.: 13756   3rd Qu.:  3000   3rd Qu.:195.0   3rd Qu.: 326.0  
##  Max.   :656730   Max.   :349000   Max.   :813.0   Max.   :5060.0  
##                                    NA's   :50      NA's   :21      
##   users_votes          score        aspect_ratio       gross          
##  Min.   :      5   Min.   :1.600   Min.   : 1.18   Min.   :      162  
##  1st Qu.:   8594   1st Qu.:5.800   1st Qu.: 1.85   1st Qu.:  5340988  
##  Median :  34359   Median :6.600   Median : 2.35   Median : 25517500  
##  Mean   :  83668   Mean   :6.442   Mean   : 2.22   Mean   : 48468408  
##  3rd Qu.:  96309   3rd Qu.:7.200   3rd Qu.: 2.35   3rd Qu.: 62309438  
##  Max.   :1689764   Max.   :9.500   Max.   :16.00   Max.   :760505847  
##                                    NA's   :329     NA's   :884        
##       year     
##  Min.   :1916  
##  1st Qu.:1999  
##  Median :2005  
##  Mean   :2002  
##  3rd Qu.:2011  
##  Max.   :2016  
##  NA's   :108

Budget Plot

This plot is represenative of the budget column

plot(budget)

plot
## function (x, y, ...) 
## UseMethod("plot")
## <bytecode: 0x7f9d91fdd150>
## <environment: namespace:graphics>

It is difficult to make accurate assessments from this plot graph therefore, the graph must be customized more accordingly.Specifically from least to most expensive movie.

#Budget is reorder from least to most expensive budget
leastexpensiveBudget = sort(budget)
#xlab labels the x axis, ylab labels the y axis
plot(leastexpensiveBudget, type="b", xlab = "Movie", ylab = "Movie Budget", col.main = "Red")

This graph shows us that the vast majority of films are produced with the mean amount, with only very few reaching above that.

4 Layout Graphs

Director Facebook Likes, Total Cast Likes, Facebook Likes, and Budget

layout(matrix(1:4,2,2))
#Director Facebook Likes
ascendingDirectorFacebookLikes = sort(DirectorFacebookLikes)
plot(ascendingDirectorFacebookLikes, type="b", xlab = "Movie", ylab = "Director Facebook Likes")
title ("Director Facebook Likes by Movie")

#Total Cast Likes
ascendingTotalCastLikes = sort(TotalCastLikes)
plot(ascendingTotalCastLikes, type="b", xlab = "Movie", ylab = "Total Cast Likes")
title ("Total Cast Likes by Movie")

#Gross
ascendingGross = sort(gross)
plot(ascendingGross, type="b", xlab = "Movie", ylab = "Gross")
title ("Gross by Movie")


#Budget
ascendingBudget = sort(budget)
plot(ascendingBudget, type="b", xlab = "Movie", ylab = "Budget")
title ("Budget")

From these graphs, it follows that the more popularity a movie has from either its director or cast then they are more likely record higher gross earnings. Budget follows this curve as well but less dramatically.


Task 3

In the following, I have calculated the Z-score for my previously selected variables.

#Budget 
budgetZscore = (10234001 - averagebudget) / sdbudget
budgetZscore
## [1] -0.1432144
#Gross
grossZscore = (10234001 - averageGross) / sdGross
grossZscore
## [1] -0.5585498
#Director Facebook Likes 
directorfacebooklikesZscore = (10234001 - averageDirectorFacebookLikes) / sdDirectorFacebookLikes
directorfacebooklikesZscore
## [1] 3637.44
#Total Cast Likes 
totalcastlikesZscore = (10234001 - averageTotalCastLikes) / sdTotalCastLikes
totalcastlikesZscore
## [1] 562.8945
# Budget
zscoreBudget = (budget - averagebudget) / sdbudget
# Total Cast Likes
zscoreTotalCastLikes = (TotalCastLikes - averageTotalCastLikes) / sdTotalCastLikes
# Director Facebook Likes
zscoreDirectorFacebookLikes = (DirectorFacebookLikes - averageDirectorFacebookLikes) / sdDirectorFacebookLikes
# Gross
zscoreGross = (gross- averageGross) / sdGross
#Histogram Graphs of Zscores 
layout(matrix(1:4, 2,2))
hist(zscoreBudget)
hist(zscoreGross)
hist(zscoreTotalCastLikes)
hist(zscoreDirectorFacebookLikes)