Quantitative Descriptive Analytics aims to gather an in-depth understanding of the underlying reasons and motivations for an event or observation. It is typically represented with visuals or charts.
Qualitative Descriptive Analytics focuses on investigating a phenomenon via statistical, mathematical, and computationaly techniques. It aims to quantify an event with metrics and numbers.
In this lab, we will explore the marketing data set and understand it better through simple statistics.
Make sure to download the folder titled ‘bsad_lab03’ zip folder and extract the folder to unzip it. Next, we must set this folder as the working directory. The way to do this is to open R Studio, go to ‘Session’, scroll down to ‘Set Working Directory’, and click ‘To Source File Location’. Now, follow the directions to complete the lab.
First begin by reading in the data from the ‘marketing.csv’ file, and viewing it to make sure we see it being read in correctly.
mydata = read.csv(file="data/rottentomatoes.csv")
head(mydata)
## title
## 1 AvatarÂÂ
## 2 Pirates of the Caribbean: At World's EndÂÂ
## 3 SpectreÂÂ
## 4 The Dark Knight RisesÂÂ
## 5 Star Wars: Episode VII - The Force AwakensÂÂ
## 6 John CarterÂÂ
## genres director actor1
## 1 Action|Adventure|Fantasy|Sci-Fi James Cameron CCH Pounder
## 2 Action|Adventure|Fantasy Gore Verbinski Johnny Depp
## 3 Action|Adventure|Thriller Sam Mendes Christoph Waltz
## 4 Action|Thriller Christopher Nolan Tom Hardy
## 5 Documentary Doug Walker Doug Walker
## 6 Action|Adventure|Sci-Fi Andrew Stanton Daryl Sabara
## actor2 actor3 length budget director_fb_likes
## 1 Joel David Moore Wes Studi 178 237000000 0
## 2 Orlando Bloom Jack Davenport 169 300000000 563
## 3 Rory Kinnear Stephanie Sigman 148 245000000 0
## 4 Christian Bale Joseph Gordon-Levitt 164 250000000 22000
## 5 Rob Walker NA NA 131
## 6 Samantha Morton Polly Walker 132 263700000 475
## actor1_fb_likes actor2_fb_likes actor3_fb_likes total_cast_likes
## 1 1000 936 855 4834
## 2 40000 5000 1000 48350
## 3 11000 393 161 11700
## 4 27000 23000 23000 106759
## 5 131 12 NA 143
## 6 640 632 530 1873
## fb_likes critic_reviews users_reviews users_votes score aspect_ratio
## 1 33000 723 3054 886204 7.9 1.78
## 2 0 302 1238 471220 7.1 2.35
## 3 85000 602 994 275868 6.8 2.35
## 4 164000 813 2701 1144337 8.5 2.35
## 5 0 NA NA 8 7.1 NA
## 6 24000 462 738 212204 6.6 2.35
## gross year
## 1 760505847 2009
## 2 309404152 2007
## 3 200074175 2015
## 4 448130642 2012
## 5 NA NA
## 6 73058679 2012
After reviewing the data, it can be seen that within the title column there are characters that require the data to be cleaned. So the column must be extracted in order to have the character removed.
title = mydata$title
title = sub("Â","",title)
head(title)
## [1] "Avatar "
## [2] "Pirates of the Caribbean: At World's End "
## [3] "Spectre "
## [4] "The Dark Knight Rises "
## [5] "Star Wars: Episode VII - The Force Awakens "
## [6] "John Carter "
In the following, I have calculated Range, Min, Max, Mean, STDEV, and Variance for the following columns: Users Votes, Director Facebook Likes, Total Cast Likes, Gross, and Budget.
Users Votes
UsersVotes = mydata$users_votes
head(UsersVotes)
## [1] 886204 471220 275868 1144337 8 212204
#Max
maxUsersVotes = max(UsersVotes)
maxUsersVotes
## [1] 1689764
#Min
minUsersVotes = min(UsersVotes)
minUsersVotes
## [1] 5
#Mean
averageUsersVotes = mean(UsersVotes)
averageUsersVotes
## [1] 83668.16
#Range
rangeUsersVotes = maxUsersVotes - minUsersVotes
rangeUsersVotes
## [1] 1689759
#Standard Deviation
sdUsersVotes = sd(UsersVotes)
sdUsersVotes
## [1] 138485.3
#Variance
varUsersVotes = var(UsersVotes)
varUsersVotes
## [1] 19178166353
Director Facebook Likes
TotalCastLikes = mydata$total_cast_likes
head(TotalCastLikes)
## [1] 4834 48350 11700 106759 143 1873
#Max
maxTotalCastLikes = max(TotalCastLikes)
maxTotalCastLikes
## [1] 656730
#Min
minTotalCastLikes = min(TotalCastLikes)
minTotalCastLikes
## [1] 0
#Mean
averageTotalCastLikes = mean(TotalCastLikes)
averageTotalCastLikes
## [1] 9699.064
#Range
rangeTotalCastLikes = maxTotalCastLikes - minTotalCastLikes
rangeTotalCastLikes
## [1] 656730
#Standard Deviation
sdTotalCastLikes = sd(TotalCastLikes)
sdTotalCastLikes
## [1] 18163.8
#Variance
varTotalCastLikes = var(TotalCastLikes)
varTotalCastLikes
## [1] 329923599
Director Facebook Likes
DirectorFacebookLikes = mydata$director_fb_likes
DirectorFacebookLikes = DirectorFacebookLikes[!is.na(DirectorFacebookLikes)]
head(DirectorFacebookLikes)
## [1] 0 563 0 22000 131 475
#Max
maxDirectorFacebookLikes = max(DirectorFacebookLikes)
maxDirectorFacebookLikes
## [1] 23000
#Min
minDirectorFacebookLikes = min(DirectorFacebookLikes)
minDirectorFacebookLikes
## [1] 0
#Mean
averageDirectorFacebookLikes = mean(DirectorFacebookLikes)
averageDirectorFacebookLikes
## [1] 686.5092
summary(mydata)
## title genres
## Ben-Hur : 3 Drama : 236
## Halloween : 3 Comedy : 209
## Home : 3 Comedy|Drama : 191
## King Kong : 3 Comedy|Drama|Romance: 187
## Pan : 3 Comedy|Romance : 158
## The Fast and the Furious : 3 Drama|Romance : 152
## (Other) :5025 (Other) :3910
## director actor1 actor2
## : 104 Robert De Niro : 49 Morgan Freeman : 20
## Steven Spielberg: 26 Johnny Depp : 41 Charlize Theron: 15
## Woody Allen : 22 Nicolas Cage : 33 Brad Pitt : 14
## Clint Eastwood : 20 J.K. Simmons : 31 : 13
## Martin Scorsese : 20 Bruce Willis : 30 James Franco : 11
## Ridley Scott : 17 Denzel Washington: 30 Meryl Streep : 11
## (Other) :4834 (Other) :4829 (Other) :4959
## actor3 length budget
## : 23 Min. : 7.0 Min. :2.180e+02
## Ben Mendelsohn: 8 1st Qu.: 93.0 1st Qu.:6.000e+06
## John Heard : 8 Median :103.0 Median :2.000e+07
## Steve Coogan : 8 Mean :107.2 Mean :3.975e+07
## Anne Hathaway : 7 3rd Qu.:118.0 3rd Qu.:4.500e+07
## Jon Gries : 7 Max. :511.0 Max. :1.222e+10
## (Other) :4982 NA's :15 NA's :492
## director_fb_likes actor1_fb_likes actor2_fb_likes actor3_fb_likes
## Min. : 0.0 Min. : 0 Min. : 0 Min. : 0.0
## 1st Qu.: 7.0 1st Qu.: 614 1st Qu.: 281 1st Qu.: 133.0
## Median : 49.0 Median : 988 Median : 595 Median : 371.5
## Mean : 686.5 Mean : 6560 Mean : 1652 Mean : 645.0
## 3rd Qu.: 194.5 3rd Qu.: 11000 3rd Qu.: 918 3rd Qu.: 636.0
## Max. :23000.0 Max. :640000 Max. :137000 Max. :23000.0
## NA's :104 NA's :7 NA's :13 NA's :23
## total_cast_likes fb_likes critic_reviews users_reviews
## Min. : 0 Min. : 0 Min. : 1.0 Min. : 1.0
## 1st Qu.: 1411 1st Qu.: 0 1st Qu.: 50.0 1st Qu.: 65.0
## Median : 3090 Median : 166 Median :110.0 Median : 156.0
## Mean : 9699 Mean : 7526 Mean :140.2 Mean : 272.8
## 3rd Qu.: 13756 3rd Qu.: 3000 3rd Qu.:195.0 3rd Qu.: 326.0
## Max. :656730 Max. :349000 Max. :813.0 Max. :5060.0
## NA's :50 NA's :21
## users_votes score aspect_ratio gross
## Min. : 5 Min. :1.600 Min. : 1.18 Min. : 162
## 1st Qu.: 8594 1st Qu.:5.800 1st Qu.: 1.85 1st Qu.: 5340988
## Median : 34359 Median :6.600 Median : 2.35 Median : 25517500
## Mean : 83668 Mean :6.442 Mean : 2.22 Mean : 48468408
## 3rd Qu.: 96309 3rd Qu.:7.200 3rd Qu.: 2.35 3rd Qu.: 62309438
## Max. :1689764 Max. :9.500 Max. :16.00 Max. :760505847
## NA's :329 NA's :884
## year
## Min. :1916
## 1st Qu.:1999
## Median :2005
## Mean :2002
## 3rd Qu.:2011
## Max. :2016
## NA's :108
#Range
rangeDirectorFacebookLikes = maxDirectorFacebookLikes - minDirectorFacebookLikes
rangeDirectorFacebookLikes
## [1] 23000
#Standard Deviation
sdDirectorFacebookLikes = sd(DirectorFacebookLikes)
sdDirectorFacebookLikes
## [1] 2813.329
#Variance
varDirectorFacebookLikes = var(DirectorFacebookLikes)
varDirectorFacebookLikes
## [1] 7914818
Gross
gross = mydata$gross
gross = gross[!is.na(gross)]
head(gross)
## [1] 760505847 309404152 200074175 448130642 73058679 336530303
#Max
maxGross = max(gross)
maxGross
## [1] 760505847
#Min
minGross = min(gross)
minGross
## [1] 162
#Mean
averageGross = mean(gross)
averageGross
## [1] 48468408
#Range
rangeGross = maxGross - minGross
rangeGross
## [1] 760505685
#Standard Deviation
sdGross = sd(gross)
sdGross
## [1] 68452990
#Variance
varGross = var(gross)
varGross
## [1] 4.685812e+15
Budget
budget = mydata$budget
budget = budget[!is.na(budget)]
head(budget)
## [1] 237000000 300000000 245000000 250000000 263700000 258000000
#Maximum
maxbudget = max(budget)
maxbudget
## [1] 12215500000
#MIN
minbudget = min(budget)
minbudget
## [1] 218
#Mean
averagebudget = mean(budget)
averagebudget
## [1] 39752620
#Range
rangebudget = maxbudget - minbudget
rangebudget
## [1] 12215499782
#Standard Deviation
sdbudget = sd(budget)
sdbudget
## [1] 206114898
#Variance
varbudget = var(budget)
varbudget
## [1] 4.248335e+16
The summary function computes all of these statistics with one process.
summary(mydata)
## title genres
## Ben-Hur : 3 Drama : 236
## Halloween : 3 Comedy : 209
## Home : 3 Comedy|Drama : 191
## King Kong : 3 Comedy|Drama|Romance: 187
## Pan : 3 Comedy|Romance : 158
## The Fast and the Furious : 3 Drama|Romance : 152
## (Other) :5025 (Other) :3910
## director actor1 actor2
## : 104 Robert De Niro : 49 Morgan Freeman : 20
## Steven Spielberg: 26 Johnny Depp : 41 Charlize Theron: 15
## Woody Allen : 22 Nicolas Cage : 33 Brad Pitt : 14
## Clint Eastwood : 20 J.K. Simmons : 31 : 13
## Martin Scorsese : 20 Bruce Willis : 30 James Franco : 11
## Ridley Scott : 17 Denzel Washington: 30 Meryl Streep : 11
## (Other) :4834 (Other) :4829 (Other) :4959
## actor3 length budget
## : 23 Min. : 7.0 Min. :2.180e+02
## Ben Mendelsohn: 8 1st Qu.: 93.0 1st Qu.:6.000e+06
## John Heard : 8 Median :103.0 Median :2.000e+07
## Steve Coogan : 8 Mean :107.2 Mean :3.975e+07
## Anne Hathaway : 7 3rd Qu.:118.0 3rd Qu.:4.500e+07
## Jon Gries : 7 Max. :511.0 Max. :1.222e+10
## (Other) :4982 NA's :15 NA's :492
## director_fb_likes actor1_fb_likes actor2_fb_likes actor3_fb_likes
## Min. : 0.0 Min. : 0 Min. : 0 Min. : 0.0
## 1st Qu.: 7.0 1st Qu.: 614 1st Qu.: 281 1st Qu.: 133.0
## Median : 49.0 Median : 988 Median : 595 Median : 371.5
## Mean : 686.5 Mean : 6560 Mean : 1652 Mean : 645.0
## 3rd Qu.: 194.5 3rd Qu.: 11000 3rd Qu.: 918 3rd Qu.: 636.0
## Max. :23000.0 Max. :640000 Max. :137000 Max. :23000.0
## NA's :104 NA's :7 NA's :13 NA's :23
## total_cast_likes fb_likes critic_reviews users_reviews
## Min. : 0 Min. : 0 Min. : 1.0 Min. : 1.0
## 1st Qu.: 1411 1st Qu.: 0 1st Qu.: 50.0 1st Qu.: 65.0
## Median : 3090 Median : 166 Median :110.0 Median : 156.0
## Mean : 9699 Mean : 7526 Mean :140.2 Mean : 272.8
## 3rd Qu.: 13756 3rd Qu.: 3000 3rd Qu.:195.0 3rd Qu.: 326.0
## Max. :656730 Max. :349000 Max. :813.0 Max. :5060.0
## NA's :50 NA's :21
## users_votes score aspect_ratio gross
## Min. : 5 Min. :1.600 Min. : 1.18 Min. : 162
## 1st Qu.: 8594 1st Qu.:5.800 1st Qu.: 1.85 1st Qu.: 5340988
## Median : 34359 Median :6.600 Median : 2.35 Median : 25517500
## Mean : 83668 Mean :6.442 Mean : 2.22 Mean : 48468408
## 3rd Qu.: 96309 3rd Qu.:7.200 3rd Qu.: 2.35 3rd Qu.: 62309438
## Max. :1689764 Max. :9.500 Max. :16.00 Max. :760505847
## NA's :329 NA's :884
## year
## Min. :1916
## 1st Qu.:1999
## Median :2005
## Mean :2002
## 3rd Qu.:2011
## Max. :2016
## NA's :108
Budget Plot
This plot is represenative of the budget column
plot(budget)
plot
## function (x, y, ...)
## UseMethod("plot")
## <bytecode: 0x7f9d91fdd150>
## <environment: namespace:graphics>
It is difficult to make accurate assessments from this plot graph therefore, the graph must be customized more accordingly.Specifically from least to most expensive movie.
#Budget is reorder from least to most expensive budget
leastexpensiveBudget = sort(budget)
#xlab labels the x axis, ylab labels the y axis
plot(leastexpensiveBudget, type="b", xlab = "Movie", ylab = "Movie Budget", col.main = "Red")
This graph shows us that the vast majority of films are produced with the mean amount, with only very few reaching above that.
4 Layout Graphs
Director Facebook Likes, Total Cast Likes, Facebook Likes, and Budget
layout(matrix(1:4,2,2))
#Director Facebook Likes
ascendingDirectorFacebookLikes = sort(DirectorFacebookLikes)
plot(ascendingDirectorFacebookLikes, type="b", xlab = "Movie", ylab = "Director Facebook Likes")
title ("Director Facebook Likes by Movie")
#Total Cast Likes
ascendingTotalCastLikes = sort(TotalCastLikes)
plot(ascendingTotalCastLikes, type="b", xlab = "Movie", ylab = "Total Cast Likes")
title ("Total Cast Likes by Movie")
#Gross
ascendingGross = sort(gross)
plot(ascendingGross, type="b", xlab = "Movie", ylab = "Gross")
title ("Gross by Movie")
#Budget
ascendingBudget = sort(budget)
plot(ascendingBudget, type="b", xlab = "Movie", ylab = "Budget")
title ("Budget")
From these graphs, it follows that the more popularity a movie has from either its director or cast then they are more likely record higher gross earnings. Budget follows this curve as well but less dramatically.
In the following, I have calculated the Z-score for my previously selected variables.
#Budget
budgetZscore = (10234001 - averagebudget) / sdbudget
budgetZscore
## [1] -0.1432144
#Gross
grossZscore = (10234001 - averageGross) / sdGross
grossZscore
## [1] -0.5585498
#Director Facebook Likes
directorfacebooklikesZscore = (10234001 - averageDirectorFacebookLikes) / sdDirectorFacebookLikes
directorfacebooklikesZscore
## [1] 3637.44
#Total Cast Likes
totalcastlikesZscore = (10234001 - averageTotalCastLikes) / sdTotalCastLikes
totalcastlikesZscore
## [1] 562.8945
# Budget
zscoreBudget = (budget - averagebudget) / sdbudget
# Total Cast Likes
zscoreTotalCastLikes = (TotalCastLikes - averageTotalCastLikes) / sdTotalCastLikes
# Director Facebook Likes
zscoreDirectorFacebookLikes = (DirectorFacebookLikes - averageDirectorFacebookLikes) / sdDirectorFacebookLikes
# Gross
zscoreGross = (gross- averageGross) / sdGross
#Histogram Graphs of Zscores
layout(matrix(1:4, 2,2))
hist(zscoreBudget)
hist(zscoreGross)
hist(zscoreTotalCastLikes)
hist(zscoreDirectorFacebookLikes)