Business Analytics Lab Worksheet 03

Task 1

First begin by reading in the data from the ‘marketing.csv’ file, and viewing it to make sure we see it being read in correctly.

mydata = read.csv(file="data/rottentomatoes.csv")
head(mydata)

##                                                      title
## 1                                                 AvatarÂ 
## 2               Pirates of the Caribbean: At World's EndÂ 
## 3                                                SpectreÂ 
## 4                                  The Dark Knight RisesÂ 
## 5 Star Wars: Episode VII - The Force AwakensÂ             
## 6                                            John CarterÂ 
##                            genres          director          actor1
## 1 Action|Adventure|Fantasy|Sci-Fi     James Cameron     CCH Pounder
## 2        Action|Adventure|Fantasy    Gore Verbinski     Johnny Depp
## 3       Action|Adventure|Thriller        Sam Mendes Christoph Waltz
## 4                 Action|Thriller Christopher Nolan       Tom Hardy
## 5                     Documentary       Doug Walker     Doug Walker
## 6         Action|Adventure|Sci-Fi    Andrew Stanton    Daryl Sabara
##             actor2               actor3 length    budget director_fb_likes
## 1 Joel David Moore            Wes Studi    178 237000000                 0
## 2    Orlando Bloom       Jack Davenport    169 300000000               563
## 3     Rory Kinnear     Stephanie Sigman    148 245000000                 0
## 4   Christian Bale Joseph Gordon-Levitt    164 250000000             22000
## 5       Rob Walker                          NA        NA               131
## 6  Samantha Morton         Polly Walker    132 263700000               475
##   actor1_fb_likes actor2_fb_likes actor3_fb_likes total_cast_likes
## 1            1000             936             855             4834
## 2           40000            5000            1000            48350
## 3           11000             393             161            11700
## 4           27000           23000           23000           106759
## 5             131              12              NA              143
## 6             640             632             530             1873
##   fb_likes critic_reviews users_reviews users_votes score aspect_ratio
## 1    33000            723          3054      886204   7.9         1.78
## 2        0            302          1238      471220   7.1         2.35
## 3    85000            602           994      275868   6.8         2.35
## 4   164000            813          2701     1144337   8.5         2.35
## 5        0             NA            NA           8   7.1           NA
## 6    24000            462           738      212204   6.6         2.35
##       gross year
## 1 760505847 2009
## 2 309404152 2007
## 3 200074175 2015
## 4 448130642 2012
## 5        NA   NA
## 6  73058679 2012

Now calculate the Range, Min, Max, Mean, STDEV, and Variance for each variable. Below is an example of how to compute the items for the variable ‘sales’. Follow the example and do the same for radio, paper, tv, and pos.

summary(mydata)

##                         title                       genres    
##  Ben-HurÂ                  :   3   Drama               : 236  
##  HalloweenÂ                :   3   Comedy              : 209  
##  HomeÂ                     :   3   Comedy|Drama        : 191  
##  King KongÂ                :   3   Comedy|Drama|Romance: 187  
##  PanÂ                      :   3   Comedy|Romance      : 158  
##  The Fast and the FuriousÂ :   3   Drama|Romance       : 152  
##  (Other)                   :5025   (Other)             :3910  
##              director                  actor1                 actor2    
##                  : 104   Robert De Niro   :  49   Morgan Freeman :  20  
##  Steven Spielberg:  26   Johnny Depp      :  41   Charlize Theron:  15  
##  Woody Allen     :  22   Nicolas Cage     :  33   Brad Pitt      :  14  
##  Clint Eastwood  :  20   J.K. Simmons     :  31                  :  13  
##  Martin Scorsese :  20   Bruce Willis     :  30   James Franco   :  11  
##  Ridley Scott    :  17   Denzel Washington:  30   Meryl Streep   :  11  
##  (Other)         :4834   (Other)          :4829   (Other)        :4959  
##             actor3         length          budget         
##                :  23   Min.   :  7.0   Min.   :2.180e+02  
##  Ben Mendelsohn:   8   1st Qu.: 93.0   1st Qu.:6.000e+06  
##  John Heard    :   8   Median :103.0   Median :2.000e+07  
##  Steve Coogan  :   8   Mean   :107.2   Mean   :3.975e+07  
##  Anne Hathaway :   7   3rd Qu.:118.0   3rd Qu.:4.500e+07  
##  Jon Gries     :   7   Max.   :511.0   Max.   :1.222e+10  
##  (Other)       :4982   NA's   :15      NA's   :492        
##  director_fb_likes actor1_fb_likes  actor2_fb_likes  actor3_fb_likes  
##  Min.   :    0.0   Min.   :     0   Min.   :     0   Min.   :    0.0  
##  1st Qu.:    7.0   1st Qu.:   614   1st Qu.:   281   1st Qu.:  133.0  
##  Median :   49.0   Median :   988   Median :   595   Median :  371.5  
##  Mean   :  686.5   Mean   :  6560   Mean   :  1652   Mean   :  645.0  
##  3rd Qu.:  194.5   3rd Qu.: 11000   3rd Qu.:   918   3rd Qu.:  636.0  
##  Max.   :23000.0   Max.   :640000   Max.   :137000   Max.   :23000.0  
##  NA's   :104       NA's   :7        NA's   :13       NA's   :23       
##  total_cast_likes    fb_likes      critic_reviews  users_reviews   
##  Min.   :     0   Min.   :     0   Min.   :  1.0   Min.   :   1.0  
##  1st Qu.:  1411   1st Qu.:     0   1st Qu.: 50.0   1st Qu.:  65.0  
##  Median :  3090   Median :   166   Median :110.0   Median : 156.0  
##  Mean   :  9699   Mean   :  7526   Mean   :140.2   Mean   : 272.8  
##  3rd Qu.: 13756   3rd Qu.:  3000   3rd Qu.:195.0   3rd Qu.: 326.0  
##  Max.   :656730   Max.   :349000   Max.   :813.0   Max.   :5060.0  
##                                    NA's   :50      NA's   :21      
##   users_votes          score        aspect_ratio       gross          
##  Min.   :      5   Min.   :1.600   Min.   : 1.18   Min.   :      162  
##  1st Qu.:   8594   1st Qu.:5.800   1st Qu.: 1.85   1st Qu.:  5340988  
##  Median :  34359   Median :6.600   Median : 2.35   Median : 25517500  
##  Mean   :  83668   Mean   :6.442   Mean   : 2.22   Mean   : 48468408  
##  3rd Qu.:  96309   3rd Qu.:7.200   3rd Qu.: 2.35   3rd Qu.: 62309438  
##  Max.   :1689764   Max.   :9.500   Max.   :16.00   Max.   :760505847  
##                                    NA's   :329     NA's   :884        
##       year     
##  Min.   :1916  
##  1st Qu.:1999  
##  Median :2005  
##  Mean   :2002  
##  3rd Qu.:2011  
##  Max.   :2016  
##  NA's   :108

gross = mydata$gross 
gross = gross[!is.na(gross)]

max_gross = max(gross)
max_gross

## [1] 760505847

min_gross = min(gross)
min_gross

## [1] 162

max_gross-min_gross

## [1] 760505685

mean_gross = mean(gross)
mean_gross

## [1] 48468408

sd_gross = sd(gross)
sd_gross

## [1] 68452990

var_gross = var(gross)
var_gross

## [1] 4.685812e+15

uservotes = mydata$users_votes 
uservotes = uservotes[!is.na(uservotes)]

max_uservotes = max(uservotes)
max_uservotes

## [1] 1689764

min_uservotes = min(uservotes)
min_uservotes

## [1] 5

max_uservotes-min_uservotes

## [1] 1689759

mean_uservotes = mean(uservotes)
mean_uservotes

## [1] 83668.16

sd_uservotes = sd(uservotes)
sd_uservotes

## [1] 138485.3

var_uservotes = var(uservotes)
var_uservotes

## [1] 19178166353

totalcastlikes = mydata$total_cast_likes 
totalcastlikes = totalcastlikes[!is.na(totalcastlikes)]

max_totalcastlikes = max(totalcastlikes)
max_totalcastlikes

## [1] 656730

min_totalcastlikes = min(totalcastlikes)
min_totalcastlikes

## [1] 0

max_totalcastlikes-min_totalcastlikes

## [1] 656730

mean_totalcastlikes = mean(totalcastlikes)
mean_totalcastlikes

## [1] 9699.064

sd_totalcastlikes = sd(totalcastlikes)
sd_totalcastlikes

## [1] 18163.8

var_totalcastlikes = var(totalcastlikes)
var_totalcastlikes

## [1] 329923599

directorFBlikes = mydata$director_fb_likes 
directorFBlikes = directorFBlikes[!is.na(directorFBlikes)]

max_directorFBlikes = max(directorFBlikes)
max_directorFBlikes

## [1] 23000

min_directorFBlikes = min(directorFBlikes)
min_directorFBlikes

## [1] 0

max_directorFBlikes-min_directorFBlikes

## [1] 23000

mean_directorFBlikes = mean(directorFBlikes)
mean_directorFBlikes

## [1] 686.5092

sd_directorFBlikes = sd(directorFBlikes)
sd_directorFBlikes

## [1] 2813.329

var_directorFBlikes = var(directorFBlikes)
var_directorFBlikes

## [1] 7914818

criticreviews = mydata$critic_reviews 
criticreviews = criticreviews[!is.na(criticreviews)]

max_criticreviews = max(criticreviews)
max_criticreviews

## [1] 813

min_criticreviews = min(criticreviews)
min_criticreviews

## [1] 1

max_criticreviews-min_criticreviews

## [1] 812

mean_criticreviews = mean(criticreviews)
mean_criticreviews

## [1] 140.1943

sd_criticreviews = sd(criticreviews)
sd_criticreviews

## [1] 121.6017

var_criticreviews = var(criticreviews)
var_criticreviews

## [1] 14786.97

Task 2

An easy way to calculate all of these statistics of all of these variables is with the summary function. Below is an example.

summary(mydata)

##                         title                       genres    
##  Ben-HurÂ                  :   3   Drama               : 236  
##  HalloweenÂ                :   3   Comedy              : 209  
##  HomeÂ                     :   3   Comedy|Drama        : 191  
##  King KongÂ                :   3   Comedy|Drama|Romance: 187  
##  PanÂ                      :   3   Comedy|Romance      : 158  
##  The Fast and the FuriousÂ :   3   Drama|Romance       : 152  
##  (Other)                   :5025   (Other)             :3910  
##              director                  actor1                 actor2    
##                  : 104   Robert De Niro   :  49   Morgan Freeman :  20  
##  Steven Spielberg:  26   Johnny Depp      :  41   Charlize Theron:  15  
##  Woody Allen     :  22   Nicolas Cage     :  33   Brad Pitt      :  14  
##  Clint Eastwood  :  20   J.K. Simmons     :  31                  :  13  
##  Martin Scorsese :  20   Bruce Willis     :  30   James Franco   :  11  
##  Ridley Scott    :  17   Denzel Washington:  30   Meryl Streep   :  11  
##  (Other)         :4834   (Other)          :4829   (Other)        :4959  
##             actor3         length          budget         
##                :  23   Min.   :  7.0   Min.   :2.180e+02  
##  Ben Mendelsohn:   8   1st Qu.: 93.0   1st Qu.:6.000e+06  
##  John Heard    :   8   Median :103.0   Median :2.000e+07  
##  Steve Coogan  :   8   Mean   :107.2   Mean   :3.975e+07  
##  Anne Hathaway :   7   3rd Qu.:118.0   3rd Qu.:4.500e+07  
##  Jon Gries     :   7   Max.   :511.0   Max.   :1.222e+10  
##  (Other)       :4982   NA's   :15      NA's   :492        
##  director_fb_likes actor1_fb_likes  actor2_fb_likes  actor3_fb_likes  
##  Min.   :    0.0   Min.   :     0   Min.   :     0   Min.   :    0.0  
##  1st Qu.:    7.0   1st Qu.:   614   1st Qu.:   281   1st Qu.:  133.0  
##  Median :   49.0   Median :   988   Median :   595   Median :  371.5  
##  Mean   :  686.5   Mean   :  6560   Mean   :  1652   Mean   :  645.0  
##  3rd Qu.:  194.5   3rd Qu.: 11000   3rd Qu.:   918   3rd Qu.:  636.0  
##  Max.   :23000.0   Max.   :640000   Max.   :137000   Max.   :23000.0  
##  NA's   :104       NA's   :7        NA's   :13       NA's   :23       
##  total_cast_likes    fb_likes      critic_reviews  users_reviews   
##  Min.   :     0   Min.   :     0   Min.   :  1.0   Min.   :   1.0  
##  1st Qu.:  1411   1st Qu.:     0   1st Qu.: 50.0   1st Qu.:  65.0  
##  Median :  3090   Median :   166   Median :110.0   Median : 156.0  
##  Mean   :  9699   Mean   :  7526   Mean   :140.2   Mean   : 272.8  
##  3rd Qu.: 13756   3rd Qu.:  3000   3rd Qu.:195.0   3rd Qu.: 326.0  
##  Max.   :656730   Max.   :349000   Max.   :813.0   Max.   :5060.0  
##                                    NA's   :50      NA's   :21      
##   users_votes          score        aspect_ratio       gross          
##  Min.   :      5   Min.   :1.600   Min.   : 1.18   Min.   :      162  
##  1st Qu.:   8594   1st Qu.:5.800   1st Qu.: 1.85   1st Qu.:  5340988  
##  Median :  34359   Median :6.600   Median : 2.35   Median : 25517500  
##  Mean   :  83668   Mean   :6.442   Mean   : 2.22   Mean   : 48468408  
##  3rd Qu.:  96309   3rd Qu.:7.200   3rd Qu.: 2.35   3rd Qu.: 62309438  
##  Max.   :1689764   Max.   :9.500   Max.   :16.00   Max.   :760505847  
##                                    NA's   :329     NA's   :884        
##       year     
##  Min.   :1916  
##  1st Qu.:1999  
##  Median :2005  
##  Mean   :2002  
##  3rd Qu.:2011  
##  Max.   :2016  
##  NA's   :108

There are some statistics not captured here like standard deviation and variance, but there is an easy and quick way to find most of your basic statistics.

Now, we will produce a basic blot of the ‘gross’ variable . Here we utilize the plot function and within the plot function we call the variable we want to plot.

plot(gross)

When looking at this graph we cannot truly capture the data or see a clear pattern. A better way to visualize this plot would be to re-order the data based on increasing sales.

plot(order(gross,decreasing = TRUE))

marketing = read.csv("data/marketing.csv")
sales = marketing$sales
radio = marketing$radio
tv = marketing$tv
paper = marketing$paper

plot(sales, type="b", xlab = "Case Number", ylab = "Sales in $1,000")

layout(matrix(1:4,2,2))

plot(sales, type="b", xlab = "Case Number", ylab = "Sales in $1,000") 
plot(radio, type="b", xlab = "Case Number", ylab = "Sales in $1,000")

plot(paper, type="b", xlab = "Case Number", ylab = "Sales in $1,000") 
plot(tv, type="b", xlab = "Case Number", ylab = "Sales in $1,000")

There are further ways to customize plots, such as changing the colors of the lines, adding a heading, or even making them interactive.

Now, lets plot the sales graph, alongside radio, paper, and tv which you will code. Make sure to run the code in the same chunk so they are on the same layout.

likes01 = mydata$actor1_fb_likes
likes02 = mydata$actor2_fb_likes
likes03 = mydata$actor3_fb_likes
likes04 = mydata$director_fb_likes

layout(matrix(1:4,2,2))

plot(likes01, type="b", xlab = "Actor 1", ylab = "Sales in $1,000") 

plot(likes02, type="b", xlab = "Actor 2", ylab = "Sales in $1,000")

plot(likes03, type="b", xlab = "Actor 3", ylab = "Sales in $1,000") 

plot(likes04, type="b", xlab = "Director", ylab = "Sales in $1,000")

The 20 months of case_number are in no particular order and not related to a chronological time sequence. They are simply 20 independent use case studies. Since each case is independent, we can reorder them. To reveal a potential trend, consider reordering the sales column from low to high and see how the other four variables behave.

newdata = mydata[order(year),]

year = mydata$year
newdata = mydata[order(year,decreasing = TRUE),]
new_gross = newdata$gross
new_likes1 = newdata$actor1_fb_likes
new_likes2 = newdata$actor2_fb_likes
new_likes3 = newdata$actor3_fb_likes
new_likes4 = newdata$director_fb_likes
plot(new_gross)

Task 3

Given a sales value of 25000, calculate the corresponding z-value or z-score using the mean and standard deviation calculations conducted in task 1.

We know that the z-score = (x - mean)/sd. So, input this into the R code where x=25000, mean=16717.2, and stdev = 2617.0521 which we found above.

Based on the z-values, how would you rate a $25000 sales value: poor, average, good, or very good performance? Explain your logic. THE Z-SCORE IS 3.164. THE MEAN OF SALES IS 16717.2. A 25000 SALES VALUE IS A VERY GOOD PERFORMANCE. THE POSITIVE Z-SCORE INDICATES THE 25000 SALES VALUE IS OUTSIDE THE NORMAL DISTRIBUTION.

RottenTomatoes

marketing

##    case_number sales radio paper  tv pos
## 1            1 11125    65    89 250 1.3
## 2            2 16121    73    55 260 1.6
## 3            3 16440    74    58 270 1.7
## 4            4 16876    75    82 270 1.3
## 5            5 13965    69    75 255 1.5
## 6            6 14999    70    71 255 2.1
## 7            7 20167    87    59 280 1.2
## 8            8 20450    89    65 280 3.0
## 9            9 15789    72    62 260 1.6
## 10          10 15991    73    56 260 1.6
## 11          11 15234    70    66 255 1.5
## 12          12 17522    78    50 270 0.0
## 13          13 17933    79    47 275 0.2
## 14          14 18390    81    78 275 0.9
## 15          15 18723    81    41 275 1.0
## 16          16 19328    84    63 280 2.6
## 17          17 19399    84    77 280 1.2
## 18          18 19641    85    35 280 2.5
## 19          19 12369    65    37 250 2.5
## 20          20 13882    68    80 252 1.4

mean_sales = mean(sales) 
sd_sales = sd(sales)

mean_sales

## [1] 16717.2

sd_sales

## [1] 2617.052

zscore = (25000 - mean_sales)/sd_sales 
zscore

## [1] 3.164935

Business Analytics Lab Worksheet 03

CME Group Foundation Business Analytics Lab

Ally Ungashick

July 26, 2017

Task 1

Task 2

Task 3

RottenTomatoes