Task 1

First begin by reading in the data from the ‘rottentomatoes.csv’ file, and viewing it to make sure we see it being read in correctly.

mydata = read.csv(file="data/rottentomatoes.csv")
head(mydata)

By using our ‘str’ function below we can see that there are some ‘N/As’ in our data. In order to get accurate results for the remainder of this lab we will remove all N/As in our data when calculating min,max, median etc.

str(mydata)
'data.frame':   5043 obs. of  21 variables:
 $ ï..title         : Factor w/ 4917 levels "#Horror ",..: 398 2731 3279 3707 3332 1961 3289 3459 399 1631 ...
 $ genres           : Factor w/ 914 levels "Action","Action|Adventure",..: 107 101 128 288 754 126 120 308 126 447 ...
 $ director         : Factor w/ 2399 levels "","A. Raven Cruz",..: 929 801 2027 380 606 109 2030 1652 1228 554 ...
 $ actor1           : Factor w/ 2098 levels "","50 Cent","A.J. Buckley",..: 305 983 355 1968 528 443 787 223 338 35 ...
 $ actor2           : Factor w/ 3033 levels "","50 Cent","A. Michael Baldwin",..: 1408 2218 2489 534 2433 2549 1228 801 2440 653 ...
 $ actor3           : Factor w/ 3522 levels "","50 Cent","A.J. Buckley",..: 3442 1393 3134 1771 1 2714 1970 2163 3018 2941 ...
 $ length           : int  178 169 148 164 NA 132 156 100 141 153 ...
 $ budget           : num  2.37e+08 3.00e+08 2.45e+08 2.50e+08 NA ...
 $ director_fb_likes: int  0 563 0 22000 131 475 0 15 0 282 ...
 $ actor1_fb_likes  : int  1000 40000 11000 27000 131 640 24000 799 26000 25000 ...
 $ actor2_fb_likes  : int  936 5000 393 23000 12 632 11000 553 21000 11000 ...
 $ actor3_fb_likes  : int  855 1000 161 23000 NA 530 4000 284 19000 10000 ...
 $ total_cast_likes : int  4834 48350 11700 106759 143 1873 46055 2036 92000 58753 ...
 $ fb_likes         : int  33000 0 85000 164000 0 24000 0 29000 118000 10000 ...
 $ critic_reviews   : int  723 302 602 813 NA 462 392 324 635 375 ...
 $ users_reviews    : int  3054 1238 994 2701 NA 738 1902 387 1117 973 ...
 $ users_votes      : int  886204 471220 275868 1144337 8 212204 383056 294810 462669 321795 ...
 $ score            : num  7.9 7.1 6.8 8.5 7.1 6.6 6.2 7.8 7.5 7.5 ...
 $ aspect_ratio     : num  1.78 2.35 2.35 2.35 NA 2.35 2.35 1.85 2.35 2.35 ...
 $ gross            : int  760505847 309404152 200074175 448130642 NA 73058679 336530303 200807262 458991599 301956980 ...
 $ year             : int  2009 2007 2015 2012 NA 2012 2007 2010 2015 2009 ...

Below you will see a summary of our data, which also includes numerical values of our min, max, mean and median for data

summary(mydata)
                         ï..title                     genres                 director   
 Ben-Hur                  :   3   Drama               : 236                   : 104  
 Halloween                :   3   Comedy              : 209   Steven Spielberg:  26  
 Home                     :   3   Comedy|Drama        : 191   Woody Allen     :  22  
 King Kong                :   3   Comedy|Drama|Romance: 187   Clint Eastwood  :  20  
 Pan                      :   3   Comedy|Romance      : 158   Martin Scorsese :  20  
 The Fast and the Furious :   3   Drama|Romance       : 152   Ridley Scott    :  17  
 (Other)                     :5025   (Other)             :3910   (Other)         :4834  
               actor1                 actor2                actor3         length     
 Robert De Niro   :  49   Morgan Freeman :  20                 :  23   Min.   :  7.0  
 Johnny Depp      :  41   Charlize Theron:  15   Ben Mendelsohn:   8   1st Qu.: 93.0  
 Nicolas Cage     :  33   Brad Pitt      :  14   John Heard    :   8   Median :103.0  
 J.K. Simmons     :  31                  :  13   Steve Coogan  :   8   Mean   :107.2  
 Bruce Willis     :  30   James Franco   :  11   Anne Hathaway :   7   3rd Qu.:118.0  
 Denzel Washington:  30   Meryl Streep   :  11   Jon Gries     :   7   Max.   :511.0  
 (Other)          :4829   (Other)        :4959   (Other)       :4982   NA's   :15     
     budget          director_fb_likes actor1_fb_likes  actor2_fb_likes  actor3_fb_likes  
 Min.   :2.180e+02   Min.   :    0.0   Min.   :     0   Min.   :     0   Min.   :    0.0  
 1st Qu.:6.000e+06   1st Qu.:    7.0   1st Qu.:   614   1st Qu.:   281   1st Qu.:  133.0  
 Median :2.000e+07   Median :   49.0   Median :   988   Median :   595   Median :  371.5  
 Mean   :3.975e+07   Mean   :  686.5   Mean   :  6560   Mean   :  1652   Mean   :  645.0  
 3rd Qu.:4.500e+07   3rd Qu.:  194.5   3rd Qu.: 11000   3rd Qu.:   918   3rd Qu.:  636.0  
 Max.   :1.222e+10   Max.   :23000.0   Max.   :640000   Max.   :137000   Max.   :23000.0  
 NA's   :492         NA's   :104       NA's   :7        NA's   :13       NA's   :23       
 total_cast_likes    fb_likes      critic_reviews  users_reviews     users_votes     
 Min.   :     0   Min.   :     0   Min.   :  1.0   Min.   :   1.0   Min.   :      5  
 1st Qu.:  1411   1st Qu.:     0   1st Qu.: 50.0   1st Qu.:  65.0   1st Qu.:   8594  
 Median :  3090   Median :   166   Median :110.0   Median : 156.0   Median :  34359  
 Mean   :  9699   Mean   :  7526   Mean   :140.2   Mean   : 272.8   Mean   :  83668  
 3rd Qu.: 13756   3rd Qu.:  3000   3rd Qu.:195.0   3rd Qu.: 326.0   3rd Qu.:  96309  
 Max.   :656730   Max.   :349000   Max.   :813.0   Max.   :5060.0   Max.   :1689764  
                                   NA's   :50      NA's   :21                        
     score        aspect_ratio       gross                year     
 Min.   :1.600   Min.   : 1.18   Min.   :      162   Min.   :1916  
 1st Qu.:5.800   1st Qu.: 1.85   1st Qu.:  5340988   1st Qu.:1999  
 Median :6.600   Median : 2.35   Median : 25517500   Median :2005  
 Mean   :6.442   Mean   : 2.22   Mean   : 48468408   Mean   :2002  
 3rd Qu.:7.200   3rd Qu.: 2.35   3rd Qu.: 62309438   3rd Qu.:2011  
 Max.   :9.500   Max.   :16.00   Max.   :760505847   Max.   :2016  
                 NA's   :329     NA's   :884         NA's   :108   

Below we will run ‘gross’ and ensure that we remove the NA’s

gross = mydata$gross
gross = gross[!is.na(gross)]
gross
   [1] 760505847 309404152 200074175 448130642  73058679 336530303 200807262 458991599 301956980
  [10] 330249062 200069408 168368427 423032628  89289910 291021565 141614023 623279547 241063875
  [19] 179020854 255108370 262030663 105219735 258355354  70083519 218051260 658672302 407197282
  [28]  65173160 652177271 304360277 373377893 408992272 334185206 234360014 268488329 402076689
  [37] 245428137 234903076 202853933 172051787 191450875 116593191 414984497 125320003 350034110
  [46] 202351611 233914986 228756232  65171860 144812796  90755643 101785482 352358779 317011114
  [55] 123070338 237282182 130468626 223806889 140080850 166112167 137850096  47375327 124051759
  [64] 291709845 154985087 533316061 292979556 198332128 318298180  73820094 113745408 102176165
  [73] 161087183 100289690 100189501  88246220 150167630 356454367 362645141 312057433 155111815
  [82] 241407328 208543795  38297305 259746958 238371987  93417865 222487711 189412677    665426
  [91] 102315545 217387997 150350192 333130696 187991439 292568851 303001229 144512310 127490802
 [100] 146405371 281666058  63143812  60655503  76846624 320706665  46978995  89732035 104383624
 [109] 198539855 318759914  34293771 292000866 289994397 227946274 256386216 206456431 206435493
 [118] 205343774 179982968 177243721 179883016 139259759 400736600 281492479 206360018 153629485
 [127] 133375846 181015141 114053579 119420252  83640426  79711678 195000874  61937495 124051759
 [136] 126597121 165230261 131564731 133382309  73103784  21379315  64459316  34964818 111505642
 [145] 133228348 216366733 160201106 118099659 201573391 190418803  82161969 143523463 209364921
 [154] 103400692 110332737 111110575  65007045 257704099 403706375 176997107  31141074  31704416
 [163] 107503316 129734803 132122995 122512052  68642452  32131830 176636816 126930660  93926386
 [172] 292298923  63992328 134518390  52792307 183635922  83024900 123207194  83348920 227137090
 [181] 215395021 180191634 424645577 292298923 177343675 234277056 138396624 149234747 118311368
 [190] 101160529  77564037 249358727  49551662  60522097 137748063 113733726 148337537 317557891
 [199]  33592415 305388685 337103873 217536138 131536019 214948780 209805005 186830669 163192114
 [208] 119412921  32694788 113165635 107285004 260031035 186739919 215397307 182618434 131920333
 [217] 124976634 115802596 108521835 100685880 126464904  64736114  93050117  57637485  58607007
 [226]  43929341  30212620  76418654  89021735 380262555 310675583 289907418 132550960 474544677
 [235] 187165546  40911830  47952020 190871240 274084951  67155742  81638674  56114221 250863268
 [244] 155181732 125332007 113330342 125531634 186336103 129995817 102608827  42776259  98780042
 [253] 106369117 142614158  50026353  66002193  85463309  71017784  48068396  61656849 134520804
 [262] 313837577  24004159  58183966 100446895 144795350  47396698 140015224 104374107 228430993
 [271]  35799026   6712451 101643008 187670866 132014112 261970615 167007184 180011740 204843350
 [280]  97030725 130127620 146282411  65452312 148383780 119219978 101228120 162804648 100117603
 [289]  89296573  85017401 173005002  75030163  77222184  34964818 107515297  67631157  66862068
 [298]  57366262 116866727 184031112  54700065  27098580  55673333  40198710  72660029  38120554
 [307]  49392095  39292022  28772222  17010646  24985612   4411102  35024475 130174897  10200000
 [316] 202007640  77679638      9213  58867694  59475623 108638745  86897182  63540020  95328937
 [325]  50802661 161317423 201148159  43982842 380838870 377019252 340478898  17176900 131144183
 [334]  23014504 181166115 176740650  71148699  67344392  22406362 261437578  11000000  88761720
 [343] 250147615 245823397  81557479 226138454 155370362 124870275 196573705  58229120 125305545
 [352] 132373442 120618403 110416702 102515793 100012500 209019489  84037039  85884815  83077470
 [361] 100018837  78747585  78616689  75817994 100853835  73209340  72515360  68558662  65653758
 [370]  64685359  61355436     26871  60874615 143618384  58220776  47474112  42877165  35168677
 [379]  56114221  37567440  61644321    190562 120147445 241688385 144512310 233630478 197992827
 [388] 176049130 172620724 183405771  20315324 148313048 127706877 126149655  66941559  78009155
 [397]  63224849 111544445 112703470 117144465  84303558 150832203  51396781  47592825  50016394
 [406]  57010853  62494975  46440491  44606335  40048332  64933670  31494270  31111260 123307945
 [415] 153288182  13401683 137340146  43575716  80170146  75754670  33048353  34543701 242589580
 [424] 102981571 180965237 407999255 254455986 162831698 155019340 145771527  82506325 140459099
 [433]  53215979 158115031 133103929 133668525 130313314 124590960 127968405 120136047 128200012
 [442] 112225777 109993847 104054514 103028109 101087161 101111837  95632614  94822707  92969824
 [451]  91188905  90443603  82226474  79363785  76081498  85707116  74329966 100169068  73215310
 [460]  80360866  69102910  65948711    821997 169692572  60507228  56684819  50628009  69772969
 [469]  45356386  55350897  39442871  37899638  37754208  27779888  38542418  34566746  32885565
 [478]  36073232  21471685  20950820  19673424  19480739  17593391  18318000  27356090  17473245
 [487]  15131330  19406406   1891821  23219748 170708996 422783777 103812241 119793567  92930005
 [496]  67286731  74158157 127083765   1339152  15071514  26000610 323505540  66462600 368049635
 [505] 306124059 229074524 193136719  35286428 157299717 134568845 134006721 195329763 120776832
 [514] 118823091  41814863  97360069 117698894 162001186  77032279  73023275  68473360  66636385
 [523] 160762022 103338338  55808744  47379090  43426961  47000485  45434443  42044321  73661010
 [532]  41523271  37600435  39251128  83503161  34636443  22751979  30013346  14567883     90820
 [541]   5409517  21009180  94999143 336029560  36381716  55585389  36976367 107225164  70224196
 [550]  51814190  47456450 148213377 112950721  75600000  62647540 183132370  27796042  32616869
 [559]  18947630 114195633 144156464 227965690 436471036 244052771 152149590 141204016 162495848
 [568] 136448821 120523073 119654900  72660029 117541000 116643346 100614858  42272747  80281096
 [577] 219613391  78120196  98895417  70117571  83552429  66257002  65012000  79883359  78031620
 [586]  54222000  52474616  55942830  40932372  38345403  37901509  48430355  30157016  28031250
 [595]  33105600  62321039  38509342  19076815  25093607  18990542  14294842  19819494  13596911
 [604]   8460990   7097125  37760080   5851188  25121291  18821279 118471320 300523113  71069884
 [613] 251501645  35324232  81257500    617840  29655590  45045037  28965197  27550735  39380442
 [622]  72980108  37516013  87704396  83892374   5932060 216119491  43568507 182805123 176387405
 [631]  33685268 182204440 171383253 172071312 119412921 139225854 148775460 115731542 100468793
 [640]  93771072 100448498 115603980  90454043  84049211  70450000  69688384  70236496  63695760
 [649]  59617068  55637680  85911262  53846915  54758461  52397389  38966057  42345531  36064910
 [658]  33328051  32598931  28045540  37023395  43532294  17218080  10014234  19059018   1987287
 [667]  24407944  13750556  31054924  43247140   2208939 213079163  19548064 356784000  25052000
 [676] 122012710     72413  58255287  77086030  65000000  32178777  15738632  54116191 118153533
 [685] 108012170 210592590 279167575 143151473 136801374 168213584 135381507 167735396 121468960
 [694] 106635996 102678089 125603360 101217900 104148781  75573300  93375151 106126012  93307796
 [703]  90646554 109176215  82670733  82569532  81687587  80574010  75764085  90356857  75530832
 [712]  75370763 100003492  90341670  74540762  80033643  73648142  71844424  75638743  66734992
 [721]  75280058  64505912  77862546  61112916  88200225  60573641  59035104  56702901  55994557
 [730]  54910560  53789313  51045801  50818750  50189179  50024083  50549107  56443482  62401264
 [739]  47748610  46975183  50807639  46611204 257756197  48472213  43060566  45996718  43337279
 [748]  37479778  36965395  40559930  36830057  36279230  42194060  43119879  35096190  35754555
 [757]  43290977  33927476  32122249  40076438  32940507  31670931  30695227  32522352  28424210
 [766]  26082914  29136626  26288320  26616590 623279547  30063805  22518325  13082288  18208078
 [775]  14218868     22451  31165421  11802056  25472967  22362500  17281832  19781879   7605668
 [784]   4535117   4426297  10166502 363024263  12065985 350123553  80021740  48291624  35231365
 [793]  53715611  31199215  29580087  44665963  60128566  49875589  60984028  36931089  51317350
 [802]  28328132  51774002  25528495 113006880  45860039 329691196 217326336 166225040 141600000
 [811] 134218018 128769345 177575142 105263257 104354205 107100855  98711404 100328194 101530738
 [820]  93815117  91400000 162586036  89706988  83000000  78745923  70098138  66365290  66207920
 [829]  63408614  58422650  56932305  68750000  68218041  25040293  55747724  55473600  49994804
 [838]  41609593  38553833  76137505  34350553  34238611  34098563  33828318  33472850  31051126
 [847]  35707327  20550712  18573791  51225796  16264475  25857987  12870569  11466088  16088610
 [856]  51178893   6768055  39440655   6167817  81645152  69951824   9483821  66676062  26838389
 [865]  75604320 108200000   5660084   7221458  70327868  58297830  57386369  45207112  62563543
 [874]  33574332  73343413  25031037  22843047   5755286 164435221  95720716 118683135 143704210
 [883] 110476776  80270227  36385763  37035845  34580635  42438300  23324666  23020488  90567722
 [892]  72601713  35092918 296623634 267652016  62453315 165500000 153620822 218628680 147637474
 [901] 135014968   2175312 126203320 126975169 125548685 105807520 191616238 105264608  97680195
 [910] 126088877  91030827 150315155 127997349  88504640  81517441  81022333  75621915  79948113
 [919]  88658172  75888270  84244877  75367693  73701902  75605492  67823573  91439400  67128202
 [928]  70496802  60470220  58336565  66002004  54997476  55682070  52752475  55092830  50815288
 [937]  52822418  50150619  48745150  50007168  48154732  48265581  46982632  44737059  56724080
 [946]  44484065  47553512  42610000  41482207  47105085  41256277  50740078  40203020  40905277
 [955]  38590500  39177541  39778599  37486138  38105077  35168395  32800000  33643461  32741596
 [964]  31874869  30306268  27667947  27067160  26616999  26536120  26199517  25450527  25407250
 [973]  23159305  24006726  20389967  19593740  19118247  26442251  17114882  18472363  14131298
 [982]  21557240  21283440  10556196  16671505  10400000   9528092  10137232   9795017  20488579
 [991]  19445217   8355815  28837115   6471394   6291602  10706786   8742261  43905746  21413502
[1000]   7994115
 [ reached getOption("max.print") -- omitted 3159 entries ]

Now we will calculate the Range, Min, Max, Mean, STDEV, and Variance for each variable. Below is an example of how to compute the items for the variable gross. We will follow the first example and do the same for users_votes, total_cast_likes, director_fb_likes, critic_reviews.

Gross

gross = mydata$gross
#Max Sales
max_gross = max(mydata$gross,na.rm = TRUE)
max
[1] 760505847
#Min Sales
min_gross = min(mydata$gross,na.rm = TRUE)
min
[1] 162
#Range
max_gross-min_gross
[1] 760505685
#Mean
mean_gross = mean(mydata$gross,na.rm = TRUE)
mean_gross
[1] 48468408
#Standard Deviation
sd_gross = sd(mydata$gross,na.rm =TRUE)
sd_gross
[1] 68452990
#Variance
var = var(mydata$gross,na.rm = TRUE)
var
[1] 4.685812e+15

Users Votes

users_votes = mydata$users_votes
#Max users votes
max_users_votes = max(mydata$users_votes,na.rm = TRUE)
max
[1] 760505847
#Min usersvotes
min_users_votes = min(mydata$users_votes,na.rm = TRUE)
min
[1] 162
#Range
max_users_votes-min_users_votes
[1] 1689759
#Mean
mean_users_votes = mean(mydata$users_votes,na.rm = TRUE)
mean_users_votes
[1] 83668.16
#Standard Deviation
sd_users_votes = sd(mydata$users_votes,na.rm =TRUE)
sd_users_votes
[1] 138485.3
#Variance
var = var(mydata$users_votes,na.rm = TRUE)
var
[1] 19178166353

Total_cast_likes

cast_likes = mydata$total_cast_likes
#Max cast_likes
max_cast_likes = max(mydata$total_cast_likes,na.rm = TRUE)
max
[1] 760505847
#Min cast_likes
min_cast_likes = min(mydata$total_cast_likes,na.rm = TRUE)
min
[1] 162
#Range
max_cast_likes-min_cast_likes
[1] 656730
#Mean
mean_cast_likes = mean(mydata$total_cast_likes,na.rm = TRUE)
mean_cast_likes
[1] 9699.064
#Standard Deviation
sd_total_cast_likes = sd(mydata$total_cast_likes,na.rm =TRUE)
sd_total_cast_likes
[1] 18163.8
#Variance
var = var(mydata$total_cast_likes,na.rm = TRUE)
var
[1] 329923599

Director_fb_likes

Director_fb_likes = mydata$director_fb_likes
#Max Director_fb_likes
max_Director_fb_likes = max(mydata$director_fb_likes,na.rm = TRUE)
max
[1] 760505847
#Min Director_fb_likes
min_Director_fb_likes = min(mydata$director_fb_likes,na.rm = TRUE)
min
[1] 162
#Range
max_Director_fb_likes-min_Director_fb_likes
[1] 23000
#Mean
mean_Director_fb_likes = mean(mydata$director_fb_likes,na.rm = TRUE)
mean_Director_fb_likes
[1] 686.5092
#Standard Deviation
sd_director_fb_likes = sd(mydata$director_fb_likes,na.rm =TRUE)
sd_director_fb_likes
[1] 2813.329
#Variance
var = var(mydata$director_fb_likes,na.rm = TRUE)
var
[1] 7914818

Critic_reviews

Critic_reviews = mydata$critic_reviews
#Max Critic_reviews
max_Critic_reviews = max(mydata$critic_reviews,na.rm = TRUE)
max
[1] 760505847
#Min Critic_reviews
min_Critic_reviews = min(mydata$critic_reviews,na.rm = TRUE)
min
[1] 162
#Range
max_Critic_reviews - min_Critic_reviews
[1] 812
#Mean
mean_Critic_reviews = mean(mydata$critic_reviews,na.rm = TRUE)
mean_Critic_reviews
[1] 140.1943
#Standard Deviation
sd_critic_reviews = sd(mydata$critic_reviews,na.rm =TRUE)
sd_critic_reviews
[1] 121.6017
#Variance
var = var(mydata$critic_reviews,na.rm = TRUE)
var
[1] 14786.97

Task 2

An easy way to calculate all of these statistics of all of these variables is with the summary function. As seen earlier in the lab the summary allows us to verify that our calculation for min,max, median, mean are in fact accurate with respect to the data we are working with. (Below is another look at the summary)

summary(mydata)
                         ï..title                     genres                 director   
 Ben-Hur                  :   3   Drama               : 236                   : 104  
 Halloween                :   3   Comedy              : 209   Steven Spielberg:  26  
 Home                     :   3   Comedy|Drama        : 191   Woody Allen     :  22  
 King Kong                :   3   Comedy|Drama|Romance: 187   Clint Eastwood  :  20  
 Pan                      :   3   Comedy|Romance      : 158   Martin Scorsese :  20  
 The Fast and the Furious :   3   Drama|Romance       : 152   Ridley Scott    :  17  
 (Other)                     :5025   (Other)             :3910   (Other)         :4834  
               actor1                 actor2                actor3         length     
 Robert De Niro   :  49   Morgan Freeman :  20                 :  23   Min.   :  7.0  
 Johnny Depp      :  41   Charlize Theron:  15   Ben Mendelsohn:   8   1st Qu.: 93.0  
 Nicolas Cage     :  33   Brad Pitt      :  14   John Heard    :   8   Median :103.0  
 J.K. Simmons     :  31                  :  13   Steve Coogan  :   8   Mean   :107.2  
 Bruce Willis     :  30   James Franco   :  11   Anne Hathaway :   7   3rd Qu.:118.0  
 Denzel Washington:  30   Meryl Streep   :  11   Jon Gries     :   7   Max.   :511.0  
 (Other)          :4829   (Other)        :4959   (Other)       :4982   NA's   :15     
     budget          director_fb_likes actor1_fb_likes  actor2_fb_likes  actor3_fb_likes  
 Min.   :2.180e+02   Min.   :    0.0   Min.   :     0   Min.   :     0   Min.   :    0.0  
 1st Qu.:6.000e+06   1st Qu.:    7.0   1st Qu.:   614   1st Qu.:   281   1st Qu.:  133.0  
 Median :2.000e+07   Median :   49.0   Median :   988   Median :   595   Median :  371.5  
 Mean   :3.975e+07   Mean   :  686.5   Mean   :  6560   Mean   :  1652   Mean   :  645.0  
 3rd Qu.:4.500e+07   3rd Qu.:  194.5   3rd Qu.: 11000   3rd Qu.:   918   3rd Qu.:  636.0  
 Max.   :1.222e+10   Max.   :23000.0   Max.   :640000   Max.   :137000   Max.   :23000.0  
 NA's   :492         NA's   :104       NA's   :7        NA's   :13       NA's   :23       
 total_cast_likes    fb_likes      critic_reviews  users_reviews     users_votes     
 Min.   :     0   Min.   :     0   Min.   :  1.0   Min.   :   1.0   Min.   :      5  
 1st Qu.:  1411   1st Qu.:     0   1st Qu.: 50.0   1st Qu.:  65.0   1st Qu.:   8594  
 Median :  3090   Median :   166   Median :110.0   Median : 156.0   Median :  34359  
 Mean   :  9699   Mean   :  7526   Mean   :140.2   Mean   : 272.8   Mean   :  83668  
 3rd Qu.: 13756   3rd Qu.:  3000   3rd Qu.:195.0   3rd Qu.: 326.0   3rd Qu.:  96309  
 Max.   :656730   Max.   :349000   Max.   :813.0   Max.   :5060.0   Max.   :1689764  
                                   NA's   :50      NA's   :21                        
     score        aspect_ratio       gross                year     
 Min.   :1.600   Min.   : 1.18   Min.   :      162   Min.   :1916  
 1st Qu.:5.800   1st Qu.: 1.85   1st Qu.:  5340988   1st Qu.:1999  
 Median :6.600   Median : 2.35   Median : 25517500   Median :2005  
 Mean   :6.442   Mean   : 2.22   Mean   : 48468408   Mean   :2002  
 3rd Qu.:7.200   3rd Qu.: 2.35   3rd Qu.: 62309438   3rd Qu.:2011  
 Max.   :9.500   Max.   :16.00   Max.   :760505847   Max.   :2016  
                 NA's   :329     NA's   :884         NA's   :108   

However when dealing with other functions such as the variance and the standard deviation it is not as simple to come up with those answers without using another formula, but for basics this will suffice.


Now, we will produce a basic blot of the ‘gross’ variable . Here we utilize the plot function and within the plot function we call the variable that we intend to plot.

#Below we will call the variable 'gross'
plot(gross,col= "green")

When looking at the graph above we can’t truly capture the data or see a clear pattern within it. A better way to visualize this plot would be to re-order the data based on increasing gross. For that we will use the re-order function below.

Below you can see that once our data has be re-arranged it is easier to read and understand.From the plot we can see that our data is more spread out whereas the in the plot above our data is all concentrated on the bottom.

plot(order(gross,decreasing = TRUE))


Below we have our gross plot with intervals of 1,000 which allows us to see the high points and low point of our variable gross

#xlab labels the x axis, ylab labels the y axis
plot(gross, type="b", xlab = "Case Number", ylab = "Gross in $1,000",col="red") 

There are other ways to customize plots, such as changing the colors of the lines, adding a heading, or even making them interactive!

Now, we will plot the gross graph, alongside users_votes, total_cast_likes, director_fb_likes,and critic_reviews which we will code. Make sure to run the code in the same chunk so they are on the same layout.

likes01 = mydata$actor1_fb_likes
likes02 = mydata$actor2_fb_likes
likes03 = mydata$actor3_fb_likes
likes04 = mydata$director_fb_likes
likes05 = mydata$total_cast_likes
#Layout allows us to see all 6 graphs on one screen
layout(matrix(1:6,2,3))
#Actor1_fb_likes
plot(new_likes1, type="b", xlab = "Actor 1", ylab = "Sales in $1,000",col="blue") 
#Actor2_fb_likes
plot(new_likes2, type="b", xlab = "Actor 2", ylab = "Sales in $1,000",col="red") 
#Actor3_fb_likes
plot(new_likes3, type="b", xlab = "Actor 3", ylab = "Sales in $1,000",col="purple") 
#Director_fb_likes
plot(new_likes4, type="b", xlab = "director_fb_likes", ylab = "Sales in $1,000",col="orange") 
#Total_cast_likes
plot(new_likes4, type="b", xlab = "total_cast_likes", ylab = "Sales in $1,000",col="green")

#Layout allows us to see all 6 graphs on one screen
layout(matrix(1:6,2,3))
#Example of how to plot the sales variable
plot(gross, type="b", xlab = "Case Number", ylab = "Gross in $1,000",col="blue") 
#Plot of users_votes
plot(mydata$users_votes, type="b", xlab = "Case Number", ylab = "Users_votes in $1,000",col="red")
#Plot of total_cast_likes
plot(mydata$total_cast_likes, type="b", xlab = "Case Number", ylab = "Total_cast_likes in $1,000",col="purple")
#Plot of director_fb_likes
plot(mydata$director_fb_likes, type="b", xlab = "Case Number", ylab = "Director_fb_likes in $1,000",col="pink")
#plot of critic_reviews
plot(mydata$critic_reviews, type="b", xlab = "Case Number", ylab = "Critic_reviews in $1,000",col="green")

The 20 months of case_number are in no particular order and not related to a chronological time sequence. They are simply 20 independent use case studies. Since each case is independent, we can reorder them. To reveal a potential trend, consider reordering the sales column from low to high and see how the other four variables behave.

year= mydata$year
newdata = mydata[order(year),]
new_gross = newdata$gross
new_likes1 = newdata$actor1_fb_likes
new_likes2 = newdata$actor2_fb_likes
new_likes3 = newdata$actor3_fb_likes
new_likes4 = newdata$director_fb_likes
plot(new_gross,col="blue")


Task 3

Given a gross value of 10,214,013, calculate the corresponding z-value or z-score using the mean and standard deviation calculations conducted in task 1.

We know that the z-score = (x - mean)/sd. So, input this into the R code where x=10,214,013, mean=48468408, and stdev =68452990` which we found above.

mean_gross = mean(mydata$gross,na.rm = TRUE)
mean_gross
[1] 48468408
sd = sd(mydata$gross, na.rm = TRUE)
sd
[1] 68452990
zscore_gross = (10214013 - mean_gross)/sd_gross
zscore_gross
[1] -0.5588418

Below is a histogram of just gross

hist(gross,border = "black",col = "blue")

Below is the histogram of the zscore_gross

#Histogram of Z-Score
hist(zscore_gross,col = "blue")

Based on the z-values, how would you rate a $10,214,013 gross value: poor, average, good, or very good performance? Explain your logic.

Based on the zscore value that we came up with of -0.5588418 we can conclude that it is less than the mean gross of 48468408 which therefore means that it is a poor gross value.

