First begin by reading in the data from the ‘rottentomatoes.csv’ file, and viewing it to make sure we see it being read in correctly.
mydata = read.csv(file="data/rottentomatoes.csv")
head(mydata)
By using our ‘str’ function below we can see that there are some ‘N/As’ in our data. In order to get accurate results for the remainder of this lab we will remove all N/As in our data when calculating min,max, median etc.
str(mydata)
'data.frame': 5043 obs. of 21 variables:
$ ï..title : Factor w/ 4917 levels "#Horror ",..: 398 2731 3279 3707 3332 1961 3289 3459 399 1631 ...
$ genres : Factor w/ 914 levels "Action","Action|Adventure",..: 107 101 128 288 754 126 120 308 126 447 ...
$ director : Factor w/ 2399 levels "","A. Raven Cruz",..: 929 801 2027 380 606 109 2030 1652 1228 554 ...
$ actor1 : Factor w/ 2098 levels "","50 Cent","A.J. Buckley",..: 305 983 355 1968 528 443 787 223 338 35 ...
$ actor2 : Factor w/ 3033 levels "","50 Cent","A. Michael Baldwin",..: 1408 2218 2489 534 2433 2549 1228 801 2440 653 ...
$ actor3 : Factor w/ 3522 levels "","50 Cent","A.J. Buckley",..: 3442 1393 3134 1771 1 2714 1970 2163 3018 2941 ...
$ length : int 178 169 148 164 NA 132 156 100 141 153 ...
$ budget : num 2.37e+08 3.00e+08 2.45e+08 2.50e+08 NA ...
$ director_fb_likes: int 0 563 0 22000 131 475 0 15 0 282 ...
$ actor1_fb_likes : int 1000 40000 11000 27000 131 640 24000 799 26000 25000 ...
$ actor2_fb_likes : int 936 5000 393 23000 12 632 11000 553 21000 11000 ...
$ actor3_fb_likes : int 855 1000 161 23000 NA 530 4000 284 19000 10000 ...
$ total_cast_likes : int 4834 48350 11700 106759 143 1873 46055 2036 92000 58753 ...
$ fb_likes : int 33000 0 85000 164000 0 24000 0 29000 118000 10000 ...
$ critic_reviews : int 723 302 602 813 NA 462 392 324 635 375 ...
$ users_reviews : int 3054 1238 994 2701 NA 738 1902 387 1117 973 ...
$ users_votes : int 886204 471220 275868 1144337 8 212204 383056 294810 462669 321795 ...
$ score : num 7.9 7.1 6.8 8.5 7.1 6.6 6.2 7.8 7.5 7.5 ...
$ aspect_ratio : num 1.78 2.35 2.35 2.35 NA 2.35 2.35 1.85 2.35 2.35 ...
$ gross : int 760505847 309404152 200074175 448130642 NA 73058679 336530303 200807262 458991599 301956980 ...
$ year : int 2009 2007 2015 2012 NA 2012 2007 2010 2015 2009 ...
Below you will see a summary of our data, which also includes numerical values of our min, max, mean and median for data
summary(mydata)
ï..title genres director
Ben-Hur : 3 Drama : 236 : 104
Halloween : 3 Comedy : 209 Steven Spielberg: 26
Home : 3 Comedy|Drama : 191 Woody Allen : 22
King Kong : 3 Comedy|Drama|Romance: 187 Clint Eastwood : 20
Pan : 3 Comedy|Romance : 158 Martin Scorsese : 20
The Fast and the Furious : 3 Drama|Romance : 152 Ridley Scott : 17
(Other) :5025 (Other) :3910 (Other) :4834
actor1 actor2 actor3 length
Robert De Niro : 49 Morgan Freeman : 20 : 23 Min. : 7.0
Johnny Depp : 41 Charlize Theron: 15 Ben Mendelsohn: 8 1st Qu.: 93.0
Nicolas Cage : 33 Brad Pitt : 14 John Heard : 8 Median :103.0
J.K. Simmons : 31 : 13 Steve Coogan : 8 Mean :107.2
Bruce Willis : 30 James Franco : 11 Anne Hathaway : 7 3rd Qu.:118.0
Denzel Washington: 30 Meryl Streep : 11 Jon Gries : 7 Max. :511.0
(Other) :4829 (Other) :4959 (Other) :4982 NA's :15
budget director_fb_likes actor1_fb_likes actor2_fb_likes actor3_fb_likes
Min. :2.180e+02 Min. : 0.0 Min. : 0 Min. : 0 Min. : 0.0
1st Qu.:6.000e+06 1st Qu.: 7.0 1st Qu.: 614 1st Qu.: 281 1st Qu.: 133.0
Median :2.000e+07 Median : 49.0 Median : 988 Median : 595 Median : 371.5
Mean :3.975e+07 Mean : 686.5 Mean : 6560 Mean : 1652 Mean : 645.0
3rd Qu.:4.500e+07 3rd Qu.: 194.5 3rd Qu.: 11000 3rd Qu.: 918 3rd Qu.: 636.0
Max. :1.222e+10 Max. :23000.0 Max. :640000 Max. :137000 Max. :23000.0
NA's :492 NA's :104 NA's :7 NA's :13 NA's :23
total_cast_likes fb_likes critic_reviews users_reviews users_votes
Min. : 0 Min. : 0 Min. : 1.0 Min. : 1.0 Min. : 5
1st Qu.: 1411 1st Qu.: 0 1st Qu.: 50.0 1st Qu.: 65.0 1st Qu.: 8594
Median : 3090 Median : 166 Median :110.0 Median : 156.0 Median : 34359
Mean : 9699 Mean : 7526 Mean :140.2 Mean : 272.8 Mean : 83668
3rd Qu.: 13756 3rd Qu.: 3000 3rd Qu.:195.0 3rd Qu.: 326.0 3rd Qu.: 96309
Max. :656730 Max. :349000 Max. :813.0 Max. :5060.0 Max. :1689764
NA's :50 NA's :21
score aspect_ratio gross year
Min. :1.600 Min. : 1.18 Min. : 162 Min. :1916
1st Qu.:5.800 1st Qu.: 1.85 1st Qu.: 5340988 1st Qu.:1999
Median :6.600 Median : 2.35 Median : 25517500 Median :2005
Mean :6.442 Mean : 2.22 Mean : 48468408 Mean :2002
3rd Qu.:7.200 3rd Qu.: 2.35 3rd Qu.: 62309438 3rd Qu.:2011
Max. :9.500 Max. :16.00 Max. :760505847 Max. :2016
NA's :329 NA's :884 NA's :108
Below we will run ‘gross’ and ensure that we remove the NA’s
gross = mydata$gross
gross = gross[!is.na(gross)]
gross
[1] 760505847 309404152 200074175 448130642 73058679 336530303 200807262 458991599 301956980
[10] 330249062 200069408 168368427 423032628 89289910 291021565 141614023 623279547 241063875
[19] 179020854 255108370 262030663 105219735 258355354 70083519 218051260 658672302 407197282
[28] 65173160 652177271 304360277 373377893 408992272 334185206 234360014 268488329 402076689
[37] 245428137 234903076 202853933 172051787 191450875 116593191 414984497 125320003 350034110
[46] 202351611 233914986 228756232 65171860 144812796 90755643 101785482 352358779 317011114
[55] 123070338 237282182 130468626 223806889 140080850 166112167 137850096 47375327 124051759
[64] 291709845 154985087 533316061 292979556 198332128 318298180 73820094 113745408 102176165
[73] 161087183 100289690 100189501 88246220 150167630 356454367 362645141 312057433 155111815
[82] 241407328 208543795 38297305 259746958 238371987 93417865 222487711 189412677 665426
[91] 102315545 217387997 150350192 333130696 187991439 292568851 303001229 144512310 127490802
[100] 146405371 281666058 63143812 60655503 76846624 320706665 46978995 89732035 104383624
[109] 198539855 318759914 34293771 292000866 289994397 227946274 256386216 206456431 206435493
[118] 205343774 179982968 177243721 179883016 139259759 400736600 281492479 206360018 153629485
[127] 133375846 181015141 114053579 119420252 83640426 79711678 195000874 61937495 124051759
[136] 126597121 165230261 131564731 133382309 73103784 21379315 64459316 34964818 111505642
[145] 133228348 216366733 160201106 118099659 201573391 190418803 82161969 143523463 209364921
[154] 103400692 110332737 111110575 65007045 257704099 403706375 176997107 31141074 31704416
[163] 107503316 129734803 132122995 122512052 68642452 32131830 176636816 126930660 93926386
[172] 292298923 63992328 134518390 52792307 183635922 83024900 123207194 83348920 227137090
[181] 215395021 180191634 424645577 292298923 177343675 234277056 138396624 149234747 118311368
[190] 101160529 77564037 249358727 49551662 60522097 137748063 113733726 148337537 317557891
[199] 33592415 305388685 337103873 217536138 131536019 214948780 209805005 186830669 163192114
[208] 119412921 32694788 113165635 107285004 260031035 186739919 215397307 182618434 131920333
[217] 124976634 115802596 108521835 100685880 126464904 64736114 93050117 57637485 58607007
[226] 43929341 30212620 76418654 89021735 380262555 310675583 289907418 132550960 474544677
[235] 187165546 40911830 47952020 190871240 274084951 67155742 81638674 56114221 250863268
[244] 155181732 125332007 113330342 125531634 186336103 129995817 102608827 42776259 98780042
[253] 106369117 142614158 50026353 66002193 85463309 71017784 48068396 61656849 134520804
[262] 313837577 24004159 58183966 100446895 144795350 47396698 140015224 104374107 228430993
[271] 35799026 6712451 101643008 187670866 132014112 261970615 167007184 180011740 204843350
[280] 97030725 130127620 146282411 65452312 148383780 119219978 101228120 162804648 100117603
[289] 89296573 85017401 173005002 75030163 77222184 34964818 107515297 67631157 66862068
[298] 57366262 116866727 184031112 54700065 27098580 55673333 40198710 72660029 38120554
[307] 49392095 39292022 28772222 17010646 24985612 4411102 35024475 130174897 10200000
[316] 202007640 77679638 9213 58867694 59475623 108638745 86897182 63540020 95328937
[325] 50802661 161317423 201148159 43982842 380838870 377019252 340478898 17176900 131144183
[334] 23014504 181166115 176740650 71148699 67344392 22406362 261437578 11000000 88761720
[343] 250147615 245823397 81557479 226138454 155370362 124870275 196573705 58229120 125305545
[352] 132373442 120618403 110416702 102515793 100012500 209019489 84037039 85884815 83077470
[361] 100018837 78747585 78616689 75817994 100853835 73209340 72515360 68558662 65653758
[370] 64685359 61355436 26871 60874615 143618384 58220776 47474112 42877165 35168677
[379] 56114221 37567440 61644321 190562 120147445 241688385 144512310 233630478 197992827
[388] 176049130 172620724 183405771 20315324 148313048 127706877 126149655 66941559 78009155
[397] 63224849 111544445 112703470 117144465 84303558 150832203 51396781 47592825 50016394
[406] 57010853 62494975 46440491 44606335 40048332 64933670 31494270 31111260 123307945
[415] 153288182 13401683 137340146 43575716 80170146 75754670 33048353 34543701 242589580
[424] 102981571 180965237 407999255 254455986 162831698 155019340 145771527 82506325 140459099
[433] 53215979 158115031 133103929 133668525 130313314 124590960 127968405 120136047 128200012
[442] 112225777 109993847 104054514 103028109 101087161 101111837 95632614 94822707 92969824
[451] 91188905 90443603 82226474 79363785 76081498 85707116 74329966 100169068 73215310
[460] 80360866 69102910 65948711 821997 169692572 60507228 56684819 50628009 69772969
[469] 45356386 55350897 39442871 37899638 37754208 27779888 38542418 34566746 32885565
[478] 36073232 21471685 20950820 19673424 19480739 17593391 18318000 27356090 17473245
[487] 15131330 19406406 1891821 23219748 170708996 422783777 103812241 119793567 92930005
[496] 67286731 74158157 127083765 1339152 15071514 26000610 323505540 66462600 368049635
[505] 306124059 229074524 193136719 35286428 157299717 134568845 134006721 195329763 120776832
[514] 118823091 41814863 97360069 117698894 162001186 77032279 73023275 68473360 66636385
[523] 160762022 103338338 55808744 47379090 43426961 47000485 45434443 42044321 73661010
[532] 41523271 37600435 39251128 83503161 34636443 22751979 30013346 14567883 90820
[541] 5409517 21009180 94999143 336029560 36381716 55585389 36976367 107225164 70224196
[550] 51814190 47456450 148213377 112950721 75600000 62647540 183132370 27796042 32616869
[559] 18947630 114195633 144156464 227965690 436471036 244052771 152149590 141204016 162495848
[568] 136448821 120523073 119654900 72660029 117541000 116643346 100614858 42272747 80281096
[577] 219613391 78120196 98895417 70117571 83552429 66257002 65012000 79883359 78031620
[586] 54222000 52474616 55942830 40932372 38345403 37901509 48430355 30157016 28031250
[595] 33105600 62321039 38509342 19076815 25093607 18990542 14294842 19819494 13596911
[604] 8460990 7097125 37760080 5851188 25121291 18821279 118471320 300523113 71069884
[613] 251501645 35324232 81257500 617840 29655590 45045037 28965197 27550735 39380442
[622] 72980108 37516013 87704396 83892374 5932060 216119491 43568507 182805123 176387405
[631] 33685268 182204440 171383253 172071312 119412921 139225854 148775460 115731542 100468793
[640] 93771072 100448498 115603980 90454043 84049211 70450000 69688384 70236496 63695760
[649] 59617068 55637680 85911262 53846915 54758461 52397389 38966057 42345531 36064910
[658] 33328051 32598931 28045540 37023395 43532294 17218080 10014234 19059018 1987287
[667] 24407944 13750556 31054924 43247140 2208939 213079163 19548064 356784000 25052000
[676] 122012710 72413 58255287 77086030 65000000 32178777 15738632 54116191 118153533
[685] 108012170 210592590 279167575 143151473 136801374 168213584 135381507 167735396 121468960
[694] 106635996 102678089 125603360 101217900 104148781 75573300 93375151 106126012 93307796
[703] 90646554 109176215 82670733 82569532 81687587 80574010 75764085 90356857 75530832
[712] 75370763 100003492 90341670 74540762 80033643 73648142 71844424 75638743 66734992
[721] 75280058 64505912 77862546 61112916 88200225 60573641 59035104 56702901 55994557
[730] 54910560 53789313 51045801 50818750 50189179 50024083 50549107 56443482 62401264
[739] 47748610 46975183 50807639 46611204 257756197 48472213 43060566 45996718 43337279
[748] 37479778 36965395 40559930 36830057 36279230 42194060 43119879 35096190 35754555
[757] 43290977 33927476 32122249 40076438 32940507 31670931 30695227 32522352 28424210
[766] 26082914 29136626 26288320 26616590 623279547 30063805 22518325 13082288 18208078
[775] 14218868 22451 31165421 11802056 25472967 22362500 17281832 19781879 7605668
[784] 4535117 4426297 10166502 363024263 12065985 350123553 80021740 48291624 35231365
[793] 53715611 31199215 29580087 44665963 60128566 49875589 60984028 36931089 51317350
[802] 28328132 51774002 25528495 113006880 45860039 329691196 217326336 166225040 141600000
[811] 134218018 128769345 177575142 105263257 104354205 107100855 98711404 100328194 101530738
[820] 93815117 91400000 162586036 89706988 83000000 78745923 70098138 66365290 66207920
[829] 63408614 58422650 56932305 68750000 68218041 25040293 55747724 55473600 49994804
[838] 41609593 38553833 76137505 34350553 34238611 34098563 33828318 33472850 31051126
[847] 35707327 20550712 18573791 51225796 16264475 25857987 12870569 11466088 16088610
[856] 51178893 6768055 39440655 6167817 81645152 69951824 9483821 66676062 26838389
[865] 75604320 108200000 5660084 7221458 70327868 58297830 57386369 45207112 62563543
[874] 33574332 73343413 25031037 22843047 5755286 164435221 95720716 118683135 143704210
[883] 110476776 80270227 36385763 37035845 34580635 42438300 23324666 23020488 90567722
[892] 72601713 35092918 296623634 267652016 62453315 165500000 153620822 218628680 147637474
[901] 135014968 2175312 126203320 126975169 125548685 105807520 191616238 105264608 97680195
[910] 126088877 91030827 150315155 127997349 88504640 81517441 81022333 75621915 79948113
[919] 88658172 75888270 84244877 75367693 73701902 75605492 67823573 91439400 67128202
[928] 70496802 60470220 58336565 66002004 54997476 55682070 52752475 55092830 50815288
[937] 52822418 50150619 48745150 50007168 48154732 48265581 46982632 44737059 56724080
[946] 44484065 47553512 42610000 41482207 47105085 41256277 50740078 40203020 40905277
[955] 38590500 39177541 39778599 37486138 38105077 35168395 32800000 33643461 32741596
[964] 31874869 30306268 27667947 27067160 26616999 26536120 26199517 25450527 25407250
[973] 23159305 24006726 20389967 19593740 19118247 26442251 17114882 18472363 14131298
[982] 21557240 21283440 10556196 16671505 10400000 9528092 10137232 9795017 20488579
[991] 19445217 8355815 28837115 6471394 6291602 10706786 8742261 43905746 21413502
[1000] 7994115
[ reached getOption("max.print") -- omitted 3159 entries ]
Now we will calculate the Range, Min, Max, Mean, STDEV, and Variance for each variable. Below is an example of how to compute the items for the variable gross. We will follow the first example and do the same for users_votes, total_cast_likes, director_fb_likes, critic_reviews.
Gross
gross = mydata$gross
#Max Sales
max_gross = max(mydata$gross,na.rm = TRUE)
max
[1] 760505847
#Min Sales
min_gross = min(mydata$gross,na.rm = TRUE)
min
[1] 162
#Range
max_gross-min_gross
[1] 760505685
#Mean
mean_gross = mean(mydata$gross,na.rm = TRUE)
mean_gross
[1] 48468408
#Standard Deviation
sd_gross = sd(mydata$gross,na.rm =TRUE)
sd_gross
[1] 68452990
#Variance
var = var(mydata$gross,na.rm = TRUE)
var
[1] 4.685812e+15
Users Votes
users_votes = mydata$users_votes
#Max users votes
max_users_votes = max(mydata$users_votes,na.rm = TRUE)
max
[1] 760505847
#Min usersvotes
min_users_votes = min(mydata$users_votes,na.rm = TRUE)
min
[1] 162
#Range
max_users_votes-min_users_votes
[1] 1689759
#Mean
mean_users_votes = mean(mydata$users_votes,na.rm = TRUE)
mean_users_votes
[1] 83668.16
#Standard Deviation
sd_users_votes = sd(mydata$users_votes,na.rm =TRUE)
sd_users_votes
[1] 138485.3
#Variance
var = var(mydata$users_votes,na.rm = TRUE)
var
[1] 19178166353
Total_cast_likes
cast_likes = mydata$total_cast_likes
#Max cast_likes
max_cast_likes = max(mydata$total_cast_likes,na.rm = TRUE)
max
[1] 760505847
#Min cast_likes
min_cast_likes = min(mydata$total_cast_likes,na.rm = TRUE)
min
[1] 162
#Range
max_cast_likes-min_cast_likes
[1] 656730
#Mean
mean_cast_likes = mean(mydata$total_cast_likes,na.rm = TRUE)
mean_cast_likes
[1] 9699.064
#Standard Deviation
sd_total_cast_likes = sd(mydata$total_cast_likes,na.rm =TRUE)
sd_total_cast_likes
[1] 18163.8
#Variance
var = var(mydata$total_cast_likes,na.rm = TRUE)
var
[1] 329923599
Director_fb_likes
Director_fb_likes = mydata$director_fb_likes
#Max Director_fb_likes
max_Director_fb_likes = max(mydata$director_fb_likes,na.rm = TRUE)
max
[1] 760505847
#Min Director_fb_likes
min_Director_fb_likes = min(mydata$director_fb_likes,na.rm = TRUE)
min
[1] 162
#Range
max_Director_fb_likes-min_Director_fb_likes
[1] 23000
#Mean
mean_Director_fb_likes = mean(mydata$director_fb_likes,na.rm = TRUE)
mean_Director_fb_likes
[1] 686.5092
#Standard Deviation
sd_director_fb_likes = sd(mydata$director_fb_likes,na.rm =TRUE)
sd_director_fb_likes
[1] 2813.329
#Variance
var = var(mydata$director_fb_likes,na.rm = TRUE)
var
[1] 7914818
Critic_reviews
Critic_reviews = mydata$critic_reviews
#Max Critic_reviews
max_Critic_reviews = max(mydata$critic_reviews,na.rm = TRUE)
max
[1] 760505847
#Min Critic_reviews
min_Critic_reviews = min(mydata$critic_reviews,na.rm = TRUE)
min
[1] 162
#Range
max_Critic_reviews - min_Critic_reviews
[1] 812
#Mean
mean_Critic_reviews = mean(mydata$critic_reviews,na.rm = TRUE)
mean_Critic_reviews
[1] 140.1943
#Standard Deviation
sd_critic_reviews = sd(mydata$critic_reviews,na.rm =TRUE)
sd_critic_reviews
[1] 121.6017
#Variance
var = var(mydata$critic_reviews,na.rm = TRUE)
var
[1] 14786.97
An easy way to calculate all of these statistics of all of these variables is with the summary function. As seen earlier in the lab the summary allows us to verify that our calculation for min,max, median, mean are in fact accurate with respect to the data we are working with. (Below is another look at the summary)
summary(mydata)
ï..title genres director
Ben-Hur : 3 Drama : 236 : 104
Halloween : 3 Comedy : 209 Steven Spielberg: 26
Home : 3 Comedy|Drama : 191 Woody Allen : 22
King Kong : 3 Comedy|Drama|Romance: 187 Clint Eastwood : 20
Pan : 3 Comedy|Romance : 158 Martin Scorsese : 20
The Fast and the Furious : 3 Drama|Romance : 152 Ridley Scott : 17
(Other) :5025 (Other) :3910 (Other) :4834
actor1 actor2 actor3 length
Robert De Niro : 49 Morgan Freeman : 20 : 23 Min. : 7.0
Johnny Depp : 41 Charlize Theron: 15 Ben Mendelsohn: 8 1st Qu.: 93.0
Nicolas Cage : 33 Brad Pitt : 14 John Heard : 8 Median :103.0
J.K. Simmons : 31 : 13 Steve Coogan : 8 Mean :107.2
Bruce Willis : 30 James Franco : 11 Anne Hathaway : 7 3rd Qu.:118.0
Denzel Washington: 30 Meryl Streep : 11 Jon Gries : 7 Max. :511.0
(Other) :4829 (Other) :4959 (Other) :4982 NA's :15
budget director_fb_likes actor1_fb_likes actor2_fb_likes actor3_fb_likes
Min. :2.180e+02 Min. : 0.0 Min. : 0 Min. : 0 Min. : 0.0
1st Qu.:6.000e+06 1st Qu.: 7.0 1st Qu.: 614 1st Qu.: 281 1st Qu.: 133.0
Median :2.000e+07 Median : 49.0 Median : 988 Median : 595 Median : 371.5
Mean :3.975e+07 Mean : 686.5 Mean : 6560 Mean : 1652 Mean : 645.0
3rd Qu.:4.500e+07 3rd Qu.: 194.5 3rd Qu.: 11000 3rd Qu.: 918 3rd Qu.: 636.0
Max. :1.222e+10 Max. :23000.0 Max. :640000 Max. :137000 Max. :23000.0
NA's :492 NA's :104 NA's :7 NA's :13 NA's :23
total_cast_likes fb_likes critic_reviews users_reviews users_votes
Min. : 0 Min. : 0 Min. : 1.0 Min. : 1.0 Min. : 5
1st Qu.: 1411 1st Qu.: 0 1st Qu.: 50.0 1st Qu.: 65.0 1st Qu.: 8594
Median : 3090 Median : 166 Median :110.0 Median : 156.0 Median : 34359
Mean : 9699 Mean : 7526 Mean :140.2 Mean : 272.8 Mean : 83668
3rd Qu.: 13756 3rd Qu.: 3000 3rd Qu.:195.0 3rd Qu.: 326.0 3rd Qu.: 96309
Max. :656730 Max. :349000 Max. :813.0 Max. :5060.0 Max. :1689764
NA's :50 NA's :21
score aspect_ratio gross year
Min. :1.600 Min. : 1.18 Min. : 162 Min. :1916
1st Qu.:5.800 1st Qu.: 1.85 1st Qu.: 5340988 1st Qu.:1999
Median :6.600 Median : 2.35 Median : 25517500 Median :2005
Mean :6.442 Mean : 2.22 Mean : 48468408 Mean :2002
3rd Qu.:7.200 3rd Qu.: 2.35 3rd Qu.: 62309438 3rd Qu.:2011
Max. :9.500 Max. :16.00 Max. :760505847 Max. :2016
NA's :329 NA's :884 NA's :108
However when dealing with other functions such as the variance and the standard deviation it is not as simple to come up with those answers without using another formula, but for basics this will suffice.
Now, we will produce a basic blot of the ‘gross’ variable . Here we utilize the plot function and within the plot function we call the variable that we intend to plot.
#Below we will call the variable 'gross'
plot(gross,col= "green")
When looking at the graph above we can’t truly capture the data or see a clear pattern within it. A better way to visualize this plot would be to re-order the data based on increasing gross. For that we will use the re-order function below.
Below you can see that once our data has be re-arranged it is easier to read and understand.From the plot we can see that our data is more spread out whereas the in the plot above our data is all concentrated on the bottom.
plot(order(gross,decreasing = TRUE))
Below we have our gross plot with intervals of 1,000 which allows us to see the high points and low point of our variable gross
#xlab labels the x axis, ylab labels the y axis
plot(gross, type="b", xlab = "Case Number", ylab = "Gross in $1,000",col="red")
There are other ways to customize plots, such as changing the colors of the lines, adding a heading, or even making them interactive!
Now, we will plot the gross graph, alongside users_votes, total_cast_likes, director_fb_likes,and critic_reviews which we will code. Make sure to run the code in the same chunk so they are on the same layout.
likes01 = mydata$actor1_fb_likes
likes02 = mydata$actor2_fb_likes
likes03 = mydata$actor3_fb_likes
likes04 = mydata$director_fb_likes
likes05 = mydata$total_cast_likes
#Layout allows us to see all 6 graphs on one screen
layout(matrix(1:6,2,3))
#Actor1_fb_likes
plot(new_likes1, type="b", xlab = "Actor 1", ylab = "Sales in $1,000",col="blue")
#Actor2_fb_likes
plot(new_likes2, type="b", xlab = "Actor 2", ylab = "Sales in $1,000",col="red")
#Actor3_fb_likes
plot(new_likes3, type="b", xlab = "Actor 3", ylab = "Sales in $1,000",col="purple")
#Director_fb_likes
plot(new_likes4, type="b", xlab = "director_fb_likes", ylab = "Sales in $1,000",col="orange")
#Total_cast_likes
plot(new_likes4, type="b", xlab = "total_cast_likes", ylab = "Sales in $1,000",col="green")
#Layout allows us to see all 6 graphs on one screen
layout(matrix(1:6,2,3))
#Example of how to plot the sales variable
plot(gross, type="b", xlab = "Case Number", ylab = "Gross in $1,000",col="blue")
#Plot of users_votes
plot(mydata$users_votes, type="b", xlab = "Case Number", ylab = "Users_votes in $1,000",col="red")
#Plot of total_cast_likes
plot(mydata$total_cast_likes, type="b", xlab = "Case Number", ylab = "Total_cast_likes in $1,000",col="purple")
#Plot of director_fb_likes
plot(mydata$director_fb_likes, type="b", xlab = "Case Number", ylab = "Director_fb_likes in $1,000",col="pink")
#plot of critic_reviews
plot(mydata$critic_reviews, type="b", xlab = "Case Number", ylab = "Critic_reviews in $1,000",col="green")
The 20 months of case_number are in no particular order and not related to a chronological time sequence. They are simply 20 independent use case studies. Since each case is independent, we can reorder them. To reveal a potential trend, consider reordering the sales column from low to high and see how the other four variables behave.
year= mydata$year
newdata = mydata[order(year),]
new_gross = newdata$gross
new_likes1 = newdata$actor1_fb_likes
new_likes2 = newdata$actor2_fb_likes
new_likes3 = newdata$actor3_fb_likes
new_likes4 = newdata$director_fb_likes
plot(new_gross,col="blue")
Given a gross value of 10,214,013, calculate the corresponding z-value or z-score using the mean and standard deviation calculations conducted in task 1.
We know that the z-score = (x - mean)/sd
. So, input this into the R code where x=10,214,013
, mean=48468408
, and stdev =
68452990` which we found above.
mean_gross = mean(mydata$gross,na.rm = TRUE)
mean_gross
[1] 48468408
sd = sd(mydata$gross, na.rm = TRUE)
sd
[1] 68452990
zscore_gross = (10214013 - mean_gross)/sd_gross
zscore_gross
[1] -0.5588418
Below is a histogram of just gross
hist(gross,border = "black",col = "blue")
Below is the histogram of the zscore_gross
#Histogram of Z-Score
hist(zscore_gross,col = "blue")
Based on the z-values, how would you rate a $10,214,013
gross value: poor, average, good, or very good performance? Explain your logic.
Based on the zscore value that we came up with of -0.5588418
we can conclude that it is less than the mean gross of 48468408
which therefore means that it is a poor gross value.