Question 1: Read in the gambling dataset check the first couple of rows and describe the data types. Identify incorrect data types, if any. ( 5 Points )

mydata = read.csv(file = "data/gambling.csv")
mydata

Sex is a categorical data type. Income & gamble are ratio data types. Status & verbal are interval data types

The sex column is also presented incorrectly.

Status, Income, Verble, and Gamble are all confusing to interpret since there is not much information provided as to what exactly the numbers tell us.

Question 2: Describe the data using full sentences and using descriptive statistics. ( 5 Points )

summary(mydata)
      sex             status          income           verbal          gamble     
 Min.   :0.0000   Min.   :18.00   Min.   : 0.600   Min.   : 1.00   Min.   :  0.0  
 1st Qu.:0.0000   1st Qu.:28.00   1st Qu.: 2.000   1st Qu.: 6.00   1st Qu.:  1.1  
 Median :0.0000   Median :43.00   Median : 3.250   Median : 7.00   Median :  6.0  
 Mean   :0.4043   Mean   :45.23   Mean   : 4.642   Mean   : 6.66   Mean   : 19.3  
 3rd Qu.:1.0000   3rd Qu.:61.50   3rd Qu.: 6.210   3rd Qu.: 8.00   3rd Qu.: 19.4  
 Max.   :1.0000   Max.   :75.00   Max.   :15.000   Max.   :10.00   Max.   :156.0  

There is an issue in the beginning column of provided data due to the fact that sex is a categorical data type. So for the data beneath that header to be numeric provides an issue. The rest of the numerical data provided is unknown because there is no real descriptors attached to the numbers to indicate to us what it is exactly that they are measuring.

Question 3: Estimate the upper and lower threshold for the verbal score ( 5 Points )

HINT: A common way to estimate the upper and lower threshold is to take the mean (+ or -) 3 * standard deviation.

#Use the formula above to calculate the upper and lower threshold
verbal = mydata$verbal
mean(verbal)
[1] 6.659574
spreadverbal = sd(verbal)
spreadverbal
[1] 1.856558
upperVerbal = mean(verbal) + (3) * spreadverbal
upperVerbal
[1] 12.22925
lowerverbal = mean(verbal) - 3 * spreadverbal
lowerverbal
[1] 1.0899

Question 4: Calculate the z-score for income where x=13. Based on the income value x=13 pounds per week, how would you rate the income: low income, average income, high income. Why? ( 5 Points )

Hint: zscore = (x - mean)/sd

spreadIncome = sd(income)
zscore = (13 - mean(income))/spreadIncome
zscore
[1] 2.353481

Question 5: Create a histogram for the zscore of income. What do you notice about the shape? ( 5 Points )

Hint: To plot a histogram, use the function hist(variable).

zscoreIncome = (income - mean(income))/spreadIncome
hist(zscoreIncome)

Question 6: Analyze the correlation plot below. Give relavant information about the negative correlated, no correlared and positive correlated variables. ( 5 Points )

Safe to say that all are positively correlated since all circles are blue and we know that blue circles are possitive because it is indicated in the key to the right. The darker blues having the most positive correlation with 1, all the way down to the lighter ones with a coorelation ranging anywhere from 0 - 0.2. The largest positive correlation seems to exist between fb_likes, and user_votes, while director and cast_likes have the least positive correlation with score and gross. Overall though, all are positively related!

Extra Credit: Analyze the correlation table below. Give relavant information about the negative correlated, no correlared and positive correlated variables. ( 5 Points )

# Create a correlation table "cor(movies)"
movies = read.csv("data/movies.csv")
cor(movies)
               length budget  director actor1 actor2 actor3 cast_likes  fb_likes critic_reviews
length              1     NA        NA     NA     NA     NA         NA        NA             NA
budget             NA      1        NA     NA     NA     NA         NA        NA             NA
director           NA     NA 1.0000000     NA     NA     NA  0.1858875 0.2894939             NA
actor1             NA     NA        NA      1     NA     NA         NA        NA             NA
actor2             NA     NA        NA     NA      1     NA         NA        NA             NA
actor3             NA     NA        NA     NA     NA      1         NA        NA             NA
cast_likes         NA     NA 0.1858875     NA     NA     NA  1.0000000 0.3387454             NA
fb_likes           NA     NA 0.2894939     NA     NA     NA  0.3387454 1.0000000             NA
critic_reviews     NA     NA        NA     NA     NA     NA         NA        NA              1
users_reviews      NA     NA        NA     NA     NA     NA         NA        NA             NA
users_votes        NA     NA 0.3492878     NA     NA     NA  0.4140989 0.8001157             NA
score              NA     NA 0.1765288     NA     NA     NA  0.1484501 0.4604384             NA
gross              NA     NA 0.1717334     NA     NA     NA  0.3829801 0.5644529             NA
               users_reviews users_votes     score     gross
length                    NA          NA        NA        NA
budget                    NA          NA        NA        NA
director                  NA   0.3492878 0.1765288 0.1717334
actor1                    NA          NA        NA        NA
actor2                    NA          NA        NA        NA
actor3                    NA          NA        NA        NA
cast_likes                NA   0.4140989 0.1484501 0.3829801
fb_likes                  NA   0.8001157 0.4604384 0.5644529
critic_reviews            NA          NA        NA        NA
users_reviews              1          NA        NA        NA
users_votes               NA   1.0000000 0.4742893 0.6892893
score                     NA   0.4742893 1.0000000 0.2669350
gross                     NA   0.6892893 0.2669350 1.0000000

It seems as though the columns length, budget, actor 1, actor 2, and actor 3 all seem to have no real coorelation with any other category listed in the table besides that which exists with themselves. There appears to be no negatively coorelated relationships. And there appear to be positive relationships amongst director, cast_likes, and fb_likes.

LS0tCnRpdGxlOiAiQnVzaW5lc3MgQW5hbHl0aWNzIC0gTUlEVEVSTSIKYXV0aG9yOiAiSnVzdGljZSBMYXdzb24iCmRhdGU6ICJTdW1tZXIgMjAxNyIKb3V0cHV0OgogIGh0bWxfbm90ZWJvb2s6IGRlZmF1bHQKICBodG1sX2RvY3VtZW50OiBkZWZhdWx0CiAgcGRmX2RvY3VtZW50OiBkZWZhdWx0CnN1YnRpdGxlOiBDTUUgR3JvdXAgRm91bmRhdGlvbiBCdXNpbmVzcyBBbmFseXRpY3MgTGFiCi0tLQoKIyMjIFF1ZXN0aW9uIDE6IFJlYWQgaW4gdGhlIGdhbWJsaW5nIGRhdGFzZXQgY2hlY2sgdGhlIGZpcnN0IGNvdXBsZSBvZiByb3dzIGFuZCBkZXNjcmliZSB0aGUgZGF0YSB0eXBlcy4gSWRlbnRpZnkgaW5jb3JyZWN0IGRhdGEgdHlwZXMsIGlmIGFueS4gKCA1IFBvaW50cyApCgpgYGB7cn0KbXlkYXRhID0gcmVhZC5jc3YoZmlsZSA9ICJkYXRhL2dhbWJsaW5nLmNzdiIpCm15ZGF0YQpgYGAKU2V4IGlzIGEgY2F0ZWdvcmljYWwgZGF0YSB0eXBlLgpJbmNvbWUgJiBnYW1ibGUgYXJlIHJhdGlvIGRhdGEgdHlwZXMuClN0YXR1cyAmIHZlcmJhbCBhcmUgaW50ZXJ2YWwgZGF0YSB0eXBlcwoKVGhlIHNleCBjb2x1bW4gaXMgYWxzbyBwcmVzZW50ZWQgaW5jb3JyZWN0bHkuIAoKU3RhdHVzLCBJbmNvbWUsIFZlcmJsZSwgYW5kIEdhbWJsZSBhcmUgYWxsIGNvbmZ1c2luZyB0byBpbnRlcnByZXQgc2luY2UgdGhlcmUgaXMgbm90IG11Y2ggaW5mb3JtYXRpb24gcHJvdmlkZWQgYXMgdG8gd2hhdCBleGFjdGx5IHRoZSBudW1iZXJzIHRlbGwgdXMuIAoKCiMjIyBRdWVzdGlvbiAyOiBEZXNjcmliZSB0aGUgZGF0YSB1c2luZyBmdWxsIHNlbnRlbmNlcyBhbmQgdXNpbmcgZGVzY3JpcHRpdmUgc3RhdGlzdGljcy4gKCA1IFBvaW50cyApCgpgYGB7cn0Kc3VtbWFyeShteWRhdGEpCmBgYApUaGVyZSBpcyBhbiBpc3N1ZSBpbiB0aGUgYmVnaW5uaW5nIGNvbHVtbiBvZiBwcm92aWRlZCBkYXRhIGR1ZSB0byB0aGUgZmFjdCB0aGF0IHNleCBpcyBhIGNhdGVnb3JpY2FsIGRhdGEgdHlwZS4gU28gZm9yIHRoZSBkYXRhIGJlbmVhdGggdGhhdCBoZWFkZXIgdG8gYmUgbnVtZXJpYyBwcm92aWRlcyBhbiBpc3N1ZS4gVGhlIHJlc3Qgb2YgdGhlIG51bWVyaWNhbCBkYXRhIHByb3ZpZGVkIGlzIHVua25vd24gYmVjYXVzZSB0aGVyZSBpcyBubyByZWFsIGRlc2NyaXB0b3JzIGF0dGFjaGVkIHRvIHRoZSBudW1iZXJzIHRvIGluZGljYXRlIHRvIHVzIHdoYXQgaXQgaXMgZXhhY3RseSB0aGF0IHRoZXkgYXJlIG1lYXN1cmluZy4gCgoKIyMjIFF1ZXN0aW9uIDM6IEVzdGltYXRlIHRoZSB1cHBlciBhbmQgbG93ZXIgdGhyZXNob2xkIGZvciB0aGUgdmVyYmFsIHNjb3JlICggNSBQb2ludHMgKQoKSElOVDogIEEgY29tbW9uIHdheSB0byBlc3RpbWF0ZSB0aGUgdXBwZXIgYW5kIGxvd2VyIHRocmVzaG9sZCBpcyB0byB0YWtlIHRoZSBtZWFuICgrIG9yIC0pIDMgKiBzdGFuZGFyZCBkZXZpYXRpb24uCgpgYGB7cn0KI1VzZSB0aGUgZm9ybXVsYSBhYm92ZSB0byBjYWxjdWxhdGUgdGhlIHVwcGVyIGFuZCBsb3dlciB0aHJlc2hvbGQKdmVyYmFsID0gbXlkYXRhJHZlcmJhbAptZWFuKHZlcmJhbCkKCnNwcmVhZHZlcmJhbCA9IHNkKHZlcmJhbCkKc3ByZWFkdmVyYmFsCgp1cHBlclZlcmJhbCA9IG1lYW4odmVyYmFsKSArICgzKSAqIHNwcmVhZHZlcmJhbAp1cHBlclZlcmJhbAoKbG93ZXJ2ZXJiYWwgPSBtZWFuKHZlcmJhbCkgLSAzICogc3ByZWFkdmVyYmFsCmxvd2VydmVyYmFsCmBgYAojIyMgUXVlc3Rpb24gNDogQ2FsY3VsYXRlIHRoZSB6LXNjb3JlIGZvciBpbmNvbWUgd2hlcmUgeD0xMy4gQmFzZWQgb24gdGhlIGluY29tZSB2YWx1ZSB4PTEzIHBvdW5kcyBwZXIgd2VlaywgaG93IHdvdWxkIHlvdSByYXRlIHRoZSBpbmNvbWU6IGxvdyBpbmNvbWUsIGF2ZXJhZ2UgaW5jb21lLCBoaWdoIGluY29tZS4gV2h5PyAoIDUgUG9pbnRzICkKCkhpbnQ6IHpzY29yZSA9ICh4IC0gbWVhbikvc2QKYGBge3J9CnNwcmVhZEluY29tZSA9IHNkKGluY29tZSkKenNjb3JlID0gKDEzIC0gbWVhbihpbmNvbWUpKS9zcHJlYWRJbmNvbWUKenNjb3JlCmBgYAojIyMgUXVlc3Rpb24gNTogQ3JlYXRlIGEgaGlzdG9ncmFtIGZvciB0aGUgenNjb3JlIG9mIGluY29tZS4gV2hhdCBkbyB5b3Ugbm90aWNlIGFib3V0IHRoZSBzaGFwZT8gKCA1IFBvaW50cyApCgpIaW50OiBUbyBwbG90IGEgaGlzdG9ncmFtLCB1c2UgdGhlIGZ1bmN0aW9uIGhpc3QodmFyaWFibGUpLiAKYGBge3J9CnpzY29yZUluY29tZSA9IChpbmNvbWUgLSBtZWFuKGluY29tZSkpL3NwcmVhZEluY29tZQpoaXN0KHpzY29yZUluY29tZSkKYGBgCgojIyMgUXVlc3Rpb24gNjogQW5hbHl6ZSB0aGUgY29ycmVsYXRpb24gcGxvdCBiZWxvdy4gR2l2ZSByZWxhdmFudCBpbmZvcm1hdGlvbiBhYm91dCB0aGUgbmVnYXRpdmUgY29ycmVsYXRlZCwgbm8gY29ycmVsYXJlZCBhbmQgcG9zaXRpdmUgY29ycmVsYXRlZCB2YXJpYWJsZXMuICggNSBQb2ludHMgKQoKIVtdKGRhdGEvY29ycl9wbG90LnBuZykKU2FmZSB0byBzYXkgdGhhdCBhbGwgYXJlIHBvc2l0aXZlbHkgY29ycmVsYXRlZCBzaW5jZSBhbGwgY2lyY2xlcyBhcmUgYmx1ZSBhbmQgd2Uga25vdyB0aGF0IGJsdWUgY2lyY2xlcyBhcmUgIHBvc3NpdGl2ZSBiZWNhdXNlIGl0IGlzIGluZGljYXRlZCBpbiB0aGUga2V5IHRvIHRoZSByaWdodC4gVGhlIGRhcmtlciBibHVlcyBoYXZpbmcgdGhlIG1vc3QgcG9zaXRpdmUgY29ycmVsYXRpb24gd2l0aCAxLCBhbGwgdGhlIHdheSBkb3duIHRvIHRoZSBsaWdodGVyIG9uZXMgd2l0aCBhIGNvb3JlbGF0aW9uIHJhbmdpbmcgYW55d2hlcmUgZnJvbSAwIC0gMC4yLiBUaGUgbGFyZ2VzdCBwb3NpdGl2ZSBjb3JyZWxhdGlvbiBzZWVtcyB0byBleGlzdCBiZXR3ZWVuIGZiX2xpa2VzLCBhbmQgdXNlcl92b3Rlcywgd2hpbGUgZGlyZWN0b3IgYW5kIGNhc3RfbGlrZXMgaGF2ZSB0aGUgbGVhc3QgcG9zaXRpdmUgY29ycmVsYXRpb24gd2l0aCBzY29yZSBhbmQgZ3Jvc3MuIE92ZXJhbGwgdGhvdWdoLCBhbGwgYXJlIHBvc2l0aXZlbHkgcmVsYXRlZCEKCgoKIyMjIEV4dHJhIENyZWRpdDogQW5hbHl6ZSB0aGUgY29ycmVsYXRpb24gdGFibGUgYmVsb3cuIEdpdmUgcmVsYXZhbnQgaW5mb3JtYXRpb24gYWJvdXQgdGhlIG5lZ2F0aXZlIGNvcnJlbGF0ZWQsIG5vIGNvcnJlbGFyZWQgYW5kIHBvc2l0aXZlIGNvcnJlbGF0ZWQgdmFyaWFibGVzLiAoIDUgUG9pbnRzICkKCmBgYHtyfQojIENyZWF0ZSBhIGNvcnJlbGF0aW9uIHRhYmxlICJjb3IobW92aWVzKSIKbW92aWVzID0gcmVhZC5jc3YoImRhdGEvbW92aWVzLmNzdiIpCmNvcihtb3ZpZXMpCmBgYApJdCBzZWVtcyBhcyB0aG91Z2ggdGhlIGNvbHVtbnMgbGVuZ3RoLCBidWRnZXQsIGFjdG9yIDEsIGFjdG9yIDIsIGFuZCBhY3RvciAzIGFsbCBzZWVtIHRvIGhhdmUgbm8gcmVhbCBjb29yZWxhdGlvbiB3aXRoIGFueSBvdGhlciBjYXRlZ29yeSBsaXN0ZWQgaW4gdGhlIHRhYmxlIGJlc2lkZXMgdGhhdCB3aGljaCBleGlzdHMgd2l0aCB0aGVtc2VsdmVzLiAKVGhlcmUgYXBwZWFycyB0byBiZSBubyBuZWdhdGl2ZWx5IGNvb3JlbGF0ZWQgcmVsYXRpb25zaGlwcy4KQW5kIHRoZXJlIGFwcGVhciB0byBiZSBwb3NpdGl2ZSByZWxhdGlvbnNoaXBzIGFtb25nc3QgZGlyZWN0b3IsIGNhc3RfbGlrZXMsIGFuZCBmYl9saWtlcy4g