Question 1: Read in the gambling dataset check the first couple of rows and describe the data types. Identify incorrect data types, if any. ( 5 Points )

gdata = read.csv("data/gambling.csv")
gdata

Sex is a categorical data type, so the values listed in that column appear to be incorrect. Status could be any type of data depending on how it was measured. I am going to assume that the numerical data presented is correct. Income is a ratio data type because there is a true zero. The column appears to be presented correctly. Verbal could be anything, so I will assume it is presented appropriately. Finally, gamble appears to be a dollar amount in which it is presented appropriately.

Question 2: Describe the data using full sentences and using descriptive statistics. ( 5 Points )

meanIncome = mean(gdata$income)
meanIncome
[1] 4.641915
maxIncome = max(gdata$income)
maxIncome
[1] 15
minIncome = min(gdata$income)
minIncome
[1] 0.6
summary(gdata)
      sex             status          income           verbal          gamble     
 Min.   :0.0000   Min.   :18.00   Min.   : 0.600   Min.   : 1.00   Min.   :  0.0  
 1st Qu.:0.0000   1st Qu.:28.00   1st Qu.: 2.000   1st Qu.: 6.00   1st Qu.:  1.1  
 Median :0.0000   Median :43.00   Median : 3.250   Median : 7.00   Median :  6.0  
 Mean   :0.4043   Mean   :45.23   Mean   : 4.642   Mean   : 6.66   Mean   : 19.3  
 3rd Qu.:1.0000   3rd Qu.:61.50   3rd Qu.: 6.210   3rd Qu.: 8.00   3rd Qu.: 19.4  
 Max.   :1.0000   Max.   :75.00   Max.   :15.000   Max.   :10.00   Max.   :156.0  

Originally, I started looking for basic statistics for the income column of the data. Then, I decided to pull of the basic statistics of all the data using the summary function.

Question 3: Estimate the upper and lower threshold for the verbal score ( 5 Points )

HINT: A common way to estimate the upper and lower threshold is to take the mean (+ or -) 3 * standard deviation.

meanVerbal = mean(gdata$verbal)
sdVerbal = sd(gdata$verbal)
upperVerbal = meanVerbal + (3) * sdVerbal
upperVerbal
[1] 12.22925
lowerVerbal = meanVerbal - (3) * sdVerbal
lowerVerbal
[1] 1.0899

First I found the mean and standard deviation of the verbal scores from the gambling data. Then I used the equation to find the upper and lower thresholds for those verbal scores.

Question 4: Calculate the z-score for income where x=13. Based on the income value x=13 pounds per week, how would you rate the income: low income, average income, high income. Why? ( 5 Points )

Hint: zscore = (x - mean)/sd

income = gdata$income
meanIncome = mean(income)
sdIncome = sd(income)
zscoreIncome = (13 - meanIncome)/sdIncome
zscoreIncome
[1] 2.353481

First, I named the income column of the gambling data. Then I found the mean and the standard deviation. Finally, i plugged those into the zscore equation to determine a zscore of 2.35 for the income. Since the zscore is positive, and not close to zero, I would rate the income as high income.

Question 5: Create a histogram for the zscore of income. What do you notice about the shape? ( 5 Points )

Hint: To plot a histogram, use the function hist(variable).

zscoresIncome = (income - meanIncome)/sdIncome
hist(zscoresIncome)

The shape of the histogram of income zscores is downward sloping. There are a lot of negative zscores and zscores around zero. There are not very many zscores higher than positive 1.

Question 6: Analyze the correlation plot below. Give relavant information about the negative correlated, no correlared and positive correlated variables. ( 5 Points )

All of the variables presented appear to be positively correlated since all of the circles are blue. Given that, there are some with stronger positive correlations than others. Naturally, all the large, dark blue circles represent a positive correlation of 1 because the variales for those are the same. There does appear to ba a strong correlation between user votes and facebook likes. Where there doesn’t seem to be as strong of a positive correlation is between cast likes and director, score and director, gross and director, score and cast likes, and score and gross. All of these still have a positive correlation, but definitely not very strong.

Extra Credit: Analyze the correlation table below. Give relavant information about the negative correlated, no correlared and positive correlated variables. ( 5 Points )

# Create a correlation table "cor(movies)"
movies = read.csv("data/movies.csv")
cor(movies)
               length budget  director actor1 actor2 actor3 cast_likes  fb_likes critic_reviews users_reviews users_votes     score     gross
length              1     NA        NA     NA     NA     NA         NA        NA             NA            NA          NA        NA        NA
budget             NA      1        NA     NA     NA     NA         NA        NA             NA            NA          NA        NA        NA
director           NA     NA 1.0000000     NA     NA     NA  0.1858875 0.2894939             NA            NA   0.3492878 0.1765288 0.1717334
actor1             NA     NA        NA      1     NA     NA         NA        NA             NA            NA          NA        NA        NA
actor2             NA     NA        NA     NA      1     NA         NA        NA             NA            NA          NA        NA        NA
actor3             NA     NA        NA     NA     NA      1         NA        NA             NA            NA          NA        NA        NA
cast_likes         NA     NA 0.1858875     NA     NA     NA  1.0000000 0.3387454             NA            NA   0.4140989 0.1484501 0.3829801
fb_likes           NA     NA 0.2894939     NA     NA     NA  0.3387454 1.0000000             NA            NA   0.8001157 0.4604384 0.5644529
critic_reviews     NA     NA        NA     NA     NA     NA         NA        NA              1            NA          NA        NA        NA
users_reviews      NA     NA        NA     NA     NA     NA         NA        NA             NA             1          NA        NA        NA
users_votes        NA     NA 0.3492878     NA     NA     NA  0.4140989 0.8001157             NA            NA   1.0000000 0.4742893 0.6892893
score              NA     NA 0.1765288     NA     NA     NA  0.1484501 0.4604384             NA            NA   0.4742893 1.0000000 0.2669350
gross              NA     NA 0.1717334     NA     NA     NA  0.3829801 0.5644529             NA            NA   0.6892893 0.2669350 1.0000000

Length, budget, actor1, actor2, actor3, critic_reviews, and user_reviews do not have any correlation with anything else except themselves. Director has some positive correlations, but none are all that strong except for with itself. Cast likes has a positive correlation of .414 with user votes. Fb likes and user votes are very strongly positively correlated. Gross and uservotes also have a pretty strong positive correlation. Score has several positive correlations as well.

LS0tDQp0aXRsZTogIkJ1c2luZXNzIEFuYWx5dGljcyAtIE1JRFRFUk0iDQphdXRob3I6ICJSYWNoZWwgSGxhdmFjZWsiDQpkYXRlOiAiSnVseSAyNCwgMjAxNyINCm91dHB1dDoNCiAgaHRtbF9ub3RlYm9vazogZGVmYXVsdA0KICBodG1sX2RvY3VtZW50OiBkZWZhdWx0DQogIHBkZl9kb2N1bWVudDogZGVmYXVsdA0Kc3VidGl0bGU6IENNRSBHcm91cCBGb3VuZGF0aW9uIEJ1c2luZXNzIEFuYWx5dGljcyBMYWINCi0tLQ0KDQoNCg0KIyMjIFF1ZXN0aW9uIDE6IFJlYWQgaW4gdGhlIGdhbWJsaW5nIGRhdGFzZXQgY2hlY2sgdGhlIGZpcnN0IGNvdXBsZSBvZiByb3dzIGFuZCBkZXNjcmliZSB0aGUgZGF0YSB0eXBlcy4gSWRlbnRpZnkgaW5jb3JyZWN0IGRhdGEgdHlwZXMsIGlmIGFueS4gKCA1IFBvaW50cyApDQoNCmBgYHtyfQ0KZ2RhdGEgPSByZWFkLmNzdigiZGF0YS9nYW1ibGluZy5jc3YiKQ0KZ2RhdGENCmBgYA0KDQpTZXggaXMgYSBjYXRlZ29yaWNhbCBkYXRhIHR5cGUsIHNvIHRoZSB2YWx1ZXMgbGlzdGVkIGluIHRoYXQgY29sdW1uIGFwcGVhciB0byBiZSBpbmNvcnJlY3QuIFN0YXR1cyBjb3VsZCBiZSBhbnkgdHlwZSBvZiBkYXRhIGRlcGVuZGluZyBvbiBob3cgaXQgd2FzIG1lYXN1cmVkLiBJIGFtIGdvaW5nIHRvIGFzc3VtZSB0aGF0IHRoZSBudW1lcmljYWwgZGF0YSBwcmVzZW50ZWQgaXMgY29ycmVjdC4gSW5jb21lIGlzIGEgcmF0aW8gZGF0YSB0eXBlIGJlY2F1c2UgdGhlcmUgaXMgYSB0cnVlIHplcm8uIFRoZSBjb2x1bW4gYXBwZWFycyB0byBiZSBwcmVzZW50ZWQgY29ycmVjdGx5LiBWZXJiYWwgY291bGQgYmUgYW55dGhpbmcsIHNvIEkgd2lsbCBhc3N1bWUgaXQgaXMgcHJlc2VudGVkIGFwcHJvcHJpYXRlbHkuIEZpbmFsbHksIGdhbWJsZSBhcHBlYXJzIHRvIGJlIGEgZG9sbGFyIGFtb3VudCBpbiB3aGljaCBpdCBpcyBwcmVzZW50ZWQgYXBwcm9wcmlhdGVseS4NCg0KDQoNCiMjIyBRdWVzdGlvbiAyOiBEZXNjcmliZSB0aGUgZGF0YSB1c2luZyBmdWxsIHNlbnRlbmNlcyBhbmQgdXNpbmcgZGVzY3JpcHRpdmUgc3RhdGlzdGljcy4gKCA1IFBvaW50cyApDQoNCmBgYHtyfQ0KbWVhbkluY29tZSA9IG1lYW4oZ2RhdGEkaW5jb21lKQ0KbWVhbkluY29tZQ0KbWF4SW5jb21lID0gbWF4KGdkYXRhJGluY29tZSkNCm1heEluY29tZQ0KbWluSW5jb21lID0gbWluKGdkYXRhJGluY29tZSkNCm1pbkluY29tZQ0Kc3VtbWFyeShnZGF0YSkNCmBgYA0KDQpPcmlnaW5hbGx5LCBJIHN0YXJ0ZWQgbG9va2luZyBmb3IgYmFzaWMgc3RhdGlzdGljcyBmb3IgdGhlIGluY29tZSBjb2x1bW4gb2YgdGhlIGRhdGEuIFRoZW4sIEkgZGVjaWRlZCB0byBwdWxsIG9mIHRoZSBiYXNpYyBzdGF0aXN0aWNzIG9mIGFsbCB0aGUgZGF0YSB1c2luZyB0aGUgc3VtbWFyeSBmdW5jdGlvbi4gDQoNCg0KDQojIyMgUXVlc3Rpb24gMzogRXN0aW1hdGUgdGhlIHVwcGVyIGFuZCBsb3dlciB0aHJlc2hvbGQgZm9yIHRoZSB2ZXJiYWwgc2NvcmUgKCA1IFBvaW50cyApDQoNCkhJTlQ6ICBBIGNvbW1vbiB3YXkgdG8gZXN0aW1hdGUgdGhlIHVwcGVyIGFuZCBsb3dlciB0aHJlc2hvbGQgaXMgdG8gdGFrZSB0aGUgbWVhbiAoKyBvciAtKSAzICogc3RhbmRhcmQgZGV2aWF0aW9uLg0KDQpgYGB7cn0NCm1lYW5WZXJiYWwgPSBtZWFuKGdkYXRhJHZlcmJhbCkNCnNkVmVyYmFsID0gc2QoZ2RhdGEkdmVyYmFsKQ0KdXBwZXJWZXJiYWwgPSBtZWFuVmVyYmFsICsgKDMpICogc2RWZXJiYWwNCnVwcGVyVmVyYmFsDQpsb3dlclZlcmJhbCA9IG1lYW5WZXJiYWwgLSAoMykgKiBzZFZlcmJhbA0KbG93ZXJWZXJiYWwNCmBgYA0KDQpGaXJzdCBJIGZvdW5kIHRoZSBtZWFuIGFuZCBzdGFuZGFyZCBkZXZpYXRpb24gb2YgdGhlIHZlcmJhbCBzY29yZXMgZnJvbSB0aGUgZ2FtYmxpbmcgZGF0YS4gVGhlbiBJIHVzZWQgdGhlIGVxdWF0aW9uIHRvIGZpbmQgdGhlIHVwcGVyIGFuZCBsb3dlciB0aHJlc2hvbGRzIGZvciB0aG9zZSB2ZXJiYWwgc2NvcmVzLg0KDQoNCg0KIyMjIFF1ZXN0aW9uIDQ6IENhbGN1bGF0ZSB0aGUgei1zY29yZSBmb3IgaW5jb21lIHdoZXJlIHg9MTMuIEJhc2VkIG9uIHRoZSBpbmNvbWUgdmFsdWUgeD0xMyBwb3VuZHMgcGVyIHdlZWssIGhvdyB3b3VsZCB5b3UgcmF0ZSB0aGUgaW5jb21lOiBsb3cgaW5jb21lLCBhdmVyYWdlIGluY29tZSwgaGlnaCBpbmNvbWUuIFdoeT8gKCA1IFBvaW50cyApDQoNCkhpbnQ6IHpzY29yZSA9ICh4IC0gbWVhbikvc2QNCmBgYHtyfQ0KaW5jb21lID0gZ2RhdGEkaW5jb21lDQptZWFuSW5jb21lID0gbWVhbihpbmNvbWUpDQpzZEluY29tZSA9IHNkKGluY29tZSkNCnpzY29yZUluY29tZSA9ICgxMyAtIG1lYW5JbmNvbWUpL3NkSW5jb21lDQp6c2NvcmVJbmNvbWUNCmBgYA0KDQpGaXJzdCwgSSBuYW1lZCB0aGUgaW5jb21lIGNvbHVtbiBvZiB0aGUgZ2FtYmxpbmcgZGF0YS4gVGhlbiBJIGZvdW5kIHRoZSBtZWFuIGFuZCB0aGUgc3RhbmRhcmQgZGV2aWF0aW9uLiBGaW5hbGx5LCBpIHBsdWdnZWQgdGhvc2UgaW50byB0aGUgenNjb3JlIGVxdWF0aW9uIHRvIGRldGVybWluZSBhIHpzY29yZSBvZiAyLjM1IGZvciB0aGUgaW5jb21lLiBTaW5jZSB0aGUgenNjb3JlIGlzIHBvc2l0aXZlLCBhbmQgbm90IGNsb3NlIHRvIHplcm8sIEkgd291bGQgcmF0ZSB0aGUgaW5jb21lIGFzIGhpZ2ggaW5jb21lLiANCg0KDQoNCiMjIyBRdWVzdGlvbiA1OiBDcmVhdGUgYSBoaXN0b2dyYW0gZm9yIHRoZSB6c2NvcmUgb2YgaW5jb21lLiBXaGF0IGRvIHlvdSBub3RpY2UgYWJvdXQgdGhlIHNoYXBlPyAoIDUgUG9pbnRzICkNCg0KSGludDogVG8gcGxvdCBhIGhpc3RvZ3JhbSwgdXNlIHRoZSBmdW5jdGlvbiBoaXN0KHZhcmlhYmxlKS4gDQpgYGB7cn0NCnpzY29yZXNJbmNvbWUgPSAoaW5jb21lIC0gbWVhbkluY29tZSkvc2RJbmNvbWUNCmhpc3QoenNjb3Jlc0luY29tZSkNCmBgYA0KDQpUaGUgc2hhcGUgb2YgdGhlIGhpc3RvZ3JhbSBvZiBpbmNvbWUgenNjb3JlcyBpcyBkb3dud2FyZCBzbG9waW5nLiBUaGVyZSBhcmUgYSBsb3Qgb2YgbmVnYXRpdmUgenNjb3JlcyBhbmQgenNjb3JlcyBhcm91bmQgemVyby4gVGhlcmUgYXJlIG5vdCB2ZXJ5IG1hbnkgenNjb3JlcyBoaWdoZXIgdGhhbiBwb3NpdGl2ZSAxLiANCg0KDQoNCiMjIyBRdWVzdGlvbiA2OiBBbmFseXplIHRoZSBjb3JyZWxhdGlvbiBwbG90IGJlbG93LiBHaXZlIHJlbGF2YW50IGluZm9ybWF0aW9uIGFib3V0IHRoZSBuZWdhdGl2ZSBjb3JyZWxhdGVkLCBubyBjb3JyZWxhcmVkIGFuZCBwb3NpdGl2ZSBjb3JyZWxhdGVkIHZhcmlhYmxlcy4gKCA1IFBvaW50cyApDQoNCiFbXShkYXRhL2NvcnJfcGxvdC5wbmcpDQoNCkFsbCBvZiB0aGUgdmFyaWFibGVzIHByZXNlbnRlZCBhcHBlYXIgdG8gYmUgcG9zaXRpdmVseSBjb3JyZWxhdGVkIHNpbmNlIGFsbCBvZiB0aGUgY2lyY2xlcyBhcmUgYmx1ZS4gR2l2ZW4gdGhhdCwgdGhlcmUgYXJlIHNvbWUgd2l0aCBzdHJvbmdlciBwb3NpdGl2ZSBjb3JyZWxhdGlvbnMgdGhhbiBvdGhlcnMuIE5hdHVyYWxseSwgYWxsIHRoZSBsYXJnZSwgZGFyayBibHVlIGNpcmNsZXMgcmVwcmVzZW50IGEgcG9zaXRpdmUgY29ycmVsYXRpb24gb2YgMSBiZWNhdXNlIHRoZSB2YXJpYWxlcyBmb3IgdGhvc2UgYXJlIHRoZSBzYW1lLiBUaGVyZSBkb2VzIGFwcGVhciB0byBiYSBhIHN0cm9uZyBjb3JyZWxhdGlvbiBiZXR3ZWVuIHVzZXIgdm90ZXMgYW5kIGZhY2Vib29rIGxpa2VzLiBXaGVyZSB0aGVyZSBkb2Vzbid0IHNlZW0gdG8gYmUgYXMgc3Ryb25nIG9mIGEgcG9zaXRpdmUgY29ycmVsYXRpb24gaXMgYmV0d2VlbiBjYXN0IGxpa2VzIGFuZCBkaXJlY3Rvciwgc2NvcmUgYW5kIGRpcmVjdG9yLCBncm9zcyBhbmQgZGlyZWN0b3IsIHNjb3JlIGFuZCBjYXN0IGxpa2VzLCBhbmQgc2NvcmUgYW5kIGdyb3NzLiBBbGwgb2YgdGhlc2Ugc3RpbGwgaGF2ZSBhIHBvc2l0aXZlIGNvcnJlbGF0aW9uLCBidXQgZGVmaW5pdGVseSBub3QgdmVyeSBzdHJvbmcuIA0KDQoNCg0KIyMjIEV4dHJhIENyZWRpdDogQW5hbHl6ZSB0aGUgY29ycmVsYXRpb24gdGFibGUgYmVsb3cuIEdpdmUgcmVsYXZhbnQgaW5mb3JtYXRpb24gYWJvdXQgdGhlIG5lZ2F0aXZlIGNvcnJlbGF0ZWQsIG5vIGNvcnJlbGFyZWQgYW5kIHBvc2l0aXZlIGNvcnJlbGF0ZWQgdmFyaWFibGVzLiAoIDUgUG9pbnRzICkNCg0KYGBge3J9DQojIENyZWF0ZSBhIGNvcnJlbGF0aW9uIHRhYmxlICJjb3IobW92aWVzKSINCm1vdmllcyA9IHJlYWQuY3N2KCJkYXRhL21vdmllcy5jc3YiKQ0KY29yKG1vdmllcykNCg0KYGBgDQoNCkxlbmd0aCwgYnVkZ2V0LCBhY3RvcjEsIGFjdG9yMiwgYWN0b3IzLCBjcml0aWNfcmV2aWV3cywgYW5kIHVzZXJfcmV2aWV3cyBkbyBub3QgaGF2ZSBhbnkgY29ycmVsYXRpb24gd2l0aCBhbnl0aGluZyBlbHNlIGV4Y2VwdCB0aGVtc2VsdmVzLiBEaXJlY3RvciBoYXMgc29tZSBwb3NpdGl2ZSBjb3JyZWxhdGlvbnMsIGJ1dCBub25lIGFyZSBhbGwgdGhhdCBzdHJvbmcgZXhjZXB0IGZvciB3aXRoIGl0c2VsZi4gQ2FzdCBsaWtlcyBoYXMgYSBwb3NpdGl2ZSBjb3JyZWxhdGlvbiBvZiAuNDE0IHdpdGggdXNlciB2b3Rlcy4gRmIgbGlrZXMgYW5kIHVzZXIgdm90ZXMgYXJlIHZlcnkgc3Ryb25nbHkgcG9zaXRpdmVseSBjb3JyZWxhdGVkLiBHcm9zcyBhbmQgdXNlcnZvdGVzIGFsc28gaGF2ZSBhIHByZXR0eSBzdHJvbmcgcG9zaXRpdmUgY29ycmVsYXRpb24uIFNjb3JlIGhhcyBzZXZlcmFsIHBvc2l0aXZlIGNvcnJlbGF0aW9ucyBhcyB3ZWxsLiANCg0KDQoNCg0KDQo=