mydata = read.csv(file="data/gambling.csv")
The data set is an interval data type, and uses continuous metrics.
sex = mydata$sex
status = mydata$status
income = mydata$income
verbal = mydata$verbal
gamble = mydata$gamble
meansex = mean(sex)
sdsex = sd(sex)
meanstatus = mean(status)
sdstatus = sd(status)
meanincome = mean(income)
sdincome = sd(income)
meanverbal = mean(verbal)
sdverbal = sd(verbal)
meangamble = mean(gamble)
sdgamble = sd(gamble)
meansex
## [1] 0.4042553
sdsex
## [1] 0.4960529
meanstatus
## [1] 45.23404
sdstatus
## [1] 17.26294
meanincome
## [1] 4.641915
sdincome
## [1] 3.551371
meanverbal
## [1] 6.659574
sdverbal
## [1] 1.856558
meangamble
## [1] 19.30106
sdgamble
## [1] 31.51587
THe data includes the variables sex, status, income, verbal, and gamble. The data is quantitative. I found the mean and standard deviation of all variables.
HINT: A common way to estimate the upper and lower threshold is to take the mean (+ or -) 3 * standard deviation.
upper = (meanverbal + 3) * sdverbal
lower = (meanverbal - 3) * sdverbal
upper
## [1] 17.93356
lower
## [1] 6.794213
Hint: zscore = (x - mean)/sd
zscore = (13 - meanincome)/sdincome
zscore
## [1] 2.353481
The zscore of income is low compared to the mean of income which is 4.6419, but within range of distribution. The income is average.
Hint: To plot a histogram, use the function hist(variable).
hist(zscore)