Interested in understanding how gambling differs between males and females

require(faraway)
## Loading required package: faraway
data("teengamb")
summary(teengamb)
##       sex             status          income           verbal     
##  Min.   :0.0000   Min.   :18.00   Min.   : 0.600   Min.   : 1.00  
##  1st Qu.:0.0000   1st Qu.:28.00   1st Qu.: 2.000   1st Qu.: 6.00  
##  Median :0.0000   Median :43.00   Median : 3.250   Median : 7.00  
##  Mean   :0.4043   Mean   :45.23   Mean   : 4.642   Mean   : 6.66  
##  3rd Qu.:1.0000   3rd Qu.:61.50   3rd Qu.: 6.210   3rd Qu.: 8.00  
##  Max.   :1.0000   Max.   :75.00   Max.   :15.000   Max.   :10.00  
##      gamble     
##  Min.   :  0.0  
##  1st Qu.:  1.1  
##  Median :  6.0  
##  Mean   : 19.3  
##  3rd Qu.: 19.4  
##  Max.   :156.0

The variables in the dataset were sex, status, income, verbal, and gamble. It is obvious that sex is not summarized correctly since the minimum value is 0 and the maximum is 1 which tells me that females were coded as 1 and males as 0 (or vise versa) so I will go ahead and factor the data to make sure that I get accurate summary statistics.

teengamb$sex <- factor(teengamb$sex)
levels(teengamb$sex) <- c("male", "female")
summary(teengamb)
##      sex         status          income           verbal          gamble     
##  male  :28   Min.   :18.00   Min.   : 0.600   Min.   : 1.00   Min.   :  0.0  
##  female:19   1st Qu.:28.00   1st Qu.: 2.000   1st Qu.: 6.00   1st Qu.:  1.1  
##              Median :43.00   Median : 3.250   Median : 7.00   Median :  6.0  
##              Mean   :45.23   Mean   : 4.642   Mean   : 6.66   Mean   : 19.3  
##              3rd Qu.:61.50   3rd Qu.: 6.210   3rd Qu.: 8.00   3rd Qu.: 19.4  
##              Max.   :75.00   Max.   :15.000   Max.   :10.00   Max.   :156.0

As mentioned earlier, sex was not summarized correctly because it was coded as a binary variable (values of 0 and 1 only) so I changed it from a categorical variable to a quantitative variable and found that there were 28 males and 19 females in the study. Next, I will look at the relationships between variables that interests me, specifically the relationship between gambling behaviors, gambling expenditure and income.

par(mfrow=c(1,2))
hist(teengamb$gamble,xlab="Gambling Expenditure",main="")

par(mfrow=c(1,2))
hist(teengamb$income,xlab="Income",main="")

The plots shown above reveal that income and gambling expenditure are both positively skewed which means that the majority of the individuals in the sample come from a low income background.

boxplot(gamble ~ sex, data=teengamb)

The boxplot above is interesting because the horizontal lines in the middle of the two boxes reveal that the median of gamble is higher for males than females and the height of the boxes reveal that the variability among males is much greater than the variability among females. All in all, and to answer my original question which was whether gambling differed between males and females, it does because as evidenced by the boxplot males have a greater gambling expenditure and more variability when compared to their female counterparts.