I am attempting to determine whether gender and weight category are independent. If they’re dependent and gender influences a person’s weight, it follows that one gender might be more at risk for weight-related medical conditions. Knowing this would help raise awareness for the health risks, as well as informing outside parties about trends in weight categories. For example, marketers might want to know if gender influences weight category while trying to sell a product to encourage weight loss. I will be analyzing this data with a chi-squared test for independence.
The mosaic plot suggests that men represent a majority in one category, “overweight”, while women are the majority in all of the rest. There’s a difference, but it’s unclear whether it’s a significant difference.
##
## normal obese overweight underweight
## F 700 994 744 51
## M 605 846 953 22
A look at a frequency table of observed values confirms that even the category with the largest deviation from a fifty-fifty split, underweight, has a difference between the values that is actually far less than the others. While at least a hundred men or women make up the difference in other categories, the split in underweight is 22 to 51. This does not, however, answer my question conclusively.
To determine whether there is significant difference between the weight groups based on gender, and through that whether they are independent, I will have to perform the chi-squared test and compare the observed values to my expected values.
H_0: Weight category is independent of gender. H_A: Weight category is dependent on gender.
Based on the observed values
##
## normal obese overweight underweight
## F 700 994 744 51
## M 605 846 953 22
our test-statistic is 55.2822891, with p-value 5.977087710^{-12}.
The \(e_{ij}\) values are:
##
## normal obese overweight underweight
## F 660.8637 931.7925 859.376 36.96785
## M 644.1363 908.2075 837.624 36.03215
The \(o_{ij}\) values are:
##
## normal obese overweight underweight
## F 700 994 744 51
## M 605 846 953 22
I would see this data less than .0000000006% of the time if weight categories were independent of gender. As p-value is much smaller than any commonly used value of alpha, I reject the null. I can conclude with over 99% confidence that weight category is dependent on gender.
I can conclude with over 99% confidence that weight category is dependent on gender.