I. Introduction

I am attempting to determine whether gender and weight category are independent. If they’re dependent and gender influences a person’s weight, it follows that one gender might be more at risk for weight-related medical conditions. Knowing this would help raise awareness for the health risks, as well as informing outside parties about trends in weight categories. For example, marketers might want to know if gender influences weight category while trying to sell a product to encourage weight loss. I will be analyzing this data with a chi-squared test for independence.

II. Data

The mosaic plot suggests that men represent a majority in one category, “overweight”, while women are the majority in all of the rest. There’s a difference, but it’s unclear whether it’s a significant difference.

##    
##     normal obese overweight underweight
##   F    700   994        744          51
##   M    605   846        953          22

A look at a frequency table of observed values confirms that even the category with the largest deviation from a fifty-fifty split, underweight, has a difference between the values that is actually far less than the others. While at least a hundred men or women make up the difference in other categories, the split in underweight is 22 to 51. This does not, however, answer my question conclusively.

III. Analysis

To determine whether there is significant difference between the weight groups based on gender, and through that whether they are independent, I will have to perform the chi-squared test and compare the observed values to my expected values.

H_0: Weight category is independent of gender. H_A: Weight category is dependent on gender.

Based on the observed values

##    
##     normal obese overweight underweight
##   F    700   994        744          51
##   M    605   846        953          22

our test-statistic is 55.2822891, with p-value 5.977087710^{-12}.

The \(e_{ij}\) values are:

##    
##       normal    obese overweight underweight
##   F 660.8637 931.7925    859.376    36.96785
##   M 644.1363 908.2075    837.624    36.03215

The \(o_{ij}\) values are:

##    
##     normal obese overweight underweight
##   F    700   994        744          51
##   M    605   846        953          22

IV. Interpretation

I would see this data less than .0000000006% of the time if weight categories were independent of gender. As p-value is much smaller than any commonly used value of alpha, I reject the null. I can conclude with over 99% confidence that weight category is dependent on gender.

V. Conclusion

I can conclude with over 99% confidence that weight category is dependent on gender.

Code

I. Introduction

patients <- read.csv("C:/Users/Lisa/Downloads/patients.csv")

I am attempting to determine whether gender and weight category are independent. If they’re dependent and gender influences a person’s weight, it follows that one gender might be more at risk for weight-related medical conditions. Knowing this would help raise awareness for the health risks, as well as informing outside parties about trends in weight categories. For example, marketers might want to know if gender influences weight category while trying to sell a product to encourage weight loss. I will be analyzing this data with a Chi-Squared test for independence.

II. Data

mosaicplot(obese ~ gender, data = patients, main = "Weight Category by Gender", color=c("purple","red"))

The mosaic plot suggests that men represent a majority in one category, “overweight”, while women are the majority in all of the rest. There’s a difference, but it’s unclear whether it’s a significant difference.

two.cat = table(patients$gender,patients$obese)
two.cat
##    
##     normal obese overweight underweight
##   F    700   994        744          51
##   M    605   846        953          22

A look at a frequency table of observed values confirms that even the category with the largest difference, underweight, has a difference between the values that is far less than the others. While at least a hundred men or women make up the difference in other categories, the split in underweight is 22 to 51. This does not, however, take into account whether the same number of males and females were sampled.

III. Analysis

To determine whether there is significant difference between the weight groups based on gender, and through that whether they are independent, I will have to perform the chi-squared test and compare the observed values to my expected values.

H_0: Weight category is independent of gender. H_A: Weight category is dependent on gender.

gender.obese.table = table(patients$gender,patients$obese)
another.test = chisq.test(gender.obese.table,correct = FALSE)
gender.obese.table
##    
##     normal obese overweight underweight
##   F    700   994        744          51
##   M    605   846        953          22

Our test-statistic is 55.2822891, with p-value 5.977087710^{-12}.

The \(e_{ij}\) values are:

another.test$expected
##    
##       normal    obese overweight underweight
##   F 660.8637 931.7925    859.376    36.96785
##   M 644.1363 908.2075    837.624    36.03215

The \(o_{ij}\) values are:

gender.obese.table
##    
##     normal obese overweight underweight
##   F    700   994        744          51
##   M    605   846        953          22

IV. Interpretation

I would see this data less than .0000000006% of the time if weight categories were independent of gender. As p-value is much smaller than any commonly used value of alpha, I reject the null. I can conclude with over 99% confidence that weight category is dependent on gender.

V. Conclusion

I can conclude with over 99% confidence that weight category is dependent on gender.