Data 606 Final Project

Vyanna Hill

4/20/2022

Abstract

In reflection, the data set is a collection of randomly-picked surveys structured by the Behavioral Risk Factor Surveillance System (BRFSS). For the statistical analysis, the null hypothesis was used to determine if there was a relationship between the survey takers’ education and their routine for their reproductive screenings. The null hypothesis is there is not a difference between the average population size who received their screenings across the surveyors’ education levels. The Alternative hypothesis is there are significant differences between the surveyees’ education groups and their participation in screening. The ANOVA test was distributed as there are four types of education defined.

In preparation for the ANOVA test, both screenings data sets passed their Normal distribution and homogeneity of variance tests. The responses were separated for both screenings for observance on the average mean. The p-value for both tests was below the 0.05 threshold (p-value=<2e-16); which means the relationship between education and screenings was statistically significant.

The boxplot of the average means across education revealed the average mean of college-educated people whose screenings are active is higher than the average of the high school graduate. The education analysis above supports the idea that higher educated women have more active screenings than the rest. There were limitations that could possibly affect the null test, as the data collection is based on location level and not on the individual level. This could mean individuals may have only one screening performed compared to both. As the collection was randomly selected in the location, the clinics chosen might be near or on a college campus.