1. Introduction

The Behavioral Risk Factor Surveillance System (BRFSS) is an ongoing surveillance system designed to measure behavioral risk factors for the non-institutionalized adult population (18 years of age and older) residing in the US.

The BRFSS objective is to collect uniform, state-specific data on preventive health practices and risk behaviors that are linked to chronic diseases, injuries, and preventable infectious diseases that affect the adult population.

It seems fair to assume that there is some sort of correlation between preventive health practices / risk behaviours, level of education and gender.

Indeed,one might believe that the higher a person’s education level is, the better its preventive health practices are. As well, general popular belief might tend to consider females more careful about their health and less prone to risky behaviours then males.

We will explore the validity of this hypothesis through three different angles:

2. The Data

I investigated these three angles using the BRFSS dataset which contains 491775 observations of 330 variables which were collected through a telephone-based survey. The BRFSS collects data from adults aged 18 years or older. Households are randomly selected from blocks of potential phone numbers in an area. Once responses are compiled, the CDC analyzes the demographic characteristics of the sample and assigns weights to each of the responses to ensure estimates are reflective of the known population. This weighting allows us to generalise the results of the BRFSS survey to the US population as a whole.

No causality can be expected from such survey, as many extraneous variables might infer, but rather potential correlations or associations.

One key potential bias to be taken in consideration is the level of honesty of respondents considering they’re asked quite personal questions about their health and risk behaviour practices.

3. Questions

3.1 Question 1

First, we might be interested in exploring the relationship between preventive health practices, the level of education and gender.

One of the obvious way to prevent medical issues is to go through regular medical routine check-ups (A routine check-up is a general physical exam, not an exam for a specific injury, illness, or condition.), indeed, when one sees a doctor regularly, the doctor is able to see the changes in the patient’s body and therefore, is able to detect health conditions or diseases early. By accessing the correct health services, screenings, and treatment, one ensures that they are taking a much-needed and important step towards living a longer and healthier life.

Is it fair to assume that the level of education and the gender of a person influences the length of time since this person last had a routine check-up ?

3.2 Question 2

Then, we might be interested in exploring the relationship between the habit of wearing a seatbelt, most probably the easiest way to minimize risks of dying or being injured in case of a car accident the level of education and gender.

According to the US National Highway Traffic Administration, car accidents happen every 60 seconds. That equates to about 5.25 million accidents across the nation on a yearly basis. Statistics by the Association for Safe International Road Travel show that 37,000 people die annually due to vehicular accidents with an additional 2.35 million either injured or disabled.

Is it fair to assume that the level of education and the gender of a person influences its habit of wearing a seat-belt while driving or riding in a car ?

3.3 Question 3

Finally, we might be interested in exploring the relationship between alcohol drinking habits, level of education and gender.

According to the CDC, excessive alcohol use is responsible for more than 95,000 deaths in the United States each year, or 261 deaths per day. These deaths shorten the lives of those who die by an average of almost 29 years, for a total of 2.8 million years of potential life lost. It is a leading cause of preventable death in the United States, and cost the nation $249 billion in 2010.

Is it fair to assume that the level of education and gender of a person influences the way this person’s alcohol drinking behaviour ?

4. Exploratory Data Analysis

In order to proceed with the EDA, the BRFSS dataset has been loaded in R environment. Required packages for data tidying, transformation and visualization (dplyr, tidyverse, ggplot2 and basictabler) have been installed.

Two recurrent variables that will be used for all three questions are:

For questions 2 and 3 I will hide the R. Code chunks (coding logic being mostly the same than for question 1) to improve readability and focus on the analysis

4.1 Question 1

Is it fair to assume that the level of education and the gender of a person influences the length of time since this person last had a routine check-up ?

#Select only the relevant variables: Level fo education, Gender, and "Time since last Checkup"#
CheckUp <- brfss2013 %>% select(X_educag, sex, checkup1) %>%
  na.omit(CheckUp)
#Join Level of Education and Gender columns for better readability#
CheckUp_Gen <- CheckUp %>%
  unite(X_educag, sex, col = "Education.Level_Gender", sep = " / ") 
#Modify type of response to frequency of responses for "length of time since last checkup"#
CheckUp_Gen <- CheckUp_Gen %>% 
  summarise(count(CheckUp_Gen, Education.Level_Gender, checkup1))
#Rename columns in a more descriptive way#
CheckUp_Gen <- CheckUp_Gen %>%
  rename(Last_Checkup = checkup1) %>%
  rename(Frequ. = n)
#Convert Frequency to Relative Frequency (Percentage) 1#
as.percent <- function(x, percent_col = "percent", ...) {
    class(x) <- c("percent", class(x))
    attributes(x)[["percent_col"]] <- percent_col
    x
}
print.percent <- function(x, ...) {
    percent_col <- attributes(x)[["percent_col"]]
    x[[percent_col]] <- paste0(round(100 * x[[percent_col]], 0), "%")   
    class(x) <- class(x)[!class(x)%in% "percent"]
    print(x)
}
#Convert Frequency to Relative Frequency (Percentage) 2#
CheckUp_Gen <- CheckUp_Gen %>%
  group_by (Education.Level_Gender) %>%
  mutate(percent= formattable::percent(Frequ. / sum(Frequ.))) %>%
  as.percent()
#Arrange dataset first by by Level of Education / Gender then by length of time since last checkup#
CheckUp_Gen <- CheckUp_Gen %>% arrange(match(Education.Level_Gender, c("Attended HighSchool / Female", "Attended HighSchool / Male", "Graduated HighSchool / Female", "Graduated HighSchool / Male", "Attended College / Female", "Attended College / Male", "Graduated from College / Female", "Graduated from College / Male")), (match(Last_Checkup, c("Within 1 Year", "Within 2 Years", "Within 5 Years", "Over 5 Years", "Never"))))
#Pivot the dataset for better readability#
CheckUp_Gen <- CheckUp_Gen %>% 
  select(-Frequ.) %>%
  pivot_wider(names_from = Education.Level_Gender, values_from = percent)
#Create a table with a cleaner format using basictabler package#
CheckUpTable_Gen <- qtbl(CheckUp_Gen, firstColumnAsRowHeaders=TRUE)
CheckUpTable_Gen$theme <-"largeplain"

On top of Gender and Education Level variables, I chose for this question the “checkup1” Categorical Variable: “About how long has it been since you last visited a doctor for a routine checkup?” as it is obviously directly linked to the question.

I deleted of the N/A as well as of the “Don’t know/Not Sure” and “Refused” observations, which led to a tidier data set of 483,309 observations of 3 variables.

Joining “Gender” and “Level of Education” variables , modifying type of responses to “Relative Frequency of Responses”, then pivoting the data set in order to improve readability reduced the set to 5 observations of 8 variables.

The table below is the result of the cleaned and tidied data set, showing us the relative frequency of length of time since a person last had a routine medical check-up depending on its Level of Education and Gender:

CheckUpTable_Gen$renderTable(styleNamePrefix="t2")

Before addressing the question, I believe it is relevant to narrow the data set first on Gender specific then Education Level specific variables.

#Create a Gender focused dataset
CheckUp_Gender <- CheckUp_Gender %>%
  group_by(Gender, Last_Checkup) %>%
  summarise(Total = sum(Frequ.))
CheckUp_Gender <- CheckUp_Gender %>%
  mutate(Percent= formattable::percent(Total / sum(Total))) %>%
  as.percent() %>%
  select(-Total) %>%
  arrange(match(Gender, c("Female", "Male")), (match(Last_Checkup, c("Within 1 Year", "Within 2 Years", "Within 5 Years", "Over 5 Years", "Never"))))
#Create a Gender focused table with a cleaner format using basictabler package#
CheckUpTable_Gender <- qtbl(CheckUp_Gender_Pivot, firstColumnAsRowHeaders=TRUE)
CheckUpTable_Gender$theme <-"largeplain"

Gender Specific Table showing the relative frequency of length of time since a person last had a routine medical check-up depending on its Gender:

CheckUpTable_Gender$renderTable(styleNamePrefix="t2")
CheckUp_Bar_Gender <- ggplot(CheckUp_Gender, aes(fill=Gender, x=Last_Checkup, y=Percent)) + 
  geom_bar(stat="identity", width = 0.7, position = position_dodge(0.8)) +  
  scale_fill_brewer(palette="Set2") +
  geom_text(aes(label=Percent), position=position_dodge(width = .9), vjust=-0.3, size=3) +
  aes(x = fct_inorder(Last_Checkup)) +
  labs(title = 'Length of Time Since Last Checkup / Gender') +
  labs(x=NULL, y=NULL) +
  theme_minimal() +
  theme(legend.position = "bottom") +
  theme(legend.key.size = unit(0.5, 'cm')) +
  theme(legend.title=element_blank())

Related Gender Specific Bar Plot:

CheckUp_Bar_Gender

#Create a Education Level focused dataset
CheckUp_Edu <- CheckUp_Edu %>%
  group_by(Education_Level, Last_Checkup) %>%
  summarise(Total = sum(Frequ.))
CheckUp_Edu <- CheckUp_Edu %>%
  mutate(Percent= formattable::percent(Total/sum(Total))) %>%
  as.percent() %>%
  select(-Total) %>%
  arrange(match(Education_Level, c("Attended HighSchool", "Graduated HighSchool", "Attended College", "Graduated from College
")), (match(Last_Checkup, c("Within 1 Year", "Within 2 Years", "Within 5 Years", "Over 5 Years", "Never"))))
CheckUp_Edu_Pivot <- CheckUp_Edu %>%
  pivot_wider(names_from = Last_Checkup, values_from = Percent)
#Create an Education Level focused table with a cleaner format using basictabler package#
CheckUpTable_Edu <- qtbl(CheckUp_Edu_Pivot, firstColumnAsRowHeaders=TRUE)
CheckUpTable_Edu$theme <-"largeplain"

Education Level Specific Table showing the relative frequency of length of time since a person last had a routine medical check-up depending on its Level of Education:

CheckUpTable_Edu$renderTable(styleNamePrefix="t2")
CheckUp_Bar_Edu <- ggplot(CheckUp_Edu, aes(fill=Education_Level, x=Last_Checkup, y=Percent)) + 
  geom_bar(stat="identity", width = 0.75, position = position_dodge(0.85)) +  
  scale_fill_brewer(palette="Set2") +
  geom_text(aes(label=Percent), position=position_dodge(width = 1.1), vjust=-0.3, size=2.5) +
  aes(x = fct_inorder(Last_Checkup)) +
  aes(fill = fct_inorder(Education_Level)) +
  labs(title = 'Length of Time Since Last Checkup / Level of Education') +
  labs(x=NULL, y=NULL) +
  theme_minimal() +
  theme(legend.position = "bottom")+
  theme(legend.key.size = unit(0.3, 'cm')) +
  theme(legend.title=element_blank())

Related Education Level Specific Bar Plot:

CheckUp_Bar_Edu

Discussion:

The first Table, showing us the global picture with combined 3 variables, tells us there is indeed some sort of correlation between Gender/Level of Education and the length of time since a person last had a routine medical check-up. This correlation is further confirmed, especially when focusing on Gender then, even though slightly less, on Education Level:

  • High Level of Education Women tend to be proactive at getting routing medical check-ups than lower education men (Women that graduated from College having their last check-up within a year = 77.2%; Men that attended High School having their last check-up within a year = 65.42%);

  • The Gender specific Table/Plot seems to confirm a strong gender based correlation (Women having their last check-up within a year = 76.73%; Men having their last check-up within a year = 68.52%);

  • The Education Level specific Table/plot is following the same trend, in a less strong way than by Gender though (College graduates having their last check-up within a year = 74.39%; Persons that attended High School having their last check-up within a year = 72.19%).

4.2 Question 2

Is it fair to assume that the level of education and the gender of a person influences its habit of wearing a seat-belt while driving or riding in a car ?

On top of Gender and Education Level variables, I chose for this question the “seatbelt” Categorical Variable: “How often do you use seat belts when you drive or ride in a car?” as it is obviously directly linked to the question.

I deleted of the N/A as well as of the “Never drive or ride in a car” and “Refused” observations. Joining “Gender” and “Level of Education” variables , modifying type of responses to “Relative Frequency of Responses”, then pivoting the data set in order to improve readability reduced the set to 5 observations of 8 variables.

The table below is the result of the cleaned and tidied data set, showing us the relative frequency of seat-belt usage depending on Level of Education and Gender:

Again, before addressing the question, let’s narrow the data set on both Gender specific then Education Level specific variables.

Gender specific Table showing us the relative frequency of seat-belt usage depending on Gender only:

Related Gender Specific Bar Plot:

Education Level specific Table showing us the relative frequency of seat-belt usage depending on Level of Education only:

Related Education Level Specific Bar Plot:

Discussion:

The first Table, showing us the global picture with combined 3 variables, tells us there is indeed some sort of correlation between Gender/Level of Education and usage of seat-belts when driving or riding a car. This correlation is further confirmed when focusing on Gender and on Education Level:

  • High Level of Education Women tend to have adopted the habit of wearing seat-belts more than lower education men (Women that graduated from College always wearing seat-belts = 92.28%; Men that attended High School always wearing a seat-belt = 78.51%);

  • The Gender specific Table/Plot seems to confirm a gender based correlation (Women always wearing seat-belt = 89.88%; Men always wearing seat-belt = 81.37%. Women never wearing seat-belts = 1.04%; Men never wearing seat-belt = 2.53%);

  • The Education Level specific Table/plot is following the same trend (College graduates always wearing seat-belt = 89.90%; Persons that attended High School always wearing seat-belt = 72.19%. College graduates never wearing seat-belt = 0.81%; Persons that attended High School never wearing seat-belt = 3.22%).

4.3 Question 3

Is it fair to assume that the level of education and gender of a person influences tit’s alcohol drinking behaviour ?

Finally, on top of Gender and Education Level variables, I chose for this question the “avedrnk2” Categorical Variable, among many others related to alcohol consumption: “During the past 30 days, on the days when you drank, about how many drinks did you drink on the average?” as it it seem to me the most relevant to assess general drinking habits.

I deleted of the N/A as well as of the “Don’t know/not sure” and “Refused” observations. Joining “Gender” and “Level of Education” variables , modifying type of responses to “Relative Frequency of Responses”, then pivoting the data set in order to improve readability reduced the set to 8 observations of 2 variables.

The table below is the result of the cleaned and tidied data set, showing us the average number of alcohol drinks per day within past month depending on Level of Education and Gender:

Before addressing the question, let’s again narrow the data set on both Gender specific and Education Level specific variables.

Gender specific Plot showing us the average number of alcohol drinks per day within past month depending on Gender only:

Education Level specific Plot showing us the average number of alcohol drinks per day within past month depending on Level of Education only:

Discussion

The first Table, showing us the global picture with combined 3 variables, tells us there is indeed some sort of correlation between Gender/Level of Education and drinking habits. This correlation is further confirmed when focusing on Gender and on Education Level:

  • High Level of Education Women drink less than lower level of education men (Women that graduated from College= avg 1.64 drinks; Men that attended High School = 3.91 drinks);

  • The Gender specific Table/Plot seems to confirm a gender based correlation (Women = 1.81 drinks; Men = 2.65);

  • The Education Level specific Table/plot is following the same trend (College graduates = 1.88 drinks; Persons that attended High School = 3.32).

5. Conclusion

The Exploratory Data Analysis we went through for these three questions tend to show a correlation between Gender, Level of Education and some specific preventive health practices and risk behaviours.

The explored data indeed shows that women and/or highly educated persons tend to be more proactive at going through regular routine medical check-ups, more prone to wearing seat-belts while driving or riding a car and having healthier drinking habits than men and/or low education level persons.

As the CDC analyzed the demographic characteristics of the sample and assigned weights to each of the responses to ensure estimates are reflective of the known population, we can generalise the results to the US population as a whole, keeping in mind that the data are from 2013.

While a correlation between Gender, Level of Educations and the three specific variables we explored has been demonstrated, we would require further studies and more specific surveys to assess any kind of causality.