Setup

Load packages

library(ggplot2)
library(dplyr)
library(grid)
library(pwr)

Load data

load("brfss2013.Rdata")

str(brfss2013)


Part 1: Data

Generalizability: The data includes information from a variety of locations and use both landlines and cell phones. The random sampling leads to a likelihood that the results will be generalizable to the rest of the population. The random

Causality: The data can not be used to imply cause and effect. The data is correlational in nature, which means that people were not randomly assign to smoke cigarettes, exercise, etc.

names(brfss2013)


Part 2: Research questions

Research quesion 1: I would like to take a look at the relationship between whether women have any health coverage and whether they have had a mammogram. I thought it would be interesting to seeing if health coverage has a relationship with women getting this life-saving test.

Research quesion 2: I am interested in whether marital status is related to unhealthy behaviors. I am going to look at differences in binge drinking between married and divorced people.

Research quesion 3: I have insomnia, so I am interested in the relationship between sleep and health. I will look at the relationship between how much sleep someone gets and if they have ever had cancer.


Part 3: Exploratory data analysis

NOTE: Insert code chunks as needed by clicking on the “Insert a new code chunk” button (green button with orange arrow) above. Make sure that your code is visible in the project you submit. Delete this note when before you submit your work.

Research quesion 1:I would like to take a look at the relationship between whether women have any health coverage and whether they have had a mammogram. I thought it would be interesting to seeing if health coverage has a relationship with women getting this life-saving test.

#I will need the following variable names: hlthpln1, sex, hadmam
#I would like to see how many people have coverage first 
brfss2013 %>% 
  group_by(hadmam) %>% 
  summarise(count = n())

brfss2013 %>%
  filter(sex == "Female") %>%
  filter(hlthpln1=="Yes") %>%
  select (hadmam) %>%
  filter(!is.na(hadmam)) %>%
  group_by(hadmam) %>% 
  summarise (count=n()) %>%
  ungroup() %>%
  mutate(rel.freq = paste0(round(100*count/sum(count), 0), "%")) 

brfss2013 %>%
  filter(sex == "Female") %>%
  filter(hlthpln1=="No") %>%
  select (hadmam) %>%
  filter(!is.na(hadmam)) %>%
  group_by(hadmam) %>% 
  summarise (count=n()) %>%
  ungroup() %>%
  mutate(rel.freq = paste0(round(100*count/sum(count), 0), "%"))

brfss2013 %>%
  filter(sex == "Female") %>%
  filter(!is.na(hadmam)) %>%
   filter(!is.na(hlthpln1)) %>%
ggplot(aes(x=hlthpln1,fill=hadmam)) +
geom_bar()

brfss2013 %>%
  filter(sex == "Female") %>%
  filter(!is.na(hadmam)) %>%
   filter(!is.na(hlthpln1)) %>%
ggplot(aes(x=hlthpln1,y=hadmam, group=1)) +
geom_line()
geom_point()


#It appears that women are more likely to have had a mammogram if they have health insurance.

install.packages(“pwr”) table(brfss2013$drnk3ge5)

Research quesion 2: I am interested in whether marital status is related to unhealthy behaviors. I am going to look at differences in binge drinking by whether someone is married or divorced

brfss2013 %>%
  filter(marital == "Married") %>%
  select (drnk3ge5) %>%
  filter(!is.na(drnk3ge5)) %>%
  group_by(drnk3ge5) %>% 
  summarise (count=n()) %>%
  ungroup() %>%
  mutate(rel.freq = paste0(round(100*count/sum(count), 0), "%"))

brfss2013 %>%
  filter(marital == "Divorced") %>%
  select (drnk3ge5) %>%
  filter(!is.na(drnk3ge5)) %>%
  group_by(drnk3ge5) %>% 
  summarise (count=n()) %>%
  ungroup() %>%
  mutate(rel.freq = paste0(round(100*count/sum(count), 0), "%"))
  
#The percentage of people who do not binge drink at all is higher for divorced that married people. 

brfss2013 %>%
select(drnk3ge5,marital) %>%
group_by(drnk3ge5,marital) %>%
filter(marital =="Married" | marital =="Divorced") %>%
summarise (count=n()) %>%
  ungroup() %>%
mutate(rel.freq = paste0(round(100*count/sum(count), 0), "%")) %>%
  ggplot(aes(x=marital,fill=drnk3ge5)) +
  geom_bar()
  
#Divorced people are slightly less likely to binge drink than married people

median(brfss2013$sleptim1)

Research quesion 3: I have insomnia, so I am interested in the relationship between sleep and health. I will look at the relationship between how much sleep someone gets and if they have ever had cancer. table(brfss2013$veteran3)


brfss2013 %>%
  filter(veteran3 == "Yes") %>%
  select (chcocncr) %>%
  filter(!is.na(chcocncr)) %>%
  group_by(chcocncr) %>% 
  summarise (count=n()) %>%
  ungroup() %>%
  mutate(rel.freq = paste0(round(100*count/sum(count), 0), "%"))

brfss2013 %>%
  filter(veteran3 == "No") %>%
  select (chcocncr) %>%
  filter(!is.na(chcocncr)) %>%
  group_by(chcocncr) %>% 
  summarise (count=n()) %>%
  ungroup() %>%
  mutate(rel.freq = paste0(round(100*count/sum(count), 0), "%"))
  
#The percentage of veterans who have cancer is higher than non-veterans.

brfss2013 %>%
select(veteran3,chcocncr) %>%
  filter(!is.na(chcocncr)) %>%
   filter(!is.na(veteran3)) %>%
group_by(veteran3,chcocncr) %>%
summarise (count=n()) %>%
  ungroup() %>%
mutate(rel.freq = paste0(round(100*count/sum(count), 0), "%")) %>%
  ggplot(aes(x=veteran3,fill=chcocncr)) +
  geom_bar()