## Warning: package 'ggplot2' was built under R version 3.6.3
## Warning: package 'dplyr' was built under R version 3.6.3
The Behavioral Risk Factor Surveillance System (BRFSS) is a collaborative project between all states in the United States (US), US territories, and the Centre for Disease Control (CDC). It is designed to measure behavioral risk factors in the adult population in the US. The objective of this survey is to collect data on health practices and behaviors linked to injuries and diseases. Data was collected by telephone surveys since 1984. Later on, in 2011, cellular telephone surveys were also conducted.
Research quesion 1: My first research question will explore the relationship between depressive disorders and smoking habits and alcohol consumption. My initial assumption is that individuals with depressive disorders are more likely to have such habits.
Research quesion 2: For my second question, I will look into the relationship beteen veterans, depression, and having sleep problems. The general consensus about the amount of appropriate sleep hours for adults (18-64) is at least 6 hours. I will be using this value to determine if veterens are getting the recommended amount of sleep.
Research quesion 3: For the final question, I will look into the relationship between cancer and smoking.
Research quesion 1:
Determine drinkers and non drinkers
brfss2013 <- brfss2013 %>%
mutate(alcohol_drinker = ifelse(alcday5 > 100, "Drinker", "Non drinker"))
brfss2013 %>% group_by(alcohol_drinker) %>% summarise(count = n())## # A tibble: 3 x 2
## alcohol_drinker count
## <chr> <int>
## 1 Drinker 235412
## 2 Non drinker 236719
## 3 <NA> 19644
Find number of individuals with depression who have drinking and smoking habits
dep_smoke <- brfss2013 %>% filter(!is.na(addepev2), !is.na(smoke100)) %>% select(addepev2, smoke100)
dep_drink <- brfss2013 %>% filter(!is.na(alcohol_drinker), !is.na(addepev2)) %>% select(addepev2, alcohol_drinker)Summaries
depsmoke_sum <- dep_smoke %>% group_by(addepev2, smoke100) %>% summarise(count = n())
depdrink_sum <- dep_drink %>% group_by(addepev2, alcohol_drinker) %>% summarise(count = n())
depsmoke_sum## # A tibble: 4 x 3
## # Groups: addepev2 [2]
## addepev2 smoke100 count
## <fct> <fct> <int>
## 1 Yes Yes 52850
## 2 Yes No 40848
## 3 No Yes 161297
## 4 No No 219742
## # A tibble: 4 x 3
## # Groups: addepev2 [2]
## addepev2 alcohol_drinker count
## <fct> <chr> <int>
## 1 Yes Drinker 41343
## 2 Yes Non drinker 51630
## 3 No Drinker 193242
## 4 No Non drinker 183847
Visualization
According to the visualizations above, we see that there is actually a slightly higher number of non drinkers who have a depressive disorder. This does not align with my assumption that more individuals with depressive disorders would have a drinking habit. For smokers, we see that the number of smokers with depressive disorder is actually higher, but not high enough to suggest a relationship between the two variables.
Notable, there is a significant difference between the number of smokers and non smokers without depressive disorders.
Research quesion 2:
Select veterens, depression, and sleep time variables
vet_dep_slp <- brfss2013 %>%
filter(!is.na(sleptim1), veteran3 == "Yes", !is.na(addepev2), sleptim1 < 12) %>%
select(veteran3, addepev2, sleptim1)
head(vet_dep_slp)## veteran3 addepev2 sleptim1
## 1 Yes Yes 3
## 2 Yes Yes 5
## 3 Yes No 8
## 4 Yes No 5
## 5 Yes No 8
## 6 Yes No 6
Summary
vet_summary <- vet_dep_slp %>% group_by(addepev2) %>% summarise(count = n(), mean = mean(sleptim1), median = median(sleptim1))
vet_summary## # A tibble: 2 x 4
## addepev2 count mean median
## <fct> <int> <dbl> <dbl>
## 1 Yes 9542 6.61 7
## 2 No 50019 7.11 7
Visualization
This bar plot shows the difference of veterans with and without depression.
The mean sleeping times of veterans with depression is 6.6, which is less than the recommended amount but still above the appropriate number of hours of sleep. It is less than the mean sleeping time of veterans without depression.
Research quesion 3:
Only selecting smokers to see their rate of cancer.
smoke_cancer <- brfss2013 %>%
filter(!is.na(chcocncr), smoke100 == "Yes") %>%
select(chcocncr, smoke100)
head(smoke_cancer)## chcocncr smoke100
## 1 No Yes
## 2 No Yes
## 3 Yes Yes
## 4 No Yes
## 5 No Yes
## 6 No Yes
Summary
## # A tibble: 2 x 2
## chcocncr count
## <fct> <int>
## 1 Yes 24586
## 2 No 190007
Visualization
The plot above shows that the number of smokers with cancer are far less than the number of smokers without cancer, which is a bit surprising. This graph may be misleading, since the data only records all types of cancer. If there was data for specifically lung cancer, the difference may not be very large.