The following is the background of this data.
This is a code book to find out what each column in the data represents.
Generalizability: Describe how the observations in the sample are collected, and the implications of this data collection method on the scope of inference (generalizability / causality).
In conducting the BRFSS landline telephone survey, interviewers collect data from a randomly selected adult in a household. In conducting the cellular telephone version of the BRFSS questionnaire, interviewers collect data from an adult who participates by using a cellular telephone, resides in a private residence orcollege housing, and received 90 percent or more of their calls on cellular telephones. (Centers for Disease Control and Prevention, 2014, Background section)
Cellular telephone sampling frames are commercially available and the system can call random samples of cellular telephone numbers, but doing so requires specific protocols. The basis of the 2013 BRFSS sampling frame is the Telecordia database of telephone exchanges (e.g., 617-492-0000 to 617-492-9999) and 1,000 banks (e.g., 617-492-0000 to 617-492-0999). (Centers for Disease Control and Prevention, 2014, Sample Description section)
To meet the BRFSS standard for the participating states’ sample designs, one must be able to justify sample records as a probability sample of all households with telephones in the state. All participating areas met this criterion in 2013. Fifty-one projects used a disproportionate stratified sample (DSS) design for their landline samples. Guam and Puerto Rico used a simple random-sample design. (Centers for Disease Control and Prevention, 2014, Sample Description section)
In the type of DSS design that states most commonly used in the BRFSS landline telephone sampling, BRFSS divides telephone numbers into two groups, or strata, which are sampled separately. The high-density and medium-density strata contain telephone numbers that are expected to belong mostly to households. (Centers for Disease Control and Prevention, 2014, Sample Description section)
Since the subjects are randomly selected for a study from a population, the result is then generalizable to the population at large.
Some potential sources of bias could be non-response bias. These survey do not include those who prefer to communicate through face to face conversation, written letters, those who had their landline and mobile phone disconnected due to unpaid bills, or those who are unable to communicate due to serious health issues.
Causality: Describe how the observations in the sample are collected, and the implications of this data collection method on causality.
Subjects are adults who were randomly selected through landline and/or mobile phone. BRFSS project cannot show causality due to it being a retrospective observational study where adults are randomly selected and their responses are recorded.
References:
Centers for Disease Control and Prevention. (2014, August 15). Behavioral Risk Factor Surveillance System: Overview: BRFSS 2013 [PDF]. https://www.cdc.gov/brfss/annual_data/2013/pdf/overview_2013.pdf.
Centers for Disease Control and Prevention. (2014, October 24). Behavioral Risk Factor Surveillance System: 2013 codebook report: Land-line and cell-phone data [PDF]. https://www.cdc.gov/brfss/annual_data/2013/pdf/CODEBOOK13_LLCP.pdf
Research question 1:
How to demonstrate the correlation between smokday2: Frequency Of Days Now Smoking and chccopd1: (Ever Told) You Have (Copd) Chronic Obstructive Pulmonary Disease, Emphysema?
Research question 2:
What is the relation between X_rfdrhv4: Heavy Alcohol Consumption Calculated Variable and those who smokday2: Frequency Of Days Now Smoking?
Research question 3:
How significant is the relationship between pregnant: Pregnancy Status and X_rfdrwm4: Adult Women Heavy Alcohol Consumption Calculated Variable?
Research question 1:
##
## Yes No
## Every day 9999 44747
## Some days 3186 18163
## Not at all 16746 120536
##
## Pearson's Chi-squared test
##
## data: lung
## X-squared = 1210, df = 2, p-value < 2.2e-16
## [1] 1.7664e-263
alpha <- 0.05
ifelse(result1$p.value < alpha, "p-value is significant", "p-value is not significant") ## [1] "p-value is significant"
plot(brfss2013$smokday2,
main = "Daily Cigarettes Status",
xlab = "Figure 1 shows the responses of 491,775 participants regarding their daily smoking habits.")plot(brfss2013$chccopd1,
main = "Having COPD, Emphysema or Chronic Bronchitis Status",
xlab = "Figure 2 portrays the responses of 491,775 participants when asked if they have ever
been told that they have COPD, emphysema or chronic bronchitis.")At 5% significance level, from the sample data, I reject the null hypothesis of independence and conclude a statistical significant association between having COPD, emphysema or chronic bronchitis status and those who smoke cigarettes daily status.
Research question 2:
##
## Every day Some days Not at all
## No 47167 18834 125916
## Yes 6212 1932 8972
##
## Pearson's Chi-squared test
##
## data: liv_lung
## X-squared = 1302.9, df = 2, p-value < 2.2e-16
## [1] 1.192845e-283
alpha <- 0.05
ifelse(result2$p.value < alpha, "p-value is significant", "p-value is not significant") ## [1] "p-value is significant"
plot(brfss2013$X_rfdrhv4,
main = "Heavy Alcohol Consumption Status",
xlab = "Figure 3 demonstrates the responses of 491,775 participants about their heavy alcohol
consumption habits.")plot(brfss2013$smokday2,
main = "Daily Cigarettes Status",
xlab = "Figure 4 shows the responses of 491,775 participants regarding their daily smoking habits.")At 5% significance level, from the sample data, I reject the null hypothesis of independence and conclude that there is a statistical significant association between those with heavy alcohol consumption status and those who smoke cigarettes daily status.
Research question 3:
##
## No Yes
## Yes 2928 40
## No 66828 4066
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: preg_liv
## X-squared = 103.63, df = 1, p-value < 2.2e-16
## [1] 2.443343e-24
alpha <- 0.05
ifelse(result3$p.value < alpha, "p-value is significant", "p-value is not significant") ## [1] "p-value is significant"
plot(brfss2013$pregnant,
main = "Pregnancy Status",
xlab = "Figure 5 demonstrates subjects' responses to the question about pregnancy status.")plot(brfss2013$X_rfdrwm4,
main = "Heavy Drinker Status",
xlab = "Figure 6 presents participants' heavy alcohol consumption habits.")At 5% significance level, from the sample data, I reject the null hypothesis of independence and conclude that pregnancy status and heavy drinker status are statistically associated. This result does not imply a causal relationship.