Load the ggplot2
package which creates elegant data visualizations. The dplyr
package allows to provide a felxbile way for data manipulation.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.2
library(dplyr)
Since the data is downloaded and saved into the same file directory as R Markdown files is, load()
function will reload the R objects. Make sure your data and R Markdown files are in the same directory.
load("brfss2013.RData")
First of all, dim()
command retrieves the dimenstion of an object, which provides the overall structure of the dataset. It provides the information regarding to dataset on the number of observations as well as the variables of interests. The dim()
returns the lengths of the rows and columns respectively. Even though it comes with the BRFSS codebook describing the characteristics of each variable, it would give better understanding of the data to look for the summary of the dataset. The str()
command compactly display the internal structure of the dataset. It is a diagnostic function and alternative to summary()
function. Be cautious using summary()
function when there are so many varibles in the dataset One of variations of ls
applying str
, ls.str()
returns all the names of variables with the range of levels or numbers.
dim(brfss2013)
## [1] 491775 330
str(brfss2013, list.len = 5)
## 'data.frame': 491775 obs. of 330 variables:
## $ X_state : Factor w/ 55 levels "0","Alabama",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ fmonth : Factor w/ 12 levels "January","February",..: 1 1 1 1 2 3 3 3 4 4 ...
## $ idate : int 1092013 1192013 1192013 1112013 2062013 3272013 3222013 3042013 4242013 4242013 ...
## $ imonth : Factor w/ 12 levels "January","February",..: 1 1 1 1 2 3 3 3 4 4 ...
## $ iday : Factor w/ 31 levels "1","2","3","4",..: 9 19 19 11 6 27 22 4 24 24 ...
## [list output truncated]
As you may observe, the argument: list.len
limits the number of output to return. This argument is useful to manage the output range for the summary of dataset with longer lists of variables. brfss2013.RData' is a dataframe with 491,775 observations for 330 variables. If you try
ls.str(brfss2013)` command, it return the complete list of the variables in the dataset with the description of the type of variable, length of levels and range of the numbers. It contains categorical, ordinal and continous variables.
According to the BRFSS webpage, the survey has been conducted since 1981. Since its first survey, the data has been collected through telephone interviews. From 2008, celluar phone interviews had been included, which allowed the accesiibility to hard=to-reach respondents and strengthen representativeness of survey. In implemeting the landline and cellular phone survey, the sampling replys on the RDD method, which allows the random selection of the samples. Randomly selected the adults within a household are interviewed when they are contacted through landlines. The random sampling thru the RDD and random selection of adult within a household strongly allow us to make generalization of the results.
The monthly survey is designed to collect the behaviors regarding to the health. Respondents are asked to answer to the survey questions without being randomly assigned to either control group or treatment group. Any causal result can’t be concluded because the respondents are not randomly assgined to experimental groups.
Research quesion 1: The avergae incidence of taking Aspirin to reduce the heart attack among 18+ y.o. adults is about 8.2%. However, people tend to believe that taking Aspirin have a postive impact on heart related disease and heart attack. Do people diagnosed with either Angina or Coronary Heart Disease or Heart Attack take more Aspirin than usual to reduce the heart attack than usual?
Research quesion 2: People tend to think that heavy smokers have higher chance to get heart attack. Do people smoking frequently and more than 100 cigarettes have more chance to having been diagnosed with heart attack than usual?
Research quesion 3: Does heavy alcohol drinkers are more like to have binge drinking on an occasion than usual? Are more male heavy drinkers to have binge drinking than female heavy drinkers?
NOTE: Insert code chunks as needed by clicking on the “Insert a new code chunk” button (green button with orange arrow) above. Make sure that your code is visible in the project you submit. Delete this note when before you submit your work.
Research quesion 1:
The research question first started with a vague assumption that either those with Angina or Coronary heart disease or those been experiencing heart attack would take Aspirin to reduce the chance of having heart attack.
In order to create more precise and concrete research question of the interest,
it is a good start figuring out the data structures of the variables in question.
In doing so, the knowledge on the dimmensions and characteristics of each variables by str()
and summary()
.
brfss2013 %>%
select(cvdinfr4, cvdcrhd4, rduchart) %>%
str()
## 'data.frame': 491775 obs. of 3 variables:
## $ cvdinfr4: Factor w/ 2 levels "Yes","No": 2 2 2 2 2 2 2 2 2 2 ...
## $ cvdcrhd4: Factor w/ 2 levels "Yes","No": NA 2 2 2 2 2 2 1 2 2 ...
## $ rduchart: Factor w/ 2 levels "Yes","No": NA NA NA NA NA NA NA NA NA NA ...
brfss2013 %>%
select(cvdinfr4, cvdcrhd4, rduchart) %>%
summary()
## cvdinfr4 cvdcrhd4 rduchart
## Yes : 29284 Yes : 29064 Yes : 40145
## No :459904 No :458288 No : 5595
## NA's: 2587 NA's: 4423 NA's:446035
The variables in question are all categorical variables with “yes” or “no” answers. NA answers are witnessed for very low number of cases except for the question whether they take Aspirin to reduce the heart attack.
Next step would be to find out the average incidence of heart attack among those who are 18 y.o. or older as well as the incidence among group of people of interest; those with high blood cholesterol and Angina or Coronary hear disease.
# Nations Heart Attack Rate
transform(as.data.frame(table(brfss2013$cvdinfr4)), percent_col = Freq/nrow(brfss2013)*100)
## Var1 Freq percent_col
## 1 Yes 29284 5.954756
## 2 No 459904 93.519191
ggplot(brfss2013, aes(x=cvdinfr4))+geom_bar()
# Rate of being diagnosed with Angina or Coronary Heart Disease
transform(as.data.frame(table(brfss2013$cvdcrhd4)), percent_col = Freq/nrow(brfss2013)*100)
## Var1 Freq percent_col
## 1 Yes 29064 5.91002
## 2 No 458288 93.19059
ggplot(brfss2013, aes(x=cvdcrhd4))+geom_bar()
# Rate of taking Aspirin to reduce the Heart Attack
transform(as.data.frame(table(brfss2013$rduchart)), percent_col = Freq/nrow(brfss2013)*100)
## Var1 Freq percent_col
## 1 Yes 40145 8.163286
## 2 No 5595 1.137715
ggplot(brfss2013, aes(x=rduchart))+geom_bar()
According to BRFSS 2013 survey, approximately 5.9% of 18+ y.o. adults in the U.S. have ever been diagnosied with heart attack, 5.9% diagnosed with Angina or Coronary Heart Disease, and 8.2% taking Aspirin to reduce the Heart Attack.
rq1_tbl1 <- table(brfss2013$rduchart, brfss2013$cvdcrhd4, dnn = c("Aspirin Intake", "Heart Disease"))
prop.table(rq1_tbl1)
## Heart Disease
## Aspirin Intake Yes No
## Yes 0.14208841 0.73496842
## No 0.00547007 0.11747309
rq1_tbl2 <- table(brfss2013$rduchart, brfss2013$cvdinfr4, dnn = c("Aspirin intakes", "Heart Attack"))
prop.table(rq1_tbl2)
## Heart Attack
## Aspirin intakes Yes No
## Yes 0.13808527 0.73936323
## No 0.00623554 0.11631596
rq1_tbl3 <- table(brfss2013$cvdcrhd4, brfss2013$cvdinfr4, dnn = c("Heart Disease", "Heart Attack"))
prop.table(rq1_tbl3)
## Heart Attack
## Heart Disease Yes No
## Yes 0.02875058 0.03026095
## No 0.02865373 0.91233475
rq1_tbl4 <- table(brfss2013$cvdinfr4, brfss2013$rduchart, brfss2013$cvdcrhd4, dnn = c("Heart Disease", "Heart Attack", "Aspirin Intakes"))
prop.table(rq1_tbl4)
## , , Aspirin Intakes = Yes
##
## Heart Attack
## Heart Disease Yes No
## Yes 0.075020702 0.002148564
## No 0.066538350 0.003222846
##
## , , Aspirin Intakes = No
##
## Heart Attack
## Heart Disease Yes No
## Yes 0.058861709 0.003782368
## No 0.676506793 0.113918668
According to the additional data anaylsis, 14.2% of people diagnosed with heart disease take Aspirin to reduce the chance of having heart attack which is higher than the population (8.2%). About 12.8% of people ever diagnosed with heart attack also take Aspirin to reduce the heart attack in the future. In sum, people ever diagnosed with either Angina or Coronary Heart Disease or Heart Attack are more likely to take Aspirin as preventive practice against the heart attack in the future.
summarise(group_by(brfss2013, cvdinfr4), count = n(), perc_col = count/nrow(brfss2013)*100)
## # A tibble: 3 × 3
## cvdinfr4 count perc_col
## <fctr> <int> <dbl>
## 1 Yes 29284 5.9547557
## 2 No 459904 93.5191907
## 3 NA 2587 0.5260536
summarise(group_by(brfss2013, cvdcrhd4), count = n(), perc_col = count/nrow(brfss2013)*100)
## # A tibble: 3 × 3
## cvdcrhd4 count perc_col
## <fctr> <int> <dbl>
## 1 Yes 29064 5.910020
## 2 No 458288 93.190585
## 3 NA 4423 0.899395
summarise(group_by(brfss2013, rduchart), count = n(), perc_col = count/nrow(brfss2013)*100)
## # A tibble: 3 × 3
## rduchart count perc_col
## <fctr> <int> <dbl>
## 1 Yes 40145 8.163286
## 2 No 5595 1.137715
## 3 NA 446035 90.698999
summarise(group_by(brfss2013, cvdinfr4, cvdcrhd4, rduchart), count=n(), perc_col = count/nrow(brfss2013)*100)
## Source: local data frame [27 x 5]
## Groups: cvdinfr4, cvdcrhd4 [?]
##
## cvdinfr4 cvdcrhd4 rduchart count perc_col
## <fctr> <fctr> <fctr> <int> <dbl>
## 1 Yes Yes Yes 3352 0.68161253
## 2 Yes Yes No 96 0.01952112
## 3 Yes Yes NA 10505 2.13613949
## 4 Yes No Yes 2630 0.53479742
## 5 Yes No No 169 0.03436531
## 6 Yes No NA 11107 2.25855320
## 7 Yes NA Yes 285 0.05795333
## 8 Yes NA No 18 0.00366021
## 9 Yes NA NA 1122 0.22815312
## 10 No Yes Yes 2973 0.60454476
## # ... with 17 more rows
The variable cvdinfr4
is whether the respondent has ever been diagnosed with Heart Attack and cvdcrhd4
with Angian or Coronary Heart Disease. The variable rduchart
is whether the respondent takes Aspirin to reduce the heart attack.
summarise(group_by(brfss2013_rq1 <- brfss2013 %>% mutate(hd_asp = ifelse(cvdcrhd4 == "Yes" & rduchart == "Yes", "Yes", "No")) %>% filter(!is.na(hd_asp), !is.na(cvdinfr4)), hd_asp, cvdinfr4), count = n(), perc_col = count/nrow(brfss2013_rq1)*100)
## Source: local data frame [4 x 4]
## Groups: hd_asp [?]
##
## hd_asp cvdinfr4 count perc_col
## <chr> <fctr> <int> <dbl>
## 1 No Yes 14020 3.0261105
## 2 No No 442956 95.6086864
## 3 Yes Yes 3352 0.7235037
## 4 Yes No 2973 0.6416995
The newly added variable hd_asp
is whether the respondent has been diagnosed with Heart Disease and take Aspirin to reduce heart attack.
Research quesion 2: First of all, we need to define the definition of “heavy smoker”. We define them as those who smoke everyday and more than 100 in entire life. The variables in question for definition of heavy smoker are smokday2
and smoke100
, respectively. The variable cvdinfr4
is whether respondents have ever diagnosed with Heart Attack
First, have a look at the structure and the summary of variables at hand.
brfss2013 %>%
select(cvdinfr4, smokday2, smoke100) %>%
str()
## 'data.frame': 491775 obs. of 3 variables:
## $ cvdinfr4: Factor w/ 2 levels "Yes","No": 2 2 2 2 2 2 2 2 2 2 ...
## $ smokday2: Factor w/ 3 levels "Every day","Some days",..: 3 NA 2 NA 3 NA 3 1 NA NA ...
## $ smoke100: Factor w/ 2 levels "Yes","No": 1 2 1 2 1 2 1 1 2 2 ...
brfss2013 %>%
select(cvdinfr4, smokday2, smoke100) %>%
summary()
## cvdinfr4 smokday2 smoke100
## Yes : 29284 Every day : 55163 Yes :215201
## No :459904 Some days : 21494 No :261654
## NA's: 2587 Not at all:138135 NA's: 14920
## NA's :276983
The variables are all categorical variables, a couple with “yes” or “no” answers like cvdinfr4
and smoke100
and smokday2
with everyday
, someday
and not at all
. A large number of NA answer is witnessed smokday2
. Even though there is very large number of the NA, it does no good to have NA included in the analysis.
Next step would be to find out the average incidence of heart attack among those who are 18 y.o. or older as well as the incidence among group of people of interest; those smoking everyday with mor than 100 cigarettes in entire life.
# Incidence of Heart Attack
summarise(group_by(brfss2013, cvdinfr4), count=n(), per_col = count/nrow(brfss2013)*100)
## # A tibble: 3 × 3
## cvdinfr4 count per_col
## <fctr> <int> <dbl>
## 1 Yes 29284 5.9547557
## 2 No 459904 93.5191907
## 3 NA 2587 0.5260536
ggplot(brfss2013, aes(x=cvdinfr4))+geom_bar()
summarise(group_by(brfss2013 %>% filter(cvdinfr4 != "Don't know"), cvdinfr4), count=n(), per_col = count/nrow(brfss2013)*100)
## # A tibble: 2 × 3
## cvdinfr4 count per_col
## <fctr> <int> <dbl>
## 1 Yes 29284 5.954756
## 2 No 459904 93.519191
# Incidence of those smoking everyday
summarise(group_by(brfss2013, smokday2), count=n(), per_col = count/nrow(brfss2013)*100)
## # A tibble: 4 × 3
## smokday2 count per_col
## <fctr> <int> <dbl>
## 1 Every day 55163 11.217122
## 2 Some days 21494 4.370698
## 3 Not at all 138135 28.089065
## 4 NA 276983 56.323115
ggplot(brfss2013, aes(x=smokday2))+geom_bar()
summarise(group_by(brfss2013 %>% filter(smokday2 != "Don't know"), smokday2), count=n(), per_col = count/nrow(brfss2013)*100)
## # A tibble: 3 × 3
## smokday2 count per_col
## <fctr> <int> <dbl>
## 1 Every day 55163 11.217122
## 2 Some days 21494 4.370698
## 3 Not at all 138135 28.089065
# Incidence of those ever smoke 100+ cigarettes in entire life
summarise(group_by(brfss2013, smoke100), count=n(), per_col = count/nrow(brfss2013)*100)
## # A tibble: 3 × 3
## smoke100 count per_col
## <fctr> <int> <dbl>
## 1 Yes 215201 43.760053
## 2 No 261654 53.206039
## 3 NA 14920 3.033908
ggplot(brfss2013, aes(x=smoke100))+geom_bar()
summarise(group_by(brfss2013 %>% filter(smoke100 != "Don't know"), smoke100), count=n(), per_col = count/nrow(brfss2013)*100)
## # A tibble: 2 × 3
## smoke100 count per_col
## <fctr> <int> <dbl>
## 1 Yes 215201 43.76005
## 2 No 261654 53.20604
As we already covered in research question 1, the average incidence of ever diagnosed with heart attack among age of 18+ adults is approximately 5.9%. About 11.2% of 18+ adults currently smokes everyday and 43.8% smoke more than 100 cigarettes in their entire life.
It seemed that those currently smoking everyday are more likely to smoke more than 100 cigarettes in their enitre life. But we need to check if it is true. In the meantime, we need to create a new variable heavy smoker
with combination of both variables in order to explore the research question.
brfss2013 <- brfss2013 %>% mutate(smk_every=ifelse(smokday2 == "Every day", "Yes", "No"))
summarise(group_by(brfss2013 %>% mutate(heavy_smoke=ifelse(smk_every == smoke100, "Heavy", "Not")), heavy_smoke), count=n(), per_col=count/nrow(brfss2013)*100)
## # A tibble: 3 × 3
## heavy_smoke count per_col
## <chr> <int> <dbl>
## 1 Heavy 55161 11.21671
## 2 Not 159629 32.45976
## 3 <NA> 276985 56.32352
summarise(group_by(brfss2013 %>% mutate(heavy_smoke=ifelse(smokday2 == "Every day" & smoke100 == "Yes", "Heavy", "Not")), heavy_smoke), count=n(), per_col=count/nrow(brfss2013)*100)
## # A tibble: 3 × 3
## heavy_smoke count per_col
## <chr> <int> <dbl>
## 1 Heavy 55161 11.216715
## 2 Not 421283 85.665802
## 3 <NA> 15331 3.117483
brfss2013 <- brfss2013 %>% mutate(heavy_smoke=ifelse(smokday2 == "Every day" & smoke100 == "Yes", "Heavy", "Not"))
ggplot(brfss2013, aes(x=heavy_smoke))+geom_bar()
According to the analysis, the incidence of those currently smoking everyday and more than 100 cigarettes in their entire life is about 11.2%. This seems to concur the assumption made before.
Now, let’s investigate if heavy smokers are more likely to be diagnosed with heart attack than usual
summarise(group_by(brfss2013 %>% mutate(heavy_smoke=ifelse(smk_every == smoke100, "Heavy", "Not")), cvdinfr4, heavy_smoke), count=n(), per_col=count/nrow(brfss2013)*100)
## Source: local data frame [9 x 4]
## Groups: cvdinfr4 [?]
##
## cvdinfr4 heavy_smoke count per_col
## <fctr> <chr> <int> <dbl>
## 1 Yes Heavy 4016 0.81663362
## 2 Yes Not 14292 2.90620711
## 3 Yes <NA> 10976 2.23191500
## 4 No Heavy 50820 10.33399420
## 5 No Not 144426 29.36830868
## 6 No <NA> 264658 53.81688780
## 7 NA Heavy 325 0.06608713
## 8 NA Not 911 0.18524732
## 9 NA <NA> 1351 0.27471913
Concluding form the result of 0.8%, the heavy smokers seem to have less incidence of ever diagnosed with heart attack than usual.
Research quesion 3: First of all, males drinking 2 drinks per day and females 1 drink per day would be defined as a heavy drinker. We have 3 separate calculated variables to identify them X_rfdrhv4
for males and females, X_rfdrmn4
for males only and X_rfdrwm4
for females. Finally, binge drinkers are defined as having more than 5 drinks on an occasion for males and 4+ for females. And the data is calculated and stored in the variable X_rfbing5
.
First, have a look at the structure and the summary of variables at hand along with the bar charts per each.
brfss2013 %>%
select(X_rfdrhv4, X_rfdrmn4, X_rfdrwm4, X_rfbing5) %>%
str()
## 'data.frame': 491775 obs. of 4 variables:
## $ X_rfdrhv4: Factor w/ 2 levels "No","Yes": 1 1 2 1 1 1 1 1 1 1 ...
## $ X_rfdrmn4: Factor w/ 2 levels "No","Yes": NA NA NA NA 1 NA NA NA 1 NA ...
## $ X_rfdrwm4: Factor w/ 2 levels "No","Yes": 1 1 2 1 NA 1 1 1 NA 1 ...
## $ X_rfbing5: Factor w/ 2 levels "No","Yes": 1 1 2 1 1 1 1 1 1 1 ...
brfss2013 %>%
select(X_rfdrhv4, X_rfdrmn4, X_rfdrwm4, X_rfbing5) %>%
summary()
## X_rfdrhv4 X_rfdrmn4 X_rfdrwm4 X_rfbing5
## No :442359 No :178553 No :263803 No :409195
## Yes : 25533 Yes : 11868 Yes : 13665 Yes : 58849
## NA's: 23883 NA's:301354 NA's:214307 NA's: 23731
ggplot(brfss2013, aes(x=X_rfdrhv4))+geom_bar()
ggplot(brfss2013, aes(x=X_rfdrmn4))+geom_bar()
ggplot(brfss2013, aes(x=X_rfdrwm4))+geom_bar()
ggplot(brfss2013, aes(x=X_rfbing5))+geom_bar()
All the variables are categorical with “Yes”, “No” and “NA”. Now let’s find out the average incidence of heavy drikning and binge drinking.
# Incidence of heavy alcohol drinking
summarise(group_by(brfss2013 %>% filter(X_rfdrhv4 != "Don't know"), X_rfdrhv4), count=n(), per_col=count/nrow(brfss2013)*100)
## # A tibble: 2 × 3
## X_rfdrhv4 count per_col
## <fctr> <int> <dbl>
## 1 No 442359 89.951502
## 2 Yes 25533 5.192009
# Incidence of male heavy alcohol drinking
summarise(group_by(brfss2013 %>% filter(X_rfdrmn4 != "Don't know"), X_rfdrmn4), count=n(), per_col=count/nrow(brfss2013)*100)
## # A tibble: 2 × 3
## X_rfdrmn4 count per_col
## <fctr> <int> <dbl>
## 1 No 178553 36.307864
## 2 Yes 11868 2.413299
# Incidence of female heavy alcohol drinking
summarise(group_by(brfss2013 %>% filter(X_rfdrwm4 != "Don't know"), X_rfdrwm4), count=n(), per_col=count/nrow(brfss2013)*100)
## # A tibble: 2 × 3
## X_rfdrwm4 count per_col
## <fctr> <int> <dbl>
## 1 No 263803 53.64303
## 2 Yes 13665 2.77871
# Incidence of bing drinking
summarise(group_by(brfss2013 %>% filter(X_rfbing5 != "Don't know"), X_rfbing5), count=n(), per_col=count/nrow(brfss2013)*100)
## # A tibble: 2 × 3
## X_rfbing5 count per_col
## <fctr> <int> <dbl>
## 1 No 409195 83.20777
## 2 Yes 58849 11.96665
About 5.2% of people drink alcohol heavily. The proportion of male heavy drinkers is a bit lower than that of female heavy drinkers, 2.4% vs. 2.8% respectively. In the meantime, about 12% of adults age of 18+ have binge drinking.
In order to explore if heavy drinkers are more likely to be binge drinkers, the next step is to build a sort of contingency table in combination of “heavy drinking” and “binge drinking”. It is good to have separate analysis to see any relationships between heavy drinking on different gender and bing drinking occasions.
summarise(group_by(brfss2013 %>% filter(X_rfbing5 != "Don't know"), X_rfbing5, X_rfdrhv4), count=n(), per_col=count/nrow(brfss2013)*100)
## Source: local data frame [6 x 4]
## Groups: X_rfbing5 [?]
##
## X_rfbing5 X_rfdrhv4 count per_col
## <fctr> <fctr> <int> <dbl>
## 1 No No 400625 81.4651009
## 2 No Yes 6930 1.4091810
## 3 No NA 1640 0.3334858
## 4 Yes No 39809 8.0949621
## 5 Yes Yes 17826 3.6248284
## 6 Yes NA 1214 0.2468609
Compared to 12% binge drinking incidence, heavy drinkers are less likely to have binge drinking by about 8%. This can be translated as those drinking heavy seem to have a regualr drinks preventing them from having binge drink on an occasion.
summarise(group_by(brfss2013 %>% filter(X_rfbing5 != "Don't know"), X_rfbing5, X_rfdrmn4), count=n(), per_col=count/nrow(brfss2013)*100)
## Source: local data frame [6 x 4]
## Groups: X_rfbing5 [?]
##
## X_rfbing5 X_rfdrmn4 count per_col
## <fctr> <fctr> <int> <dbl>
## 1 No No 152827 31.0766102
## 2 No Yes 1774 0.3607341
## 3 No NA 254594 51.7704235
## 4 Yes No 24698 5.0222154
## 5 Yes Yes 9678 1.9679732
## 6 Yes NA 24473 4.9764628
summarise(group_by(brfss2013 %>% filter(X_rfbing5 != "Don't know"), X_rfbing5, X_rfdrwm4), count=n(), per_col=count/nrow(brfss2013)*100)
## Source: local data frame [6 x 4]
## Groups: X_rfbing5 [?]
##
## X_rfbing5 X_rfdrwm4 count per_col
## <fctr> <fctr> <int> <dbl>
## 1 No No 247797 50.388287
## 2 No Yes 5156 1.048447
## 3 No NA 156242 31.771034
## 4 Yes No 15111 3.072747
## 5 Yes Yes 8148 1.656855
## 6 Yes NA 35590 7.237049
brfss2013 <- brfss2013 %>%
mutate(bing_malehv=ifelse(X_rfbing5=="Yes" & X_rfdrmn4=="Yes", "Male Heavy & Binge", "Not"), bing_femalehv=ifelse(X_rfbing5=="Yes" & X_rfdrwm4=="Yes", "Female Heavy & Binge", "Not"))
ggplot(brfss2013, aes(x=bing_malehv))+geom_bar()
ggplot(brfss2013, aes(x=bing_femalehv))+geom_bar()
As we assumed at research question, more male heavy drinkers take binge drink onan occasion than female heavy drinkers, 2.0% to 1.7%.