Motive of this Data Analysis Project is to investigate three questions using the BRFSS-2013 dataset.
Behavioural Risk Factor Surveillance System (BRFSS) is an ongoing surveillance system, designed to measure a uniform and state-specific ‘preventive health practices’ and ‘behavioral risk factors’ for the non-institutionalized adult (above 18yrs) population residing in the US. BRFSS dataset is the result of an observational study and NO causal inference can be drawn from this survey.
Thus, the inference that can be drawn from it, at best can be an association between two or more variables, which can further be used to form a hypothesis that can be verified using separate randomized experiments built on four principles, namely Controlling, Randomization, Replication and Blocking.
BRFSS conducts both land line telephone- and cellular telephone-based surveys. In conducting the BRFSS land line telephone survey, interviewers collect data from a randomly selected adult in a household. In conducting the cellular telephone version of the BRFSS questionnaire, interviewers collect data from an adult who participates by using a cellular telephone and resides in a private residence or college housing. The BRFSS questionnaire is comprised of an annual standard core, a biannual rotating core, optional modules, and state-added questions. In order to maintain consistency across states, the BRFSS has set standard protocols for data collection. These standardised protocols allow for state-to-state data comparison applicable uniformly on the entire population in the United States of America.
Many steps such as standardised protocols, Disproportionate Stratified Sampling (DSS), sampling land line numbers based on sub-state geographic locations, two steps weighting process namely, ‘design’ weighting and ‘iterative proportional fitting’ weighting are important statistical processes to remove bias in the sample.
COVID19 pandemic has hit the world hard. It has taken toll on the economy and the human life alike. But the biggest sufferers are those who are elderly or and suffer co-morbid diseases. Most widely prevalent among the co-morbid diseases are diabetes and hyper-tension. As a keen observer and practitioner of public policy, through this research paper, using BRFSS 2013 data, using three separate set of mandated questions, I have made an attempt, to draw an association between these two co-morbid conditions (diabetes and hyper-tension) with the following three set of variables namely:-
income2 with 6 factor levels.It has been recorded as an educa categorical variable in the data set.
employ1, The question for the BRFSS 2013 data posed was- Are you currently…?Further, I have made an attempt, within each set of questions, to explore any association of different results for males and females.
As far as the identification of those suffering from Diabetes and Hyper Tension (High BP) is concerned, my ‘Data Analysis Research Project’ takes into account two variables; one from the Main Section 5 on Hypertension Awareness and the other from Main Section 7 on Chronic Health Conditions. These are bpmeds recording those ‘Currently taking Blood Pressure Medication’ and diabete3 recording those ‘(Ever Told) You Have Diabetes’
Research question 1: Does the proportion of repondents with incidence of having both, the diabetes and the hypertension together, vary with income groups? Does it increase or decrease with rising income? Within each of these groups, how does it vary among males and females? Knowing fully well that the answers will not infer any causal relationship, can we identify any association between the explanatory (income2) and response variables (diabete3 bpmeds and sex )?
Research Question 2: Does the proportion of respondents with incidence of having both, the diabetes and the hypertension together, vary with levels of education? These education levels are College 4 years or more (College graduate), College 1 year to 3 years (Some college, Grade 12 or GED (High school graduate), Grades 9 through 11 (Some high school), Grades 1 through 8 (Elementary), Never attended school or only kindergarten. Does it increase or decrease with the higher levels of education? Within each of these levels, how does it vary among males and females? Knowing fully well that the answers will not infer any causal relationship, can we identify any association between the explanatory (educa) and response variables (diabete3 bpmeds and sex )?
Research quesion 3: Does the proportion of respondents with incidence of having both, the diabetes and the hypertension together, vary with ’Status of Employment? Does it increase or decrease with status of employment? Does out of work for a long period, or retirement, or unable to work result in higher proportion of Hypertension and Diabetic patients? Within each of these groups, how does it vary among males and females? Knowing fully well that the answers will not infer any causal relationship, can we identify any association between the explanatory (employ1) and response variables (diabete3 bpmeds and sex )?
This Data Analysis Project will attempt to answer three above mentioned questions. Thus the variables required to attempt answering these questions are
bpmeds diabete3 income2 sex employ1 educa
Since, I have estimated the section of population suffering from diabetes and hypertension together, I have added a variable which takes into account only such respondents, who have answered ONLY “Yes” for both, ‘Currently taking Blood Pressure Medication’ and ‘(Ever Told) You Have Diabetes’.
Thus this does not capture pre-diabetes or borderline diabetes or even those females, who have been told “You Have Diabetes” during pregnancy. This numbers 62,345 out of total 491,773 surveyed.
For Blood Pressure, my research paper captures only those who have answered as “Yes” on being asked, if they are currently taking medicine for high blood pressure. This numbers 166,155 out of total 491,773 surveyed.
I have cleaned the NA data, either by removing them or adding a variable as mentioned above, which contains values of 0 and 1 only.
Before moving to specifics, an estimation should be made with regards to the following general concepts
GQ.a <- brfss2013 %>%
select(bpmeds) %>%
mutate(High_BP = ifelse(bpmeds == "Yes", 1, 0)) %>%
summarise(Total = n(), High_BP = sum(High_BP %in% 1), High_BP_Prop = (High_BP / Total), .groups = 'drop')
GQ.a$High_BP_Prop #is the proportion among the total surveyed,who are currently taking medicine for High Blood Pressure.## [1] 0.3379
Thus, 33.79% of those surveyed, are currently on Blood Pressure medicine.
GQ.b <- brfss2013 %>%
select(diabete3) %>%
mutate(Diabetic = ifelse(diabete3 == "Yes", 1, 0)) %>%
summarise(Total = n(), Diabetic = sum(Diabetic %in% 1),
Diabetic_Prop = (Diabetic / Total), .groups = 'drop')
GQ.b$Diabetic_Prop #is the proportion among the total surveyed, who have been told ## [1] 0.1268
Thus 12.68% of those surveyed have been told“You Have Diabetes.”
GQ.c <- brfss2013 %>%
select(diabete3, bpmeds) %>%
mutate(HighBPDiab = ifelse(bpmeds == "Yes" &
diabete3 == "Yes", 1, 0)) %>%
summarise(Total = n(), HighBPDiab = sum(HighBPDiab %in% 1),
HighBPDiab_Prop = (HighBPDiab / Total), .groups = 'drop')
GQ.c$HighBPDiab_Prop #is the proportion among the total surveyed, who have been told "You Have Diabetes". Excluding all others who have not answered "Yes".## [1] 0.08895
Thus 8.90% of those surveyed have been told“You Have Diabetes” and are also currently taking medicines for High Blood Pressure.
GQ.d <- brfss2013 %>%
filter(!is.na(sex)) %>%
select(sex, diabete3, bpmeds) %>%
mutate(HighBPDiab = ifelse(bpmeds == "Yes" &
diabete3 == "Yes", 1, 0)) %>%
group_by(sex) %>%
summarise(Total = n(), HighBPDiab = sum(HighBPDiab %in% 1),
HighBPDiab_Prop = (HighBPDiab / Total), .groups = 'drop')
GQ.d## # A tibble: 2 x 4
## sex Total HighBPDiab HighBPDiab_Prop
## <fct> <int> <int> <dbl>
## 1 Male 201313 18244 0.0906
## 2 Female 290455 25501 0.0878
8.90% of all those surveyed have Diabetes and are also currently taking medicines for High Blood Pressure. But among the Males, it is marginally higher at 9.06% and Females it is marginally lower at 8.78%. Even though in absolute numbers, it is higher for females.
1) 33.79% of those surveyed, are currently on Blood Pressure medicine.
2) 12.68% of those surveyed have been told “You Have Diabetes.”
3) 8.90% of those surveyed have been told“You Have Diabetes” and are also currently taking medicines for High Blood Pressure.
4) But among the Males, it is marginally higher at 9.06% and Females it is marginally lower at 8.78%. Even though in absolute numbers, it is higher for females.
Research Question 1: Does the proportion of respondents with incidence of having both, the diabetes and the hypertension together, vary with income groups? Does it increase or decrease with rising income? Within each of these groups, how does it vary among males and females? Knowing fully well that the answers will not infer any causal relationship, can we identify any association between the explanatory (income2) and response variables (diabete3 bpmeds and sex )?
RQ1.1 <- brfss2013 %>%
filter(!is.na(sex), !is.na(income2)) %>%
select(sex, bpmeds, diabete3, income2) %>%
mutate(HighBPDiab = ifelse(bpmeds == "Yes" &
diabete3 == "Yes", 1, 0)) %>%
group_by(income2) %>%
summarise(Total = n(), HighBPDiab = sum(HighBPDiab %in% 1),
HighBPDiab_Prop = (HighBPDiab / Total), .groups = 'drop')
colnames(RQ1.1) <- c("Income_Groups", "Total", "HighBPDiab", "HighBPDiab_Prop")
RQ1.1## # A tibble: 8 x 4
## Income_Groups Total HighBPDiab HighBPDiab_Prop
## <fct> <int> <int> <dbl>
## 1 Less than $10,000 25441 3627 0.143
## 2 Less than $15,000 26793 4146 0.155
## 3 Less than $20,000 34873 4605 0.132
## 4 Less than $25,000 41732 4895 0.117
## 5 Less than $35,000 48867 5105 0.104
## 6 Less than $50,000 61509 5236 0.0851
## 7 Less than $75,000 65231 4314 0.0661
## 8 $75,000 or more 115902 4955 0.0428
Except for one group of those earning below $10,000, as the income levels goes up, the proportion BP and Diabetes in the surveyed population comes down.This can be very clearly observed in the plotted graph also.
But before plotting a graph, let us calculate the total proportion of those who suffer from both the High BP and Diabetes in our data-set RQ2.1.
## [1] 0.08774
As it can be seen, the proportion of persons in our RQ1.1 data set, suffering from both Hypertension and Diabetes is 0.08774. We will put it as a vertical blue dash line on the graph to understand the relative positions of various groups.
RQ1.1 <- brfss2013 %>%
filter(!is.na(sex), !is.na(income2)) %>%
select(sex, bpmeds, diabete3, income2) %>%
mutate(HighBPDiab = ifelse(bpmeds == "Yes" &
diabete3 == "Yes", 1, 0)) %>%
group_by(income2) %>%
summarise(Total = n(), HighBPDiab = sum(HighBPDiab %in% 1),
HighBPDiab_Prop = (HighBPDiab / Total), .groups = 'drop')
RQ1.1 %>% ggplot(aes(x = HighBPDiab_Prop,
y = income2)) +
geom_bar(stat = "identity",fill = "#FF6666") +
geom_vline(xintercept = .08774, size = 2, linetype = "dashed",
col = "blue") + scale_x_continuous(limits = c(0, .20)) +
scale_y_discrete(name = "Income",
labels = c("Less than $10,000" = "Less Than $10,000",
"Less than $15,000" = "$10,000 - $15,000",
"Less than $20,000" = "$15,000 - $20,000",
"Less than $25,000" = "$20,000 - $25,000",
"Less than $35,000" = "$25,000 - $35,000",
"Less than $50,000" = "$35,000 - $50,000",
"Less than $75,000" = "$50,000 - $75,000",
"$75,000 or more" = "More Than $75,000")) +
geom_text(aes(label = HighBPDiab),size = 4,
position = position_fill(vjust = .01)) + labs(title = "Graph (1.1) BP & Diabetic in
Income Groups",
y = "Income Groups", x = "Proportion of BP and Diabetic Patients") +
theme_economist()As it can be seen in the Graph (1.1), though the dotted blue line, that the mean value of total surveyed population is .08895. But, as soon as the income level increases from $35,000, the proportion of BP and Diabetic decreases drastically. It is as low as 4.28% for income group above $75,00 as compared to 15.47% for the income group between $10,000 and $15,000. The numbers inside the graph within each group denotes the number of patients suffering from High BP and Diabetes within that group.
Let us now compare it among Males and Females within these income groups.
RQ1.2 <- brfss2013 %>%
filter(!is.na(sex), !is.na(income2)) %>%
select(sex, bpmeds, diabete3, income2) %>%
mutate(HighBPDiab = ifelse(bpmeds == "Yes" &
diabete3 == "Yes", 1, 0)) %>%
group_by(income2, sex) %>%
summarise(Total = n(), HighBPDiab = sum(HighBPDiab %in% 1),
HighBPDiab_Prop = (HighBPDiab / Total), .groups = 'drop')
RQ1.2## # A tibble: 16 x 5
## income2 sex Total HighBPDiab HighBPDiab_Prop
## <fct> <fct> <int> <int> <dbl>
## 1 Less than $10,000 Male 8296 994 0.120
## 2 Less than $10,000 Female 17145 2633 0.154
## 3 Less than $15,000 Male 9207 1364 0.148
## 4 Less than $15,000 Female 17586 2782 0.158
## 5 Less than $20,000 Male 12562 1653 0.132
## 6 Less than $20,000 Female 22311 2952 0.132
## 7 Less than $25,000 Male 15734 1947 0.124
## 8 Less than $25,000 Female 25998 2948 0.113
## 9 Less than $35,000 Male 19628 2149 0.109
## 10 Less than $35,000 Female 29239 2956 0.101
## 11 Less than $50,000 Male 26817 2565 0.0956
## 12 Less than $50,000 Female 34692 2671 0.0770
## 13 Less than $75,000 Male 29405 2325 0.0791
## 14 Less than $75,000 Female 35826 1989 0.0555
## 15 $75,000 or more Male 56537 3111 0.0550
## 16 $75,000 or more Female 59365 1844 0.0311
RQ1.2 %>% ggplot(aes(fill = sex, y = income2,
x = HighBPDiab_Prop)) +
geom_bar(position = "fill", stat = "identity") +
geom_vline(xintercept = .5, size = 2, linetype = "dashed",
col = "blue") + geom_text(aes(label = HighBPDiab),size = 4,
position = position_fill(vjust = .01)) +
scale_y_discrete(name = "Income",
labels = c("Less than $10,000" = "< $10,000",
"Less than $15,000" = "$10,000 - $15,000",
"Less than $20,000" = "$15,000 - $20,000",
"Less than $25,000" = "$20,000 - $25,000",
"Less than $35,000" = "$25,000 - $35,000",
"Less than $50,000" = "$35,000 - $50,000",
"Less than $75,000" = "$50,000 - $75,000",
"$75,000 or more" = ">= $75,000")) +
labs(title = "Graph (1.2) BP & Diabetic in
Income Groups", y = "Income Groups", x = "Proportion of BP and Diabetic Patients") +
theme_economist() As it can be seen in Graph (1.2), the proportion of female BP and Diabetic patients increases steadily, as we move down the income groups. The numbers inside each group in the graph above, denotes absolute numbers of Males and Females suffering from diabetes and Hypertension both.
Research Question 2:
Does the proportion of respondents with incidence of having both, the diabetes and the hypertension together, vary with levels of education? These education levels are College 4 years or more (College graduate), College 1 year to 3 years (Some college, Grade 12 or GED (High school graduate), Grades 9 through 11 (Some high school), Grades 1 through 8 (Elementary), Never attended school or only kindergarten. Does it increase or decrease with the higher levels of education? Within each of these levels, how does it vary among males and females? Knowing fully well that the answers will not infer any causal relationship, can we identify any association between the explanatory (educa) and response variables (diabete3 bpmeds and sex )?
RQ2.1 <- brfss2013 %>%
filter(!is.na(sex), !is.na(educa)) %>%
select(sex, bpmeds,diabete3, educa) %>%
mutate(HighBPDiab =
ifelse(bpmeds == "Yes" &
diabete3 == "Yes", 1, 0)) %>%
group_by(educa) %>%
summarise(Total = n(), HighBPDiab = sum(HighBPDiab %in% 1),
HighBPDiab_Prop = (HighBPDiab / Total),
.groups = "drop") %>%
arrange(desc(HighBPDiab_Prop))
RQ2.1## # A tibble: 6 x 4
## educa Total HighBPDiab HighBPDiab_Prop
## <fct> <int> <int> <dbl>
## 1 Grades 1 through 8 (Elementary) 13395 2357 0.176
## 2 Never attended school or only kindergarten 674 109 0.162
## 3 Grades 9 though 11 (Some high school) 28141 4127 0.147
## 4 Grade 12 or GED (High school graduate) 142971 15437 0.108
## 5 College 1 year to 3 years (Some college or ~ 134196 11795 0.0879
## 6 College 4 years or more (College graduate) 170120 9730 0.0572
As it can be seen from the table, a strong co-relation between the increasing levels of education and decreasing proportion of High BP and Diabetes patients.For 4 or years or more college graduates, the proportion is just .0572, whereas the proportion of High BP and Diabetic is as high as .176 for grade 1 through grade 8 education levels.
But before plotting a graph, let us calculate the total proportion of those who suffer from both the High BP and Diabetes in our data-set RQ2.1.
## [1] 0.08898
As it can be seen, the proportion of persons in our RQ2.1 data set, suffering from both Hypertension and Diabetes is 0.08898.We will put it as a vertical blue dash line on the graph to understand the relative positions of various groups.
Let us try to plot it on graph (2.1).
RQ2.1 <- brfss2013 %>%
filter(!is.na(sex), !is.na(educa)) %>%
select(sex, bpmeds,diabete3, educa) %>%
mutate(HighBPDiab =
ifelse(bpmeds == "Yes" &
diabete3 == "Yes", 1, 0)) %>%
group_by(educa) %>%
summarise(Total = n(), HighBPDiab = sum(HighBPDiab %in% 1),
HighBPDiab_Prop = (HighBPDiab / Total),
.groups = "drop") %>%
arrange(desc(HighBPDiab_Prop))
RQ2.1 %>% ggplot(aes(x = HighBPDiab_Prop,
y = educa)) +
geom_bar(stat = "identity",fill = "#FF6666") +
geom_vline(xintercept = .08898, size = 2, linetype = "dashed",
col = "blue") + scale_x_continuous(limits = c(0, .20)) +
scale_y_discrete(labels = c("College 4 years or more (College graduate)" = "4yr or more College Graduate",
"College 1 year to 3 years (Some college or technical school)" = "1-3yr College or Technical Schl",
"Grade 12 or GED (High school graduate)" = "High School Graduate",
"Grades 9 though 11 (Some high school)" = "High School",
"Grades 1 through 8 (Elementary)" = "Elementary",
"Never attended school or only kindergarten" = "No School or Kindergarden")) +
geom_text(aes(label = HighBPDiab),size = 4,
position = position_fill(vjust = .01)) +
labs(title =
"Graph(2.1) BP&Diabetic
For Education Levels", y = "Education", x = "Proportion of BP and Diabetic Patients") +
theme_economist()As it can be seen in the Graph (2.1), though the dotted blue line, that the mean value of total surveyed population is .08898. But, as soon as the education levels decreases, the proportion of BP and Diabetic increases drastically.For 4 or years or more college graduates, the proportion is just .0572, whereas the proportion of High BP and Diabetic is as high as .176 for grade 1 through grade 8 education levels. The numbers inside the graph within each group denotes the number of patients suffering from High BP and Diabetes within that group.
Let us now compare it among Males and Females within these education levels.
RQ2.2 <- brfss2013 %>%
filter(!is.na(sex), !is.na(educa)) %>%
select(sex, bpmeds,diabete3, educa) %>%
mutate(HighBPDiab =
ifelse(bpmeds == "Yes" &
diabete3 == "Yes", 1, 0)) %>%
group_by(educa, sex) %>%
summarise(Total = n(), HighBPDiab = sum(HighBPDiab %in% 1),
HighBPDiab_Prop = (HighBPDiab / Total),
.groups = "drop") %>%
arrange(desc(HighBPDiab_Prop))
RQ2.2 %>% ggplot(aes(fill = sex, y = educa,
x = HighBPDiab_Prop)) +
geom_bar(position = "fill", stat = "identity") +
geom_vline(xintercept = .5, size = 2, linetype = "dashed",
col = "blue") + geom_text(aes(label = HighBPDiab),size = 4,
position = position_fill(vjust = .01)) +
scale_y_discrete(labels = c("College 4 years or more (College graduate)" = "4yr+ College Graduate",
"College 1 year to 3 years (Some college or technical school)" = "1-3yr College/TSchl",
"Grade 12 or GED (High school graduate)" = "HS Graduate",
"Grades 9 though 11 (Some high school)" = "High School",
"Grades 1 through 8 (Elementary)" = "Elementary",
"Never attended school or only kindergarten" = "KG/No School")) +
geom_text(aes(label = HighBPDiab),size = 4,
position = position_fill(vjust = .01)) +
labs(title =
"Graph(2.2) BP&Diabetic
For Education Levels", y = "Education", x = "Proportion of BP and Diabetic Patients") +
theme_economist()As it can be seen in Graph(2.2), as we slide down the education levels among the respondents, the proportion of female patients steadily goes up (Except for KG level education, which has a very small number in total). Its proportion is as high as 19.4% in elementary and just 4.82% among the highest level of educated persons. The numbers inside each group in the graph above, denotes absolute numbers of Males and Females suffering from diabetes and Hypertension both.
Research Question 3:
Does the proportion of respondents with incidence of having both, the diabetes and the hypertension together, vary with ’Status of Employment? Does it increase or decrease with status of employment? Does out of work for a long period, or retirement, or unable to work result in higher proportion of Hypertension and Diabetic patients? Within each of these groups, how does it vary among males and females? Knowing fully well that the answers will not infer any causal relationship, can we identify any association between the explanatory (employ1) and response variables (diabete3 bpmeds and sex )?
RQ3.1 <- brfss2013 %>%
filter(!is.na(sex), !is.na(employ1)) %>%
select(sex, bpmeds,diabete3, employ1) %>%
mutate(HighBPDiab =
ifelse(bpmeds == "Yes" &
diabete3 == "Yes", 1, 0)) %>%
group_by(employ1) %>%
summarise(Total = n(), HighBPDiab = sum(HighBPDiab %in% 1),
HighBPDiab_Prop = (HighBPDiab / Total),
.groups = "drop") %>%
arrange(desc(HighBPDiab_Prop))
RQ3.1## # A tibble: 8 x 4
## employ1 Total HighBPDiab HighBPDiab_Prop
## <fct> <int> <int> <dbl>
## 1 Unable to work 37453 8256 0.220
## 2 Retired 138259 20936 0.151
## 3 Out of work for 1 year or more 14073 1142 0.0811
## 4 A homemaker 31646 2377 0.0751
## 5 Out of work for less than 1 year 12241 587 0.0480
## 6 Self-employed 39832 1739 0.0437
## 7 Employed for wages 202200 8348 0.0413
## 8 A student 12682 103 0.00812
As the above table clearly shows that the ‘unable to work’ with 22.03% and ‘retired persons’ with 15.14% are the two groups of people who have increased the total proportion of BP and Diabetic patients.
But before plotting a graph, let us calculate the total proportion of those who suffer from both the High BP and Diabetes in our data-set RQ3.1.
## [1] 0.08904
As it can be seen, the proportion of persons in our RQ3.1 data set, suffering from both Hypertension and Diabetes is 0.08904.We will put it as a vertical blue dash line on the graph to understand the relative positions of various groups.
The ‘unable to work’ with 22.03% and ‘retired persons’ with 15.14% are the two groups which are much higher than the entire 8.90%. Let us examine this on the graph below.
RQ3.1 <- brfss2013 %>%
filter(!is.na(sex), !is.na(employ1)) %>%
select(sex, bpmeds,diabete3, employ1) %>%
mutate(HighBPDiab =
ifelse(bpmeds == "Yes" &
diabete3 == "Yes", 1, 0)) %>%
group_by(employ1) %>%
summarise(Total = n(), HighBPDiab = sum(HighBPDiab %in% 1),
HighBPDiab_Prop = (HighBPDiab / Total),
.groups = "drop") %>%
arrange(desc(HighBPDiab_Prop))
RQ3.1 %>% ggplot(aes(x = HighBPDiab_Prop,
y = employ1)) +
geom_bar(stat = "identity",fill = "#FF6666") +
geom_vline(xintercept = .089904, size = 2, linetype = "dashed",
col = "blue") + scale_x_continuous(limits = c(0, .23)) +
scale_y_discrete(labels = c("Unable to work" = "Unable to Work",
"Retired" = "Retired",
"Out of work for 1 year or more" = "Out of Work 1yr+",
"A homemaker" = "Homemaker",
"Out of work for less than 1 year" = "Out of Work less 1yr",
"Self-employed" = "Self-employed",
"Employed for wages" = "Employed for Wages",
"A student" = "student")) +
geom_text(aes(label = HighBPDiab),size = 4,
position = position_fill(vjust = .01)) +
labs(title ="Graph(3.1) BP&Diabetic
For Employment Status", y = "Employment Status", x = "Proportion of BP and Diabetic Patients") +
theme_economist()Graph (3.1) clearly depicts how unable to work and retired groups are way beyond normal proportions of diabetic and High BP patients. Dashed blue line depicts the average proportion of all the respondents, which is .089904.The numbers inside the graph within each group denotes the number of patients suffering from High BP and Diabetes within that group.
Let us now compare it among Males and Females within these Employment Status levels.
RQ3.2 <- brfss2013 %>%
filter(!is.na(sex), !is.na(employ1)) %>%
select(sex, bpmeds,diabete3, employ1) %>%
mutate(HighBPDiab =
ifelse(bpmeds == "Yes" &
diabete3 == "Yes", 1, 0)) %>%
group_by(employ1, sex) %>%
summarise(Total = n(), HighBPDiab = sum(HighBPDiab %in% 1),
HighBPDiab_Prop = (HighBPDiab / Total),
.groups = "drop")
RQ3.2## # A tibble: 16 x 5
## employ1 sex Total HighBPDiab HighBPDiab_Prop
## <fct> <fct> <int> <int> <dbl>
## 1 Employed for wages Male 91055 3941 0.0433
## 2 Employed for wages Female 111145 4407 0.0397
## 3 Self-employed Male 23081 1193 0.0517
## 4 Self-employed Female 16751 546 0.0326
## 5 Out of work for 1 year or more Male 5830 506 0.0868
## 6 Out of work for 1 year or more Female 8243 636 0.0772
## 7 Out of work for less than 1 year Male 5709 264 0.0462
## 8 Out of work for less than 1 year Female 6532 323 0.0494
## 9 A homemaker Male 610 31 0.0508
## 10 A homemaker Female 31036 2346 0.0756
## 11 A student Male 5382 41 0.00762
## 12 A student Female 7300 62 0.00849
## 13 Retired Male 54893 9242 0.168
## 14 Retired Female 83366 11694 0.140
## 15 Unable to work Male 13367 2945 0.220
## 16 Unable to work Female 24086 5311 0.221
RQ3.2 <- brfss2013 %>%
filter(!is.na(sex), !is.na(employ1)) %>%
select(sex, bpmeds,diabete3, employ1) %>%
mutate(HighBPDiab =
ifelse(bpmeds == "Yes" &
diabete3 == "Yes", 1, 0)) %>%
group_by(employ1, sex) %>%
summarise(Total = n(), HighBPDiab = sum(HighBPDiab %in% 1),
HighBPDiab_Prop = (HighBPDiab / Total),
.groups = "drop") %>%
arrange(desc(HighBPDiab_Prop))
RQ3.2 %>% ggplot(aes(fill = sex, y = employ1,
x = HighBPDiab_Prop)) +
geom_bar(position = "fill", stat = "identity") +
geom_vline(xintercept = .5, size = 2, linetype = "dashed",
col = "blue") + geom_text(aes(label = HighBPDiab),size = 4,
position = position_fill(vjust = .01)) +
scale_y_discrete(labels = c("Unable to work" = "Unable to Work",
"Retired" = "Retired",
"Out of work for 1 year or more" = "Out of Work 1yr+",
"A homemaker" = "Homemaker",
"Out of work for less than 1 year" = "Out of Work less 1yr",
"Self-employed" = "Self-employed",
"Employed for wages" = "Employed for Wages",
"A student" = "student")) +
labs(title =
"Graph(3.2) BP&Diabetic
For Employment Status", y = "Employment Status", x = "Proportion of BP and Diabetic Patients") +
theme_economist()Among Homemakers, the proportion of Female suffering from High BP and Diabetes is 0.075590 and among Males is 0.0508. The numbers inside each group in the graph above, denotes absolute numbers of Males and Females suffering from diabetes and Hypertension both.
The proportion of persons in our RQ3.1 data set, suffering from both Hypertension and Diabetes is 0.08904.The ‘unable to work’ with 22.03% and ‘retired persons’ with 15.14% are the two groups which are much higher than the average.
Among Homemakers, the proportion of Female suffering from High BP and Diabetes is 0.075590 and among Males is 0.0508.This group has disproportionately high female proportion of High BP and Diabetic patients.