setwd("D:/stat")getwd()## [1] "D:/stat"
library(ggplot2)
library(dplyr)load("brfss2013.Rdata")Come up with at least three research questions that you want to answer using these data. You should phrase your research questions in a way that matches up with the scope of inference your dataset allows for. Make sure that at least two of these questions involve at least three variables. You are welcomed to create new variables based on existing ones. With each question include a brief discussion (1-2 sentences) as to why this question is of interest to you and/or your audience.
Perform exploratory data analysis (EDA) that addresses each of the three research questions you outlined above. Your EDA should contain numerical summaries and visualizations. Each R output and plot should be accompanied by a brief interpretation.
Research question 1: Do most men having asthma also have diabetes?
(Some studies reveal that patients diagnosed with diabetes are at increased risk for Asthma. Additionally, people with diabetes have increased insulin resistance and metabolic syndrome, two conditions that can increase the risk of asthma.)
Research question 2: What is the total percentage of divorced women in Florida? (According to Centers for Disease Control and Prevention, 3.5 people got divorced per 1,000 people in Florida in the year 2019. In this question, I want to examine the difference of the given data.)
Research question 3: What are the BMI percentage of male and female in terms of Underweight, Normal Weight, Overweight, and Obese? (Study reveals that women tend to have more body fat than men.)
Perform exploratory data analysis (EDA) that addresses each of the three research questions you outlined above. Your EDA should contain numerical summaries and visualizations. Each R output and plot should be accompanied by a brief interpretation.
Research question 1: Do most men having asthma also have diabetes?
str(select(brfss2013,sex,asthma3,diabete3))## 'data.frame': 491775 obs. of 3 variables:
## $ sex : Factor w/ 2 levels "Male","Female": 2 2 2 2 1 2 2 2 1 2 ...
## $ asthma3 : Factor w/ 2 levels "Yes","No": 1 2 2 2 1 2 2 2 2 2 ...
## $ diabete3: Factor w/ 4 levels "Yes","Yes, but female told only during pregnancy",..: 3 3 3 3 3 3 3 3 3 3 ...
asthmatic<- brfss2013 %>%
filter(asthma3 == "Yes")
diabetic<- brfss2013 %>%
filter(diabete3 == "Yes")asthmatic2<-nrow(asthmatic)
diabetic2<-nrow(diabetic)diabetic2## [1] 62363
asthmatic2## [1] 67204
brfss2013 %>%
filter(sex != "Female", asthma3 !="NA", asthma3 =="Yes", diabete3 != "NA") %>%
group_by(sex, asthma3, diabete3) %>%
summarise(count=n())%>%
mutate(Percentage=round((count/sum(count))*100,2))## `summarise()` has grouped output by 'sex', 'asthma3'. You can override using
## the `.groups` argument.
## # A tibble: 3 × 5
## # Groups: sex, asthma3 [1]
## sex asthma3 diabete3 count Percentage
## <fct> <fct> <fct> <int> <dbl>
## 1 Male Yes Yes 3395 15.4
## 2 Male Yes No 18203 82.6
## 3 Male Yes No, pre-diabetes or borderline diabetes 446 2.02
x = c(15.40,84.60)
labels = c("Asthma w/ Diabetes 15.4%","Asthma w/o Diabetes 84.6%")
pie(x,labels, col= c('yellow', 'black'), main = "Asthma and Diabetes Diagnostic Percentage of Male") INTERPRETATION: The data above shows that not most men with asthma also have diabetes. It turns out that further research shows that the effect of asthma on diabetes does not seem to be significant, except for in patients with severe asthma.
Research question 2: What are the marital status percentage of women in Florida?
str(select(brfss2013, marital, sex, X_state))## 'data.frame': 491775 obs. of 3 variables:
## $ marital: Factor w/ 6 levels "Married","Divorced",..: 2 1 1 1 1 2 1 3 1 1 ...
## $ sex : Factor w/ 2 levels "Male","Female": 2 2 2 2 1 2 2 2 1 2 ...
## $ X_state: Factor w/ 55 levels "0","Alabama",..: 2 2 2 2 2 2 2 2 2 2 ...
Marital<- brfss2013 %>%
filter(marital != "NA")
State<- brfss2013 %>%
filter(X_state == "Florida")
Sex<- brfss2013 %>%
filter(sex == "Female")Marital1<-nrow(Marital)
State1<-nrow(State)
Sex1<-nrow(Sex)Marital1## [1] 488355
State1## [1] 33668
Sex1## [1] 290455
brfss2013 %>%
filter(sex == "Female", marital !="NA", X_state == "Florida") %>%
group_by(marital, sex, X_state) %>%
summarise(count=n())%>%
mutate(percentage=round((count/sum(count))*100))## `summarise()` has grouped output by 'marital', 'sex'. You can override using
## the `.groups` argument.
## # A tibble: 6 × 5
## # Groups: marital, sex [6]
## marital sex X_state count percentage
## <fct> <fct> <fct> <int> <dbl>
## 1 Married Female Florida 9270 100
## 2 Divorced Female Florida 3340 100
## 3 Widowed Female Florida 4776 100
## 4 Separated Female Florida 611 100
## 5 Never married Female Florida 1882 100
## 6 A member of an unmarried couple Female Florida 500 100
x = c(45.49,16.39,23.43,3.00,9.24,2.45)
labels = c("Married 45.49%","Divorced 16.39%","Widowed 23.43%","Separated 3%","Never married 9.24%","Unmarried couple 2.45%")
pie(x,labels, col= c('orange', 'khaki','tan','gray','white','brown'), main = "Marital Status Percentage of Women in Florida")
INTERPRETAION: The data above shows that the total count of divorced in
Florida is 3,340 out of 20,379 women equal to 16.39%.
Research question 3: What are the BMI percentage of male and female in terms of Underweight, Normal Weight, Overweight, and Obese?
str(select(brfss2013,sex,X_bmi5cat))## 'data.frame': 491775 obs. of 2 variables:
## $ sex : Factor w/ 2 levels "Male","Female": 2 2 2 2 1 2 2 2 1 2 ...
## $ X_bmi5cat: Factor w/ 4 levels "Underweight",..: 4 1 3 2 4 4 2 NA 4 3 ...
brfss2013 %>%
filter(X_bmi5cat != "NA", sex!="NA") %>%
group_by(X_bmi5cat, sex) %>%
summarise(count=n())%>%
mutate(Percentage=round((count/sum(count))*100,2))## `summarise()` has grouped output by 'X_bmi5cat'. You can override using the
## `.groups` argument.
## # A tibble: 8 × 4
## # Groups: X_bmi5cat [4]
## X_bmi5cat sex count Percentage
## <fct> <fct> <int> <dbl>
## 1 Underweight Male 1907 23.1
## 2 Underweight Female 6359 76.9
## 3 Normal weight Male 53045 34.2
## 4 Normal weight Female 101852 65.8
## 5 Overweight Male 84759 50.7
## 6 Overweight Female 82325 49.3
## 7 Obese Male 57494 42.6
## 8 Obese Female 77305 57.4
x = c(23.07,76.93)
labels = c("Male 23.07%","Female 76.93%")
pie(x,labels, col= c('tan', 'maroon'), main = "Underweight")x = c(34.25,66.75)
labels = c("Male 34.25%","Female 65.75%")
pie(x,labels, col= c('white', 'khaki'), main = "Normal Weight")x = c(50.73,49.27)
labels = c("Male 50.73%","Female 49.27")
pie(x,labels, col= c('violet', 'pink'), main = "Overweight")x = c(42.65,57.35)
labels = c("Male 42.65%","Female 57.35")
pie(x,labels, col= c('gray', 'black'), main = "Obese")
INTERPRETATION: The data above shows that female leads the percentage of
Overweight and Obesity. Therefore, it is true that women tend to have
more body fat than men based on the data given.