Setup

setwd("D:/stat")
getwd()
## [1] "D:/stat"

Load packages

library(ggplot2)
library(dplyr)

Load data

load("brfss2013.Rdata")

Part 1: Research questions

Come up with at least three research questions that you want to answer using these data. You should phrase your research questions in a way that matches up with the scope of inference your dataset allows for. Make sure that at least two of these questions involve at least three variables. You are welcomed to create new variables based on existing ones. With each question include a brief discussion (1-2 sentences) as to why this question is of interest to you and/or your audience.

Part 2: Exploratory data analysis

Perform exploratory data analysis (EDA) that addresses each of the three research questions you outlined above. Your EDA should contain numerical summaries and visualizations. Each R output and plot should be accompanied by a brief interpretation.

Research question 1: Do most men having asthma also have diabetes?

(Some studies reveal that patients diagnosed with diabetes are at increased risk for Asthma. Additionally, people with diabetes have increased insulin resistance and metabolic syndrome, two conditions that can increase the risk of asthma.)

Research question 2: What is the total percentage of divorced women in Florida? (According to Centers for Disease Control and Prevention, 3.5 people got divorced per 1,000 people in Florida in the year 2019. In this question, I want to examine the difference of the given data.)

Research question 3: What are the BMI percentage of male and female in terms of Underweight, Normal Weight, Overweight, and Obese? (Study reveals that women tend to have more body fat than men.)

Part 3: Exploratory data analysis

Perform exploratory data analysis (EDA) that addresses each of the three research questions you outlined above. Your EDA should contain numerical summaries and visualizations. Each R output and plot should be accompanied by a brief interpretation.

Research question 1: Do most men having asthma also have diabetes?

str(select(brfss2013,sex,asthma3,diabete3))
## 'data.frame':    491775 obs. of  3 variables:
##  $ sex     : Factor w/ 2 levels "Male","Female": 2 2 2 2 1 2 2 2 1 2 ...
##  $ asthma3 : Factor w/ 2 levels "Yes","No": 1 2 2 2 1 2 2 2 2 2 ...
##  $ diabete3: Factor w/ 4 levels "Yes","Yes, but female told only during pregnancy",..: 3 3 3 3 3 3 3 3 3 3 ...
asthmatic<- brfss2013 %>%
  filter(asthma3 == "Yes")
diabetic<- brfss2013 %>%
  filter(diabete3 == "Yes")
asthmatic2<-nrow(asthmatic)
diabetic2<-nrow(diabetic)
diabetic2
## [1] 62363
asthmatic2
## [1] 67204
brfss2013 %>% 
  filter(sex != "Female", asthma3 !="NA", asthma3 =="Yes", diabete3 != "NA") %>%
  group_by(sex, asthma3, diabete3) %>%
  summarise(count=n())%>%
  mutate(Percentage=round((count/sum(count))*100,2))
## `summarise()` has grouped output by 'sex', 'asthma3'. You can override using
## the `.groups` argument.
## # A tibble: 3 × 5
## # Groups:   sex, asthma3 [1]
##   sex   asthma3 diabete3                                count Percentage
##   <fct> <fct>   <fct>                                   <int>      <dbl>
## 1 Male  Yes     Yes                                      3395      15.4 
## 2 Male  Yes     No                                      18203      82.6 
## 3 Male  Yes     No, pre-diabetes or borderline diabetes   446       2.02
x = c(15.40,84.60)
labels = c("Asthma w/ Diabetes 15.4%","Asthma w/o Diabetes 84.6%") 

pie(x,labels, col= c('yellow', 'black'), main = "Asthma and Diabetes Diagnostic Percentage of Male") 

INTERPRETATION: The data above shows that not most men with asthma also have diabetes. It turns out that further research shows that the effect of asthma on diabetes does not seem to be significant, except for in patients with severe asthma.

Research question 2: What are the marital status percentage of women in Florida?

str(select(brfss2013, marital, sex, X_state))
## 'data.frame':    491775 obs. of  3 variables:
##  $ marital: Factor w/ 6 levels "Married","Divorced",..: 2 1 1 1 1 2 1 3 1 1 ...
##  $ sex    : Factor w/ 2 levels "Male","Female": 2 2 2 2 1 2 2 2 1 2 ...
##  $ X_state: Factor w/ 55 levels "0","Alabama",..: 2 2 2 2 2 2 2 2 2 2 ...
Marital<- brfss2013 %>%
  filter(marital != "NA")
State<- brfss2013 %>%
  filter(X_state == "Florida")
Sex<- brfss2013 %>%
  filter(sex == "Female")
Marital1<-nrow(Marital)
State1<-nrow(State)
Sex1<-nrow(Sex)
Marital1
## [1] 488355
State1
## [1] 33668
Sex1
## [1] 290455
brfss2013 %>% 
  filter(sex == "Female", marital !="NA", X_state == "Florida") %>%
  group_by(marital, sex, X_state) %>%
  summarise(count=n())%>%
  mutate(percentage=round((count/sum(count))*100))
## `summarise()` has grouped output by 'marital', 'sex'. You can override using
## the `.groups` argument.
## # A tibble: 6 × 5
## # Groups:   marital, sex [6]
##   marital                         sex    X_state count percentage
##   <fct>                           <fct>  <fct>   <int>      <dbl>
## 1 Married                         Female Florida  9270        100
## 2 Divorced                        Female Florida  3340        100
## 3 Widowed                         Female Florida  4776        100
## 4 Separated                       Female Florida   611        100
## 5 Never married                   Female Florida  1882        100
## 6 A member of an unmarried couple Female Florida   500        100
x = c(45.49,16.39,23.43,3.00,9.24,2.45)
labels = c("Married 45.49%","Divorced 16.39%","Widowed 23.43%","Separated 3%","Never married 9.24%","Unmarried couple 2.45%") 
pie(x,labels, col= c('orange', 'khaki','tan','gray','white','brown'), main = "Marital Status Percentage of Women in Florida") 

INTERPRETAION: The data above shows that the total count of divorced in Florida is 3,340 out of 20,379 women equal to 16.39%.

Research question 3: What are the BMI percentage of male and female in terms of Underweight, Normal Weight, Overweight, and Obese?

str(select(brfss2013,sex,X_bmi5cat))
## 'data.frame':    491775 obs. of  2 variables:
##  $ sex      : Factor w/ 2 levels "Male","Female": 2 2 2 2 1 2 2 2 1 2 ...
##  $ X_bmi5cat: Factor w/ 4 levels "Underweight",..: 4 1 3 2 4 4 2 NA 4 3 ...
brfss2013 %>%
  filter(X_bmi5cat != "NA", sex!="NA") %>%
  group_by(X_bmi5cat, sex) %>%
  summarise(count=n())%>%
  mutate(Percentage=round((count/sum(count))*100,2))
## `summarise()` has grouped output by 'X_bmi5cat'. You can override using the
## `.groups` argument.
## # A tibble: 8 × 4
## # Groups:   X_bmi5cat [4]
##   X_bmi5cat     sex     count Percentage
##   <fct>         <fct>   <int>      <dbl>
## 1 Underweight   Male     1907       23.1
## 2 Underweight   Female   6359       76.9
## 3 Normal weight Male    53045       34.2
## 4 Normal weight Female 101852       65.8
## 5 Overweight    Male    84759       50.7
## 6 Overweight    Female  82325       49.3
## 7 Obese         Male    57494       42.6
## 8 Obese         Female  77305       57.4
x = c(23.07,76.93)
labels = c("Male 23.07%","Female 76.93%") 

pie(x,labels, col= c('tan', 'maroon'), main = "Underweight")

x = c(34.25,66.75)
labels = c("Male 34.25%","Female 65.75%") 

pie(x,labels, col= c('white', 'khaki'), main = "Normal Weight")

x = c(50.73,49.27)
labels = c("Male 50.73%","Female 49.27") 

pie(x,labels, col= c('violet', 'pink'), main = "Overweight")

x = c(42.65,57.35)
labels = c("Male 42.65%","Female 57.35") 

pie(x,labels, col= c('gray', 'black'), main = "Obese")

INTERPRETATION: The data above shows that female leads the percentage of Overweight and Obesity. Therefore, it is true that women tend to have more body fat than men based on the data given.