Exploring the BRFSS data

Part 3: Exploratory data analysis

Perform exploratory data analysis (EDA) that addresses each of the three research questions you outlined above. Your EDA should contain numerical summaries and visualizations. Each R output and plot should be accompanied by a brief interpretation.

Research question 1:

str(select(brfss2013,asthma3,smoke100))

## 'data.frame':    491775 obs. of  2 variables:
##  $ asthma3 : Factor w/ 2 levels "Yes","No": 1 2 2 2 1 2 2 2 2 2 ...
##  $ smoke100: Factor w/ 2 levels "Yes","No": 1 2 1 2 1 2 1 1 2 2 ...

asthmatic<- brfss2013 %>%
  filter(asthma3 == "Yes")

asthmatic2<-nrow(asthmatic)

brfss2013 %>% 
  filter(asthma3 != "No", smoke100 != "NA") %>%
  group_by(asthma3, smoke100) %>% 
  summarise(count=n(), percentage=n()*100/asthmatic2)

## `summarise()` has grouped output by 'asthma3'. You can override using the
## `.groups` argument.

## # A tibble: 2 × 4
## # Groups:   asthma3 [1]
##   asthma3 smoke100 count percentage
##   <fct>   <fct>    <int>      <dbl>
## 1 Yes     Yes      32263       48.0
## 2 Yes     No       33148       49.3

Graph

Asmoke<- brfss2013%>%
  filter(asthma3 == "Yes",smoke100 == "Yes", smoke100 != "NA")

AS<- nrow(Asmoke)

Nsmoke<-brfss2013%>%
  filter(asthma3 == "Yes", smoke100 == "No", smoke100 != "NA")

AN<- nrow(Nsmoke)

x<-  c(AS, AN)
labels <-  c("Smokers","Nonsmokers")

pie(x, main = "Asthmatic People",col = rainbow(length(x)))
legend("topright", c("Smokers", "Nonsmokers"), cex = 0.8,
   fill = rainbow(length(x)))

Interpretation: We have seen that 49.3% of people having asthma avoid smoking and 48.0% do smoking, hence we cannot say that most asthmatic people avoid smoking.

Research question 2:

str(select(brfss2013,genhlth,checkup1,income2))

## 'data.frame':    491775 obs. of  3 variables:
##  $ genhlth : Factor w/ 5 levels "Excellent","Very good",..: 4 3 3 2 3 2 4 3 1 3 ...
##  $ checkup1: Factor w/ 5 levels "Within past year",..: 1 1 1 2 4 1 1 1 1 1 ...
##  $ income2 : Factor w/ 8 levels "Less than $10,000",..: 7 8 8 7 6 8 NA 6 8 4 ...

lincome<- brfss2013 %>%
  filter(income2 == "Less than $10,000", checkup1 == "Never")

lincome2<-nrow(lincome)

lincome2

## [1] 405

brfss2013 %>% 
  filter(income2 == "Less than $10,000", checkup1 == "Never", genhlth != "NA") %>%
  group_by(income2, checkup1, genhlth) %>%
  summarise(count=n(), percentage=n()*100/lincome2)

## `summarise()` has grouped output by 'income2', 'checkup1'. You can override
## using the `.groups` argument.

## # A tibble: 5 × 5
## # Groups:   income2, checkup1 [1]
##   income2           checkup1 genhlth   count percentage
##   <fct>             <fct>    <fct>     <int>      <dbl>
## 1 Less than $10,000 Never    Excellent    53       13.1
## 2 Less than $10,000 Never    Very good    68       16.8
## 3 Less than $10,000 Never    Good        127       31.4
## 4 Less than $10,000 Never    Fair         98       24.2
## 5 Less than $10,000 Never    Poor         54       13.3

Graph

L<- brfss2013%>%
  filter(income2 == "Less than $10,000", checkup1 == "Never", genhlth != "NA", genhlth == "Excellent")
M<- brfss2013%>%
  filter(income2 == "Less than $10,000", checkup1 == "Never", genhlth != "NA", genhlth == "Very good")
N<- brfss2013%>%
  filter(income2 == "Less than $10,000", checkup1 == "Never", genhlth != "NA", genhlth == "Good")
O<- brfss2013%>%
  filter(income2 == "Less than $10,000", checkup1 == "Never", genhlth != "NA", genhlth == "Fair")
P<- brfss2013%>%
  filter(income2 == "Less than $10,000", checkup1 == "Never", genhlth != "NA", genhlth == "Poor")

A<-nrow(L)
B<-nrow(M)
C<-nrow(N)
D<-nrow(O)
E<-nrow(P)

x<-  c(A, B, C, D, E)
labels <-  c("Excellent", "Very good", "Good", "Fair", "Poor")

pie(x, main = "General Health of People with Income Less than $10,000 
    that Never Go to Check ups",col = rainbow(length(x)))
legend("topright", c("Excellent", "Very good", "Good", "Fair", "Poor"), cex = 0.8,
   fill = rainbow(length(x)))

Interpretation: In conclusion, it does not generally mean that the people that never goes to check-up due to less income (less than $10,000) will have a poor general health. Moreover, there is only 13.33% of people having less than $10,000 income which never goes to check-up has poor general health.

Research question 3:

str(select(brfss2013, sex, arthdis2, diffwalk))

## 'data.frame':    491775 obs. of  3 variables:
##  $ sex     : Factor w/ 2 levels "Male","Female": 2 2 2 2 1 2 2 2 1 2 ...
##  $ arthdis2: Factor w/ 2 levels "Yes","No": 1 NA 1 NA NA NA 1 2 2 NA ...
##  $ diffwalk: Factor w/ 2 levels "Yes","No": 1 2 1 2 2 2 2 1 2 2 ...

marthrit<- brfss2013 %>%
  filter(sex == "Male", arthdis2 == "Yes")

marthrit2<-nrow(marthrit)

marthrit2

## [1] 16030

brfss2013 %>% 
  filter(sex == "Male", arthdis2 !="NA", arthdis2 == "Yes", diffwalk !="NA") %>%
  group_by(sex, arthdis2, diffwalk) %>%
  summarise(count=n(), percentage=n()*100/marthrit2)

## `summarise()` has grouped output by 'sex', 'arthdis2'. You can override using
## the `.groups` argument.

## # A tibble: 2 × 5
## # Groups:   sex, arthdis2 [1]
##   sex   arthdis2 diffwalk count percentage
##   <fct> <fct>    <fct>    <int>      <dbl>
## 1 Male  Yes      Yes       8725       54.4
## 2 Male  Yes      No        7200       44.9

Graph

Marthdis<- brfss2013%>%
  filter(sex=="Male", arthdis2 !="NA", arthdis2=="Yes", diffwalk !="NA", diffwalk =="Yes")

X<- nrow(Marthdis)

Marthdis2<- brfss2013%>%
  filter(sex=="Male", arthdis2 !="NA", arthdis2=="Yes", diffwalk !="NA", diffwalk =="No")

Y<- nrow(Marthdis2)

x<-  c(X, Y)
labels <-  c("Having Difficulty in Walking","Do not Have Difficulty in Walking")

pie(x, main = "Men with Arthritis",col = rainbow(length(x)))
legend("topright", c("Having Difficulty in Walking","Do not Have Difficulty in Walking"), cex = 0.8,
   fill = rainbow(length(x)))

Interpretation: Therefore we can say that the males that has arthritis usually has difficulty in walking. In fact, 54.4% of males having arthritis has difficulty in walking.

Exploring the BRFSS data

Marsha Ella L. Maceren

2/21/2022

Setup

Load packages

Load data

Refer to the provided data in our google classroom.

Part 1: Research questions

Part 3: Exploratory data analysis