Setup

getwd()
## [1] "C:/Users/user/Desktop/STAT50"

Load packages

library(ggplot2)
library(dplyr)

Load data

load("brfss2013.RData")

Refer to the provided data in our google classroom.

Part 1: Research questions

Come up with at least three research questions that you want to answer using these data. You should phrase your research questions in a way that matches up with the scope of inference your dataset allows for. Make sure that at least two of these questions involve at least three variables. You are welcomed to create new variables based on existing ones. With each question include a brief discussion (1-2 sentences) as to why this question is of interest to you and/or your audience.

Research quesion 1: Do people with arthritis engage in more physical activity or exercise than those people without a diagnose? Research quesion 2: Are Self - employed people, earn more in getting paid by the hour or they earn more in getting paid by salary? Research quesion 3: Does more men in North Carolina have health coverage than women?

Part 3: Exploratory data analysis

Perform exploratory data analysis (EDA) that addresses each of the three research questions you outlined above. Your EDA should contain numerical summaries and visualizations. Each R output and plot should be accompanied by a brief interpretation.

Research quesion 1: Are Employed for wages earns more income than self-employed?

str(select(brfss2013,X_drdxar1,X_totinda))
## 'data.frame':    491775 obs. of  2 variables:
##  $ X_drdxar1: Factor w/ 2 levels "Diagnosed with arthritis",..: 1 2 1 2 2 2 1 1 1 2 ...
##  $ X_totinda: Factor w/ 2 levels "Had physical activity or exercise",..: 2 1 2 1 2 1 1 1 1 1 ...

The above mentioned will serve as our variables in the data. The employment status is defined in 8 levels, and at the same time how much they earned.

First let us check how many people who were employed for wages and self-employed in our data.

total_obs <- nrow(brfss2013)

brfss2013 %>%
  group_by(X_drdxar1) %>%
  summarise(count=n(), percentage=n()*100/total_obs)
## # A tibble: 3 × 3
##   X_drdxar1                     count percentage
##   <fct>                         <int>      <dbl>
## 1 Diagnosed with arthritis     165151     33.6  
## 2 Not diagnosed with arthritis 323648     65.8  
## 3 <NA>                           2976      0.605

In our data, 33.6 % of 491775 respondents in the data are diagnosed with arthritis, 65.8 % are not diagnosed, while 0.61 % are neither.

ggplot(brfss2013, aes(y=X_drdxar1)) + geom_bar() + ggtitle('Diagnosis Status')  + theme_update()

almost 33 % among the total population had been diagnosed with arthritis and almose 35 % had been not.

levels(brfss2013$X_totinda)[1] <- c('Had exercise')
levels(brfss2013$X_totinda)[2] <- c('Had not exercise (last 30 days)')
brfss2013 %>%
  group_by(X_totinda) %>%
  summarise(count=n(), percentage=n()*100/total_obs)
## # A tibble: 3 × 3
##   X_totinda                        count percentage
##   <fct>                            <int>      <dbl>
## 1 Had exercise                    332461      67.6 
## 2 Had not exercise (last 30 days) 125280      25.5 
## 3 <NA>                             34034       6.92

About 67.6 % of our respondents had exercise. Around 25.48 % of our respondents had not not been exercising for last 30 days, and 6.92 % had neither.

brfss2013 %>%
  group_by(X_drdxar1, X_totinda) %>%
  summarise(count=n(), percentage=n()*100/total_obs)
## `summarise()` has grouped output by 'X_drdxar1'. You can override using the
## `.groups` argument.
## # A tibble: 9 × 4
## # Groups:   X_drdxar1 [3]
##   X_drdxar1                    X_totinda                        count percentage
##   <fct>                        <fct>                            <int>      <dbl>
## 1 Diagnosed with arthritis     Had exercise                    101397    20.6   
## 2 Diagnosed with arthritis     Had not exercise (last 30 days)  54218    11.0   
## 3 Diagnosed with arthritis     <NA>                              9536     1.94  
## 4 Not diagnosed with arthritis Had exercise                    229293    46.6   
## 5 Not diagnosed with arthritis Had not exercise (last 30 days)  70159    14.3   
## 6 Not diagnosed with arthritis <NA>                             24196     4.92  
## 7 <NA>                         Had exercise                      1771     0.360 
## 8 <NA>                         Had not exercise (last 30 days)    903     0.184 
## 9 <NA>                         <NA>                               302     0.0614

The tables shows us how many Diagnosed with arthritis had been exercising and didn’t exercise. It also shows us how many respondents that hadn’t been diagnosed with arthritis had been exercising, and didn’t, and NA means that respondents had or had not been diagnosed which shows also how many of them in the data.

rq1_table <-table(brfss2013$X_drdxar1,brfss2013$X_totinda, useNA = "no")
rq1_table
##                               
##                                Had exercise Had not exercise (last 30 days)
##   Diagnosed with arthritis           101397                           54218
##   Not diagnosed with arthritis       229293                           70159

This is the tabular representation of the data above. It’s a little hard to look at the number and quickly understand what proportions of Diagnosed with arthritis and not been diagnosed have been exercising or not. We’ll calculate those proportions, and make sure that the rows sum to 1.

prop.table(rq1_table,1)
##                               
##                                Had exercise Had not exercise (last 30 days)
##   Diagnosed with arthritis        0.6515889                       0.3484111
##   Not diagnosed with arthritis    0.7657087                       0.2342913
a <-brfss2013 %>%
  filter(X_drdxar1!="NA", X_totinda!= "NA") %>%
  group_by(X_drdxar1,X_totinda)
levels(brfss2013$X_totinda)[1] <- c('Had exercise')
levels(brfss2013$X_totinda)[2] <- c('Had not exercise (last 30 days)')
ggplot(a, aes(fill=X_drdxar1, x = X_totinda )) + geom_bar(position = 'dodge')

According to the data above, those without arthritis diagnosis engaged in more physical activity and exercise than people diagnosed with arthritis. On the other hand, those without arthritis diagnosis also does not engage in physical activity and exercise in the last 30 days than those, again, with diagnosis.

Research quesion 2: Are Self-employed who are getting paid by the hour is satisfied than self-employed getting paid by salary?

str(select(brfss2013,employ1,scntpaid,carercvd))
## 'data.frame':    491775 obs. of  3 variables:
##  $ employ1 : Factor w/ 8 levels "Employed for wages",..: 7 1 1 7 7 1 1 7 7 5 ...
##  $ scntpaid: Factor w/ 4 levels "Paid by salary",..: NA 2 2 NA NA 1 2 NA NA NA ...
##  $ carercvd: Factor w/ 3 levels "Very satisfied",..: 1 1 1 2 1 1 1 1 1 1 ...
brfss2013 %>%
  filter(employ1!= "NA")%>%
  group_by(employ1) %>%
  summarise(count=n(), percentage=n()*100/total_obs)
## # A tibble: 8 × 3
##   employ1                           count percentage
##   <fct>                             <int>      <dbl>
## 1 Employed for wages               202200      41.1 
## 2 Self-employed                     39832       8.10
## 3 Out of work for 1 year or more    14074       2.86
## 4 Out of work for less than 1 year  12242       2.49
## 5 A homemaker                       31647       6.44
## 6 A student                         12682       2.58
## 7 Retired                          138259      28.1 
## 8 Unable to work                    37453       7.62

According to the data, 39,832 among the respondents are self-employed, that is 8.1 % of our total respondents in our data. Our only focus is only on a respondent that are Self-Employed.

brfss2013 %>%
  filter(scntpaid != "NA")%>%
  group_by(scntpaid) %>%
  summarise(count=n(), percentage=n()*100/total_obs)
## # A tibble: 4 × 3
##   scntpaid               count percentage
##   <fct>                  <int>      <dbl>
## 1 Paid by salary         13240      2.69 
## 2 Paid by the hour       14300      2.91 
## 3 Paid by the job / task  2752      0.560
## 4 Paid some other way     1912      0.389

13,240 of our respondents are getting paid by salary, that is 2.700 % of our total respondents and 2.908 % are getting paid by the hour, 14,300 of our respondents to be exact.

rq2_table <- table(brfss2013$employ1,brfss2013$scntpaid, useNA = "no")

rq2_table
##                                   
##                                    Paid by salary Paid by the hour
##   Employed for wages                        12370            13445
##   Self-employed                               862              837
##   Out of work for 1 year or more                0                0
##   Out of work for less than 1 year              0                0
##   A homemaker                                   0                0
##   A student                                     0                0
##   Retired                                       0                0
##   Unable to work                                0                0
##                                   
##                                    Paid by the job / task Paid some other way
##   Employed for wages                                  725                 644
##   Self-employed                                      2025                1259
##   Out of work for 1 year or more                        0                   0
##   Out of work for less than 1 year                      0                   0
##   A homemaker                                           0                   0
##   A student                                             0                   0
##   Retired                                               0                   0
##   Unable to work                                        0                   0

The data shows that 862 of Self-employed respondents are paid by salary, and 837 of them are paid by the hour. Our only question is how many of the Self-employed respondents that are paid by salary and paid by the hour are earning $75,000 or more.

 rq2_table <- table(brfss2013$employ1,brfss2013$scntpaid,brfss2013$carercvd, useNA = "no")

rq2_table
## , ,  = Very satisfied
## 
##                                   
##                                    Paid by salary Paid by the hour
##   Employed for wages                         8230             7657
##   Self-employed                               520              505
##   Out of work for 1 year or more                0                0
##   Out of work for less than 1 year              0                0
##   A homemaker                                   0                0
##   A student                                     0                0
##   Retired                                       0                0
##   Unable to work                                0                0
##                                   
##                                    Paid by the job / task Paid some other way
##   Employed for wages                                  416                 357
##   Self-employed                                      1150                 739
##   Out of work for 1 year or more                        0                   0
##   Out of work for less than 1 year                      0                   0
##   A homemaker                                           0                   0
##   A student                                             0                   0
##   Retired                                               0                   0
##   Unable to work                                        0                   0
## 
## , ,  = Somewhat satisfied
## 
##                                   
##                                    Paid by salary Paid by the hour
##   Employed for wages                         3060             4026
##   Self-employed                               225              223
##   Out of work for 1 year or more                0                0
##   Out of work for less than 1 year              0                0
##   A homemaker                                   0                0
##   A student                                     0                0
##   Retired                                       0                0
##   Unable to work                                0                0
##                                   
##                                    Paid by the job / task Paid some other way
##   Employed for wages                                  211                 189
##   Self-employed                                       554                 340
##   Out of work for 1 year or more                        0                   0
##   Out of work for less than 1 year                      0                   0
##   A homemaker                                           0                   0
##   A student                                             0                   0
##   Retired                                               0                   0
##   Unable to work                                        0                   0
## 
## , ,  = Not at all satisfied
## 
##                                   
##                                    Paid by salary Paid by the hour
##   Employed for wages                          223              484
##   Self-employed                                20               40
##   Out of work for 1 year or more                0                0
##   Out of work for less than 1 year              0                0
##   A homemaker                                   0                0
##   A student                                     0                0
##   Retired                                       0                0
##   Unable to work                                0                0
##                                   
##                                    Paid by the job / task Paid some other way
##   Employed for wages                                   32                  26
##   Self-employed                                        77                  44
##   Out of work for 1 year or more                        0                   0
##   Out of work for less than 1 year                      0                   0
##   A homemaker                                           0                   0
##   A student                                             0                   0
##   Retired                                               0                   0
##   Unable to work                                        0                   0

Our data is set but it is cumbersome to look at the numbers, especially we are looking only on self-employed respondents getting paid by salary and by the hour. Let’s make a graph using the data we gathered.

abc <- c(520,505,225,223,20,40,2904,35395)
labels <- c("Self-employed Paid by salary - VS","Self-employed Paid by the hour - VS", "Self-employed Paid by salary - SS", "Self-employed Paid by the hour - SS","Self-employed Paid by salary - NS","Self-employed Paid by the hour-NS", "Other method of getting paid", "Number of Nas")
piepercent <- round(100*abc/sum(abc),1)
pie(abc, labels = piepercent, main = "Total Self-employment status", col = rainbow(length(abc)))
legend("topright", c("Self-employed Paid by salary - VS","Self-employed Paid by the hour - VS", "Self-employed Paid by salary - SS", "Self-employed Paid by the hour - SS","Self-employed Paid by salary - NS","Self-employed Paid by the hour-NS", "Other method of getting paid", "Total Na's"), cex = 0.6, fill = rainbow(length(abc)))

The data above shows the total number of self-employed respondents. We need to exclude the number of Na’s in order for us to understand what is going on in the data.

abc <- c(520,505,225,223,20,40,2904)
labels <- c("Self-employed Paid by salary - VS","Self-employed Paid by the hour - VS", "Self-employed Paid by salary - SS", "Self-employed Paid by the hour - SS","Self-employed Paid by salary - NS","Self-employed Paid by the hour-NS", "Other method of getting paid")
piepercent <- round(100*abc/sum(abc),1)
pie(abc, labels = piepercent, main = "Total Self-employment status excluding NAs", col = rainbow(length(abc)))
legend("topright", c("Self-employed Paid by salary - VS","Self-employed Paid by the hour - VS", "Self-employed Paid by salary - SS", "Self-employed Paid by the hour - SS","Self-employed Paid by salary - NS","Self-employed Paid by the hour-NS", "Other method of getting paid"), cex = 0.591, fill = rainbow(length(abc)))

The data above looks promising, and we can get information, regarding to our question. As we look at the data above, 11.7 % of self-employed respondents that are getting paid by salary are very satisfied than those getting paid by the hour which is 11.4 %. 5.1 % of self-employed respondents that are getting paid by the hour are somehow satisfied than those getting paid by the hour which is 5 % only. 0.5 % of self employed respondents that are getting paid by the salary are not at all satisfied, compare to those who are getting paid by the hour which 0.9 %, a little higher.

Research question 3: Does more men in North Carolina have health coverage than women?

str(select(brfss2013,sex, X_hcvu651, X_state))
## 'data.frame':    491775 obs. of  3 variables:
##  $ sex      : Factor w/ 2 levels "Male","Female": 2 2 2 2 1 2 2 2 1 2 ...
##  $ X_hcvu651: Factor w/ 2 levels "Have health care coverage",..: 1 1 1 1 NA 1 1 1 1 NA ...
##  $ X_state  : Factor w/ 55 levels "0","Alabama",..: 2 2 2 2 2 2 2 2 2 2 ...

All of the above are categorical variable. Sex of the respondents is defined in 2 levels, while are there more male that have health care coverage than female in North Carolina.

To begin, let’s check out our selected variables individually.

brfss2013%>%
  group_by(sex)%>%
  summarise(count=n(), percentage=n()*100/total_obs)
## # A tibble: 3 × 3
##   sex     count percentage
##   <fct>   <int>      <dbl>
## 1 Male   201313   40.9    
## 2 Female 290455   59.1    
## 3 <NA>        7    0.00142
ggplot(brfss2013, aes(x=sex)) + geom_bar() + ggtitle('Men vs Women') + theme_linedraw() 

40.94 % of our respondents are male, while 59.06 % are female. There are some missing (NA) values too which we’ll deal with later as they don’t make much sense with our analysis.

brfss2013%>%
  group_by(X_hcvu651)%>%
  summarise(count=n(), percentage = n()*100/total_obs)
## # A tibble: 3 × 3
##   X_hcvu651                         count percentage
##   <fct>                             <int>      <dbl>
## 1 Have health care coverage        272546       55.4
## 2 Do not have health care coverage  52631       10.7
## 3 <NA>                             166598       33.9
ggplot(brfss2013, aes(x=X_hcvu651)) + geom_bar() + ggtitle('respondents that have health care coverage  and do not') + theme_linedraw() 

More than 55.42 % of our respondents have health care coverage, while 10.70 % do not. There are also some missing (NA) values too which we’ll deal with later as they don’t make much sense with our analysis.

brfss2013%>%
  filter(X_hcvu651 != "NA")%>%
  group_by(X_hcvu651)%>%
  summarise(count=n(), percentage = n()*100/total_obs)
## # A tibble: 2 × 3
##   X_hcvu651                         count percentage
##   <fct>                             <int>      <dbl>
## 1 Have health care coverage        272546       55.4
## 2 Do not have health care coverage  52631       10.7
rq4_table <- table(brfss2013$sex, brfss2013$X_hcvu651, useNA = "no")
rq4_table
##         
##          Have health care coverage Do not have health care coverage
##   Male                      115138                            24581
##   Female                    157408                            28050
prop.table(rq4_table)
##         
##          Have health care coverage Do not have health care coverage
##   Male                  0.35407793                       0.07559268
##   Female                0.48406868                       0.08626071
mosaicplot(prop.table(rq4_table,1),main='Male and Female', xlab='Male and Female', ylab='Health Care Coverage Status')

According to the data, there are more male who have health care coverage than female.

brfss2013 %>%
  group_by(X_state)%>%
  summarise(count=n(), percentage = n()*100/total_obs)
## # A tibble: 55 × 3
##    X_state              count percentage
##    <fct>                <int>      <dbl>
##  1 0                        1   0.000203
##  2 Alabama               6505   1.32    
##  3 Alaska                4578   0.931   
##  4 Arizona               4253   0.865   
##  5 Arkansas              5268   1.07    
##  6 California           11518   2.34    
##  7 Colorado             13649   2.78    
##  8 Connecticut           7710   1.57    
##  9 Delaware              5206   1.06    
## 10 District of Columbia  4931   1.00    
## # … with 45 more rows

Next is to look up for North Carolina, since our data sample population is in North Carolina.

brfss2013 %>%
  group_by(X_state)%>%
  filter(X_state == "North Carolina")%>%
  summarise(count=n(), percentage = n()*100/total_obs)
## # A tibble: 1 × 3
##   X_state        count percentage
##   <fct>          <int>      <dbl>
## 1 North Carolina  8860       1.80

8860 of our respondents, or 1.80 % of our total population is from North Carolina, Let us see how many of males and females in North Carolina have health coverage. Let us check the distribution of all our variables.

brfss2013 %>%
  group_by(sex,X_hcvu651,X_state)%>%
  filter(X_state == "North Carolina")%>%
  filter(X_hcvu651 != "NA", sex != "NA")%>%
  summarise(count=n(), percentage = n()*100/total_obs)
## `summarise()` has grouped output by 'sex', 'X_hcvu651'. You can override using
## the `.groups` argument.
## # A tibble: 4 × 5
## # Groups:   sex, X_hcvu651 [4]
##   sex    X_hcvu651                        X_state        count percentage
##   <fct>  <fct>                            <fct>          <int>      <dbl>
## 1 Male   Have health care coverage        North Carolina  1855      0.377
## 2 Male   Do not have health care coverage North Carolina   550      0.112
## 3 Female Have health care coverage        North Carolina  2705      0.550
## 4 Female Do not have health care coverage North Carolina   708      0.144

As we see in the data shown, 1855 of male in North Carolina, have health coverage, that is about 0.38 % of our data, while, 2705 of of female in North Carolina have health coverage. For transparency, let us confirm our data above by using a table() for sex, X_hcvu651 and X_state.

rq5_table <- table(brfss2013$sex, brfss2013$X_hcvu651, brfss2013$X_state, useNA = "no")
rq5_table
## , ,  = 0
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                           0                                0
##   Female                         0                                0
## 
## , ,  = Alabama
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1187                              255
##   Female                      2169                              420
## 
## , ,  = Alaska
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1340                              384
##   Female                      1627                              292
## 
## , ,  = Arizona
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                         883                              246
##   Female                      1205                              281
## 
## , ,  = Arkansas
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                         990                              280
##   Female                      1425                              432
## 
## , ,  = California
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        3100                              769
##   Female                      3741                              766
## 
## , ,  = Colorado
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        3430                              766
##   Female                      4381                              800
## 
## , ,  = Connecticut
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        2019                              295
##   Female                      2650                              276
## 
## , ,  = Delaware
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1171                              214
##   Female                      1766                              200
## 
## , ,  = District of Columbia
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1216                              101
##   Female                      1741                               92
## 
## , ,  = Florida
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        5636                             1957
##   Female                      8853                             2493
## 
## , ,  = Georgia
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1854                              581
##   Female                      2887                              763
## 
## , ,  = Hawaii
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        2296                              340
##   Female                      2592                              291
## 
## , ,  = Idaho
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1230                              338
##   Female                      1611                              428
## 
## , ,  = Illinois
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1400                              268
##   Female                      1827                              249
## 
## , ,  = Indiana
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        2435                              522
##   Female                      3127                              620
## 
## , ,  = Iowa
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1967                              242
##   Female                      2686                              253
## 
## , ,  = Kansas
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        5604                             1170
##   Female                      7304                             1381
## 
## , ,  = Kentucky
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        2226                              524
##   Female                      3797                              808
## 
## , ,  = Louisiana
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                         958                              240
##   Female                      1706                              405
## 
## , ,  = Maine
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1913                              379
##   Female                      2674                              355
## 
## , ,  = Maryland
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        2983                              389
##   Female                      4690                              469
## 
## , ,  = Massachusetts
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        4027                              302
##   Female                      5698                              265
## 
## , ,  = Michigan
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        3186                              626
##   Female                      4083                              601
## 
## , ,  = Minnesota
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        3971                              499
##   Female                      5125                              461
## 
## , ,  = Mississippi
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1349                              417
##   Female                      2226                              653
## 
## , ,  = Missouri
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1489                              314
##   Female                      2149                              435
## 
## , ,  = Montana
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        2376                              598
##   Female                      2864                              599
## 
## , ,  = Nebraska
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        4043                              677
##   Female                      5387                              832
## 
## , ,  = Nevada
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1144                              360
##   Female                      1528                              399
## 
## , ,  = New Hampshire
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1402                              250
##   Female                      1971                              290
## 
## , ,  = New Jersey
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        3580                              813
##   Female                      4738                              825
## 
## , ,  = New Mexico
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        2139                              659
##   Female                      2759                              777
## 
## , ,  = New York
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        2244                              482
##   Female                      3110                              406
## 
## , ,  = North Carolina
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1855                              550
##   Female                      2705                              708
## 
## , ,  = North Dakota
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        2227                              328
##   Female                      2463                              253
## 
## , ,  = Ohio
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        2905                              531
##   Female                      4031                              610
## 
## , ,  = Oklahoma
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1763                              425
##   Female                      2478                              573
## 
## , ,  = Oregon
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1394                              372
##   Female                      1714                              367
## 
## , ,  = Pennsylvania
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        2902                              494
##   Female                      3602                              475
## 
## , ,  = Rhode Island
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1458                              300
##   Female                      2256                              353
## 
## , ,  = South Carolina
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        2245                              590
##   Female                      3150                              760
## 
## , ,  = South Dakota
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1851                              285
##   Female                      2266                              271
## 
## , ,  = Tennessee
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1127                              301
##   Female                      1941                              419
## 
## , ,  = Texas
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        2264                              863
##   Female                      2956                             1222
## 
## , ,  = Utah
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        3614                              709
##   Female                      4329                              716
## 
## , ,  = Vermont
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1601                              244
##   Female                      2228                              159
## 
## , ,  = Virginia
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        2111                              418
##   Female                      2886                              492
## 
## , ,  = Washington
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        2653                              571
##   Female                      3425                              624
## 
## , ,  = West Virginia
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1402                              365
##   Female                      1834                              467
## 
## , ,  = Wisconsin
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1770                              312
##   Female                      2172                              242
## 
## , ,  = Wyoming
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1342                              315
##   Female                      1798                              390
## 
## , ,  = Guam
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                         524                              194
##   Female                       737                              212
## 
## , ,  = Puerto Rico
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                        1342                              157
##   Female                      2340                              120
## 
## , ,  = 80
## 
##         
##          Have health care coverage Do not have health care coverage
##   Male                           0                                0
##   Female                         0                                0

As we compare our previous data in the data above, both of them are the same, hence, it is true.

wer <-data.frame(category=c("Men w/ HCC", "Men w/o HCC", "Women w/ HCC", "Women w/o HCC"), count= c(1855, 550, 2705, 708))
wer$fraction = wer$count/ sum(wer$count)
wer$ymax = cumsum(wer$fraction)
wer$ymin = c(0, head(wer$ymax, n = -1))
wer$labelposition <- (wer$ymax + wer$ymin) /2
wer$label <- paste0(wer$category, "\n", wer$count)
ggplot(wer, aes(ymax=ymax, ymin = ymin, xmax =4, xmin = 3, fill = category)) + geom_rect() + geom_label(x=3.5, aes(y=labelposition, label=label), size=3) + scale_fill_brewer(palette = 4) + coord_polar(theta = "y") + xlim(c(1,4)) + theme_void() + theme(legend.position = "none") + ggtitle("Men w/ Health Care Coverage & Women w/ Health Care Coverage")

Looking at the summary statistics and visualization, we can wee than a lot of female in North Carolina have health care coverage than male.