getwd()## [1] "C:/Users/user/Desktop/STAT50"
library(ggplot2)
library(dplyr)load("brfss2013.RData")Come up with at least three research questions that you want to answer using these data. You should phrase your research questions in a way that matches up with the scope of inference your dataset allows for. Make sure that at least two of these questions involve at least three variables. You are welcomed to create new variables based on existing ones. With each question include a brief discussion (1-2 sentences) as to why this question is of interest to you and/or your audience.
Research quesion 1: Do people with arthritis engage in more physical activity or exercise than those people without a diagnose? Research quesion 2: Are Self - employed people, earn more in getting paid by the hour or they earn more in getting paid by salary? Research quesion 3: Does more men in North Carolina have health coverage than women?
Perform exploratory data analysis (EDA) that addresses each of the three research questions you outlined above. Your EDA should contain numerical summaries and visualizations. Each R output and plot should be accompanied by a brief interpretation.
Research quesion 1: Are Employed for wages earns more income than self-employed?
str(select(brfss2013,X_drdxar1,X_totinda))## 'data.frame': 491775 obs. of 2 variables:
## $ X_drdxar1: Factor w/ 2 levels "Diagnosed with arthritis",..: 1 2 1 2 2 2 1 1 1 2 ...
## $ X_totinda: Factor w/ 2 levels "Had physical activity or exercise",..: 2 1 2 1 2 1 1 1 1 1 ...
The above mentioned will serve as our variables in the data. The employment status is defined in 8 levels, and at the same time how much they earned.
First let us check how many people who were employed for wages and self-employed in our data.
total_obs <- nrow(brfss2013)
brfss2013 %>%
group_by(X_drdxar1) %>%
summarise(count=n(), percentage=n()*100/total_obs)## # A tibble: 3 × 3
## X_drdxar1 count percentage
## <fct> <int> <dbl>
## 1 Diagnosed with arthritis 165151 33.6
## 2 Not diagnosed with arthritis 323648 65.8
## 3 <NA> 2976 0.605
In our data, 33.6 % of 491775 respondents in the data are diagnosed with arthritis, 65.8 % are not diagnosed, while 0.61 % are neither.
ggplot(brfss2013, aes(y=X_drdxar1)) + geom_bar() + ggtitle('Diagnosis Status') + theme_update()
almost 33 % among the total population had been diagnosed with arthritis
and almose 35 % had been not.
levels(brfss2013$X_totinda)[1] <- c('Had exercise')
levels(brfss2013$X_totinda)[2] <- c('Had not exercise (last 30 days)')brfss2013 %>%
group_by(X_totinda) %>%
summarise(count=n(), percentage=n()*100/total_obs)## # A tibble: 3 × 3
## X_totinda count percentage
## <fct> <int> <dbl>
## 1 Had exercise 332461 67.6
## 2 Had not exercise (last 30 days) 125280 25.5
## 3 <NA> 34034 6.92
About 67.6 % of our respondents had exercise. Around 25.48 % of our respondents had not not been exercising for last 30 days, and 6.92 % had neither.
brfss2013 %>%
group_by(X_drdxar1, X_totinda) %>%
summarise(count=n(), percentage=n()*100/total_obs)## `summarise()` has grouped output by 'X_drdxar1'. You can override using the
## `.groups` argument.
## # A tibble: 9 × 4
## # Groups: X_drdxar1 [3]
## X_drdxar1 X_totinda count percentage
## <fct> <fct> <int> <dbl>
## 1 Diagnosed with arthritis Had exercise 101397 20.6
## 2 Diagnosed with arthritis Had not exercise (last 30 days) 54218 11.0
## 3 Diagnosed with arthritis <NA> 9536 1.94
## 4 Not diagnosed with arthritis Had exercise 229293 46.6
## 5 Not diagnosed with arthritis Had not exercise (last 30 days) 70159 14.3
## 6 Not diagnosed with arthritis <NA> 24196 4.92
## 7 <NA> Had exercise 1771 0.360
## 8 <NA> Had not exercise (last 30 days) 903 0.184
## 9 <NA> <NA> 302 0.0614
The tables shows us how many Diagnosed with arthritis had been exercising and didn’t exercise. It also shows us how many respondents that hadn’t been diagnosed with arthritis had been exercising, and didn’t, and NA means that respondents had or had not been diagnosed which shows also how many of them in the data.
rq1_table <-table(brfss2013$X_drdxar1,brfss2013$X_totinda, useNA = "no")
rq1_table##
## Had exercise Had not exercise (last 30 days)
## Diagnosed with arthritis 101397 54218
## Not diagnosed with arthritis 229293 70159
This is the tabular representation of the data above. It’s a little hard to look at the number and quickly understand what proportions of Diagnosed with arthritis and not been diagnosed have been exercising or not. We’ll calculate those proportions, and make sure that the rows sum to 1.
prop.table(rq1_table,1)##
## Had exercise Had not exercise (last 30 days)
## Diagnosed with arthritis 0.6515889 0.3484111
## Not diagnosed with arthritis 0.7657087 0.2342913
a <-brfss2013 %>%
filter(X_drdxar1!="NA", X_totinda!= "NA") %>%
group_by(X_drdxar1,X_totinda)
levels(brfss2013$X_totinda)[1] <- c('Had exercise')
levels(brfss2013$X_totinda)[2] <- c('Had not exercise (last 30 days)')ggplot(a, aes(fill=X_drdxar1, x = X_totinda )) + geom_bar(position = 'dodge')
According to the data above, those without arthritis diagnosis engaged
in more physical activity and exercise than people diagnosed with
arthritis. On the other hand, those without arthritis diagnosis also
does not engage in physical activity and exercise in the last 30 days
than those, again, with diagnosis.
Research quesion 2: Are Self-employed who are getting paid by the hour is satisfied than self-employed getting paid by salary?
str(select(brfss2013,employ1,scntpaid,carercvd))## 'data.frame': 491775 obs. of 3 variables:
## $ employ1 : Factor w/ 8 levels "Employed for wages",..: 7 1 1 7 7 1 1 7 7 5 ...
## $ scntpaid: Factor w/ 4 levels "Paid by salary",..: NA 2 2 NA NA 1 2 NA NA NA ...
## $ carercvd: Factor w/ 3 levels "Very satisfied",..: 1 1 1 2 1 1 1 1 1 1 ...
brfss2013 %>%
filter(employ1!= "NA")%>%
group_by(employ1) %>%
summarise(count=n(), percentage=n()*100/total_obs)## # A tibble: 8 × 3
## employ1 count percentage
## <fct> <int> <dbl>
## 1 Employed for wages 202200 41.1
## 2 Self-employed 39832 8.10
## 3 Out of work for 1 year or more 14074 2.86
## 4 Out of work for less than 1 year 12242 2.49
## 5 A homemaker 31647 6.44
## 6 A student 12682 2.58
## 7 Retired 138259 28.1
## 8 Unable to work 37453 7.62
According to the data, 39,832 among the respondents are self-employed, that is 8.1 % of our total respondents in our data. Our only focus is only on a respondent that are Self-Employed.
brfss2013 %>%
filter(scntpaid != "NA")%>%
group_by(scntpaid) %>%
summarise(count=n(), percentage=n()*100/total_obs)## # A tibble: 4 × 3
## scntpaid count percentage
## <fct> <int> <dbl>
## 1 Paid by salary 13240 2.69
## 2 Paid by the hour 14300 2.91
## 3 Paid by the job / task 2752 0.560
## 4 Paid some other way 1912 0.389
13,240 of our respondents are getting paid by salary, that is 2.700 % of our total respondents and 2.908 % are getting paid by the hour, 14,300 of our respondents to be exact.
rq2_table <- table(brfss2013$employ1,brfss2013$scntpaid, useNA = "no")
rq2_table##
## Paid by salary Paid by the hour
## Employed for wages 12370 13445
## Self-employed 862 837
## Out of work for 1 year or more 0 0
## Out of work for less than 1 year 0 0
## A homemaker 0 0
## A student 0 0
## Retired 0 0
## Unable to work 0 0
##
## Paid by the job / task Paid some other way
## Employed for wages 725 644
## Self-employed 2025 1259
## Out of work for 1 year or more 0 0
## Out of work for less than 1 year 0 0
## A homemaker 0 0
## A student 0 0
## Retired 0 0
## Unable to work 0 0
The data shows that 862 of Self-employed respondents are paid by salary, and 837 of them are paid by the hour. Our only question is how many of the Self-employed respondents that are paid by salary and paid by the hour are earning $75,000 or more.
rq2_table <- table(brfss2013$employ1,brfss2013$scntpaid,brfss2013$carercvd, useNA = "no")
rq2_table## , , = Very satisfied
##
##
## Paid by salary Paid by the hour
## Employed for wages 8230 7657
## Self-employed 520 505
## Out of work for 1 year or more 0 0
## Out of work for less than 1 year 0 0
## A homemaker 0 0
## A student 0 0
## Retired 0 0
## Unable to work 0 0
##
## Paid by the job / task Paid some other way
## Employed for wages 416 357
## Self-employed 1150 739
## Out of work for 1 year or more 0 0
## Out of work for less than 1 year 0 0
## A homemaker 0 0
## A student 0 0
## Retired 0 0
## Unable to work 0 0
##
## , , = Somewhat satisfied
##
##
## Paid by salary Paid by the hour
## Employed for wages 3060 4026
## Self-employed 225 223
## Out of work for 1 year or more 0 0
## Out of work for less than 1 year 0 0
## A homemaker 0 0
## A student 0 0
## Retired 0 0
## Unable to work 0 0
##
## Paid by the job / task Paid some other way
## Employed for wages 211 189
## Self-employed 554 340
## Out of work for 1 year or more 0 0
## Out of work for less than 1 year 0 0
## A homemaker 0 0
## A student 0 0
## Retired 0 0
## Unable to work 0 0
##
## , , = Not at all satisfied
##
##
## Paid by salary Paid by the hour
## Employed for wages 223 484
## Self-employed 20 40
## Out of work for 1 year or more 0 0
## Out of work for less than 1 year 0 0
## A homemaker 0 0
## A student 0 0
## Retired 0 0
## Unable to work 0 0
##
## Paid by the job / task Paid some other way
## Employed for wages 32 26
## Self-employed 77 44
## Out of work for 1 year or more 0 0
## Out of work for less than 1 year 0 0
## A homemaker 0 0
## A student 0 0
## Retired 0 0
## Unable to work 0 0
Our data is set but it is cumbersome to look at the numbers, especially we are looking only on self-employed respondents getting paid by salary and by the hour. Let’s make a graph using the data we gathered.
abc <- c(520,505,225,223,20,40,2904,35395)
labels <- c("Self-employed Paid by salary - VS","Self-employed Paid by the hour - VS", "Self-employed Paid by salary - SS", "Self-employed Paid by the hour - SS","Self-employed Paid by salary - NS","Self-employed Paid by the hour-NS", "Other method of getting paid", "Number of Nas")piepercent <- round(100*abc/sum(abc),1)pie(abc, labels = piepercent, main = "Total Self-employment status", col = rainbow(length(abc)))
legend("topright", c("Self-employed Paid by salary - VS","Self-employed Paid by the hour - VS", "Self-employed Paid by salary - SS", "Self-employed Paid by the hour - SS","Self-employed Paid by salary - NS","Self-employed Paid by the hour-NS", "Other method of getting paid", "Total Na's"), cex = 0.6, fill = rainbow(length(abc)))
The data above shows the total number of self-employed respondents. We
need to exclude the number of Na’s in order for us to understand what is
going on in the data.
abc <- c(520,505,225,223,20,40,2904)
labels <- c("Self-employed Paid by salary - VS","Self-employed Paid by the hour - VS", "Self-employed Paid by salary - SS", "Self-employed Paid by the hour - SS","Self-employed Paid by salary - NS","Self-employed Paid by the hour-NS", "Other method of getting paid")piepercent <- round(100*abc/sum(abc),1)pie(abc, labels = piepercent, main = "Total Self-employment status excluding NAs", col = rainbow(length(abc)))
legend("topright", c("Self-employed Paid by salary - VS","Self-employed Paid by the hour - VS", "Self-employed Paid by salary - SS", "Self-employed Paid by the hour - SS","Self-employed Paid by salary - NS","Self-employed Paid by the hour-NS", "Other method of getting paid"), cex = 0.591, fill = rainbow(length(abc)))
The data above looks promising, and we can get information, regarding to
our question. As we look at the data above, 11.7 % of self-employed
respondents that are getting paid by salary are very satisfied than
those getting paid by the hour which is 11.4 %. 5.1 % of self-employed
respondents that are getting paid by the hour are somehow satisfied than
those getting paid by the hour which is 5 % only. 0.5 % of self employed
respondents that are getting paid by the salary are not at all
satisfied, compare to those who are getting paid by the hour which 0.9
%, a little higher.
Research question 3: Does more men in North Carolina have health coverage than women?
str(select(brfss2013,sex, X_hcvu651, X_state))## 'data.frame': 491775 obs. of 3 variables:
## $ sex : Factor w/ 2 levels "Male","Female": 2 2 2 2 1 2 2 2 1 2 ...
## $ X_hcvu651: Factor w/ 2 levels "Have health care coverage",..: 1 1 1 1 NA 1 1 1 1 NA ...
## $ X_state : Factor w/ 55 levels "0","Alabama",..: 2 2 2 2 2 2 2 2 2 2 ...
All of the above are categorical variable. Sex of the respondents is defined in 2 levels, while are there more male that have health care coverage than female in North Carolina.
To begin, let’s check out our selected variables individually.
brfss2013%>%
group_by(sex)%>%
summarise(count=n(), percentage=n()*100/total_obs)## # A tibble: 3 × 3
## sex count percentage
## <fct> <int> <dbl>
## 1 Male 201313 40.9
## 2 Female 290455 59.1
## 3 <NA> 7 0.00142
ggplot(brfss2013, aes(x=sex)) + geom_bar() + ggtitle('Men vs Women') + theme_linedraw() 40.94 % of our respondents are male, while 59.06 % are female. There are some missing (NA) values too which we’ll deal with later as they don’t make much sense with our analysis.
brfss2013%>%
group_by(X_hcvu651)%>%
summarise(count=n(), percentage = n()*100/total_obs)## # A tibble: 3 × 3
## X_hcvu651 count percentage
## <fct> <int> <dbl>
## 1 Have health care coverage 272546 55.4
## 2 Do not have health care coverage 52631 10.7
## 3 <NA> 166598 33.9
ggplot(brfss2013, aes(x=X_hcvu651)) + geom_bar() + ggtitle('respondents that have health care coverage and do not') + theme_linedraw()
More than 55.42 % of our respondents have health care coverage, while
10.70 % do not. There are also some missing (NA) values too which we’ll
deal with later as they don’t make much sense with our analysis.
brfss2013%>%
filter(X_hcvu651 != "NA")%>%
group_by(X_hcvu651)%>%
summarise(count=n(), percentage = n()*100/total_obs)## # A tibble: 2 × 3
## X_hcvu651 count percentage
## <fct> <int> <dbl>
## 1 Have health care coverage 272546 55.4
## 2 Do not have health care coverage 52631 10.7
rq4_table <- table(brfss2013$sex, brfss2013$X_hcvu651, useNA = "no")
rq4_table##
## Have health care coverage Do not have health care coverage
## Male 115138 24581
## Female 157408 28050
prop.table(rq4_table)##
## Have health care coverage Do not have health care coverage
## Male 0.35407793 0.07559268
## Female 0.48406868 0.08626071
mosaicplot(prop.table(rq4_table,1),main='Male and Female', xlab='Male and Female', ylab='Health Care Coverage Status')According to the data, there are more male who have health care coverage than female.
brfss2013 %>%
group_by(X_state)%>%
summarise(count=n(), percentage = n()*100/total_obs)## # A tibble: 55 × 3
## X_state count percentage
## <fct> <int> <dbl>
## 1 0 1 0.000203
## 2 Alabama 6505 1.32
## 3 Alaska 4578 0.931
## 4 Arizona 4253 0.865
## 5 Arkansas 5268 1.07
## 6 California 11518 2.34
## 7 Colorado 13649 2.78
## 8 Connecticut 7710 1.57
## 9 Delaware 5206 1.06
## 10 District of Columbia 4931 1.00
## # … with 45 more rows
Next is to look up for North Carolina, since our data sample population is in North Carolina.
brfss2013 %>%
group_by(X_state)%>%
filter(X_state == "North Carolina")%>%
summarise(count=n(), percentage = n()*100/total_obs)## # A tibble: 1 × 3
## X_state count percentage
## <fct> <int> <dbl>
## 1 North Carolina 8860 1.80
8860 of our respondents, or 1.80 % of our total population is from North Carolina, Let us see how many of males and females in North Carolina have health coverage. Let us check the distribution of all our variables.
brfss2013 %>%
group_by(sex,X_hcvu651,X_state)%>%
filter(X_state == "North Carolina")%>%
filter(X_hcvu651 != "NA", sex != "NA")%>%
summarise(count=n(), percentage = n()*100/total_obs)## `summarise()` has grouped output by 'sex', 'X_hcvu651'. You can override using
## the `.groups` argument.
## # A tibble: 4 × 5
## # Groups: sex, X_hcvu651 [4]
## sex X_hcvu651 X_state count percentage
## <fct> <fct> <fct> <int> <dbl>
## 1 Male Have health care coverage North Carolina 1855 0.377
## 2 Male Do not have health care coverage North Carolina 550 0.112
## 3 Female Have health care coverage North Carolina 2705 0.550
## 4 Female Do not have health care coverage North Carolina 708 0.144
As we see in the data shown, 1855 of male in North Carolina, have health coverage, that is about 0.38 % of our data, while, 2705 of of female in North Carolina have health coverage. For transparency, let us confirm our data above by using a table() for sex, X_hcvu651 and X_state.
rq5_table <- table(brfss2013$sex, brfss2013$X_hcvu651, brfss2013$X_state, useNA = "no")
rq5_table## , , = 0
##
##
## Have health care coverage Do not have health care coverage
## Male 0 0
## Female 0 0
##
## , , = Alabama
##
##
## Have health care coverage Do not have health care coverage
## Male 1187 255
## Female 2169 420
##
## , , = Alaska
##
##
## Have health care coverage Do not have health care coverage
## Male 1340 384
## Female 1627 292
##
## , , = Arizona
##
##
## Have health care coverage Do not have health care coverage
## Male 883 246
## Female 1205 281
##
## , , = Arkansas
##
##
## Have health care coverage Do not have health care coverage
## Male 990 280
## Female 1425 432
##
## , , = California
##
##
## Have health care coverage Do not have health care coverage
## Male 3100 769
## Female 3741 766
##
## , , = Colorado
##
##
## Have health care coverage Do not have health care coverage
## Male 3430 766
## Female 4381 800
##
## , , = Connecticut
##
##
## Have health care coverage Do not have health care coverage
## Male 2019 295
## Female 2650 276
##
## , , = Delaware
##
##
## Have health care coverage Do not have health care coverage
## Male 1171 214
## Female 1766 200
##
## , , = District of Columbia
##
##
## Have health care coverage Do not have health care coverage
## Male 1216 101
## Female 1741 92
##
## , , = Florida
##
##
## Have health care coverage Do not have health care coverage
## Male 5636 1957
## Female 8853 2493
##
## , , = Georgia
##
##
## Have health care coverage Do not have health care coverage
## Male 1854 581
## Female 2887 763
##
## , , = Hawaii
##
##
## Have health care coverage Do not have health care coverage
## Male 2296 340
## Female 2592 291
##
## , , = Idaho
##
##
## Have health care coverage Do not have health care coverage
## Male 1230 338
## Female 1611 428
##
## , , = Illinois
##
##
## Have health care coverage Do not have health care coverage
## Male 1400 268
## Female 1827 249
##
## , , = Indiana
##
##
## Have health care coverage Do not have health care coverage
## Male 2435 522
## Female 3127 620
##
## , , = Iowa
##
##
## Have health care coverage Do not have health care coverage
## Male 1967 242
## Female 2686 253
##
## , , = Kansas
##
##
## Have health care coverage Do not have health care coverage
## Male 5604 1170
## Female 7304 1381
##
## , , = Kentucky
##
##
## Have health care coverage Do not have health care coverage
## Male 2226 524
## Female 3797 808
##
## , , = Louisiana
##
##
## Have health care coverage Do not have health care coverage
## Male 958 240
## Female 1706 405
##
## , , = Maine
##
##
## Have health care coverage Do not have health care coverage
## Male 1913 379
## Female 2674 355
##
## , , = Maryland
##
##
## Have health care coverage Do not have health care coverage
## Male 2983 389
## Female 4690 469
##
## , , = Massachusetts
##
##
## Have health care coverage Do not have health care coverage
## Male 4027 302
## Female 5698 265
##
## , , = Michigan
##
##
## Have health care coverage Do not have health care coverage
## Male 3186 626
## Female 4083 601
##
## , , = Minnesota
##
##
## Have health care coverage Do not have health care coverage
## Male 3971 499
## Female 5125 461
##
## , , = Mississippi
##
##
## Have health care coverage Do not have health care coverage
## Male 1349 417
## Female 2226 653
##
## , , = Missouri
##
##
## Have health care coverage Do not have health care coverage
## Male 1489 314
## Female 2149 435
##
## , , = Montana
##
##
## Have health care coverage Do not have health care coverage
## Male 2376 598
## Female 2864 599
##
## , , = Nebraska
##
##
## Have health care coverage Do not have health care coverage
## Male 4043 677
## Female 5387 832
##
## , , = Nevada
##
##
## Have health care coverage Do not have health care coverage
## Male 1144 360
## Female 1528 399
##
## , , = New Hampshire
##
##
## Have health care coverage Do not have health care coverage
## Male 1402 250
## Female 1971 290
##
## , , = New Jersey
##
##
## Have health care coverage Do not have health care coverage
## Male 3580 813
## Female 4738 825
##
## , , = New Mexico
##
##
## Have health care coverage Do not have health care coverage
## Male 2139 659
## Female 2759 777
##
## , , = New York
##
##
## Have health care coverage Do not have health care coverage
## Male 2244 482
## Female 3110 406
##
## , , = North Carolina
##
##
## Have health care coverage Do not have health care coverage
## Male 1855 550
## Female 2705 708
##
## , , = North Dakota
##
##
## Have health care coverage Do not have health care coverage
## Male 2227 328
## Female 2463 253
##
## , , = Ohio
##
##
## Have health care coverage Do not have health care coverage
## Male 2905 531
## Female 4031 610
##
## , , = Oklahoma
##
##
## Have health care coverage Do not have health care coverage
## Male 1763 425
## Female 2478 573
##
## , , = Oregon
##
##
## Have health care coverage Do not have health care coverage
## Male 1394 372
## Female 1714 367
##
## , , = Pennsylvania
##
##
## Have health care coverage Do not have health care coverage
## Male 2902 494
## Female 3602 475
##
## , , = Rhode Island
##
##
## Have health care coverage Do not have health care coverage
## Male 1458 300
## Female 2256 353
##
## , , = South Carolina
##
##
## Have health care coverage Do not have health care coverage
## Male 2245 590
## Female 3150 760
##
## , , = South Dakota
##
##
## Have health care coverage Do not have health care coverage
## Male 1851 285
## Female 2266 271
##
## , , = Tennessee
##
##
## Have health care coverage Do not have health care coverage
## Male 1127 301
## Female 1941 419
##
## , , = Texas
##
##
## Have health care coverage Do not have health care coverage
## Male 2264 863
## Female 2956 1222
##
## , , = Utah
##
##
## Have health care coverage Do not have health care coverage
## Male 3614 709
## Female 4329 716
##
## , , = Vermont
##
##
## Have health care coverage Do not have health care coverage
## Male 1601 244
## Female 2228 159
##
## , , = Virginia
##
##
## Have health care coverage Do not have health care coverage
## Male 2111 418
## Female 2886 492
##
## , , = Washington
##
##
## Have health care coverage Do not have health care coverage
## Male 2653 571
## Female 3425 624
##
## , , = West Virginia
##
##
## Have health care coverage Do not have health care coverage
## Male 1402 365
## Female 1834 467
##
## , , = Wisconsin
##
##
## Have health care coverage Do not have health care coverage
## Male 1770 312
## Female 2172 242
##
## , , = Wyoming
##
##
## Have health care coverage Do not have health care coverage
## Male 1342 315
## Female 1798 390
##
## , , = Guam
##
##
## Have health care coverage Do not have health care coverage
## Male 524 194
## Female 737 212
##
## , , = Puerto Rico
##
##
## Have health care coverage Do not have health care coverage
## Male 1342 157
## Female 2340 120
##
## , , = 80
##
##
## Have health care coverage Do not have health care coverage
## Male 0 0
## Female 0 0
As we compare our previous data in the data above, both of them are the same, hence, it is true.
wer <-data.frame(category=c("Men w/ HCC", "Men w/o HCC", "Women w/ HCC", "Women w/o HCC"), count= c(1855, 550, 2705, 708))wer$fraction = wer$count/ sum(wer$count)wer$ymax = cumsum(wer$fraction)wer$ymin = c(0, head(wer$ymax, n = -1))wer$labelposition <- (wer$ymax + wer$ymin) /2
wer$label <- paste0(wer$category, "\n", wer$count)ggplot(wer, aes(ymax=ymax, ymin = ymin, xmax =4, xmin = 3, fill = category)) + geom_rect() + geom_label(x=3.5, aes(y=labelposition, label=label), size=3) + scale_fill_brewer(palette = 4) + coord_polar(theta = "y") + xlim(c(1,4)) + theme_void() + theme(legend.position = "none") + ggtitle("Men w/ Health Care Coverage & Women w/ Health Care Coverage")Looking at the summary statistics and visualization, we can wee than a lot of female in North Carolina have health care coverage than male.