This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
abc_poll <- read_csv("../challenge_datasets/abc_poll_2021.csv")
## Rows: 527 Columns: 31
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (28): xspanish, complete_status, ppeduc5, ppeducat, ppgender, ppethm, pp...
## dbl (3): id, ppage, weights_pid
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(abc_poll)
## # A tibble: 6 × 31
## id xspanish complete_status ppage ppeduc5 ppeducat ppgender ppethm
## <dbl> <chr> <chr> <dbl> <chr> <chr> <chr> <chr>
## 1 7230001 English qualified 68 "High school … High sc… Female White…
## 2 7230002 English qualified 85 "Bachelor\x92… Bachelo… Male White…
## 3 7230003 English qualified 69 "High school … High sc… Male White…
## 4 7230004 English qualified 74 "Bachelor\x92… Bachelo… Female White…
## 5 7230005 English qualified 77 "High school … High sc… Male White…
## 6 7230006 English qualified 70 "Bachelor\x92… Bachelo… Male White…
## # ℹ 23 more variables: pphhsize <chr>, ppinc7 <chr>, ppmarit5 <chr>,
## # ppmsacat <chr>, ppreg4 <chr>, pprent <chr>, ppstaten <chr>, PPWORKA <chr>,
## # ppemploy <chr>, Q1_a <chr>, Q1_b <chr>, Q1_c <chr>, Q1_d <chr>, Q1_e <chr>,
## # Q1_f <chr>, Q2 <chr>, Q3 <chr>, Q4 <chr>, Q5 <chr>, QPID <chr>,
## # ABCAGE <chr>, Contact <chr>, weights_pid <dbl>
# The data for the most part already seems pretty tidy.
# This is recoding 'ppmsacat' variable to have more meaningful categories. Just about the only thing that seemed that needed a touch up from my perspective.
abc_poll <- abc_poll %>%
mutate(ppmsacat = recode(ppmsacat, 'Metro area' = 'Urban', 'Non-metro area' = 'Rural'))
head(abc_poll)
## # A tibble: 6 × 31
## id xspanish complete_status ppage ppeduc5 ppeducat ppgender ppethm
## <dbl> <chr> <chr> <dbl> <chr> <chr> <chr> <chr>
## 1 7230001 English qualified 68 "High school … High sc… Female White…
## 2 7230002 English qualified 85 "Bachelor\x92… Bachelo… Male White…
## 3 7230003 English qualified 69 "High school … High sc… Male White…
## 4 7230004 English qualified 74 "Bachelor\x92… Bachelo… Female White…
## 5 7230005 English qualified 77 "High school … High sc… Male White…
## 6 7230006 English qualified 70 "Bachelor\x92… Bachelo… Male White…
## # ℹ 23 more variables: pphhsize <chr>, ppinc7 <chr>, ppmarit5 <chr>,
## # ppmsacat <chr>, ppreg4 <chr>, pprent <chr>, ppstaten <chr>, PPWORKA <chr>,
## # ppemploy <chr>, Q1_a <chr>, Q1_b <chr>, Q1_c <chr>, Q1_d <chr>, Q1_e <chr>,
## # Q1_f <chr>, Q2 <chr>, Q3 <chr>, Q4 <chr>, Q5 <chr>, QPID <chr>,
## # ABCAGE <chr>, Contact <chr>, weights_pid <dbl>
# Create Bar Chart - Count of each Age Group
ggplot(abc_poll, aes(x = ABCAGE, fill = ABCAGE)) +
geom_bar() +
labs(title = "Count of each Age Group", x = "Age Group", y = "Count")
# Plot 2: Bar Chart - Marital Status Distribution by Age Group
ggplot(abc_poll, aes(x = ABCAGE, fill = ppmarit5)) +
geom_bar(position = "dodge") +
labs(title = "Marital Status Distribution by Age Group", x = "Age Group", y = "Count") +
facet_wrap(~ppgender)
The ABAGE is a categorical group of age and a bar chart is suitable for visualizing this since its good for the count or frequency distribution of the data. The bar graph provides the overview of distributions for each individual age groups.
The variables age_group and ppmarit5 represent categorical data indicating age groups and marital status and a bar chart is most effective as it displays the distribution. Using facets with facet_wrap(~ppgender) allows us to create separate bar charts for each gender, making it a easier view for readers on comparisons.