Challenge7

Read in Data

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

abc_poll <- read_csv("../challenge_datasets/abc_poll_2021.csv")

## Rows: 527 Columns: 31
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (28): xspanish, complete_status, ppeduc5, ppeducat, ppgender, ppethm, pp...
## dbl  (3): id, ppage, weights_pid
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

head(abc_poll)

## # A tibble: 6 × 31
##        id xspanish complete_status ppage ppeduc5        ppeducat ppgender ppethm
##     <dbl> <chr>    <chr>           <dbl> <chr>          <chr>    <chr>    <chr> 
## 1 7230001 English  qualified          68 "High school … High sc… Female   White…
## 2 7230002 English  qualified          85 "Bachelor\x92… Bachelo… Male     White…
## 3 7230003 English  qualified          69 "High school … High sc… Male     White…
## 4 7230004 English  qualified          74 "Bachelor\x92… Bachelo… Female   White…
## 5 7230005 English  qualified          77 "High school … High sc… Male     White…
## 6 7230006 English  qualified          70 "Bachelor\x92… Bachelo… Male     White…
## # ℹ 23 more variables: pphhsize <chr>, ppinc7 <chr>, ppmarit5 <chr>,
## #   ppmsacat <chr>, ppreg4 <chr>, pprent <chr>, ppstaten <chr>, PPWORKA <chr>,
## #   ppemploy <chr>, Q1_a <chr>, Q1_b <chr>, Q1_c <chr>, Q1_d <chr>, Q1_e <chr>,
## #   Q1_f <chr>, Q2 <chr>, Q3 <chr>, Q4 <chr>, Q5 <chr>, QPID <chr>,
## #   ABCAGE <chr>, Contact <chr>, weights_pid <dbl>

Tidy Data

# The data for the most part already seems pretty tidy.

# This is recoding 'ppmsacat' variable to have more meaningful categories. Just about the only thing that seemed that needed a touch up from my perspective. 
abc_poll <- abc_poll %>%
  mutate(ppmsacat = recode(ppmsacat, 'Metro area' = 'Urban', 'Non-metro area' = 'Rural'))

head(abc_poll)

## # A tibble: 6 × 31
##        id xspanish complete_status ppage ppeduc5        ppeducat ppgender ppethm
##     <dbl> <chr>    <chr>           <dbl> <chr>          <chr>    <chr>    <chr> 
## 1 7230001 English  qualified          68 "High school … High sc… Female   White…
## 2 7230002 English  qualified          85 "Bachelor\x92… Bachelo… Male     White…
## 3 7230003 English  qualified          69 "High school … High sc… Male     White…
## 4 7230004 English  qualified          74 "Bachelor\x92… Bachelo… Female   White…
## 5 7230005 English  qualified          77 "High school … High sc… Male     White…
## 6 7230006 English  qualified          70 "Bachelor\x92… Bachelo… Male     White…
## # ℹ 23 more variables: pphhsize <chr>, ppinc7 <chr>, ppmarit5 <chr>,
## #   ppmsacat <chr>, ppreg4 <chr>, pprent <chr>, ppstaten <chr>, PPWORKA <chr>,
## #   ppemploy <chr>, Q1_a <chr>, Q1_b <chr>, Q1_c <chr>, Q1_d <chr>, Q1_e <chr>,
## #   Q1_f <chr>, Q2 <chr>, Q3 <chr>, Q4 <chr>, Q5 <chr>, QPID <chr>,
## #   ABCAGE <chr>, Contact <chr>, weights_pid <dbl>

Graph the Data

# Create Bar Chart - Count of each Age Group
ggplot(abc_poll, aes(x = ABCAGE, fill = ABCAGE)) +
  geom_bar() +
  labs(title = "Count of each Age Group", x = "Age Group", y = "Count")

# Plot 2: Bar Chart - Marital Status Distribution by Age Group
ggplot(abc_poll, aes(x = ABCAGE, fill = ppmarit5)) +
  geom_bar(position = "dodge") +
  labs(title = "Marital Status Distribution by Age Group", x = "Age Group", y = "Count") +
  facet_wrap(~ppgender)

The ABAGE is a categorical group of age and a bar chart is suitable for visualizing this since its good for the count or frequency distribution of the data. The bar graph provides the overview of distributions for each individual age groups.

The variables age_group and ppmarit5 represent categorical data indicating age groups and marital status and a bar chart is most effective as it displays the distribution. Using facets with facet_wrap(~ppgender) allows us to create separate bar charts for each gender, making it a easier view for readers on comparisons.

Challenge7

Alex Kim

2024-01-13

Read in Data

Tidy Data

Graph the Data