Challenge 7

Read in Data

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ozmaps)
marriage = read.csv("../challenge_datasets/australian_marriage_tidy.csv")
marriage

                         territory resp   count percent
1                  New South Wales  yes 2374362    57.8
2                  New South Wales   no 1736838    42.2
3                         Victoria  yes 2145629    64.9
4                         Victoria   no 1161098    35.1
5                       Queensland  yes 1487060    60.7
6                       Queensland   no  961015    39.3
7                  South Australia  yes  592528    62.5
8                  South Australia   no  356247    37.5
9                Western Australia  yes  801575    63.7
10               Western Australia   no  455924    36.3
11                        Tasmania  yes  191948    63.6
12                        Tasmania   no  109655    36.4
13           Northern Territory(b)  yes   48686    60.6
14           Northern Territory(b)   no   31690    39.4
15 Australian Capital Territory(c)  yes  175459    74.0
16 Australian Capital Territory(c)   no   61520    26.0

Briefly describe the data

The data shows the counts and percentages of Australians from various territories who said they either were or were not married.

Tidy Data

It is redundant to list each territory twice, as we could simply have one column for the count of yes responses and one for the count of no responses. We can group by territory and summarise only the counts, then recompute the percentage of yes responses based off the total responses and yes responses.

marriage2 <- marriage %>%
  group_by(territory) %>%
  summarise(
    yes = sum(count[resp == "yes"]),
    no = sum(count[resp == "no"]),
    total = sum(count),
    yes_percent = yes / total * 100
  ) %>%
  arrange(match(territory, unique(marriage$territory)))
marriage2

# A tibble: 8 × 5
  territory                           yes      no   total yes_percent
  <chr>                             <int>   <int>   <int>       <dbl>
1 New South Wales                 2374362 1736838 4111200        57.8
2 Victoria                        2145629 1161098 3306727        64.9
3 Queensland                      1487060  961015 2448075        60.7
4 South Australia                  592528  356247  948775        62.5
5 Western Australia                801575  455924 1257499        63.7
6 Tasmania                         191948  109655  301603        63.6
7 Northern Territory(b)             48686   31690   80376        60.6
8 Australian Capital Territory(c)  175459   61520  236979        74.0

Graphs

I will create a choropleths in order to visualize the column values by location. This is based off following a tutorial for a similar task here: https://www.youtube.com/watch?v=xXXqTvv5g3M. I chose to use a choropleths as it is the best choice for modeling numeric data that is associated with territories and it is something which I will likely use in my final project.

First, we need to append a dummy row for “Other Territories” since the map shape file lists this as the 9th territory while our data only has 8 territories:

dummy_row <- data.frame(
  territory = "Other Territories",
  yes = NaN,
  no = NaN,
  total = NaN,
  yes_percent = NaN
)

marriage3 <- bind_rows(marriage2, dummy_row)
marriage3

# A tibble: 9 × 5
  territory                           yes      no   total yes_percent
  <chr>                             <dbl>   <dbl>   <dbl>       <dbl>
1 New South Wales                 2374362 1736838 4111200        57.8
2 Victoria                        2145629 1161098 3306727        64.9
3 Queensland                      1487060  961015 2448075        60.7
4 South Australia                  592528  356247  948775        62.5
5 Western Australia                801575  455924 1257499        63.7
6 Tasmania                         191948  109655  301603        63.6
7 Northern Territory(b)             48686   31690   80376        60.6
8 Australian Capital Territory(c)  175459   61520  236979        74.0
9 Other Territories                   NaN     NaN     NaN       NaN

Now, we have what we need to create the choropleth of Australian marriages by adding a Cases column to sf_oz and then using ggplot to graph the data using the Cases to color.

sf_oz <- ozmap("states")

sf_oz$`Percent Yes` <- marriage3$yes_percent
pl <- ggplot(data = sf_oz, aes(fill = `Percent Yes`)) + geom_sf() +
  scale_fill_gradient(low ="green", high = "red") +
  labs(title = "Australian Marraiges by Territory",
       caption = "Map of the Australian territories colored by the percentage of respondents polled who are married") +
  theme_void()
pl

We could also use facet_wrap in order to graph each district separately:

pl <- pl + theme(panel.border = element_rect(color = "blue",
                                    fill = NA,
                                    size = 1)) +
  facet_wrap( ~ NAME)

Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.

pl

Lastly, we could instead visualize the total number of respondents instead of the percentage who responded yes:

sf_oz <- ozmap("states")

sf_oz$`Total Count` <- marriage3$total
pl <- ggplot(data = sf_oz, aes(fill = `Total Count`)) + geom_sf() +
  scale_fill_gradient(low ="green", high = "red",
                      labels = scales::number_format(scale = 1e-6, suffix = "M")) +
  labs(title = "Poll Responses by Territory",
       caption = "Map of the Australian territories colored by the total respondents to a poll in millions") +
  theme_void()
pl