territory resp count percent
1 New South Wales yes 2374362 57.8
2 New South Wales no 1736838 42.2
3 Victoria yes 2145629 64.9
4 Victoria no 1161098 35.1
5 Queensland yes 1487060 60.7
6 Queensland no 961015 39.3
7 South Australia yes 592528 62.5
8 South Australia no 356247 37.5
9 Western Australia yes 801575 63.7
10 Western Australia no 455924 36.3
11 Tasmania yes 191948 63.6
12 Tasmania no 109655 36.4
13 Northern Territory(b) yes 48686 60.6
14 Northern Territory(b) no 31690 39.4
15 Australian Capital Territory(c) yes 175459 74.0
16 Australian Capital Territory(c) no 61520 26.0
Briefly describe the data
The data shows the counts and percentages of Australians from various territories who said they either were or were not married.
Tidy Data
It is redundant to list each territory twice, as we could simply have one column for the count of yes responses and one for the count of no responses. We can group by territory and summarise only the counts, then recompute the percentage of yes responses based off the total responses and yes responses.
marriage2 <- marriage %>%group_by(territory) %>%summarise(yes =sum(count[resp =="yes"]),no =sum(count[resp =="no"]),total =sum(count),yes_percent = yes / total *100 ) %>%arrange(match(territory, unique(marriage$territory)))marriage2
# A tibble: 8 × 5
territory yes no total yes_percent
<chr> <int> <int> <int> <dbl>
1 New South Wales 2374362 1736838 4111200 57.8
2 Victoria 2145629 1161098 3306727 64.9
3 Queensland 1487060 961015 2448075 60.7
4 South Australia 592528 356247 948775 62.5
5 Western Australia 801575 455924 1257499 63.7
6 Tasmania 191948 109655 301603 63.6
7 Northern Territory(b) 48686 31690 80376 60.6
8 Australian Capital Territory(c) 175459 61520 236979 74.0
Graphs
I will create a choropleths in order to visualize the column values by location. This is based off following a tutorial for a similar task here: https://www.youtube.com/watch?v=xXXqTvv5g3M. I chose to use a choropleths as it is the best choice for modeling numeric data that is associated with territories and it is something which I will likely use in my final project.
First, we need to append a dummy row for “Other Territories” since the map shape file lists this as the 9th territory while our data only has 8 territories:
# A tibble: 9 × 5
territory yes no total yes_percent
<chr> <dbl> <dbl> <dbl> <dbl>
1 New South Wales 2374362 1736838 4111200 57.8
2 Victoria 2145629 1161098 3306727 64.9
3 Queensland 1487060 961015 2448075 60.7
4 South Australia 592528 356247 948775 62.5
5 Western Australia 801575 455924 1257499 63.7
6 Tasmania 191948 109655 301603 63.6
7 Northern Territory(b) 48686 31690 80376 60.6
8 Australian Capital Territory(c) 175459 61520 236979 74.0
9 Other Territories NaN NaN NaN NaN
Now, we have what we need to create the choropleth of Australian marriages by adding a Cases column to sf_oz and then using ggplot to graph the data using the Cases to color.
sf_oz <-ozmap("states")
sf_oz$`Percent Yes`<- marriage3$yes_percentpl <-ggplot(data = sf_oz, aes(fill =`Percent Yes`)) +geom_sf() +scale_fill_gradient(low ="green", high ="red") +labs(title ="Australian Marraiges by Territory",caption ="Map of the Australian territories colored by the percentage of respondents polled who are married") +theme_void()pl
We could also use facet_wrap in order to graph each district separately:
Warning: The `size` argument of `element_rect()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.
pl
Lastly, we could instead visualize the total number of respondents instead of the percentage who responded yes:
sf_oz <-ozmap("states")
sf_oz$`Total Count`<- marriage3$totalpl <-ggplot(data = sf_oz, aes(fill =`Total Count`)) +geom_sf() +scale_fill_gradient(low ="green", high ="red",labels = scales::number_format(scale =1e-6, suffix ="M")) +labs(title ="Poll Responses by Territory",caption ="Map of the Australian territories colored by the total respondents to a poll in millions") +theme_void()pl