aus_marriage <- read_csv("../challenge_datasets/australian_marriage_tidy.csv")
## Rows: 16 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): territory, resp
## dbl (2): count, percent
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(aus_marriage)
## # A tibble: 6 × 4
## territory resp count percent
## <chr> <chr> <dbl> <dbl>
## 1 New South Wales yes 2374362 57.8
## 2 New South Wales no 1736838 42.2
## 3 Victoria yes 2145629 64.9
## 4 Victoria no 1161098 35.1
## 5 Queensland yes 1487060 60.7
## 6 Queensland no 961015 39.3
The data seems to show the territories of Australia, the vote count on if the population supports or is against marriage equality. The survey took place in 2017.
str(aus_marriage)
## spc_tbl_ [16 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ territory: chr [1:16] "New South Wales" "New South Wales" "Victoria" "Victoria" ...
## $ resp : chr [1:16] "yes" "no" "yes" "no" ...
## $ count : num [1:16] 2374362 1736838 2145629 1161098 1487060 ...
## $ percent : num [1:16] 57.8 42.2 64.9 35.1 60.7 39.3 62.5 37.5 63.7 36.3 ...
## - attr(*, "spec")=
## .. cols(
## .. territory = col_character(),
## .. resp = col_character(),
## .. count = col_double(),
## .. percent = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
summary(aus_marriage)
## territory resp count percent
## Length:16 Length:16 Min. : 31690 Min. :26.00
## Class :character Class :character 1st Qu.: 159008 1st Qu.:37.23
## Mode :character Mode :character Median : 524226 Median :50.00
## Mean : 793202 Mean :50.00
## 3rd Qu.:1242588 3rd Qu.:62.77
## Max. :2374362 Max. :74.00
The data is already tidy.
You can also embed plots, for example:
# Univariate plot 1
ggplot(data=aus_marriage, aes(x=territory, y=count, fill=resp)) +
geom_bar(stat="identity") +
# sideways view seems more appropriate for this case
coord_flip() +
labs(title = "Total Count of Respondents on Same-Sex Marriage Views, by Australian State/Territory",
caption = "Bar heights show total respondents from each territory, colored by view on same-sex marriage.",
x = NULL, y = "Total Respondents") +
# I changed this so the numbers are easierß
scale_y_continuous(labels = scales::comma) +
theme_minimal() +
theme(plot.title = element_text(face="bold", size=14, hjust=0.5),
plot.subtitle = element_text(face="italic", size=10, hjust=0.5),
plot.caption = element_text(size=8, hjust=0),
axis.title.y = element_text(face="bold", size=12),
axis.text.x = element_text(size=8, angle=0, hjust=1, color="grey50"))
# Univariate plot 2
# Messed around with size and shapes for an appropriate size when it displays
ggplot(data=aus_marriage, aes(x=reorder(territory, percent), y=percent)) +
geom_point(size=5, shape=21, fill="white") +
geom_segment(aes(x=territory, xend=territory, y=0, yend=percent)) +
labs(title = "Percentage Support for Same-Sex Marriage, by Australian State/Territory",
caption = "Points show the percentage of respondents in each territory supporting same-sex marriage.",
x = NULL, y = "Percentage") +
expand_limits(y=0) +
theme_minimal() +
theme(plot.title = element_text(face="bold", size=14, hjust=0.5),
plot.subtitle = element_text(face="italic", size=10, hjust=0.5),
plot.caption = element_text(size=8, hjust=0),
axis.title.y = element_text(face="bold", size=12),
axis.text.x = element_text(size=8, angle=0, hjust=1, color="grey50"))
# Bivariate Plot
ggplot(aus_marriage, aes(percent, count, color = resp)) +
geom_point(size=3) +
facet_wrap(~territory) +
# fire and forest seem even more opposite then green vs red :D
scale_color_manual(values = c("firebrick4","forestgreen"))
The first plot is a bar chart which I tried to clearly show the separating views by territory.
The second plot shows the context of percentage differences across countries to show what the majority favors.
The univariate I tried to make some color coordination and messed around with text size, title, etc.
The bivariate plot I chose because its a scatter plot and it makes it easier to evaluate relationships between the territories and their answers. And I thought that this visualization helsp support further trends within the terrirotries between yes or no.