── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 120 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): month
dbl (5): year, large_half_dozen, large_dozen, extra_large_half_dozen, extra_...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
egg %>%print(n =10, width =Inf)
# A tibble: 120 × 6
month year large_half_dozen large_dozen extra_large_half_dozen
<chr> <dbl> <dbl> <dbl> <dbl>
1 January 2004 126 230 132
2 February 2004 128. 226. 134.
3 March 2004 131 225 137
4 April 2004 131 225 137
5 May 2004 131 225 137
6 June 2004 134. 231. 137
7 July 2004 134. 234. 137
8 August 2004 134. 234. 137
9 September 2004 130. 234. 136.
10 October 2004 128. 234. 136.
extra_large_dozen
<dbl>
1 230
2 230
3 230
4 234.
5 236
6 241
7 241
8 241
9 241
10 241
# ℹ 110 more rows
2.2 Import the “australian_marriage*.xls” data set
Importing the Excel file with several sheets into R is hard, so reference the resource through the internet and used it below.
The egg data set cleaning is mainly regarding the “month” column, which is made the “month” column arranged from January to December, which makes the data set more focused on the month column, not the year column, for some research purposes like analyzing the change in different years based on the same month. This data set also includes “year,” “large_half_dozen,” “large_dozen extra_large_half_dozen,” and “extra_large_dozen.”
# A tibble: 120 × 6
month year large_half_dozen large_dozen extra_large_half_dozen
<fct> <dbl> <dbl> <dbl> <dbl>
1 January 2004 126 230 132
2 January 2005 128. 234. 136.
3 January 2006 128. 234. 136.
4 January 2007 128. 234. 136.
5 January 2008 132 237 139
6 January 2009 174. 278. 186.
7 January 2010 174. 272. 186.
8 January 2011 174. 268. 186.
9 January 2012 174. 268. 186.
10 January 2013 178 268. 188.
extra_large_dozen
<dbl>
1 230
2 241
3 241
4 242.
5 245
6 286.
7 286.
8 286.
9 286.
10 290
# ℹ 110 more rows
3.2 clean the “marriage” data set
Cleaning Table 1 in the marriage data set mainly deletes the columns containing no information. Cleaning Table 2 in the marriage data set mainly deletes the column containing no information and the rows that only contain the names of the divisions in Australia and the total numbers of values. The cleaning also includes changing the information regarding the name of divisions into new categories called divisions. It is hard to organize the places into different divisions and make a new column, so reference the resources for organizing and making a new data set is needed. Both of these two tables contain information about “response clear,” which refers to the answer in the survey being clearly recorded based on “yes” or “no.” They also contain information about “Eligible participants,” which means the people eligible for enrollment on the Commonwealth Electoral Roll and elections. The values for these two variables are further divided through “yes,” “no,” and “total, and presented through numbers and percentages.
3.2.1 clean the “marriage” data set sheet 1 “Table 1” (or marriage 1)
Pivot the “egg” data set using the “pivot_longer” function, changing the column titles from “large_half_dozen” to “extra_large_dozen” into a column “Key” and making the values in these columns into one column called “values.” It will make the table longer but less wide and also make the values in the various columns easier to analyze based on the month, year, and various dozen categories.
# A tibble: 480 × 4
month year key Values
<fct> <dbl> <chr> <dbl>
1 January 2004 large_half_dozen 126
2 January 2004 large_dozen 230
3 January 2004 extra_large_half_dozen 132
4 January 2004 extra_large_dozen 230
5 January 2005 large_half_dozen 128.
6 January 2005 large_dozen 234.
7 January 2005 extra_large_half_dozen 136.
8 January 2005 extra_large_dozen 241
9 January 2006 large_half_dozen 128.
10 January 2006 large_dozen 234.
# ℹ 470 more rows
4.2 Pivot the “marriage” data set
Pivot the two tables in the marriage data set by 1) dividing the table through “clear response” and “eligible participants” and using “pivot_longer” function to make the column title contain “clear response” and “eligible participants” into a column called key so that their value can become one column, which will be easier for further analysis by different groups, 2) merging these two tables through “bind_rows” function so that it creates a new table and will not be impacted by the “eligible participants have more observations than the”clear response,” and 3) dividing down the key column by using the “separate” function to three columns: “Element,” which mainly refers to the “clear response” and “eligible participants” this two categories, “status&total” refers to the yes, no, and total in the original tables, and “Counting ways” refers to two ways the table present the data- numbers and percentage. Doing this will make the table easier to understand and do further analysis.
# A tibble: 54 × 3
`State and Territory` key values
<chr> <chr> <dbl>
1 New South Wales Response clear_yes_number 2374362
2 New South Wales Response clear_yes_percentage 57.8
3 New South Wales Response clear_no_number 1736838
4 New South Wales Response clear_no_percentage 42.2
5 New South Wales Response clear_total_number 4111200
6 New South Wales Response clear_total_percentage 100
7 Victoria Response clear_yes_number 2145629
8 Victoria Response clear_yes_percentage 64.9
9 Victoria Response clear_no_number 1161098
10 Victoria Response clear_no_percentage 35.1
# ℹ 44 more rows
# A tibble: 72 × 3
`State and Territory` key
<chr> <chr>
1 New South Wales Eligible Participants_Response clear_number
2 New South Wales Eligible Participants_Response clear_percentage
3 New South Wales Eligible Participants_Response not clear(a)_number
4 New South Wales Eligible Participants_Response not clear(a)_percentage
5 New South Wales Eligible Participants_Non-responding_number
6 New South Wales Eligible Participants_Non-responding_percentage
7 New South Wales Eligible Participants_total_number
8 New South Wales Eligible Participants_total_percentage
9 Victoria Eligible Participants_Response clear_number
10 Victoria Eligible Participants_Response clear_percentage
values
<dbl>
1 4111200
2 79.2
3 11036
4 0.2
5 1065445
6 20.5
7 5187681
8 100
9 3306727
10 81.4
# ℹ 62 more rows
# A tibble: 126 × 5
`State and Territory` Element `status&total` `Counting ways` values
<chr> <chr> <chr> <chr> <dbl>
1 New South Wales Response clear yes number 2374362
2 New South Wales Response clear yes percentage 57.8
3 New South Wales Response clear no number 1736838
4 New South Wales Response clear no percentage 42.2
5 New South Wales Response clear total number 4111200
6 New South Wales Response clear total percentage 100
7 Victoria Response clear yes number 2145629
8 Victoria Response clear yes percentage 64.9
9 Victoria Response clear no number 1161098
10 Victoria Response clear no percentage 35.1
# ℹ 116 more rows
4.2.2 pivot the the “marriage” dara set sheet 1 “Table 2” (or marriage 2)
# A tibble: 900 × 4
regions division key values
<chr> <chr> <chr> <dbl>
1 Banks New South Wales Response clear_yes_number 37736
2 Banks New South Wales Response clear_yes_percentage 44.9
3 Banks New South Wales Response clear_no_number 46343
4 Banks New South Wales Response clear_no_percentage 55.1
5 Banks New South Wales Response clear_total_number 84079
6 Banks New South Wales Response clear_total_percentage 100
7 Barton New South Wales Response clear_yes_number 37153
8 Barton New South Wales Response clear_yes_percentage 43.6
9 Barton New South Wales Response clear_no_number 47984
10 Barton New South Wales Response clear_no_percentage 56.4
# ℹ 890 more rows
# A tibble: 1,200 × 4
regions division
<chr> <chr>
1 Banks New South Wales
2 Banks New South Wales
3 Banks New South Wales
4 Banks New South Wales
5 Banks New South Wales
6 Banks New South Wales
7 Banks New South Wales
8 Banks New South Wales
9 Barton New South Wales
10 Barton New South Wales
key values
<chr> <dbl>
1 Eligible Participants_Response clear_number 84079
2 Eligible Participants_Response clear_percentage 79.9
3 Eligible Participants_Response not clear(b)_number 247
4 Eligible Participants_Response not clear(b)_percentage 0.2
5 Eligible Participants_Non-responding_number 20928
6 Eligible Participants_Non-responding_percentage 19.9
7 Eligible Participants_total_number 105254
8 Eligible Participants_total_percentage 100
9 Eligible Participants_Response clear_number 85137
10 Eligible Participants_Response clear_percentage 77.8
# ℹ 1,190 more rows
# A tibble: 2,100 × 5
regions_divisions Element `status&total` `Counting ways` values
<chr> <chr> <chr> <chr> <dbl>
1 New South Wales,Banks Response clear yes number 37736
2 New South Wales,Banks Response clear yes percentage 44.9
3 New South Wales,Banks Response clear no number 46343
4 New South Wales,Banks Response clear no percentage 55.1
5 New South Wales,Banks Response clear total number 84079
6 New South Wales,Banks Response clear total percentage 100
7 New South Wales,Barton Response clear yes number 37153
8 New South Wales,Barton Response clear yes percentage 43.6
9 New South Wales,Barton Response clear no number 47984
10 New South Wales,Barton Response clear no percentage 56.4
# ℹ 2,090 more rows