Workshop 7: Pivoting then grouping

Giulia Rathmes

2022-11-22

1 Intro

Welcome!

For this workshop, we will be cleaning a dataset. It is a hands-on approach to using pivoting and groupings

The assignment should be submitted individually, but you are encouraged to brainstorm with partners.

The final due date for the assignment is Tuesday, November 29th at 23:59 PM UTC+2.

2 Get the assignment repo

To get started, you should download and look through the assignment folder.

  1. First download the repo to your local computer here.

    You should ideally work on your local computer, but if you would rather work on RStudio Cloud, you can upload the zip file to RStudio Cloud through the Files pane. Consult one of the instructors for guidance on this.

  2. Unzip/Extract the downloaded folder.

    If you are on macOS, you can simply double-click on a file to unzip it.

    If you are on Windows and are not sure how to “unzip” a file, see this image. You need to right-click on the file and then select “extract all”.

  3. Once done, click on the RStudio Project file in the unzipped folder to open the project in RStudio.

  4. In RStudio, navigate to the Files tab and open the “rmd” folder. The instructions for your exercise are outlined there (these are the same instructions you see here).

  5. Open the “data” folder and observe its components. You will work with the “diet_diversity_vietnam_wide_EASY.csv”, “diet_diversity_vietnam_wide_INTERMEDIATE.csv” and “diet_diversity_vietnam_wide_HARD.csv” files. (The data is from the same source, remodelled for the exercise: you can also open the “00_info_about_the_dataset” file to learn more about this dataset.)

3 Load packages

Now that you understand the structure of the repo, you can load in and clean your dataset.

In the code section below, load in the needed packages.

## Loading required package: pacman

4 Easy Pivoting

For this pivoting, please import the “diet_diversity_vietnam_wide_EASY.csv”. The data frame you import should have 61 rows and 3 columns. Remember to use the here() function to allow your Rmd to use project-relative paths.

Using the lesson on pivoting you prepared for today: pivot this dataset into long format. Print the pivoted dataframe as a reactable table.

## Rows: 61 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): household_id, enerc.kcal.w.1, enerc.kcal.w.2
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

5 Intermediate Pivoting

For this pivoting, please import the “diet_diversity_vietnam_wide_INTERMEDIATE.csv”. The data frame you import should have 61 rows and 5 columns.

Using what you learnt in the code demo about some more advanced pivoting, pivot the data to long format. Print the pivoted dataframe as a reactable table.

Hint: Remember to use the neat separator in your column names.

## Rows: 61 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (5): household_id, enerc.kcal.w_1, enerc.kcal.w_2, fat.w_1, fat.w_2
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

6 Bonus: Hard Pivoting (You can do it!)

For this pivoting, please import the “diet_diversity_vietnam_wide_HARD.csv”. The data frame you import should have 61 rows and 9 columns. This is the original data.

There is no neat separator, think about how you could make one to then pivot the dataframe into long format. Print the pivoted dataframe as a reactable table.

Hint: Think about the rename() function of {dplyr}

## Rows: 61 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (9): household_id, enerc_kcal_w_1, enerc_kcal_w_2, dry_w_1, dry_w_2, wat...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 61 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (9): household_id, enerc_kcal_w_1, enerc_kcal_w_2, dry_w_1, dry_w_2, wat...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 61 × 9
##    household_id enerc_…¹ enerc…² dry_w…³ dry_w…⁴ water…⁵ water…⁶ fat_w…⁷ fat_w…⁸
##           <dbl>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1          348    2268.   1386.    548.    281.   4219.   1997.    78.4    67.7
##  2          354    2775.   1240.    600.    284.   2376.   3145.   115.     45.3
##  3           53    3104.   2075.    646.    451.   2808.   2305.   127.     66.0
##  4           18    2802.   2146.    620.    807.   3457.   1903.    87.4    47.8
##  5          211    1298.   1191.    269.    288.   2584.   2269.    47.8    32.0
##  6          130    1634.   1425.    397.    301.   3062.   1792.    56.3    63.7
##  7          139    1914.   1222.    407.    278.   2480.   2200.    76.7    34.6
##  8          159    1623.   1410.    425.    321.   2253.   2082.    16.1    46.2
##  9          212    2817.   2058.    618.    411.   3071.   1898.    87.0   104. 
## 10          147    1044.   1100.    266.    246.   1564.   1769.    10.4    31.5
## # … with 51 more rows, and abbreviated variable names ¹​enerc_kcal_w__1,
## #   ²​enerc_kcal_w__2, ³​dry_w__1, ⁴​dry_w__2, ⁵​water_w__1, ⁶​water_w__2,
## #   ⁷​fat_w__1, ⁸​fat_w__2

7 Submission: Upload HTML

Once you have finished the tasks above, you should knit this Rmd into an HTML and upload it on the assignment page.