1 Intro
Welcome!
For this workshop, we will be cleaning a dataset. It is a hands-on approach to using pivoting and groupings
The assignment should be submitted individually, but you are encouraged to brainstorm with partners.
The final due date for the assignment is Tuesday, November 29th at 23:59 PM UTC+2.
2 Get the assignment repo
To get started, you should download and look through the assignment folder.
First download the repo to your local computer here.
You should ideally work on your local computer, but if you would rather work on RStudio Cloud, you can upload the zip file to RStudio Cloud through the Files pane. Consult one of the instructors for guidance on this.
Unzip/Extract the downloaded folder.
If you are on macOS, you can simply double-click on a file to unzip it.
If you are on Windows and are not sure how to “unzip” a file, see this image. You need to right-click on the file and then select “extract all”.
Once done, click on the RStudio Project file in the unzipped folder to open the project in RStudio.
In RStudio, navigate to the Files tab and open the “rmd” folder. The instructions for your exercise are outlined there (these are the same instructions you see here).
Open the “data” folder and observe its components. You will work with the “diet_diversity_vietnam_wide_EASY.csv”, “diet_diversity_vietnam_wide_INTERMEDIATE.csv” and “diet_diversity_vietnam_wide_HARD.csv” files. (The data is from the same source, remodelled for the exercise: you can also open the “00_info_about_the_dataset” file to learn more about this dataset.)
3 Load packages
Now that you understand the structure of the repo, you can load in and clean your dataset.
In the code section below, load in the needed packages.
## Loading required package: pacman
4 Easy Pivoting
For this pivoting, please import the
“diet_diversity_vietnam_wide_EASY.csv”. The data frame you import should
have 61 rows and 3 columns. Remember to use the here()
function to allow your Rmd to use project-relative paths.
Using the lesson on pivoting you prepared for today: pivot this dataset into long format. Print the pivoted dataframe as a reactable table.
## Rows: 61 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): household_id, enerc.kcal.w.1, enerc.kcal.w.2
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
5 Intermediate Pivoting
For this pivoting, please import the “diet_diversity_vietnam_wide_INTERMEDIATE.csv”. The data frame you import should have 61 rows and 5 columns.
Using what you learnt in the code demo about some more advanced pivoting, pivot the data to long format. Print the pivoted dataframe as a reactable table.
Hint: Remember to use the neat separator in your column names.
## Rows: 61 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (5): household_id, enerc.kcal.w_1, enerc.kcal.w_2, fat.w_1, fat.w_2
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
6 Bonus: Hard Pivoting (You can do it!)
For this pivoting, please import the “diet_diversity_vietnam_wide_HARD.csv”. The data frame you import should have 61 rows and 9 columns. This is the original data.
There is no neat separator, think about how you could make one to then pivot the dataframe into long format. Print the pivoted dataframe as a reactable table.
Hint: Think about the rename()
function of {dplyr}
## Rows: 61 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (9): household_id, enerc_kcal_w_1, enerc_kcal_w_2, dry_w_1, dry_w_2, wat...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 61 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (9): household_id, enerc_kcal_w_1, enerc_kcal_w_2, dry_w_1, dry_w_2, wat...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 61 × 9
## household_id enerc_…¹ enerc…² dry_w…³ dry_w…⁴ water…⁵ water…⁶ fat_w…⁷ fat_w…⁸
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 348 2268. 1386. 548. 281. 4219. 1997. 78.4 67.7
## 2 354 2775. 1240. 600. 284. 2376. 3145. 115. 45.3
## 3 53 3104. 2075. 646. 451. 2808. 2305. 127. 66.0
## 4 18 2802. 2146. 620. 807. 3457. 1903. 87.4 47.8
## 5 211 1298. 1191. 269. 288. 2584. 2269. 47.8 32.0
## 6 130 1634. 1425. 397. 301. 3062. 1792. 56.3 63.7
## 7 139 1914. 1222. 407. 278. 2480. 2200. 76.7 34.6
## 8 159 1623. 1410. 425. 321. 2253. 2082. 16.1 46.2
## 9 212 2817. 2058. 618. 411. 3071. 1898. 87.0 104.
## 10 147 1044. 1100. 266. 246. 1564. 1769. 10.4 31.5
## # … with 51 more rows, and abbreviated variable names ¹enerc_kcal_w__1,
## # ²enerc_kcal_w__2, ³dry_w__1, ⁴dry_w__2, ⁵water_w__1, ⁶water_w__2,
## # ⁷fat_w__1, ⁸fat_w__2
7 Submission: Upload HTML
Once you have finished the tasks above, you should knit this Rmd into an HTML and upload it on the assignment page.