May 20, 2020

HI Sahar and Riley, here is an example of a readme file I am trying to get in the habit of making for each R project. It is good to get into the habit of documenting what a project contains. It is important that you leave yourself breadcrumbs so that when you come back to your project you can quickly work out which scripts run on which data, as well as which scripts are just “working” and which are the final product.

This readme also serves as a how-to for you re the map_cleaning process. Have a look at how it works and we can chat about it tomorrow.

Jenny

Get started by….

Open the jenny_hons_code.Rproj file, then find the file map_cleaning_in_data_folder.Rmd (hint- its in the raw_data folder) and run all.

To work out what the .Rmd is doing, you probably want to also open the cleanwrite_emg_functions.R file to dig into what the functions do.

Important things to note

1. cleanwrite_emg_functions.R

Code for two functions that pull Sahar/Julia’s triple functions into a single function; clean_emg() gets rid of extra columns, renames variables, adds a bin column and makes the data long; clean_write_emg() does all of the above and WRITES a csv to the clean_data folder.

2. map_cleaning_in_folder.Rmd

This file lives in the raw_data folder. Calls functions from cleanwrite_emg_functions.R and uses map() to read in, clean and write to csv (in clean_data folder) all of the xlsx files here. For some reason I can’t make it work if it doesn’t live in the same folder as the data. need to work this out

What is in this project?

Folders

data

raw_data
  • the data in the raw_folder are 4 .xlsx examples from Sahar’s data
  • within the raw_data folder there is a map cleaning in folder.Rmd file; this file calls functions from cleanwrite_emg_functions.R and uses map() reads in, clean and write to csv (in clean_data folder) all of the xlsx files here.
clean_data
  • contains exported .csv output from cleanwrite_emg() function

working it out scripts

  1. map_cleaning_in_main_folder.Rmd Same code as file in raw_data folder. Trying to work out why it wont work if not in the data folder. Something to do with here::here and file paths, I think.

  2. read in filename.R Code that reads in ALL the .xlsx files in a folder to a SINGLE df using map_df

  3. read range excel.R Code that reads in a excel file but only a range of cells (i.e A2:AE13)

4.sahar_functions_copy.R Script version of functions that Sahar and Julia wrote. - cleaning_data - bin_cols - long_data note: cleaning code reads in whole .xlsx file and uses rownumbers to get rid of extra rows.

  1. single step emg cleaning Jenny’s code that turns Sahar and Julia’s 3 step functions into a single pipe. Relies on using the range argument in read_excel to only read in rows needed, rather than reading in everything and then deleting

  2. single step emg cleaning_data from list.R Jenny’s code that adapts the single pipe cleaning for input from the map list, rather than a single read_excel. Removes mutate filename, becasue that is already in the file from the map function.

FOLDER readxl_workflow
  • notes from readxl documentation re iterating across sheets/workbooks; trying to work out why map_cleaning doesn’t work if not in the raw-data folder