Creating All_experiments

This file will take Jeff’s latest CDBN experiment data (V1.3) and do the following:

add my CDBN_ID’s and Seq_ID’s, as well as some other variety information that I have, to the Germplasm sheet
add a CDBN_ID and Seq_ID Key to Jeff’s Phenotypes sheet so that it’s easier to combine the phenotypic data with the genotype data in the future.
add the Climate_bin information to the Locations sheet
save a new xlsx file (V1.3.1) for Jeff to look over.

wb <- loadWorkbook("All_experiments_V1.3.1.xlsx")
lst = readWorksheet(wb, sheet = getSheets(wb))
metadata1 <- loadWorkbook("../../CDBN Variety Info/CDBN_Metadata_sequencing_2017-07-25.xlsx")
metadata = readWorksheet(metadata1, sheet = getSheets(metadata1))

1. Add CDBN_ID’s to `lst$Germplasm`

I added the echo = FALSE parameter to most of the code for this step, so the html file does not show several hundred regular expression substitutions that generated the CDBN_ID’s from the Genotypes. If you want to see these, refer to the .Rmd document.

Here are three punctuation fixes, as examples:

Germplasm$CDBN_ID <- gsub("-","_", Germplasm$CDBN_ID)
Germplasm$CDBN_ID <- gsub(" ","_", Germplasm$CDBN_ID)
Germplasm$CDBN_ID <- gsub("Long's_Peak","Longs_Peak", Germplasm$CDBN_ID)

1a. Test that the second round of punctuation fixes actually ties the CDBN_ID keys to the Germplasm table.

The table below is empty, which means that there are no remaining CDBN_ID’s in my metadata that are unaccounted for.

metadata %>% 
  anti_join(Germplasm, by = "CDBN_ID") %>%
  select(CDBN_ID)

## [1] CDBN_ID
## <0 rows> (or 0-length row.names)

2. Join Germplasm and Metadata

Germplasm_ahm <- Germplasm %>% 
  left_join(metadata, by = "CDBN_ID") %>%
  select(Genotype:GBS_barcode_length.2, MDP_ID, Flower_color:Stem_thickness_below_cotyledons_width.1, Hue:Width_10..cm., Gene_pool:Market.class, Seed.From.1:Seed.From.2, BEANCAP_ID:Synonym.10)

3. Make a Germplasm table to add to the Phenotypes worksheet

Germplasm_small <- Germplasm_ahm %>%
  select(Genotype, CDBN_ID, Seq_ID, Gene_pool, Race, Market.class)

3a. Join small Genotypes table to Phenotypes worksheet

Phenotypes <- lst$Phenotypes

Phenotypes_ahm <- Phenotypes %>%
  left_join(Germplasm_small, by = "Genotype") %>%
  select(Genotype, CDBN_ID, Seq_ID, Location_code, Year, Gene_pool, Race, Market.class, everything())

4. Add Climate_bin to Locations worksheet

loc_transl_wb <- loadWorkbook("Location_code_translations_w_JW_check.xlsx")
loc_transl = readWorksheet(loc_transl_wb, sheet = getSheets(loc_transl_wb))

Locations <- lst$Locations

loc_transl <- loc_transl %>%
  mutate(Location_code_key = coalesce(Location_code, Location_code_v2)) 

Locations_ahm <- Locations %>%
  left_join(loc_transl, by = c("Location_code" = "Location_code_key", "State")) %>%
  select(Location_code, Climate_bin, State, Latitude, Longitude, everything()) %>%
  mutate_if(is.numeric, funs(na_if(., -99)))

5. Assign Germplasm_ahm to `lst$Germplasm`, Phenotypes_ahm to `lst$Phenotypes`, Locations_ahm to `lst$Locations`, and save a new All_Experiments_V1.3.1.xlsx file.

NB: I commented out this code because it only needed to run once to generate the new Excel file.

# createSheet(wb, name = "Germplasm_ahm")
# createSheet(wb, name = "Phenotypes_ahm")
# createSheet(wb, name = "Locations_ahm")
# writeWorksheet(wb, Germplasm_ahm, sheet = "Germplasm_ahm")
# writeWorksheet(wb, Phenotypes_ahm, sheet = "Phenotypes_ahm")
# writeWorksheet(wb, Locations_ahm, sheet = "Locations_ahm")

# saveWorkbook(wb)

Creating All_experiments_V1.3.1

Alice MacQueen

2017-10-17

1. Add CDBN_ID’s to `lst$Germplasm`

1a. Test that the second round of punctuation fixes actually ties the CDBN_ID keys to the Germplasm table.

2. Join Germplasm and Metadata

3. Make a Germplasm table to add to the Phenotypes worksheet

3a. Join small Genotypes table to Phenotypes worksheet

4. Add Climate_bin to Locations worksheet

5. Assign Germplasm_ahm to `lst$Germplasm`, Phenotypes_ahm to `lst$Phenotypes`, Locations_ahm to `lst$Locations`, and save a new All_Experiments_V1.3.1.xlsx file.

Creating All_experiments_V1.3.1

Alice MacQueen

2017-10-17

1. Add CDBN_ID’s to lst$Germplasm

1a. Test that the second round of punctuation fixes actually ties the CDBN_ID keys to the Germplasm table.

2. Join Germplasm and Metadata

3. Make a Germplasm table to add to the Phenotypes worksheet

3a. Join small Genotypes table to Phenotypes worksheet

4. Add Climate_bin to Locations worksheet

5. Assign Germplasm_ahm to lst$Germplasm, Phenotypes_ahm to lst$Phenotypes, Locations_ahm to lst$Locations, and save a new All_Experiments_V1.3.1.xlsx file.

1. Add CDBN_ID’s to `lst$Germplasm`

5. Assign Germplasm_ahm to `lst$Germplasm`, Phenotypes_ahm to `lst$Phenotypes`, Locations_ahm to `lst$Locations`, and save a new All_Experiments_V1.3.1.xlsx file.