Input and Merge

Tutorial for yeast transcriptomics dataset

# I made my life easier by putting these 3 files in a subdirectory called labels so I 
# didn't have to type the filenames.  You can do this assignment with a for loop or 
# writing 3 separate code chunks for the 3 files. 

#Read in files names from folder, helpful for 3 files and essential for 300
all_label_files <- list.files("labels", pattern = "*.csv",
                             full.names = TRUE)

#Read in one row at a time to account for multiple entries/different # columns
datalist <- lapply(all_label_files, 
                   function(x){
                     read.table(file=x,sep="\n")
                     })

#Use fixed string split to break every row into 3 columns using first 2 commas
datalist <- lapply(datalist, 
                   function(x){
                     as.data.frame(str_split_fixed(x$V1, ",", 3))
                     })

#Remove rows that aren't gene names
datalist <- lapply(datalist, 
                   function(x){
                     x[!grepl("_",x$V1),]
                     x[!grepl("gene",x$V1),]
                     })
#Remove extra columns
#All 3 files have validation data so I deleted that from the second and third DF
datalist[[2]] <- datalist[[2]][,-2]
datalist[[3]] <-datalist[[3]][,-2]

#The default merge command works for individual dataframes
#but reduce from the purrr package works for dfs in a list
newDF <- datalist %>% reduce(left_join, by = "V1")
names(newDF) <- c("gene","validation","BP","CC","MF")
write.csv(newDF,"mergedLabels.csv",row.names = FALSE)

Input and Merge

Jess Kaufman

2024-09-05

Tutorial for yeast transcriptomics dataset