Angel Angelov | aangeloo@gmail.com | 2017

This script can be used to perform analysis of quantitative data from GC-MS runs from the Shimadzu instrument in R (including figure generation and statistics). The starting file is an “ASCIIData.txt” file, generated by the Shimadzu software, here is an example file you can use. It takes the peak areas for compounds generated by the Shimadzu software (GCMC Postrun) from the ASCIIData file and calculates the amounts of compounds based on an internal standard, in this case this is the C30 alkane peak area. Presumably several GCMS runs were done for each sample.

Load required libraries:

library(ggplot2)
library(dplyr)
library(reshape2)

Read in the text file. This file should be processed (with Excel may be) so that you have the samples (observations) in rows and the values (area of each olefin) in columns, tab separated. The command below opens a window so you can select your text file:

df <- read.csv(file.choose(), header = TRUE, sep = "\t", dec = ",")

A more programmatic approach to directly read the ASCIIData.txt file generated by the Shimadzu software is below. First off, it is a horrible format to clean and feed into R. Anyway, here is a dirty solution. It gets the number of runs in the file (using grep). The 8 rows skipped at the beginning and afterwards and the nrows = have to be changed for each ASCIIData file, depending on the particular GCMS run. Implemented in a responsive context (the ASCIIData file has to be in your working directory):

  # n <- as.integer(system("grep -c Header ASCIIData.txt", intern = TRUE)) #count samples (runs) in file
  n <- as.integer(grep("Header", readLines("ASCIIData.txt")) %>% length()) # without the system call above
  
  ole <- readline(prompt="Enter number of olefins detected for these runs(including C30): ")
11
  ole <- as.integer(ole)
  
  df <- data.frame() # make an empty data.frame
  skip <- seq(8, length.out = n, by = 20) # tell on which lines is the data, define what to skip in the next step
  for (i in skip) {
    df <- rbind(df,read.table("ASCIIData.txt", header = F, skip = i, sep="\t", nrows = ole))
    }
  df <- df %>% select(olefin = V2, area=V10) # take columns 2 and 10 (peak area)
  

Get the names of the samples in the ASCIIData file

names <- system("grep \"Data File Name\" ASCIIData.txt | cut -f 11 -d \"-\"", intern = TRUE)
tt <- list() # make an empty list
for (i in names) {tt <- cbind(tt,print(rep(i,ole)))} # repeat each sample name `ole` number of times
df$names <- paste(tt) # add sample names to df
#df$names <- substr(df$names,1,3) # in this case, take only sample name, removing replicate info
#df$names <- as.factor(df$names)
rm(tt)
df <- dcast(df,names~olefin, value.var = "area")

Now the ASCIIData.txt file is finally read and in the form I need it. Calculations and plotting:

C30conc <- readline(prompt="Enter the concentration of internal standard (µg/ml): ")
10
C30conc <- as.integer(C30conc)
df.norm <- df[,2:ncol(df)]/df$C30*C30conc # normalize peak areas to C30 peak
df.norm[,ncol(df.norm)] <- NULL # delete standard (last column)
df.norm$total <- rowSums(df.norm) # get total olefins per sample
df.norm$names <- df$names
df.norm$samples <- substr(df.norm$names, 6,8) # in this case the sample names are located in positions 6 to 8 in the names string
# first melt data, then get statistics and plot
df.norm.melt <- df.norm %>% select(-names) %>% melt(id.vars = "samples") # get rid of the "names" column here
df.norm.melt %>% group_by(samples, variable) %>% summarise(mean=mean(value),SD= sd(value),N= n()) %>% write.csv("out-quant-olefins.csv") # mean for each olefin and for total, grouped by sample
ggplot(df.norm.melt, aes(x= variable, y= value)) + geom_boxplot(aes(color=samples)) # and so on...

LS0tCnRpdGxlOiAiQW5hbHlzaXMgb2Ygb2xlZmlucyBUSUNzIGZyb20gR0MtTVMgd2l0aCBSIgpvdXRwdXQ6CiAgaHRtbF9ub3RlYm9vazogZGVmYXVsdAogIGh0bWxfZG9jdW1lbnQ6IGRlZmF1bHQKICBwZGZfZG9jdW1lbnQ6IGRlZmF1bHQKLS0tCgpBbmdlbCBBbmdlbG92IHwgYWFuZ2Vsb29AZ21haWwuY29tIHwgMjAxNwoKVGhpcyBzY3JpcHQgY2FuIGJlIHVzZWQgdG8gcGVyZm9ybSBhbmFseXNpcyBvZiBxdWFudGl0YXRpdmUgZGF0YSBmcm9tIEdDLU1TIHJ1bnMgZnJvbSB0aGUgU2hpbWFkenUgaW5zdHJ1bWVudCBpbiBSIChpbmNsdWRpbmcgZmlndXJlIGdlbmVyYXRpb24gYW5kIHN0YXRpc3RpY3MpLiBUaGUgc3RhcnRpbmcgZmlsZSBpcyBhbiAiQVNDSUlEYXRhLnR4dCIgZmlsZSwgZ2VuZXJhdGVkIGJ5IHRoZSBTaGltYWR6dSBzb2Z0d2FyZSwgW2hlcmUgaXMgYW4gZXhhbXBsZSBmaWxlXShodHRwczovL3d3dy5kcm9wYm94LmNvbS9zL2hxdG45cmU0OXVycnpsdy9BU0NJSURhdGEudHh0P2RsPTApIHlvdSBjYW4gdXNlLgpJdCB0YWtlcyB0aGUgcGVhayBhcmVhcyBmb3IgY29tcG91bmRzIGdlbmVyYXRlZCBieSB0aGUgU2hpbWFkenUgc29mdHdhcmUgKEdDTUMgUG9zdHJ1bikgZnJvbSB0aGUgQVNDSUlEYXRhIGZpbGUgYW5kIGNhbGN1bGF0ZXMgdGhlIGFtb3VudHMgb2YgY29tcG91bmRzIGJhc2VkIG9uIGFuIGludGVybmFsIHN0YW5kYXJkLCBpbiB0aGlzIGNhc2UgdGhpcyBpcyB0aGUgQzMwIGFsa2FuZSBwZWFrIGFyZWEuIFByZXN1bWFibHkgc2V2ZXJhbCBHQ01TIHJ1bnMgd2VyZSBkb25lIGZvciBlYWNoIHNhbXBsZS4gIAoKTG9hZCByZXF1aXJlZCBsaWJyYXJpZXM6CmBgYHtyIGxvYWQgbGlicmFyaWVzfQpsaWJyYXJ5KGdncGxvdDIpCmxpYnJhcnkoZHBseXIpCmxpYnJhcnkocmVzaGFwZTIpCmBgYAoKClJlYWQgaW4gdGhlIHRleHQgZmlsZS4gVGhpcyBmaWxlIHNob3VsZCBiZSBwcm9jZXNzZWQgKHdpdGggRXhjZWwgbWF5IGJlKSBzbyB0aGF0IHlvdSBoYXZlIHRoZSBzYW1wbGVzIChvYnNlcnZhdGlvbnMpIGluIHJvd3MgYW5kIHRoZSB2YWx1ZXMgKGFyZWEgb2YgZWFjaCBvbGVmaW4pIGluIGNvbHVtbnMsIHRhYiBzZXBhcmF0ZWQuIFRoZSBjb21tYW5kIGJlbG93IG9wZW5zIGEgd2luZG93IHNvIHlvdSBjYW4gc2VsZWN0IHlvdXIgdGV4dCBmaWxlOiAKYGBge3IsIGVycm9yPVRSVUV9CmRmIDwtIHJlYWQuY3N2KGZpbGUuY2hvb3NlKCksIGhlYWRlciA9IFRSVUUsIHNlcCA9ICJcdCIsIGRlYyA9ICIsIikKCmBgYAoKCkEgbW9yZSBwcm9ncmFtbWF0aWMgYXBwcm9hY2ggdG8gZGlyZWN0bHkgcmVhZCB0aGUgQVNDSUlEYXRhLnR4dCBmaWxlIGdlbmVyYXRlZCBieSB0aGUgU2hpbWFkenUgc29mdHdhcmUgaXMgYmVsb3cuIEZpcnN0IG9mZiwgaXQgaXMgYSBob3JyaWJsZSBmb3JtYXQgdG8gY2xlYW4gYW5kIGZlZWQgaW50byBSLiBBbnl3YXksIGhlcmUgaXMgYSBkaXJ0eSBzb2x1dGlvbi4KSXQgZ2V0cyB0aGUgbnVtYmVyIG9mIHJ1bnMgaW4gdGhlIGZpbGUgKHVzaW5nIGBncmVwYCkuIFRoZSA4IHJvd3Mgc2tpcHBlZCBhdCB0aGUgYmVnaW5uaW5nIGFuZCBhZnRlcndhcmRzIGFuZCB0aGUgYG5yb3dzID1gIGhhdmUgdG8gYmUgY2hhbmdlZCBmb3IgZWFjaCBBU0NJSURhdGEgZmlsZSwgZGVwZW5kaW5nIG9uIHRoZSBwYXJ0aWN1bGFyIEdDTVMgcnVuLgpJbXBsZW1lbnRlZCBpbiBhIHJlc3BvbnNpdmUgY29udGV4dCAodGhlIEFTQ0lJRGF0YSBmaWxlIGhhcyB0byBiZSBpbiB5b3VyIHdvcmtpbmcgZGlyZWN0b3J5KToKYGBge3IsIHdhcm5pbmc9RkFMU0V9CiAgCiAgbiA8LSBhcy5pbnRlZ2VyKGdyZXAoIkhlYWRlciIsIHJlYWRMaW5lcygiQVNDSUlEYXRhLnR4dCIpKSAlPiUgbGVuZ3RoKCkpICMgY291bnQgbnVtYmVyIG9mIHNhbXBsZXMKICAKICBvbGUgPC0gcmVhZGxpbmUocHJvbXB0PSJFbnRlciBudW1iZXIgb2Ygb2xlZmlucyBkZXRlY3RlZCBmb3IgdGhlc2UgcnVucyhpbmNsdWRpbmcgQzMwKTogIikKICBvbGUgPC0gYXMuaW50ZWdlcihvbGUpCiAgCiAgZGYgPC0gZGF0YS5mcmFtZSgpICMgbWFrZSBhbiBlbXB0eSBkYXRhLmZyYW1lCiAgc2tpcCA8LSBzZXEoOCwgbGVuZ3RoLm91dCA9IG4sIGJ5ID0gMjApICMgdGVsbCBvbiB3aGljaCBsaW5lcyBpcyB0aGUgZGF0YSwgZGVmaW5lIHdoYXQgdG8gc2tpcCBpbiB0aGUgbmV4dCBzdGVwCgogIGZvciAoaSBpbiBza2lwKSB7CiAgICBkZiA8LSByYmluZChkZixyZWFkLnRhYmxlKCJBU0NJSURhdGEudHh0IiwgaGVhZGVyID0gRiwgc2tpcCA9IGksIHNlcD0iXHQiLCBucm93cyA9IG9sZSkpCiAgICB9CiAgZGYgPC0gZGYgJT4lIHNlbGVjdChvbGVmaW4gPSBWMiwgYXJlYT1WMTApICMgdGFrZSBjb2x1bW5zIDIgYW5kIDEwIChwZWFrIGFyZWEpCiAgCmBgYAoKR2V0IHRoZSBuYW1lcyBvZiB0aGUgc2FtcGxlcyBpbiB0aGUgQVNDSUlEYXRhIGZpbGUKYGBge3J9Cm5hbWVzIDwtIHN5c3RlbSgiZ3JlcCBcIkRhdGEgRmlsZSBOYW1lXCIgQVNDSUlEYXRhLnR4dCB8IGN1dCAtZiAxMSAtZCBcIi1cIiIsIGludGVybiA9IFRSVUUpCnR0IDwtIGxpc3QoKSAjIG1ha2UgYW4gZW1wdHkgbGlzdApmb3IgKGkgaW4gbmFtZXMpIHt0dCA8LSBjYmluZCh0dCxwcmludChyZXAoaSxvbGUpKSl9ICMgcmVwZWF0IGVhY2ggc2FtcGxlIG5hbWUgYG9sZWAgbnVtYmVyIG9mIHRpbWVzCmRmJG5hbWVzIDwtIHBhc3RlKHR0KSAjIGFkZCBzYW1wbGUgbmFtZXMgdG8gZGYKI2RmJG5hbWVzIDwtIHN1YnN0cihkZiRuYW1lcywxLDMpICMgaW4gdGhpcyBjYXNlLCB0YWtlIG9ubHkgc2FtcGxlIG5hbWUsIHJlbW92aW5nIHJlcGxpY2F0ZSBpbmZvCiNkZiRuYW1lcyA8LSBhcy5mYWN0b3IoZGYkbmFtZXMpCnJtKHR0KQpkZiA8LSBkY2FzdChkZixuYW1lc35vbGVmaW4sIHZhbHVlLnZhciA9ICJhcmVhIikKYGBgCk5vdyB0aGUgQVNDSUlEYXRhLnR4dCBmaWxlIGlzIGZpbmFsbHkgcmVhZCBhbmQgaW4gdGhlIGZvcm0gSSBuZWVkIGl0LgpDYWxjdWxhdGlvbnMgYW5kIHBsb3R0aW5nOgpgYGB7cn0KQzMwY29uYyA8LSByZWFkbGluZShwcm9tcHQ9IkVudGVyIHRoZSBjb25jZW50cmF0aW9uIG9mIGludGVybmFsIHN0YW5kYXJkICjCtWcvbWwpOiAiKQpDMzBjb25jIDwtIGFzLmludGVnZXIoQzMwY29uYykKZGYubm9ybSA8LSBkZlssMjpuY29sKGRmKV0vZGYkQzMwKkMzMGNvbmMgIyBub3JtYWxpemUgcGVhayBhcmVhcyB0byBDMzAgcGVhawpkZi5ub3JtWyxuY29sKGRmLm5vcm0pXSA8LSBOVUxMICMgZGVsZXRlIHN0YW5kYXJkIChsYXN0IGNvbHVtbikKZGYubm9ybSR0b3RhbCA8LSByb3dTdW1zKGRmLm5vcm0pICMgZ2V0IHRvdGFsIG9sZWZpbnMgcGVyIHNhbXBsZQpkZi5ub3JtJG5hbWVzIDwtIGRmJG5hbWVzCmRmLm5vcm0kc2FtcGxlcyA8LSBzdWJzdHIoZGYubm9ybSRuYW1lcywgNiw4KSAjIGluIHRoaXMgY2FzZSB0aGUgc2FtcGxlIG5hbWVzIGFyZSBsb2NhdGVkIGluIHBvc2l0aW9ucyA2IHRvIDggaW4gdGhlIG5hbWVzIHN0cmluZwoKIyBmaXJzdCBtZWx0IGRhdGEsIHRoZW4gZ2V0IHN0YXRpc3RpY3MgYW5kIHBsb3QKZGYubm9ybS5tZWx0IDwtIGRmLm5vcm0gJT4lIHNlbGVjdCgtbmFtZXMpICU+JSBtZWx0KGlkLnZhcnMgPSAic2FtcGxlcyIpICMgZ2V0IHJpZCBvZiB0aGUgIm5hbWVzIiBjb2x1bW4gaGVyZQpkZi5ub3JtLm1lbHQgJT4lIGdyb3VwX2J5KHNhbXBsZXMsIHZhcmlhYmxlKSAlPiUgc3VtbWFyaXNlKG1lYW49bWVhbih2YWx1ZSksU0Q9IHNkKHZhbHVlKSxOPSBuKCkpICU+JSB3cml0ZS5jc3YoIm91dC1xdWFudC1vbGVmaW5zLmNzdiIpICMgbWVhbiBmb3IgZWFjaCBvbGVmaW4gYW5kIGZvciB0b3RhbCwgZ3JvdXBlZCBieSBzYW1wbGUKZ2dwbG90KGRmLm5vcm0ubWVsdCwgYWVzKHg9IHZhcmlhYmxlLCB5PSB2YWx1ZSkpICsgZ2VvbV9ib3hwbG90KGFlcyhjb2xvcj1zYW1wbGVzKSkgIyBhbmQgc28gb24uLi4KCmBgYAoK