In this R document you wil find some extra activities for those students who finished Lab 3 and are lookikng to practice a litte more of the apply family of functoins.

using sapply to explore lists

Load the needed packages and datasets

library(synbreed)

## Warning: package 'synbreed' was built under R version 3.0.3

## Loading required package: doBy

## Warning: package 'doBy' was built under R version 3.0.3

## Loading required package: survival
## Loading required package: splines
## Loading required package: BLR
## Loading required package: SuppDists
## Loading required package: regress

## Warning: package 'regress' was built under R version 3.0.3

## Loading required package: abind

library(synbreedData)
data(cattle)
data(mice)
data(maize)

Look into the datasets cattle, mice, maize and respond
- their dimension
- their class
- the names of their components
The following code provides the dimension of each component of mice using the lapply function

lapply(mice,dim)

## $covar
## [1] 2527    9
## 
## $pheno
## [1] 2527    2    1
## 
## $geno
## [1]  1940 12545
## 
## $map
## [1] 12545     2
## 
## $pedigree
## NULL
## 
## $info
## NULL

Propose code to do the following: - dimension of all components of the three datasets
- class of all components of the three datasels
The following code computes the range of positions of markers in one chromosome of the mice dataset

range(mice$map$pos[mice$map$chr==1])

## [1]   0.000 118.127

Propose the code that computes the range for all chromosomes and it gives the following output
Hint: use the function by

## mice$map$chr: 1
## [1]   0.000 118.127
## -------------------------------------------------------- 
## mice$map$chr: 10
## [1]  0.00000 83.38172
## -------------------------------------------------------- 
## mice$map$chr: 11
## [1]  0.00000 97.08675
## -------------------------------------------------------- 
## mice$map$chr: 12
## [1]  0.00000 69.33921
## -------------------------------------------------------- 
## mice$map$chr: 13
## [1]  0.00000 69.57477
## -------------------------------------------------------- 
## mice$map$chr: 14
## [1]  0.00000 59.76976
## -------------------------------------------------------- 
## mice$map$chr: 15
## [1]  0.00000 65.13584
## -------------------------------------------------------- 
## mice$map$chr: 16
## [1]  0.00000 63.29275
## -------------------------------------------------------- 
## mice$map$chr: 17
## [1]  0.00000 63.09626
## -------------------------------------------------------- 
## mice$map$chr: 18
## [1]  0.00000 64.26144
## -------------------------------------------------------- 
## mice$map$chr: 19
## [1]  0.00000 54.06668
## -------------------------------------------------------- 
## mice$map$chr: 2
## [1]   0.0000 108.3889
## -------------------------------------------------------- 
## mice$map$chr: 3
## [1]  0.00000 89.71785
## -------------------------------------------------------- 
## mice$map$chr: 4
## [1]   0.0000 101.6805
## -------------------------------------------------------- 
## mice$map$chr: 5
## [1]   0.0000 106.9162
## -------------------------------------------------------- 
## mice$map$chr: 6
## [1]  0.00000 89.98493
## -------------------------------------------------------- 
## mice$map$chr: 7
## [1]  0.00000 89.67044
## -------------------------------------------------------- 
## mice$map$chr: 8
## [1]  0.00000 80.52541
## -------------------------------------------------------- 
## mice$map$chr: 9
## [1]  0.00000 86.03299
## -------------------------------------------------------- 
## mice$map$chr: X
## [1]  0.05278331 61.15009580

The following function computes the meam and variance of phenotypes, but they fail to produce a numeric results due to presence of NA

apply(mice$pheno[,,1],2,mean)

##       weight growth.slope 
##           NA           NA

apply(mice$pheno[,,1],2,var)

##       weight growth.slope 
##           NA           NA

Modify the code so that it will give the mean and variance:

##       weight growth.slope 
##  20.30334528   0.08659105

##       weight growth.slope 
## 11.616753745  0.003446954

if you still have time left, compute the absolute position for each map (of the three datasets) using the function you wrote. Here is my example

absolute_map<-function(map){
  if (!(is.data.frame(map))) stop ("map should be a data.frame")
  if (!("chr"%in%colnames(map))) stop ("no chromosome column")
  if (!("pos"%in%colnames(map))) stop ("no position column")
  if (!("SNP"%in%colnames(map))) stop ("no SNP ID column")
  
  if (!is.numeric(map$pos)) stop ("3rd column of map should be numeric")

  lmap<-nrow(map) #map length
  abpos<-rep(0,lmap) #absolute position
  chrom_start=0 #for first chromosome, the absolute chromosome start position is 0
  abpos[1]<-map$pos[1] #first absolute position = position of first SNP
  
  for (i in 2:lmap){ #loop
    previous_chromosome<-map$chr[i-1] 
    current_chromosome<-map$chr[i]
    if (current_chromosome==previous_chromosome){ #compare current and previous chromosome names
      if (map$pos[i]<map$pos[i-1]) stop ("positions within chromosome should be in increasing order")
      abpos[i]<-chrom_start+map$pos[i] #if still within same chromosome just add position
    } else {                          #if in a different chromosome...
      chrom_start<-abpos[i-1]+0.1     #make last position the new chromosome start
      #we also add a small quantity in case the current position is 0.0
      abpos[i]<-chrom_start+map$pos[i] #proceed
    }
    
  } #close loop

  names(abpos)<-map$SNP#add names for keeping track of all SNP
  return(abpos) 
}

Now a look into the map of maize

head(maize$map)

##    chr   pos
## M1   1  1.02
## M2   1  4.13
## M3   1  4.73
## M4   1 10.37
## M5   1 11.70
## M6   1 13.97

We are missing the marker ID column (called SNP), so I need to add it:

absolutepos<-absolute_map(data.frame(SNP=rownames(maize$map),maize$map))
plot(absolutepos,type="l",col="blue")

Extra work Lab 3

using sapply to explore lists