In this R document you wil find some extra activities for those students who finished Lab 3 and are lookikng to practice a litte more of the apply family of functoins.
Load the needed packages and datasets
library(synbreed)
## Warning: package 'synbreed' was built under R version 3.0.3
## Loading required package: doBy
## Warning: package 'doBy' was built under R version 3.0.3
## Loading required package: survival
## Loading required package: splines
## Loading required package: BLR
## Loading required package: SuppDists
## Loading required package: regress
## Warning: package 'regress' was built under R version 3.0.3
## Loading required package: abind
library(synbreedData)
data(cattle)
data(mice)
data(maize)
Look into the datasets cattle
, mice
, maize
and respond
- their dimension
- their class
- the names of their components
The following code provides the dimension of each component of mice
using the lapply
function
lapply(mice,dim)
## $covar
## [1] 2527 9
##
## $pheno
## [1] 2527 2 1
##
## $geno
## [1] 1940 12545
##
## $map
## [1] 12545 2
##
## $pedigree
## NULL
##
## $info
## NULL
Propose code to do the following: - dimension of all components of the three datasets
- class of all components of the three datasels
The following code computes the range of positions of markers in one chromosome of the mice dataset
range(mice$map$pos[mice$map$chr==1])
## [1] 0.000 118.127
Propose the code that computes the range for all chromosomes and it gives the following output
Hint: use the function by
## mice$map$chr: 1
## [1] 0.000 118.127
## --------------------------------------------------------
## mice$map$chr: 10
## [1] 0.00000 83.38172
## --------------------------------------------------------
## mice$map$chr: 11
## [1] 0.00000 97.08675
## --------------------------------------------------------
## mice$map$chr: 12
## [1] 0.00000 69.33921
## --------------------------------------------------------
## mice$map$chr: 13
## [1] 0.00000 69.57477
## --------------------------------------------------------
## mice$map$chr: 14
## [1] 0.00000 59.76976
## --------------------------------------------------------
## mice$map$chr: 15
## [1] 0.00000 65.13584
## --------------------------------------------------------
## mice$map$chr: 16
## [1] 0.00000 63.29275
## --------------------------------------------------------
## mice$map$chr: 17
## [1] 0.00000 63.09626
## --------------------------------------------------------
## mice$map$chr: 18
## [1] 0.00000 64.26144
## --------------------------------------------------------
## mice$map$chr: 19
## [1] 0.00000 54.06668
## --------------------------------------------------------
## mice$map$chr: 2
## [1] 0.0000 108.3889
## --------------------------------------------------------
## mice$map$chr: 3
## [1] 0.00000 89.71785
## --------------------------------------------------------
## mice$map$chr: 4
## [1] 0.0000 101.6805
## --------------------------------------------------------
## mice$map$chr: 5
## [1] 0.0000 106.9162
## --------------------------------------------------------
## mice$map$chr: 6
## [1] 0.00000 89.98493
## --------------------------------------------------------
## mice$map$chr: 7
## [1] 0.00000 89.67044
## --------------------------------------------------------
## mice$map$chr: 8
## [1] 0.00000 80.52541
## --------------------------------------------------------
## mice$map$chr: 9
## [1] 0.00000 86.03299
## --------------------------------------------------------
## mice$map$chr: X
## [1] 0.05278331 61.15009580
The following function computes the meam and variance of phenotypes, but they fail to produce a numeric results due to presence of NA
apply(mice$pheno[,,1],2,mean)
## weight growth.slope
## NA NA
apply(mice$pheno[,,1],2,var)
## weight growth.slope
## NA NA
Modify the code so that it will give the mean and variance:
## weight growth.slope
## 20.30334528 0.08659105
## weight growth.slope
## 11.616753745 0.003446954
if you still have time left, compute the absolute position for each map (of the three datasets) using the function you wrote. Here is my example
absolute_map<-function(map){
if (!(is.data.frame(map))) stop ("map should be a data.frame")
if (!("chr"%in%colnames(map))) stop ("no chromosome column")
if (!("pos"%in%colnames(map))) stop ("no position column")
if (!("SNP"%in%colnames(map))) stop ("no SNP ID column")
if (!is.numeric(map$pos)) stop ("3rd column of map should be numeric")
lmap<-nrow(map) #map length
abpos<-rep(0,lmap) #absolute position
chrom_start=0 #for first chromosome, the absolute chromosome start position is 0
abpos[1]<-map$pos[1] #first absolute position = position of first SNP
for (i in 2:lmap){ #loop
previous_chromosome<-map$chr[i-1]
current_chromosome<-map$chr[i]
if (current_chromosome==previous_chromosome){ #compare current and previous chromosome names
if (map$pos[i]<map$pos[i-1]) stop ("positions within chromosome should be in increasing order")
abpos[i]<-chrom_start+map$pos[i] #if still within same chromosome just add position
} else { #if in a different chromosome...
chrom_start<-abpos[i-1]+0.1 #make last position the new chromosome start
#we also add a small quantity in case the current position is 0.0
abpos[i]<-chrom_start+map$pos[i] #proceed
}
} #close loop
names(abpos)<-map$SNP#add names for keeping track of all SNP
return(abpos)
}
Now a look into the map of maize
head(maize$map)
## chr pos
## M1 1 1.02
## M2 1 4.13
## M3 1 4.73
## M4 1 10.37
## M5 1 11.70
## M6 1 13.97
We are missing the marker ID column (called SNP
), so I need to add it:
absolutepos<-absolute_map(data.frame(SNP=rownames(maize$map),maize$map))
plot(absolutepos,type="l",col="blue")