Worksheet 18: reading Excel files and some ant data. Reading Excel files can be done in two ways in R.

Method 1: Save your Excel spreadsheet as a .csv file and use read.csv() function. Method 2: use the read.xls2() function from the xlsx library.

Task 1:c Download the file CompleteGeneticData.xlsx from iLearn. Save the excel file CompleteGeneticData.xlsx as a .csv file Create a new .RMD file. Save it with a reasonable name in the same folder as CompleteGeneticData.xlsx

Read it into R, using the read.csv() function and save the resulting object with a reasonable name.

Task 2: Install the xlsx package and read the CompleteGeneticData.xlsx into R, save the resulting object with a reasonable name.

setwd("~/Desktop/Fall 2015/Bioinformatics")
#install.packages("xlsx")
library(xlsx)

## Loading required package: rJava
## Loading required package: xlsxjars

csvcomgen <- read.csv("CompleteGeneticData1.csv")

Task 3: Use summary() and head() to look at the objects you created. Also click on the object in the upper right panel of Rstudio to look at the data.frame in the upper left panel.

head(csvcomgen)

##   year colony Individual free_slave Habitat Transect transect.x transect.y
## 1 2012     A2          1          0      MI     MI.1      13520        570
## 2 2012     A2          2          0      MI     MI.1      13520        570
## 3 2012     A2          3          0      MI     MI.1      13520        570
## 4 2012     A2          4          0      MI     MI.1      13520        570
## 5 2012     A2          5          0      MI     MI.1      13520        570
## 6 2012    B10          1          0      MI     MI.1       5952        316
##   GT218_1 GT218_2 T223_1 T223_2 L18_1 L18_2 GT1_1 GT1_2 L5_1 L5_2 Myrt3_1
## 1     114     116    102    116   135   149   208   208  124  126     191
## 2     114     114    102    116   135   149   208   208  124  126     193
## 3     114     116    102    116   135   149   208   208  124  126     197
## 4     112     112    102    116   135   151   208   208  124  126     197
## 5     112     112    102    116   135   149   208   208  124  126     197
## 6     112     112    116    116   145   147   198   214  126  126     193
##   Myrt3_2  UniqueInd UniqueColony Plot
## 1     197  MI.1.A2.1        MI.A2 MI.A
## 2     201  MI.1.A2.2        MI.A2 MI.A
## 3     197  MI.1.A2.3        MI.A2 MI.A
## 4     197  MI.1.A2.4        MI.A2 MI.A
## 5     197  MI.1.A2.5        MI.A2 MI.A
## 6     197 MI.1.B10.1       MI.B10 MI.B

summary(csvcomgen)

##       year          colony       Individual      free_slave     Habitat  
##  Min.   :2009   P1     :  45   Min.   : 1.00   Min.   :0.0000   MI:1202  
##  1st Qu.:2010   O1     :  40   1st Qu.: 2.00   1st Qu.:0.0000   NY:1075  
##  Median :2010   M1     :  38   Median : 3.00   Median :0.0000   WV: 789  
##  Mean   :2011   P4     :  33   Mean   : 4.46   Mean   :0.2228            
##  3rd Qu.:2012   G1     :  25   3rd Qu.: 5.00   3rd Qu.:0.0000            
##  Max.   :2012   H1     :  25   Max.   :25.00   Max.   :1.0000            
##                 (Other):2860   NA's   :1                                 
##  Transect     transect.x      transect.y       GT218_1       GT218_2     
##  MI.1:389   Min.   :    1   Min.   :    0   Min.   : 96   Min.   : 96.0  
##  MI.2:813   1st Qu.:  780   1st Qu.: 1036   1st Qu.:112   1st Qu.:112.0  
##  NY.2:777   Median : 4914   Median : 3351   Median :112   Median :112.0  
##  NY.3:298   Mean   : 5532   Mean   : 4626   Mean   :111   Mean   :112.7  
##  WV.1:285   3rd Qu.: 8924   3rd Qu.: 7558   3rd Qu.:112   3rd Qu.:112.0  
##  WV.3: 96   Max.   :15880   Max.   :14797   Max.   :152   Max.   :132.0  
##  WV.4:408                                   NA's   :273   NA's   :275    
##      T223_1          T223_2          L18_1           L18_2      
##  Min.   : 72.0   Min.   : 76.0   Min.   :109.0   Min.   :113.0  
##  1st Qu.:116.0   1st Qu.:116.0   1st Qu.:137.0   1st Qu.:145.0  
##  Median :116.0   Median :116.0   Median :143.0   Median :147.0  
##  Mean   :115.7   Mean   :116.6   Mean   :141.7   Mean   :146.8  
##  3rd Qu.:116.0   3rd Qu.:116.0   3rd Qu.:145.0   3rd Qu.:149.0  
##  Max.   :258.0   Max.   :148.0   Max.   :157.0   Max.   :159.0  
##  NA's   :140     NA's   :137     NA's   :206     NA's   :207    
##      GT1_1           GT1_2            L5_1            L5_2      
##  Min.   :160.0   Min.   :112.0   Min.   : 98.0   Min.   : 98.0  
##  1st Qu.:198.0   1st Qu.:208.0   1st Qu.:124.0   1st Qu.:134.0  
##  Median :202.0   Median :214.0   Median :126.0   Median :142.0  
##  Mean   :204.3   Mean   :215.1   Mean   :129.4   Mean   :140.3  
##  3rd Qu.:210.0   3rd Qu.:220.0   3rd Qu.:134.0   3rd Qu.:146.0  
##  Max.   :260.0   Max.   :294.0   Max.   :160.0   Max.   :190.0  
##  NA's   :127     NA's   :127     NA's   :83      NA's   :83     
##     Myrt3_1       Myrt3_2           UniqueInd     UniqueColony 
##  Min.   :143   Min.   :173.0   MI.1.F10.1:   2   WV.P1  :  25  
##  1st Qu.:187   1st Qu.:191.0   MI.1.F11.5:   2   NY.G1  :  20  
##  Median :189   Median :193.0   MI.2.I27.3:   2   NY.G3  :  20  
##  Mean   :189   Mean   :193.3   MI.2.I37.1:   2   NY.H1  :  20  
##  3rd Qu.:191   3rd Qu.:195.0   MI.2.J13.3:   2   NY.J1  :  20  
##  Max.   :209   Max.   :235.0   MI.1.A2.1 :   1   NY.K1  :  20  
##  NA's   :139   NA's   :139     (Other)   :3055   (Other):2941  
##       Plot     
##  MI.I   : 245  
##  NY.K   : 219  
##  MI.L   : 180  
##  NY.L   : 180  
##  MI.H   : 129  
##  MI.D   : 113  
##  (Other):2000

Task 4 The columns with strange names (e.g., GT218) hold genotypes from microsattelite loci. Look up microsattelite using google, and write down what it is.

Microsattelites are tadem repeats of 2-5 bp

Task 5 Make a subset of the data for only the ants from plot “WV.R” (WV is West Viriginia).

Make a scatter plot to show the locations of the ant nests in plot WV.R using the transect.x and transect.y columns.

csvcomgen[csvcomgen$Plot=='WV.R', ]->WVR
plot(WVR$transect.x, WVR$transect.y, main="Scatterplot Transect X vs.Y", xlab="Transect.x", ylab="Transect.y", col= "blue")

Task 6 Make a histogram of the microsattelite length for each of the microsatellite loci in the entire dataset.

Task 7 How many of the sampled ants are slaves? [3066]

for (i in 9:20){hist(WVR[,i],col=i, main="Histogram of WVR microsatellite #lengths", xlab="microsatellite length" )}

length(csvcomgen$free_slave)

## [1] 3066

Task 8 (bonus question) How many of the ant nests contain slaves? (for the last question, you can select the two columns that have the colony name and whether the ants are free and use the unique() function on those two columns) E.g., unique(DATA[,c(5,7)])

#install.packages("plyr")
library(plyr)
count(csvcomgen$free_slave)

##   x freq
## 1 0 2383
## 2 1  683

Task 9: Knit your RMD file and publish the html file online, then upload the link to iLearn.