Worksheet 18: reading Excel files and some ant data. Reading Excel files can be done in two ways in R.
Method 1: Save your Excel spreadsheet as a .csv file and use read.csv() function. Method 2: use the read.xls2() function from the xlsx library.
Task 1:c Download the file CompleteGeneticData.xlsx from iLearn. Save the excel file CompleteGeneticData.xlsx as a .csv file Create a new .RMD file. Save it with a reasonable name in the same folder as CompleteGeneticData.xlsx
Read it into R, using the read.csv() function and save the resulting object with a reasonable name.
Task 2: Install the xlsx package and read the CompleteGeneticData.xlsx into R, save the resulting object with a reasonable name.
setwd("~/Desktop/Fall 2015/Bioinformatics")
#install.packages("xlsx")
library(xlsx)
## Loading required package: rJava
## Loading required package: xlsxjars
csvcomgen <- read.csv("CompleteGeneticData1.csv")
Task 3: Use summary() and head() to look at the objects you created. Also click on the object in the upper right panel of Rstudio to look at the data.frame in the upper left panel.
head(csvcomgen)
## year colony Individual free_slave Habitat Transect transect.x transect.y
## 1 2012 A2 1 0 MI MI.1 13520 570
## 2 2012 A2 2 0 MI MI.1 13520 570
## 3 2012 A2 3 0 MI MI.1 13520 570
## 4 2012 A2 4 0 MI MI.1 13520 570
## 5 2012 A2 5 0 MI MI.1 13520 570
## 6 2012 B10 1 0 MI MI.1 5952 316
## GT218_1 GT218_2 T223_1 T223_2 L18_1 L18_2 GT1_1 GT1_2 L5_1 L5_2 Myrt3_1
## 1 114 116 102 116 135 149 208 208 124 126 191
## 2 114 114 102 116 135 149 208 208 124 126 193
## 3 114 116 102 116 135 149 208 208 124 126 197
## 4 112 112 102 116 135 151 208 208 124 126 197
## 5 112 112 102 116 135 149 208 208 124 126 197
## 6 112 112 116 116 145 147 198 214 126 126 193
## Myrt3_2 UniqueInd UniqueColony Plot
## 1 197 MI.1.A2.1 MI.A2 MI.A
## 2 201 MI.1.A2.2 MI.A2 MI.A
## 3 197 MI.1.A2.3 MI.A2 MI.A
## 4 197 MI.1.A2.4 MI.A2 MI.A
## 5 197 MI.1.A2.5 MI.A2 MI.A
## 6 197 MI.1.B10.1 MI.B10 MI.B
summary(csvcomgen)
## year colony Individual free_slave Habitat
## Min. :2009 P1 : 45 Min. : 1.00 Min. :0.0000 MI:1202
## 1st Qu.:2010 O1 : 40 1st Qu.: 2.00 1st Qu.:0.0000 NY:1075
## Median :2010 M1 : 38 Median : 3.00 Median :0.0000 WV: 789
## Mean :2011 P4 : 33 Mean : 4.46 Mean :0.2228
## 3rd Qu.:2012 G1 : 25 3rd Qu.: 5.00 3rd Qu.:0.0000
## Max. :2012 H1 : 25 Max. :25.00 Max. :1.0000
## (Other):2860 NA's :1
## Transect transect.x transect.y GT218_1 GT218_2
## MI.1:389 Min. : 1 Min. : 0 Min. : 96 Min. : 96.0
## MI.2:813 1st Qu.: 780 1st Qu.: 1036 1st Qu.:112 1st Qu.:112.0
## NY.2:777 Median : 4914 Median : 3351 Median :112 Median :112.0
## NY.3:298 Mean : 5532 Mean : 4626 Mean :111 Mean :112.7
## WV.1:285 3rd Qu.: 8924 3rd Qu.: 7558 3rd Qu.:112 3rd Qu.:112.0
## WV.3: 96 Max. :15880 Max. :14797 Max. :152 Max. :132.0
## WV.4:408 NA's :273 NA's :275
## T223_1 T223_2 L18_1 L18_2
## Min. : 72.0 Min. : 76.0 Min. :109.0 Min. :113.0
## 1st Qu.:116.0 1st Qu.:116.0 1st Qu.:137.0 1st Qu.:145.0
## Median :116.0 Median :116.0 Median :143.0 Median :147.0
## Mean :115.7 Mean :116.6 Mean :141.7 Mean :146.8
## 3rd Qu.:116.0 3rd Qu.:116.0 3rd Qu.:145.0 3rd Qu.:149.0
## Max. :258.0 Max. :148.0 Max. :157.0 Max. :159.0
## NA's :140 NA's :137 NA's :206 NA's :207
## GT1_1 GT1_2 L5_1 L5_2
## Min. :160.0 Min. :112.0 Min. : 98.0 Min. : 98.0
## 1st Qu.:198.0 1st Qu.:208.0 1st Qu.:124.0 1st Qu.:134.0
## Median :202.0 Median :214.0 Median :126.0 Median :142.0
## Mean :204.3 Mean :215.1 Mean :129.4 Mean :140.3
## 3rd Qu.:210.0 3rd Qu.:220.0 3rd Qu.:134.0 3rd Qu.:146.0
## Max. :260.0 Max. :294.0 Max. :160.0 Max. :190.0
## NA's :127 NA's :127 NA's :83 NA's :83
## Myrt3_1 Myrt3_2 UniqueInd UniqueColony
## Min. :143 Min. :173.0 MI.1.F10.1: 2 WV.P1 : 25
## 1st Qu.:187 1st Qu.:191.0 MI.1.F11.5: 2 NY.G1 : 20
## Median :189 Median :193.0 MI.2.I27.3: 2 NY.G3 : 20
## Mean :189 Mean :193.3 MI.2.I37.1: 2 NY.H1 : 20
## 3rd Qu.:191 3rd Qu.:195.0 MI.2.J13.3: 2 NY.J1 : 20
## Max. :209 Max. :235.0 MI.1.A2.1 : 1 NY.K1 : 20
## NA's :139 NA's :139 (Other) :3055 (Other):2941
## Plot
## MI.I : 245
## NY.K : 219
## MI.L : 180
## NY.L : 180
## MI.H : 129
## MI.D : 113
## (Other):2000
Task 4 The columns with strange names (e.g., GT218) hold genotypes from microsattelite loci. Look up microsattelite using google, and write down what it is.
Task 5 Make a subset of the data for only the ants from plot “WV.R” (WV is West Viriginia).
Make a scatter plot to show the locations of the ant nests in plot WV.R using the transect.x and transect.y columns.
csvcomgen[csvcomgen$Plot=='WV.R', ]->WVR
plot(WVR$transect.x, WVR$transect.y, main="Scatterplot Transect X vs.Y", xlab="Transect.x", ylab="Transect.y", col= "blue")
Task 6 Make a histogram of the microsattelite length for each of the microsatellite loci in the entire dataset.
Task 7 How many of the sampled ants are slaves? [3066]
for (i in 9:20){hist(WVR[,i],col=i, main="Histogram of WVR microsatellite #lengths", xlab="microsatellite length" )}
length(csvcomgen$free_slave)
## [1] 3066
Task 8 (bonus question) How many of the ant nests contain slaves? (for the last question, you can select the two columns that have the colony name and whether the ants are free and use the unique() function on those two columns) E.g., unique(DATA[,c(5,7)])
#install.packages("plyr")
library(plyr)
count(csvcomgen$free_slave)
## x freq
## 1 0 2383
## 2 1 683
Task 9: Knit your RMD file and publish the html file online, then upload the link to iLearn.