ant_data_exploration

Worksheet 21: reading Excel files and some ant data.

Task 1

used

read.csv("CompleteGeneticData.csv")-> CSVcompleteGeneticData

to read the .csv file of the complete genetic data.

Task 2

use instal.packages("xlsx")to install the xlsx package into our computer. consequently, I used library(xlsx) to load the package into R.
the completeGeneticData.xlsx file was read and stored as a variable using read.xlsx2("CompleteGeneticData.xlsx", sheetIndex = 1)->XLSXcompleteGeneticData

Task 3

Yes, there seem to be differences in the way the data is being interpreted by R, when the file is opened using different means.
the file read using read.xlsx() seems to read all columns as a factor data type. on the other hand, the file read using read.csv() seems to give each column the right data type. For example, if the data is a set of numbers, then the column is read a numeric vector.

Task 4

Microsatellites are di-, tri-, or tetra nucleotide tandem repeats in DNA sequences. The number of repeats is variable in populations of DNA and within the alleles of an individual. For example, The sequence below has a 20 dinucleotide repeat (40bp) stretch of CA that is shown in bold.

CGTTCAATAAGCAAAAATCCATAGTTTTAGGAATGTGGGCT GCTTGGTGTGATGTAGAAGGCGCCAATGCATCTCGACGTAT GCGTATACGGGTTACCCCCTTTGCAATCAGTGCACACACAC ACACACACACACACACACACACACACACACAGTGCCAAGCA AAAATAACGCCAAGCAGAACGAAGACGTTCTCGAGAACACC AGAAGTTCGTGCTGTCGGGGCATGCGGCGAGTAAAGGGGAT

Task 5

I used the following command to subset the data to show only the ants that belong to the category “WV.R” as specified by column “plot”:

subset(CSVcompleteGeneticData,CSVcompleteGeneticData$Plot=="WV.R")->WV.Rdatasubset

In order to make the plot we used:

plot(WV.Rdatasubset$transect.x, WV.Rdatasubset$transect.y, xlab = "transect.X", ylab="transect.y",main="transect.x vs transect.y" )

Task 6

In order to plot a histogram incorporating different sets of data, we need to use the function multhist(), which is part of the package “plotrix”. therefore we needed to first install and load the package.

install.packages("plotrix",repos="http://cran.rstudio.com/")

## 
## The downloaded binary packages are in
##  /var/folders/0w/0k09n4rx6fd5gmm7hcv1yj8r0000gn/T//RtmphEzFjT/downloaded_packages

#the argument (repos="http://cran.rstudio.com/") serves so that the chunk runs smoothly when knitted by Rmd.
library(plotrix)

## Warning: package 'plotrix' was built under R version 3.2.5

After the package was installed, we were able to use multhist() to make our histogram.

multhist(CSVcompleteGeneticData[,9:20], xlab="microsattelite lengths", ylab="frequency", main="Histogram of microsattelite length")

Task 7

Even though it is not specified, it is correct to assume that in the column CSVcompleteGeneticData$free_slave every value equal to 0 means that the ant is free and every value equal to 1 is a slave ant. therefore in order to count how many ants are slaves, I used the following command:

length(which(CSVcompleteGeneticData$free_slave==1))

## [1] 683

Task 8

as described in the explanation of this task, I used unique() to determine which colonies has slaves and which did not. the following chunk of command describes what I did:

unique(CSVcompleteGeneticData[,c(2,4)])->colony_free_slaves
length(which(colony_free_slaves$free_slave==1))

## [1] 37

Task 9

ant_data_exploration

Manuel Rosero

11/8/2016

Worksheet 21: reading Excel files and some ant data.