08/04/2015

How to input data in R

  • To read data in R you need to specify the working directory

  • The working directory (wd) can be set with the function setwd()

  • The way you do this depends on the operating system (windows, mac, ubuntu)

Setting the working directory

  • To find the location you can look at the properties of a file in that folder and copy it

  • Make sure you have forward-slash (/) between folder names

  • Do not include any file name in the folder directory name

  • The path to the folder should be quoted ("")

  • It is always better to copy/paste (less typos)

Setting the working directory in WINDOWS

In windows it should be something like this

setwd("C:/location")

Setting the working directory in WINDOWS

You can also do this (ONLY WINDOWS!)

setwd(choose.dir())

That should pop-up a window where you can choose the location

Setting the working directory in MAC

Should be something like this

setwd("/Users/yourname/..")

do not include whatever you have before "users" (like macintosh… )

Setting the working directory in UBUNTU (another operating system)

setwd("~/Dropbox/2015 La Selva REU")

Reading data from .csv files

Reading data from .csv files

To read the csv file:

ap<-read.csv("acoustic parameters song-cognition aug-15.csv")

Make sure the name is quoted

The name matches exactly (better to copy/paste)

Include the extension name (.csv or whatever it is)

Reading data from excel files

We need to install the package "readxl"

install.packages(pkgs = "readxl")

Then we load the package (otherwise it is not available in you R environment)

library(readxl)

Reading data from excel files

look at the data structure

We should make sure that R is reading the data in the right format

To check the data we can use the function str()

str(ap)

look at the data structure

That should return something like this:

## Classes 'tbl_df', 'tbl' and 'data.frame':    409 obs. of  24 variables:
##  $ sound.files: chr  "203.SUR.2014.4.10.8.10.WAV" "203.SUR.2014.4.10.8.10.WAV" "203.SUR.2014.4.10.8.10.WAV" "203.SUR.2014.4.10.8.10.WAV" ...
##  $ selec      : chr  "1-1" "1-2" "1-3" "1-4" ...
##  $ duration   : num  NA 0.128 0.133 0.124 0.131 ...
##  $ meanfreq   : num  6.36 6.5 6.57 6.56 6.57 ...
##  $ sd         : num  1.24 1.13 1.09 1.16 1.12 ...
##  $ median     : num  6.59 6.68 6.68 6.77 6.74 ...
##  $ Q25        : num  5.95 6.09 6.14 6.21 6.12 ...
##  $ Q75        : num  7.21 7.22 7.29 7.34 7.31 ...
##  $ IQR        : num  1.26 1.13 1.15 1.13 1.19 ...
##  $ skew       : num  2.43 2.47 2.16 2.63 1.97 ...
##  $ kurt       : num  9.26 10.24 8.03 11.58 6.5 ...
##  $ sp.ent     : num  0.898 0.886 0.889 0.893 0.891 ...
##  $ sfm        : num  0.445 0.388 0.378 0.421 0.394 ...
##  $ mode       : num  6.82 6.37 6.1 7.1 6.11 ...
##  $ centroid   : num  6.36 6.5 6.57 6.56 6.57 ...
##  $ peakf      : num  6.21 6.21 6.35 6.39 6.07 ...
##  $ meanfun    : num  5.37 5.91 5.92 6.11 5.77 ...
##  $ minfun     : num  2.286 2.182 1.116 2.286 0.262 ...
##  $ maxfun     : num  8 8 8 8 8 8 8 8 8 8 ...
##  $ meandom    : num  6.68 6.77 6.99 6.92 6.9 ...
##  $ mindom     : num  5.28 5.6 5.92 5.92 5.92 5.76 5.76 5.44 5.92 5.92 ...
##  $ maxdom     : num  7.84 7.84 7.68 8 8 7.68 7.84 8 7.68 7.84 ...
##  $ dfrange    : num  2.56 2.24 1.76 2.08 2.08 1.92 2.08 2.56 1.76 1.92 ...
##  $ modindx    : num  0.226 0.25 0.321 0.285 0.267 ...

look at the data structure

  • The output tells you the type of object (data frame), the number of rows and columns and the name of the columns

  • On that ouput look at the type of vector for each column

  • Continuous variables should be read as numeric vectors

look at the data structure

You can also look at the entire data frame with the function View()

View(ap)

What to do if numbers are read as factors

  • For most plots and test you need to have continuous variables identified as numeric in R

  • If not, you will get lots of errors

For the sake of the example I am going to make a factor the variable "duration"

dur<-as.factor(ap$duration)

What to do if numbers are read as factors

To convert that variable back to a numeric one you need to do this

dur<-as.numeric(as.character(dur))

You can have numeric variables read as factors if you add any text to one of the cells

For instance, if you add NODATA to empty cells

Is better to leave them empty

That is it!