Therefore, the basics of the language must be understood. This document references a beginner’s guide published by Emmanuel Paradis in a much more accessible summary as well as this RStudio resource and the Torfs guidebook. The WS2e website also has helpful R codes, but their explanation is not quite as in depth as these other resources.
The types of data possible to input in R are numeric, character, and logical values.
The type of data determines the relationships between objects in RStudio. An object is any value that you assign a name to in RStudio.
Data can be considered as matrices and data frames. Matrices are collections of data of the same data type, while data frames are collections of data with different data types. For example, you can create a matrix of numeric values against numeric values whereas you can make a data frame with character values against numerical values. The scope of this module is mostly using data frames and vectors to analyze data in RStudio.
The types of file we use in R to analyze datasets is known as “comma-separated values” with an extension of .csv. CSV files are not typical to beginners of data analysis and computation; however, you will become quite familiar with these files throughout the scope of the course. Opening these files in Microsoft Excel or your browser is highly depressing and not all that interesting, while opening them in RStudio improves the amount of things you can do with these files.
There are two simple ways to load data in RStudio. In the “Environment” tab, there is a small opened file button, in which you can directly load a dataset into RStudio from a file saved onto your computer. Alternatively, you can use a command to load a file into RStudio from a hyper link, such as the files loaded onto the WS2e website. This code displays exactly what you need to do.
## First, right-click on the dataset you want to load and choose "Copy link address"
## Make sure to name your data something easy to use and access. Use the following argument to load from a hyperlink.
cannibalspiders <- read.csv(url("http://whitlockschluter.zoology.ubc.ca/wp-content/data/chapter07/chap07q08GardenSpiderCannibalism.csv"))
The url("") including the quotations is an essential part of this argument; otherwise, you would not be able to load the files through a hyperlink.
The object can be named anything you please. Sometimes it is very simple to name an object as something very small like “n” or “x”, but if you are very specific about what you want to name your object, it can be as long as you want.
For example, let’s consider the following objects.
object <- 2
n <- 2
## These are both separate objects with the same value, but they both have very different object names. That is fine.
Objects can also include R code and equations. Let’s consider these objects.
n <- 2*3
n
## [1] 6
c <- 1 + dbinom(3, size = 5, prob = 0.8)
A vector can join multiple objects. This is done with the following function
## simple vector (vector a)
a <- c(1,2,3)
## integer sequence
b <- c(1:10)
## complex sequence (I'm annoying)
c <- c(seq(1, 10, by=0.05))
## Let's see what we got!
show(a)
## [1] 1 2 3
show(b)
## [1] 1 2 3 4 5 6 7 8 9 10
show(c)
## [1] 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50
## [12] 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00 2.05
## [23] 2.10 2.15 2.20 2.25 2.30 2.35 2.40 2.45 2.50 2.55 2.60
## [34] 2.65 2.70 2.75 2.80 2.85 2.90 2.95 3.00 3.05 3.10 3.15
## [45] 3.20 3.25 3.30 3.35 3.40 3.45 3.50 3.55 3.60 3.65 3.70
## [56] 3.75 3.80 3.85 3.90 3.95 4.00 4.05 4.10 4.15 4.20 4.25
## [67] 4.30 4.35 4.40 4.45 4.50 4.55 4.60 4.65 4.70 4.75 4.80
## [78] 4.85 4.90 4.95 5.00 5.05 5.10 5.15 5.20 5.25 5.30 5.35
## [89] 5.40 5.45 5.50 5.55 5.60 5.65 5.70 5.75 5.80 5.85 5.90
## [100] 5.95 6.00 6.05 6.10 6.15 6.20 6.25 6.30 6.35 6.40 6.45
## [111] 6.50 6.55 6.60 6.65 6.70 6.75 6.80 6.85 6.90 6.95 7.00
## [122] 7.05 7.10 7.15 7.20 7.25 7.30 7.35 7.40 7.45 7.50 7.55
## [133] 7.60 7.65 7.70 7.75 7.80 7.85 7.90 7.95 8.00 8.05 8.10
## [144] 8.15 8.20 8.25 8.30 8.35 8.40 8.45 8.50 8.55 8.60 8.65
## [155] 8.70 8.75 8.80 8.85 8.90 8.95 9.00 9.05 9.10 9.15 9.20
## [166] 9.25 9.30 9.35 9.40 9.45 9.50 9.55 9.60 9.65 9.70 9.75
## [177] 9.80 9.85 9.90 9.95 10.00
Using vectors is an important skill for joining data so that it can be used. These values are stored in the “Environment” tab in Rstudio under the “Values” header.
## You can take summary statistics of a numeric vector
mean(c)
## [1] 5.5
sd(c)
## [1] 2.619717
median(c)
## [1] 5.5
## You can plot vectors against each other
d <- c(1:5)
e <- c(2, 3, 4, 3, 2)
plot(d~e)
## You can transform a vector
log(b)
## [1] 0.0000000 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101
## [8] 2.0794415 2.1972246 2.3025851
#### Note: the argument log() is a natural log. log10() is the common logarithm
The RStudio website describes data frames as a “a list where all elements are the same length”. When two objects correspond to each other in their combined vectors (think of like an x, y relationship as a function of a graph), then you can display these objects as a data frame.
## Let's create the data frame.
framename <- data.frame(label1=c(6,10,12), label2=c(category1=6, category2=10, category3=12))
## The first vector, label1, is numeric values, while the second vector, label2, is categorical (character) values. Using the data.frame command, we can use this mix of data in RStudio.
Categorical values can be stored as factors in RStudio. Technically, categorical values are considered to be character values and require quotation marks around their input. Here is an example of creating factors in R to store categorical values.
## We must first create the vector of categorical values.
treatments <- c("Sham", "Positive Control", "Negative Control", "Experimental", "Experimental", "Sham", "Experimental", "Negative Control")
## Now we can look at the factors, which will describe the levels!
factor(treatments)
## [1] Sham Positive Control Negative Control Experimental
## [5] Experimental Sham Experimental Negative Control
## Levels: Experimental Negative Control Positive Control Sham
Levels are the different categories in a factor.