Gavin Douglas
Aug. 7th, 2018
Assigning a character:
tmp <- "hello world!"
print(tmp)
[1] "hello world!"
Working with numbers:
a <- 20
b <- 5
c <- a**2 + b**2
c
[1] 425
Why do I use <- instead of =?
print is a function, which you can read about by typing:
?print
All functions will have some sort of documentation when you use the ? syntax
(1) numeric
(2) integer
(3) logical
(4) character
(5) factor
(1) vector
(2) list
(3) matrix
(4) dataframe
One of the main functions we'll be using is c.
test_vec <- c(5, 42, 44, 6)
test_vec
[1] 5 42 44 6
Getting a particular index from a vector:
test_vec[3]
[1] 44
Note that the first index is indicated by 1, not 0:
test_vec[0]
numeric(0)
What's the difference between the below two objects (hint: try class)?
x <- 1
y <- 1L
logical_vec <- c(TRUE, F, T, 10 > 2, 10 == 1)
logical_vec
[1] TRUE FALSE TRUE TRUE FALSE
Note no " characters:
class(c("TRUE", "FALSE"))
[1] "character"
Beware working with variables with Boolean names:
T <- FALSE
print(T)
[1] FALSE
These two lines have very different meanings:
T == 24
T = 24
Using comparison operators will return logical vectors
"CAT" == "DOG"
[1] FALSE
x <- 2
x < 3
[1] TRUE
x > 10
[1] FALSE
Defining a character vector:
tmp <- c("hey,", "this", "is", "multiple", "strings")
class(tmp)
[1] "character"
Converting to a character vector:
tmp2 <- c(10, 20, 40.0, 19)
tmp2 <- as.character(tmp2)
tmp2
[1] "10" "20" "40" "19"
class(tmp2)
[1] "character"
Character columns in tables will be read in as factors by default (set StringsAsFactors=False to avoid this).
Explicitly defining factors is better:
tmp <- c("hey,", "this", "is", "multiple", "strings")
tmp
[1] "hey," "this" "is" "multiple" "strings"
tmp <- factor(tmp)
tmp
[1] hey, this is multiple strings
Levels: hey, is multiple strings this
tmp <- factor(c("treated_WT", "control_WT", "treated_KO", "control_WT", "control_KO", "control_KO", "treated_WT", "treated_KO"))
print(tmp)
[1] treated_WT control_WT treated_KO control_WT control_KO control_KO
[7] treated_WT treated_KO
Levels: control_KO control_WT treated_KO treated_WT
You can explicitly set the factor levels you want:
tmp2 <- factor(tmp, levels = c("control_WT", "treated_WT", "control_KO", "treated_KO"))
print(tmp2)
[1] treated_WT control_WT treated_KO control_WT control_KO control_KO
[7] treated_WT treated_KO
Levels: control_WT treated_WT control_KO treated_KO
What class is the object z below?
z <- c(4154163, "Hi there gang!")
What class is the object funInSun below?
print(funInSun)
[[1]]
[1] 1841 51
[[2]]
[1] "heya"
You can combine objects of different types in lists.
tmp <- list("prime"=c(1, 2, 5, 7, 11), "animals"=c("cow", "chicken"))
tmp
$prime
[1] 1 2 5 7 11
$animals
[1] "cow" "chicken"
The key advantage of using lists is that you can use the function lapply, e.g.
tmp2 <- list("set1"=c(1, 2, 5, 7, 11), "set2"=c(3, 8, 10), "set3"=c(4, 8))
lapply(tmp2, sum)
$set1
[1] 26
$set2
[1] 21
$set3
[1] 12
Only for data of the same type. Take up less memory than dataframes.
test_matrix <- matrix(c(10, 32, 13, 54), nrow=2, ncol=2)
test_matrix
[,1] [,2]
[1,] 10 13
[2,] 32 54
test_matrix2 <- matrix(c("dog", "cat", 13, 54), nrow=2, ncol=2)
test_matrix2
[,1] [,2]
[1,] "dog" "13"
[2,] "cat" "54"
Columns can be of different types. Take up more memory, but are easier to interact with.
test_df <- as.data.frame(matrix(c(10, 32, 13, 54), nrow=2, ncol=2))
test_df
V1 V2
1 10 13
2 32 54
test_df2 <- data.frame(pet=c("dog", "cat"), livestock=c("cow", "sheep"), stringsAsFactors = FALSE)
test_df2
pet livestock
1 dog cow
2 cat sheep
Select a single column:
test_df2$pets
NULL
Matrices and dataframes can both have row and column names.
test_df2 <- data.frame(pet=c("dog", "cat"), livestock=c("cow", "sheep"), stringsAsFactors = FALSE)
test_df2
pet livestock
1 dog cow
2 cat sheep
rownames(test_df2) <- c("Bill", "Sandy")
colnames(test_df2) <- c("Pet", "Livestock")
test_df2
Pet Livestock
Bill dog cow
Sandy cat sheep
To get the row corresponding to Bill (note the column is left blank):
test_df2["Bill", ]
Pet Livestock
Bill dog cow
To get Sandy's livestock:
test_df2["Sandy", "Livestock"]
[1] "sheep"
However, we can't remove rows or columns by name (at least not without an extra step).
test_df2[-"Bill",]
Error in -"Bill" : invalid argument to unary operator
Return element at first row and second column:
test_df2[1, 2]
[1] "cow"
Remove the first row:
test_df2[-1,]
Pet Livestock
Sandy cat sheep
Remove the first row based on it's name (with the which function that will return an index):
test_df2[-which(rownames(test_df2) == "Bill"),]
Pet Livestock
Sandy cat sheep
Reading:
in_table <- read.table("myfile.txt", header=TRUE, row.names=1, sep="\t", stringsAsFactors=FALSE)
There are many other options, but these ones are good to be aware of. Note that in this case the first column would be interpreted as the rownames.
Writing:
write.table(in_table_modified, file="newfile.txt", quote=FALSE, sep="\t", col.names=NA, row.names=TRUE)
Make sure to set quote=FALSE so that no quotes are in your output file.
Both of these datatypes can be used in the place of dataframes.
data.table (data.table package)
tibble (tidyr packages)
mean
min/max
plot
boxplot
summary
str
rm(list=ls())
Restart R (if necessary) to make sure no packages are loaded.
Make a new Rscript file to save your R commands; relying on “History” will end in sadness
Run commands in this Rscript file in RStudio by highlighting lines and hitting CTRL-RETURN
Write comments starting with the # character before blocks of code
Write your comments and commands in a way that someone else could understand
Start a new line once you reach 80 characters
Next workshop: write custom functions to avoid writing repetitive code
Write the test_df2 table to a file named “write_test.txt” with the write.table function.
mtcars with the commanddata(mtcars)
summary and str.
head(mtcars, 10)
mtcars table.
pch option)mtcars object, but only keep the rows called: “Valiant”, “Merc 230”, and “Lotus Europa”.
grep.
mean miles per gallon for all Merc cars.data(airquality)
rowSums function to identify rows that have any NA values?
All week 2 Coursera videos up to and including “Scoping Rules - R Scoping Rules”.