This is my first use for R Markdown.
few tips for R Markdown
Ctrl + Shift + Enter = run the code.Ctrl + Alt + I = is inserting new chunk.Ctrl + Shift + K = preview to the HTML file{r, options} if setting the options echo = FALSE then R code is disappearR can be used as a calculator. The basic arithmetic operators are :
# addition
29 + 92
## [1] 121
# substraction
92 - 29
## [1] 63
# multiplication
29 * 92
## [1] 2668
# division
2018 / 4
## [1] 504.5
# exponentiation
4^2
## [1] 16
log(4, base = 2)
## [1] 2
# modulo: returns the remainder of the division
10 %% 3 # it must be return 1
## [1] 1
log10(100) # logarithms base 10 of 100
## [1] 2
exp(2) # exponential of 2
## [1] 7.389056
\[ |-2| = 2 \]
abs(-2) # absolute value of -2
## [1] 2
\[ \sqrt{16} = \sqrt{4^2} = 4 \]
sqrt(16) # square root of 16 = 4^2 = 4
## [1] 4
A variable can be used to store a value. It can be character, numeric or logical values. These are the basic data types of R. Let’s assign the single value to each variable. Note that, R is case-sensitive. It means that char is not equal to Char. It can be assigned like below:
# Character object
char <- "A"
# Numeric object
num <- 2
# Logical object(Yes/No <=> TRUE/FALSE)
logi <- FALSE
We can use the function class() to check what type a variable is:
class(char)
## [1] "character"
class(num)
## [1] "numeric"
class(logi)
## [1] "logical"
Also, we can use the functions is.character(), is.numeric(), is.logical() to check whether the variable is which data types. Or What if we want to change the type of variable? then we can use the as.* functions, including as.character(), as.numeric() et cetera. Note that, the conversion of a character to numeric will output NA. R does not know how to convert a numeric variable to a character variable. In this case, We can convert to the factor which is one of my favorite characteristic of R.
num.to.char <- as.character(num)
print(num.to.char)
## [1] "2"
class(num.to.char)
## [1] "character"
Usually, the variable has more than one value. So a vector can help to make the variables keep the values more than one. A vector is created using the function c()(for concatenate).
alphabets <- c("A", "C", "H", "I", "N", "R", "T")
num1 <- c(1, 3, 8, 9, 14, 18, 20)
alpha_number <- c(TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE)
num2 <- c(20, 10, 5, 5, 10, 15, 5)
Vectors can be calculated with each other.
num1 * num2
## [1] 20 30 40 45 140 270 100
sum(num1 * num2)
## [1] 645
range(num1) # min & max
## [1] 1 20
sqrt(num2)
## [1] 4.472136 3.162278 2.236068 2.236068 3.162278 3.872983 2.236068
sort(num2)
## [1] 5 5 5 10 10 15 20
prod(num2)
## [1] 3750000
length(num1 * num2)
## [1] 7
mean(num1 * num2) # == sum(num1 * num2) / length(num1 * num2)
## [1] 92.14286
var(num2) # == sum((num2 - mean(num2))^2 * 1 /(length(num2) - 1))
## [1] 33.33333
sd(num2) # == sqrt(sum((num2 - mean(num2))^2 / (length(num2) - 1))) # == sqrt(var(num2))
## [1] 5.773503
\[ Var(x) = \sigma^2 = \frac{1}{(n-1)} \displaystyle \sum_{i = 1}^{n} (x_i - \mu) \] \[ SD(x) = \sigma = \sqrt {Var(x)} = \sqrt {\frac{1}{(n-1)} \displaystyle \sum_{i = 1}^{n} (x_i - \mu)} \]
In R, Missing values(or missing information) are represented by NA. Even if the data has the blanks, R has treated the blanks some kind of the data. However, NULL is not treated as the missing value in R. Also, we might see the second type of missing values named NaN(“Not a Number”). It shown where mathematical function won’t work properly, like 0/0 = NaN.
miv <- c("", "", NA, NA, NULL, "B", "B", "E")
table(miv, useNA = "always") # It shows me whether the data has missing values. Note that, NULL is not counted
## miv
## B E <NA>
## 2 2 1 2
sum(is.na(miv)) # compute how many missing values in vectors
## [1] 2
0/0 # NaN
## [1] NaN
A matrix is like an Excel sheet containing multiple rows and cols. It’s used to combine vectors with the same type, which can be either numeric, character or logical. The rows of a matrix are generally obs(observations) and the cols are variables. To create a matrix, use the function cbind(), rbind() or matrix()
col1 <- c(1:5)
col2 <- c(2:6)
col3 <- c(3:7)
mat <- cbind(col1, col2, col3)
mat
## col1 col2 col3
## [1,] 1 2 3
## [2,] 2 3 4
## [3,] 3 4 5
## [4,] 4 5 6
## [5,] 5 6 7
rownames(mat) <- paste0("row", c(1:5)) # set the rows names
mat
## col1 col2 col3
## row1 1 2 3
## row2 2 3 4
## row3 3 4 5
## row4 4 5 6
## row5 5 6 7
t(mat) # transpose the data
## row1 row2 row3 row4 row5
## col1 1 2 3 4 5
## col2 2 3 4 5 6
## col3 3 4 5 6 7
mat2 <- matrix(
data = cbind(col1, col2, col3),
nrow = 5, ncol = 3, # If fill the matrix by rows, then use byrow = FALSE option
dimnames = list(paste0("row", c(1:5)),
paste0("C", c(1:3)))
)
mat2
## C1 C2 C3
## row1 1 2 3
## row2 2 3 4
## row3 3 4 5
## row4 4 5 6
## row5 5 6 7
class(mat2) # Is this really matrix?
## [1] "matrix"
When you might need to select few rows or few cols in the data: data_name[rows_numbers,cols_numbers]
mat2[1, ] # select the row1 only
## C1 C2 C3
## 1 2 3
mat2[1:3, ] # select row number 1 to 3
## C1 C2 C3
## row1 1 2 3
## row2 2 3 4
## row3 3 4 5
mat2[c(1, 5), ] # select row number 1 and 5
## C1 C2 C3
## row1 1 2 3
## row5 5 6 7
mat2[3, 1] # select by index
## [1] 3
When you might need to exclude few rows or few cols by using negative indexing
mat2[, -3] # exclude the col3
## C1 C2
## row1 1 2
## row2 2 3
## row3 3 4
## row4 4 5
## row5 5 6
mat2[-1, -3] # exclude the raw1 and col3
## C1 C2
## row2 2 3
## row3 3 4
## row4 4 5
## row5 5 6
mat2 * 3 # multiply each element of the matrix by 3
## C1 C2 C3
## row1 3 6 9
## row2 6 9 12
## row3 9 12 15
## row4 12 15 18
## row5 15 18 21
sqrt(mat2)
## C1 C2 C3
## row1 1.000000 1.414214 1.732051
## row2 1.414214 1.732051 2.000000
## row3 1.732051 2.000000 2.236068
## row4 2.000000 2.236068 2.449490
## row5 2.236068 2.449490 2.645751
colSums(mat2) # Total of each col
## C1 C2 C3
## 15 20 25
rowSums(mat2) # Total of each row
## row1 row2 row3 row4 row5
## 6 9 12 15 18
colMeans(mat2) # mean of col
## C1 C2 C3
## 3 4 5
# Or possible to the function apply()
# apply(X, MARGIN, FUNS)
# X : matrix
# MARGIN : possible values are 1(for rows) and 2(for columns)
# FUNS : the function to apply on rows/cols
apply(mat2, 1, mean) # = rowMeans(mat2)
## row1 row2 row3 row4 row5
## 2 3 4 5 6
Factor variables are represented categories or groups in the data. II think that factor is one of the handy features of R because it can count group or category variables. Note that, R orders factor levels alphabetically. So, if you want to change the order in the levels, you can specify the levels argument
fac <- as.factor(alphabets) # convert character to factor
levels(fac)
## [1] "A" "C" "H" "I" "N" "R" "T"
levels(fac) <- c("C", "A", "T", "H", "R", "I", "N") # Change levels
fac
## [1] C A T H R I N
## Levels: C A T H R I N
This is the same way that converts to factor and changes the levels which have a specific order.
fac <- factor(alphabets, # conver to factor and change the levels
levels = c("C", "A", "T", "H", "R", "I", "N"))
fac
## [1] A C H I N R T
## Levels: C A T H R I N
summary(fac) # check the numbers of each levels
## C A T H R I N
## 1 1 1 1 1 1 1
A data frames is like a matrix but it can contain not only numeric also character or logical. Rows are observations and cols are variables.
# Create a data frame
df <- data.frame(alphabets, alpha_number, num1, num2, miv)
df
class(df)
## [1] "data.frame"
dim(df) # can check the number of rows and cols
## [1] 7 5
Just like subsetting of the matrix, data frame can select specific cols or rows. To select certain cols(or rows) from a data frame, refer to the cols(or rows) by name or by their number.
# To access certain data, dollar sign is used
df$alpha_number
## [1] TRUE FALSE TRUE TRUE TRUE FALSE FALSE
# or number
df[ ,2] # == df[ ,"alpha_number"]
## [1] TRUE FALSE TRUE TRUE TRUE FALSE FALSE
# subset columns
df[ ,c(1, 3)]
df[df$alpha_number == TRUE, ] # select the rows that meet the condition
df[c(1, 3, 5, 7), c("alphabets", "miv")] # select multiple rows and cols that meet the condition
Matrix only can contain numeric variables. And Data frame can contain logical, character, numeric variables but it must have same rows. However, List can contain different rows with all kind of R objects. Also, it can create functions using alist() This is the main feature of the List.
list_test <- list(
a1 = "catharina",
a2 = c("catharina", "jisoo", "park"),
a3 = c(1:10),
a4 = c(TRUE, TRUE, FALSE, FALSE, TRUE)
)
list_test
## $a1
## [1] "catharina"
##
## $a2
## [1] "catharina" "jisoo" "park"
##
## $a3
## [1] 1 2 3 4 5 6 7 8 9 10
##
## $a4
## [1] TRUE TRUE FALSE FALSE TRUE
length(list_test) # 4
## [1] 4
The list_test contains 4 components, which may be individually referred to as list_test[[1]], list_test[[2]] et cetera.
When you subtract an element from a list, we can use its name or its index.
# select the name of list using $ sign
list_test$a1
## [1] "catharina"
list_test[["a1"]]
## [1] "catharina"
# select by index
list_test[[1]]
## [1] "catharina"
# select a specific element of a component
list_test[[3]][5]
## [1] 5
list_test[[4]][2]
## [1] TRUE
Lists can also concatenate two or more lists. Like list_abc <- c(list_a, list_b, list_c), the result is also a list and those argument lists joined together in sequence.