This is my first use for R Markdown.

few tips for R Markdown

Basic arithmetic operations

R can be used as a calculator. The basic arithmetic operators are :

# addition
29 + 92
## [1] 121
# substraction
92 - 29
## [1] 63
# multiplication
29 * 92
## [1] 2668
# division
2018 / 4
## [1] 504.5
# exponentiation
4^2
## [1] 16
log(4, base = 2)
## [1] 2
# modulo: returns the remainder of the division
10 %% 3 # it must be return 1
## [1] 1

Basic arithmetic funcgions

1. Logarithms and Exponentials
log10(100) # logarithms base 10 of 100
## [1] 2
exp(2) # exponential of 2
## [1] 7.389056
2. other mathematical functions

\[ |-2| = 2 \]

abs(-2) # absolute value of -2
## [1] 2

\[ \sqrt{16} = \sqrt{4^2} = 4 \]

sqrt(16) # square root of 16 = 4^2 = 4 
## [1] 4

Assign values to variables

A variable can be used to store a value. It can be character, numeric or logical values. These are the basic data types of R. Let’s assign the single value to each variable. Note that, R is case-sensitive. It means that char is not equal to Char. It can be assigned like below:

# Character object
char <- "A"   
# Numeric object
num <- 2      
# Logical object(Yes/No <=> TRUE/FALSE)
logi <- FALSE 

We can use the function class() to check what type a variable is:

class(char)
## [1] "character"
class(num)
## [1] "numeric"
class(logi)
## [1] "logical"

Also, we can use the functions is.character(), is.numeric(), is.logical() to check whether the variable is which data types. Or What if we want to change the type of variable? then we can use the as.* functions, including as.character(), as.numeric() et cetera. Note that, the conversion of a character to numeric will output NA. R does not know how to convert a numeric variable to a character variable. In this case, We can convert to the factor which is one of my favorite characteristic of R.

num.to.char <- as.character(num)
print(num.to.char)
## [1] "2"
class(num.to.char)
## [1] "character"

Create a vector

Usually, the variable has more than one value. So a vector can help to make the variables keep the values more than one. A vector is created using the function c()(for concatenate).

alphabets <- c("A", "C", "H", "I", "N", "R", "T")
num1 <- c(1, 3, 8, 9, 14, 18, 20)
alpha_number <- c(TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE)
num2 <- c(20, 10, 5, 5, 10, 15, 5)

Vectors can be calculated with each other.

num1 * num2
## [1]  20  30  40  45 140 270 100
sum(num1 * num2)
## [1] 645
range(num1) # min & max
## [1]  1 20
sqrt(num2) 
## [1] 4.472136 3.162278 2.236068 2.236068 3.162278 3.872983 2.236068
sort(num2)
## [1]  5  5  5 10 10 15 20
prod(num2)
## [1] 3750000
length(num1 * num2)
## [1] 7
mean(num1 * num2) # == sum(num1 * num2) / length(num1 * num2)
## [1] 92.14286
var(num2) # == sum((num2 - mean(num2))^2 * 1 /(length(num2) - 1))
## [1] 33.33333
sd(num2) # == sqrt(sum((num2 - mean(num2))^2 / (length(num2) - 1))) # == sqrt(var(num2))
## [1] 5.773503

\[ Var(x) = \sigma^2 = \frac{1}{(n-1)} \displaystyle \sum_{i = 1}^{n} (x_i - \mu) \] \[ SD(x) = \sigma = \sqrt {Var(x)} = \sqrt {\frac{1}{(n-1)} \displaystyle \sum_{i = 1}^{n} (x_i - \mu)} \]

What is the missing values in R?

In R, Missing values(or missing information) are represented by NA. Even if the data has the blanks, R has treated the blanks some kind of the data. However, NULL is not treated as the missing value in R. Also, we might see the second type of missing values named NaN(“Not a Number”). It shown where mathematical function won’t work properly, like 0/0 = NaN.

miv <- c("", "", NA, NA, NULL, "B", "B", "E")
table(miv, useNA = "always") # It shows me whether the data has missing values. Note that, NULL is not counted
## miv
##         B    E <NA> 
##    2    2    1    2
sum(is.na(miv)) # compute how many missing values in vectors
## [1] 2
0/0 # NaN
## [1] NaN

Matrices

A matrix is like an Excel sheet containing multiple rows and cols. It’s used to combine vectors with the same type, which can be either numeric, character or logical. The rows of a matrix are generally obs(observations) and the cols are variables. To create a matrix, use the function cbind(), rbind() or matrix()

col1 <- c(1:5)
col2 <- c(2:6)
col3 <- c(3:7)

mat <- cbind(col1, col2, col3)
mat
##      col1 col2 col3
## [1,]    1    2    3
## [2,]    2    3    4
## [3,]    3    4    5
## [4,]    4    5    6
## [5,]    5    6    7
rownames(mat) <- paste0("row", c(1:5)) # set the rows names
mat
##      col1 col2 col3
## row1    1    2    3
## row2    2    3    4
## row3    3    4    5
## row4    4    5    6
## row5    5    6    7
t(mat) # transpose the data
##      row1 row2 row3 row4 row5
## col1    1    2    3    4    5
## col2    2    3    4    5    6
## col3    3    4    5    6    7
mat2 <- matrix(
  data = cbind(col1, col2, col3),
  nrow = 5, ncol = 3, # If fill the matrix by rows, then use byrow = FALSE option
  dimnames = list(paste0("row", c(1:5)),
                  paste0("C", c(1:3)))
)
mat2
##      C1 C2 C3
## row1  1  2  3
## row2  2  3  4
## row3  3  4  5
## row4  4  5  6
## row5  5  6  7
class(mat2) # Is this really matrix?
## [1] "matrix"

1. Subset of a matrix

When you might need to select few rows or few cols in the data: data_name[rows_numbers,cols_numbers]

mat2[1, ] # select the row1 only
## C1 C2 C3 
##  1  2  3
mat2[1:3, ] # select row number 1 to 3
##      C1 C2 C3
## row1  1  2  3
## row2  2  3  4
## row3  3  4  5
mat2[c(1, 5), ] # select row number 1 and 5
##      C1 C2 C3
## row1  1  2  3
## row5  5  6  7
mat2[3, 1] # select by index
## [1] 3

When you might need to exclude few rows or few cols by using negative indexing

mat2[, -3] # exclude the col3
##      C1 C2
## row1  1  2
## row2  2  3
## row3  3  4
## row4  4  5
## row5  5  6
mat2[-1, -3] # exclude the raw1 and col3
##      C1 C2
## row2  2  3
## row3  3  4
## row4  4  5
## row5  5  6

2. Calculaltions of matrices

mat2 * 3 # multiply each element of the matrix by 3
##      C1 C2 C3
## row1  3  6  9
## row2  6  9 12
## row3  9 12 15
## row4 12 15 18
## row5 15 18 21
sqrt(mat2)
##            C1       C2       C3
## row1 1.000000 1.414214 1.732051
## row2 1.414214 1.732051 2.000000
## row3 1.732051 2.000000 2.236068
## row4 2.000000 2.236068 2.449490
## row5 2.236068 2.449490 2.645751
colSums(mat2) # Total of each col
## C1 C2 C3 
## 15 20 25
rowSums(mat2) # Total of each row
## row1 row2 row3 row4 row5 
##    6    9   12   15   18
colMeans(mat2) # mean of col
## C1 C2 C3 
##  3  4  5
# Or possible to the function apply()
# apply(X, MARGIN, FUNS)
# X : matrix
# MARGIN : possible values are 1(for rows) and 2(for columns)
# FUNS : the function to apply on rows/cols
apply(mat2, 1, mean) #  = rowMeans(mat2)
## row1 row2 row3 row4 row5 
##    2    3    4    5    6

Factors

Factor variables are represented categories or groups in the data. II think that factor is one of the handy features of R because it can count group or category variables. Note that, R orders factor levels alphabetically. So, if you want to change the order in the levels, you can specify the levels argument

fac <- as.factor(alphabets) # convert character to factor
levels(fac)
## [1] "A" "C" "H" "I" "N" "R" "T"
levels(fac) <- c("C", "A", "T", "H", "R", "I", "N") # Change levels
fac
## [1] C A T H R I N
## Levels: C A T H R I N

This is the same way that converts to factor and changes the levels which have a specific order.

fac <- factor(alphabets, # conver to factor and change the levels
              levels = c("C", "A", "T", "H", "R", "I", "N"))
fac
## [1] A C H I N R T
## Levels: C A T H R I N
summary(fac) # check the numbers of each levels
## C A T H R I N 
## 1 1 1 1 1 1 1

Data frames

A data frames is like a matrix but it can contain not only numeric also character or logical. Rows are observations and cols are variables.

# Create a data frame
df <- data.frame(alphabets, alpha_number, num1, num2, miv)
df
class(df)
## [1] "data.frame"
dim(df) # can check the number of rows and cols
## [1] 7 5

Subset of data frame

Just like subsetting of the matrix, data frame can select specific cols or rows. To select certain cols(or rows) from a data frame, refer to the cols(or rows) by name or by their number.

# To access certain data, dollar sign is used
df$alpha_number
## [1]  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE
# or number
df[ ,2] # == df[ ,"alpha_number"]
## [1]  TRUE FALSE  TRUE  TRUE  TRUE FALSE FALSE
# subset columns
df[ ,c(1, 3)]
df[df$alpha_number == TRUE, ] # select the rows that meet the condition 
df[c(1, 3, 5, 7), c("alphabets", "miv")] # select multiple rows and cols that meet the condition

Lists

Matrix only can contain numeric variables. And Data frame can contain logical, character, numeric variables but it must have same rows. However, List can contain different rows with all kind of R objects. Also, it can create functions using alist() This is the main feature of the List.

list_test <- list(
  a1 = "catharina",
  a2 = c("catharina", "jisoo", "park"),
  a3 = c(1:10),
  a4 = c(TRUE, TRUE, FALSE, FALSE, TRUE)
)
list_test
## $a1
## [1] "catharina"
## 
## $a2
## [1] "catharina" "jisoo"     "park"     
## 
## $a3
##  [1]  1  2  3  4  5  6  7  8  9 10
## 
## $a4
## [1]  TRUE  TRUE FALSE FALSE  TRUE
length(list_test) # 4
## [1] 4

The list_test contains 4 components, which may be individually referred to as list_test[[1]], list_test[[2]] et cetera.

When you subtract an element from a list, we can use its name or its index.

# select the name of list using $ sign
list_test$a1
## [1] "catharina"
list_test[["a1"]]
## [1] "catharina"
# select by index
list_test[[1]]
## [1] "catharina"
# select a specific element of a component
list_test[[3]][5]
## [1] 5
list_test[[4]][2]
## [1] TRUE

Lists can also concatenate two or more lists. Like list_abc <- c(list_a, list_b, list_c), the result is also a list and those argument lists joined together in sequence.

References