You can create a vector in the following ways
c(1,2,3)
## [1] 1 2 3
c(1:5)
## [1] 1 2 3 4 5
seq(from=0, to=1, by=0.1)
## [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Operations on vectors are done element wise
c(1,2,3) + c(1,2,3)
## [1] 2 4 6
c(1,2,3)+2
## [1] 3 4 5
c(1,2,3)*c(1,2,3)
## [1] 1 4 9
Can use the commands length(), sort(), rev(), rank(), head(x,2), tail(x,2) for various purposes with vectors.
To access/alter a specific element in a vector:
a <- c(1,2,3);
a[2] = 10;
a
## [1] 1 10 3
b<- c(1,2,3);
b[b<3] #logical mask for vectors
## [1] 1 2
Other useful extraction functions include which.max(x), which.min(x), which(x condition), which return the index position of the relevant elements. These can be used as masks when indexing.
A useful way to create repetitive vectors is using the rep function:
rep(1,6)
## [1] 1 1 1 1 1 1
Lists allow for elements of any data type
(b <- list(TRUE, my.matrix=matrix(1:4,nrow=2),c(1+2i,3), "A character string"))
## [[1]]
## [1] TRUE
##
## $my.matrix
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
##
## [[3]]
## [1] 1+2i 3+0i
##
## [[4]]
## [1] "A character string"
To access an element of a list:
b[[1]]
## [1] TRUE
Can create a matrix in the following way:
(X <- matrix(1:12, nrow = 4, ncol = 3, byrow = TRUE))
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [4,] 10 11 12
Can use rbind() and cbind() to add rows or columns to a matrix
(Y <- cbind(X, c(5,5,5,5)))
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 5
## [2,] 4 5 6 5
## [3,] 7 8 9 5
## [4,] 10 11 12 5
Can do element wise operations similar to vectors. Can also do matrix multiplication A%*%B.
Can utilise the function apply() to apply a function to the elements of all rows (MARGIN=1) or all columns (MARGIN=2)
apply(X, MARGIN=1, FUN=sum)
## [1] 6 15 24 33
Can create array:
(X <- array(1:6, dim=c(1,2,3)))
## , , 1
##
## [,1] [,2]
## [1,] 1 2
##
## , , 2
##
## [,1] [,2]
## [1,] 3 4
##
## , , 3
##
## [,1] [,2]
## [1,] 5 6
To create a dataframe
(Z <- data.frame(GENDER=c("F","M","M","F"),
ID=c(123,234,345,456),
NAME=c("Mary","James","James","Olivia"),
Height=c(170,180,185,160)))
## GENDER ID NAME Height
## 1 F 123 Mary 170
## 2 M 234 James 180
## 3 M 345 James 185
## 4 F 456 Olivia 160
To index the dataframe we use the convention [rows, columns], leaving it blank if we want all of the relative. Can also use vectors in this to get specific rows
Z[1:2,]
## GENDER ID NAME Height
## 1 F 123 Mary 170
## 2 M 234 James 180
Z[c(1,3),]
## GENDER ID NAME Height
## 1 F 123 Mary 170
## 3 M 345 James 185
Adding Column to a dataframe, you can either cbind a vector/dataframe or do it directly through the $ method
Z$Test <- c(1,2,3,4)
print(Z)
## GENDER ID NAME Height Test
## 1 F 123 Mary 170 1
## 2 M 234 James 180 2
## 3 M 345 James 185 3
## 4 F 456 Olivia 160 4
Can subset observations using boolean masks, with the following logic & = and, !=not, |=or, %in%= in a vector
Z[Z$ID == 123,]
## GENDER ID NAME Height Test
## 1 F 123 Mary 170 1
To get the general structure of a dataframe:
str(Z)
## 'data.frame': 4 obs. of 5 variables:
## $ GENDER: Factor w/ 2 levels "F","M": 1 2 2 1
## $ ID : num 123 234 345 456
## $ NAME : Factor w/ 3 levels "James","Mary",..: 2 1 1 3
## $ Height: num 170 180 185 160
## $ Test : num 1 2 3 4
To get summary statistics for each column of the dataframe:
print(summary(Z))
## GENDER ID NAME Height Test
## F:2 Min. :123.0 James :2 Min. :160.0 Min. :1.00
## M:2 1st Qu.:206.2 Mary :1 1st Qu.:167.5 1st Qu.:1.75
## Median :289.5 Olivia:1 Median :175.0 Median :2.50
## Mean :289.5 Mean :173.8 Mean :2.50
## 3rd Qu.:372.8 3rd Qu.:181.2 3rd Qu.:3.25
## Max. :456.0 Max. :185.0 Max. :4.00
We can access a specific variable using the command:
Z$GENDER
## [1] F M M F
## Levels: F M
Can create a new dataframe with extracted columns with the shorthand:
(new <- data.frame(Z$GENDER, Z$ID))
## Z.GENDER Z.ID
## 1 F 123
## 2 M 234
## 3 M 345
## 4 F 456
Can add a column/dataframe to a dataframe using cbind
Extra <- c(1,2,3,4);
(cbind(Z,Extra))
## GENDER ID NAME Height Test Extra
## 1 F 123 Mary 170 1 1
## 2 M 234 James 180 2 2
## 3 M 345 James 185 3 3
## 4 F 456 Olivia 160 4 4
Can merge dataframes by a variable, often using a variable that is unique for each observation. Can add an all.x=TRUE arguement to the merge statement to only return results that are common between data frames.
Y <- data.frame(GENDER=c("M","F","F","M"),
ID=c(345,456,123,234),
NAME=c("James","Olivia","Mary","James"),
Weight=c(80,50,70,60));
merge(Z,Y, by=c("ID"))
## ID GENDER.x NAME.x Height Test GENDER.y NAME.y Weight
## 1 123 F Mary 170 1 F Mary 70
## 2 234 M James 180 2 M James 60
## 3 345 M James 185 3 M James 80
## 4 456 F Olivia 160 4 F Olivia 50
Factors are a way to store strings that can be used in specific ways. Examples include the levels(x) function
x <- factor(c("blue","green","blue","red","blue","green","green"));
levels(x) #Shows all unique factors
## [1] "blue" "green" "red"
To read text files that are presented as tables, you would use: my.data <- read.table(file=“C:/MyFolder/somedata.txt”, header = TRUE, sep =‘,’, dec=‘.’, row.names=1)
To read a csv file
training_data <- read.csv(file = "C:/Users/Zac/Downloads/Diabetes_training.csv")
To write to a text (table) or csv (csv) file:
write.csv(training_data, file = "testfile.csv")