Data Structures

Vectors

You can create a vector in the following ways

c(1,2,3)

## [1] 1 2 3

c(1:5)

## [1] 1 2 3 4 5

seq(from=0, to=1, by=0.1)

##  [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Operations on vectors are done element wise

c(1,2,3) + c(1,2,3)

## [1] 2 4 6

c(1,2,3)+2

## [1] 3 4 5

c(1,2,3)*c(1,2,3)

## [1] 1 4 9

Can use the commands length(), sort(), rev(), rank(), head(x,2), tail(x,2) for various purposes with vectors.
To access/alter a specific element in a vector:

a <- c(1,2,3);
a[2] = 10;
a

## [1]  1 10  3

b<- c(1,2,3);
b[b<3] #logical mask for vectors

## [1] 1 2

Other useful extraction functions include which.max(x), which.min(x), which(x condition), which return the index position of the relevant elements. These can be used as masks when indexing.
A useful way to create repetitive vectors is using the rep function:

rep(1,6)

## [1] 1 1 1 1 1 1

Lists

Lists allow for elements of any data type

(b <- list(TRUE, my.matrix=matrix(1:4,nrow=2),c(1+2i,3), "A character string"))

## [[1]]
## [1] TRUE
## 
## $my.matrix
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
## 
## [[3]]
## [1] 1+2i 3+0i
## 
## [[4]]
## [1] "A character string"

To access an element of a list:

b[[1]]

## [1] TRUE

Matrix

Can create a matrix in the following way:

(X <- matrix(1:12, nrow = 4, ncol = 3, byrow = TRUE))

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
## [3,]    7    8    9
## [4,]   10   11   12

Can use rbind() and cbind() to add rows or columns to a matrix

(Y <- cbind(X, c(5,5,5,5)))

##      [,1] [,2] [,3] [,4]
## [1,]    1    2    3    5
## [2,]    4    5    6    5
## [3,]    7    8    9    5
## [4,]   10   11   12    5

Can do element wise operations similar to vectors. Can also do matrix multiplication A%*%B.
Can utilise the function apply() to apply a function to the elements of all rows (MARGIN=1) or all columns (MARGIN=2)

apply(X, MARGIN=1, FUN=sum)

## [1]  6 15 24 33

Array

Can create array:

(X <- array(1:6, dim=c(1,2,3)))

## , , 1
## 
##      [,1] [,2]
## [1,]    1    2
## 
## , , 2
## 
##      [,1] [,2]
## [1,]    3    4
## 
## , , 3
## 
##      [,1] [,2]
## [1,]    5    6

DataFrame

To create a dataframe

(Z <- data.frame(GENDER=c("F","M","M","F"),
ID=c(123,234,345,456),
NAME=c("Mary","James","James","Olivia"),
Height=c(170,180,185,160)))

##   GENDER  ID   NAME Height
## 1      F 123   Mary    170
## 2      M 234  James    180
## 3      M 345  James    185
## 4      F 456 Olivia    160

To index the dataframe we use the convention [rows, columns], leaving it blank if we want all of the relative. Can also use vectors in this to get specific rows

Z[1:2,]

##   GENDER  ID  NAME Height
## 1      F 123  Mary    170
## 2      M 234 James    180

Z[c(1,3),]

##   GENDER  ID  NAME Height
## 1      F 123  Mary    170
## 3      M 345 James    185

Adding Column to a dataframe, you can either cbind a vector/dataframe or do it directly through the $ method

Z$Test <- c(1,2,3,4)
print(Z)

##   GENDER  ID   NAME Height Test
## 1      F 123   Mary    170    1
## 2      M 234  James    180    2
## 3      M 345  James    185    3
## 4      F 456 Olivia    160    4

Can subset observations using boolean masks, with the following logic & = and, !=not, |=or, %in%= in a vector

Z[Z$ID == 123,]

##   GENDER  ID NAME Height Test
## 1      F 123 Mary    170    1

To get the general structure of a dataframe:

str(Z)

## 'data.frame':    4 obs. of  5 variables:
##  $ GENDER: Factor w/ 2 levels "F","M": 1 2 2 1
##  $ ID    : num  123 234 345 456
##  $ NAME  : Factor w/ 3 levels "James","Mary",..: 2 1 1 3
##  $ Height: num  170 180 185 160
##  $ Test  : num  1 2 3 4

To get summary statistics for each column of the dataframe:

print(summary(Z))

##  GENDER       ID            NAME       Height           Test     
##  F:2    Min.   :123.0   James :2   Min.   :160.0   Min.   :1.00  
##  M:2    1st Qu.:206.2   Mary  :1   1st Qu.:167.5   1st Qu.:1.75  
##         Median :289.5   Olivia:1   Median :175.0   Median :2.50  
##         Mean   :289.5              Mean   :173.8   Mean   :2.50  
##         3rd Qu.:372.8              3rd Qu.:181.2   3rd Qu.:3.25  
##         Max.   :456.0              Max.   :185.0   Max.   :4.00

We can access a specific variable using the command:

Z$GENDER

## [1] F M M F
## Levels: F M

Can create a new dataframe with extracted columns with the shorthand:

(new <- data.frame(Z$GENDER, Z$ID))

##   Z.GENDER Z.ID
## 1        F  123
## 2        M  234
## 3        M  345
## 4        F  456

Can add a column/dataframe to a dataframe using cbind

Extra <- c(1,2,3,4);
(cbind(Z,Extra))

##   GENDER  ID   NAME Height Test Extra
## 1      F 123   Mary    170    1     1
## 2      M 234  James    180    2     2
## 3      M 345  James    185    3     3
## 4      F 456 Olivia    160    4     4

Can merge dataframes by a variable, often using a variable that is unique for each observation. Can add an all.x=TRUE arguement to the merge statement to only return results that are common between data frames.

Y <- data.frame(GENDER=c("M","F","F","M"),
ID=c(345,456,123,234),
NAME=c("James","Olivia","Mary","James"),
Weight=c(80,50,70,60));
merge(Z,Y, by=c("ID"))

##    ID GENDER.x NAME.x Height Test GENDER.y NAME.y Weight
## 1 123        F   Mary    170    1        F   Mary     70
## 2 234        M  James    180    2        M  James     60
## 3 345        M  James    185    3        M  James     80
## 4 456        F Olivia    160    4        F Olivia     50

Factors

Factors are a way to store strings that can be used in specific ways. Examples include the levels(x) function

x <- factor(c("blue","green","blue","red","blue","green","green"));
levels(x) #Shows all unique factors

## [1] "blue"  "green" "red"

Reading Files

To read text files that are presented as tables, you would use: my.data <- read.table(file=“C:/MyFolder/somedata.txt”, header = TRUE, sep =‘,’, dec=‘.’, row.names=1)

To read a csv file

training_data <- read.csv(file = "C:/Users/Zac/Downloads/Diabetes_training.csv")

To write to a text (table) or csv (csv) file:

write.csv(training_data, file = "testfile.csv")