Creating a data frame in R
* A data frame can be created using the “data.frame()” function in R.
x <- data.frame("SN"=1:8, "Name"=c("Samuel","Dory","Ken","Danny","Sarah","Dan","Kenny","Derrick"), "Age"=c(23,21,24,22,22,41,28,31), "Blood Type"=c("A","A","B","B","B","B","O","A"), stringsAsFactors = FALSE)
* Check if a variable is a data frame using the class() function and print the data frame
class(x)
## [1] "data.frame"
x
## SN Name Age Blood.Type
## 1 1 Samuel 23 A
## 2 2 Dory 21 A
## 3 3 Ken 24 B
## 4 4 Danny 22 B
## 5 5 Sarah 22 B
## 6 6 Dan 41 B
## 7 7 Kenny 28 O
## 8 8 Derrick 31 A
* Checking the structure of a data frame using the str() function.
str(x)
## 'data.frame': 8 obs. of 4 variables:
## $ SN : int 1 2 3 4 5 6 7 8
## $ Name : chr "Samuel" "Dory" "Ken" "Danny" ...
## $ Age : num 23 21 24 22 22 41 28 31
## $ Blood.Type: chr "A" "A" "B" "B" ...
* Checking the statistical summary of a data frame using the summary() function.
summary(x)
## SN Name Age Blood.Type
## Min. :1.00 Length:8 Min. :21.00 Length:8
## 1st Qu.:2.75 Class :character 1st Qu.:22.00 Class :character
## Median :4.50 Mode :character Median :23.50 Mode :character
## Mean :4.50 Mean :26.50
## 3rd Qu.:6.25 3rd Qu.:28.75
## Max. :8.00 Max. :41.00
* Checking the variables of a data frame using the names() function.
names(x)
## [1] "SN" "Name" "Age" "Blood.Type"
* Checking the number of columns of a data frame using the ncol() function.
ncol(x)
## [1] 4
* Checking the number of rows of a data frame using the nrow() function.
nrow(x)
## [1] 8
* Checking the length of the list in a data frame using the length() function, similar to ncol().
length(x)
## [1] 4
* Checking the names of the variables with names() function
names(x)
## [1] "SN" "Name" "Age" "Blood.Type"
* Checking the names of each of the rows or observations with row.names() function
row.names(x)
## [1] "1" "2" "3" "4" "5" "6" "7" "8"
Accessing a data frame in R
* Accessing a data frame in R is very similar to accessing a matrix or a list.
* Show the data frame
x
## SN Name Age Blood.Type
## 1 1 Samuel 23 A
## 2 2 Dory 21 A
## 3 3 Ken 24 B
## 4 4 Danny 22 B
## 5 5 Sarah 22 B
## 6 6 Dan 41 B
## 7 7 Kenny 28 O
## 8 8 Derrick 31 A
* Accessing a data frame by row using df[]
x[1,]
## SN Name Age Blood.Type
## 1 1 Samuel 23 A
x[1:3,]
## SN Name Age Blood.Type
## 1 1 Samuel 23 A
## 2 2 Dory 21 A
## 3 3 Ken 24 B
* Accessing a data frame like a matrix
* Accessing the top few rows of a data frame
head(x,n=3)
## SN Name Age Blood.Type
## 1 1 Samuel 23 A
## 2 2 Dory 21 A
## 3 3 Ken 24 B
* Accessing specific variable of a specific rows/observation
x[1:2,2]
## [1] "Samuel" "Dory"
* Accessing a data frame with conditions
x[x$Age>30,]
## SN Name Age Blood.Type
## 6 6 Dan 41 B
## 8 8 Derrick 31 A
* Subset a data frame under a specific condition
subset(x, subset=Age>30)
## SN Name Age Blood.Type
## 6 6 Dan 41 B
## 8 8 Derrick 31 A
Modifying a data frame in R
* Modification of a data frame through reassignment
x
## SN Name Age Blood.Type
## 1 1 Samuel 23 A
## 2 2 Dory 21 A
## 3 3 Ken 24 B
## 4 4 Danny 22 B
## 5 5 Sarah 22 B
## 6 6 Dan 41 B
## 7 7 Kenny 28 O
## 8 8 Derrick 31 A
x[2,"Age"] <- 26
x
## SN Name Age Blood.Type
## 1 1 Samuel 23 A
## 2 2 Dory 26 A
## 3 3 Ken 24 B
## 4 4 Danny 22 B
## 5 5 Sarah 22 B
## 6 6 Dan 41 B
## 7 7 Kenny 28 O
## 8 8 Derrick 31 A
* Adding rows/observation to a data frame using the rbind() function
x
## SN Name Age Blood.Type
## 1 1 Samuel 23 A
## 2 2 Dory 26 A
## 3 3 Ken 24 B
## 4 4 Danny 22 B
## 5 5 Sarah 22 B
## 6 6 Dan 41 B
## 7 7 Kenny 28 O
## 8 8 Derrick 31 A
rbind(x,list(9,"Tom",15,"O"))
## SN Name Age Blood.Type
## 1 1 Samuel 23 A
## 2 2 Dory 26 A
## 3 3 Ken 24 B
## 4 4 Danny 22 B
## 5 5 Sarah 22 B
## 6 6 Dan 41 B
## 7 7 Kenny 28 O
## 8 8 Derrick 31 A
## 9 9 Tom 15 O
* Adding columns to a data frame using the cbind() function
x
## SN Name Age Blood.Type
## 1 1 Samuel 23 A
## 2 2 Dory 26 A
## 3 3 Ken 24 B
## 4 4 Danny 22 B
## 5 5 Sarah 22 B
## 6 6 Dan 41 B
## 7 7 Kenny 28 O
## 8 8 Derrick 31 A
cbind(x,"Gender"=c("M","F","M","M","F","M","M","M"))
## SN Name Age Blood.Type Gender
## 1 1 Samuel 23 A M
## 2 2 Dory 26 A F
## 3 3 Ken 24 B M
## 4 4 Danny 22 B M
## 5 5 Sarah 22 B F
## 6 6 Dan 41 B M
## 7 7 Kenny 28 O M
## 8 8 Derrick 31 A M
* Adding columns to a data frame based on existing columns
x
## SN Name Age Blood.Type
## 1 1 Samuel 23 A
## 2 2 Dory 26 A
## 3 3 Ken 24 B
## 4 4 Danny 22 B
## 5 5 Sarah 22 B
## 6 6 Dan 41 B
## 7 7 Kenny 28 O
## 8 8 Derrick 31 A
x$AgeSN <- (x$SN * x$Age)
x
## SN Name Age Blood.Type AgeSN
## 1 1 Samuel 23 A 23
## 2 2 Dory 26 A 52
## 3 3 Ken 24 B 72
## 4 4 Danny 22 B 88
## 5 5 Sarah 22 B 110
## 6 6 Dan 41 B 246
## 7 7 Kenny 28 O 196
## 8 8 Derrick 31 A 248
* Adding a vector as a variable to a data frame
x
## SN Name Age Blood.Type AgeSN
## 1 1 Samuel 23 A 23
## 2 2 Dory 26 A 52
## 3 3 Ken 24 B 72
## 4 4 Danny 22 B 88
## 5 5 Sarah 22 B 110
## 6 6 Dan 41 B 246
## 7 7 Kenny 28 O 196
## 8 8 Derrick 31 A 248
ID <- c(121,231,452,109,223,76,090,564)
x$ID <- ID
x
## SN Name Age Blood.Type AgeSN ID
## 1 1 Samuel 23 A 23 121
## 2 2 Dory 26 A 52 231
## 3 3 Ken 24 B 72 452
## 4 4 Danny 22 B 88 109
## 5 5 Sarah 22 B 110 223
## 6 6 Dan 41 B 246 76
## 7 7 Kenny 28 O 196 90
## 8 8 Derrick 31 A 248 564
* Adding columns to a data frame with rounded values
x
## SN Name Age Blood.Type AgeSN ID
## 1 1 Samuel 23 A 23 121
## 2 2 Dory 26 A 52 231
## 3 3 Ken 24 B 72 452
## 4 4 Danny 22 B 88 109
## 5 5 Sarah 22 B 110 223
## 6 6 Dan 41 B 246 76
## 7 7 Kenny 28 O 196 90
## 8 8 Derrick 31 A 248 564
nums <- c(1.21,2.23,3.12,4.222,4.4,7.888,1.1,1.0)
x$nums <- nums
x$nums <- round(x$nums,1)
x
## SN Name Age Blood.Type AgeSN ID nums
## 1 1 Samuel 23 A 23 121 1.2
## 2 2 Dory 26 A 52 231 2.2
## 3 3 Ken 24 B 72 452 3.1
## 4 4 Danny 22 B 88 109 4.2
## 5 5 Sarah 22 B 110 223 4.4
## 6 6 Dan 41 B 246 76 7.9
## 7 7 Kenny 28 O 196 90 1.1
## 8 8 Derrick 31 A 248 564 1.0
Deleting from a data frame in R
* Delete a variable of a data frame
x
## SN Name Age Blood.Type AgeSN ID nums
## 1 1 Samuel 23 A 23 121 1.2
## 2 2 Dory 26 A 52 231 2.2
## 3 3 Ken 24 B 72 452 3.1
## 4 4 Danny 22 B 88 109 4.2
## 5 5 Sarah 22 B 110 223 4.4
## 6 6 Dan 41 B 246 76 7.9
## 7 7 Kenny 28 O 196 90 1.1
## 8 8 Derrick 31 A 248 564 1.0
x$ID <- NULL
x
## SN Name Age Blood.Type AgeSN nums
## 1 1 Samuel 23 A 23 1.2
## 2 2 Dory 26 A 52 2.2
## 3 3 Ken 24 B 72 3.1
## 4 4 Danny 22 B 88 4.2
## 5 5 Sarah 22 B 110 4.4
## 6 6 Dan 41 B 246 7.9
## 7 7 Kenny 28 O 196 1.1
## 8 8 Derrick 31 A 248 1.0
* Delete a row from a data frame
x
## SN Name Age Blood.Type AgeSN nums
## 1 1 Samuel 23 A 23 1.2
## 2 2 Dory 26 A 52 2.2
## 3 3 Ken 24 B 72 3.1
## 4 4 Danny 22 B 88 4.2
## 5 5 Sarah 22 B 110 4.4
## 6 6 Dan 41 B 246 7.9
## 7 7 Kenny 28 O 196 1.1
## 8 8 Derrick 31 A 248 1.0
x <- x[-2,]
x
## SN Name Age Blood.Type AgeSN nums
## 1 1 Samuel 23 A 23 1.2
## 3 3 Ken 24 B 72 3.1
## 4 4 Danny 22 B 88 4.2
## 5 5 Sarah 22 B 110 4.4
## 6 6 Dan 41 B 246 7.9
## 7 7 Kenny 28 O 196 1.1
## 8 8 Derrick 31 A 248 1.0