Dataframe is a two dimensional data structure in R that consists of rows and columns. It is a special case of a list which has each component of equal length. The difference between dataframe and matrix is that in each column of a dataframe can be different types of elements: character, integer, boolean and etc.
The dataframe below is created. It consists of boolean, integer and character elements.
name <- c("Fatimah Nizam", "Basyir Nizam", "Adam Sinclair", "Harry Styles",
"Maisarah Zairi", "Fateh Malik","Henry Golding", "Kendall Jenner", "Michael Jackson", "Gigi Hadid")
age <- c(24, 15, 30, 29, 17, 16, 35, 25, 14, 15)
teen <- c(FALSE, TRUE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE,TRUE, TRUE)
people <- data.frame(name,age,teen, stringsAsFactors = FALSE)
people
## name age teen
## 1 Fatimah Nizam 24 FALSE
## 2 Basyir Nizam 15 TRUE
## 3 Adam Sinclair 30 FALSE
## 4 Harry Styles 29 FALSE
## 5 Maisarah Zairi 17 TRUE
## 6 Fateh Malik 16 TRUE
## 7 Henry Golding 35 FALSE
## 8 Kendall Jenner 25 FALSE
## 9 Michael Jackson 14 TRUE
## 10 Gigi Hadid 15 TRUE
A dataframe named people is created. The argument stringsAsFactors is a logical argument used to indicate whether strings in a data frame should be treated as factor variables or just plain strings.
To check the attribute of the dataframe, class() function can be used.
class(people)
## [1] "data.frame"
typeof() function is used to access type of an object, it is more specific than class() function.
typeof(people)
## [1] "list"
From the result we can see that each object inside a dataframe is a list. This is what defined a dataframe.
To check the length of the Dataframe:
length(people)
## [1] 3
To assign names to the row of the DataFframe:
row.names(people) <- c("name1","name2","name3","name4","name5","name6","name7","name8","name9", "name10")
people
## name age teen
## name1 Fatimah Nizam 24 FALSE
## name2 Basyir Nizam 15 TRUE
## name3 Adam Sinclair 30 FALSE
## name4 Harry Styles 29 FALSE
## name5 Maisarah Zairi 17 TRUE
## name6 Fateh Malik 16 TRUE
## name7 Henry Golding 35 FALSE
## name8 Kendall Jenner 25 FALSE
## name9 Michael Jackson 14 TRUE
## name10 Gigi Hadid 15 TRUE
There are many ways to rename the column. Solution 1 is using the name () function.
names(people) <- c("Full Name", "Age", "Teenager")
people
## Full Name Age Teenager
## name1 Fatimah Nizam 24 FALSE
## name2 Basyir Nizam 15 TRUE
## name3 Adam Sinclair 30 FALSE
## name4 Harry Styles 29 FALSE
## name5 Maisarah Zairi 17 TRUE
## name6 Fateh Malik 16 TRUE
## name7 Henry Golding 35 FALSE
## name8 Kendall Jenner 25 FALSE
## name9 Michael Jackson 14 TRUE
## name10 Gigi Hadid 15 TRUE
Solution 2 is to assign new names in the dataframe() function.
people <- data.frame(Full_Name = name, Age = age, Teenager = teen)
people
## Full_Name Age Teenager
## 1 Fatimah Nizam 24 FALSE
## 2 Basyir Nizam 15 TRUE
## 3 Adam Sinclair 30 FALSE
## 4 Harry Styles 29 FALSE
## 5 Maisarah Zairi 17 TRUE
## 6 Fateh Malik 16 TRUE
## 7 Henry Golding 35 FALSE
## 8 Kendall Jenner 25 FALSE
## 9 Michael Jackson 14 TRUE
## 10 Gigi Hadid 15 TRUE
nrow() and ncol() function are used to identify the number of rows and columns inside a dataframe. paste0 is used in print() function to ensure that the variable row and column are printed align (same row) with the strings.
row <- nrow(people)
print(paste0("Number of rows: ", row))
## [1] "Number of rows: 10"
col <- ncol(people)
print(paste0("Number of column: ", col))
## [1] "Number of column: 3"
To identify the dimension of a dataframe, dim() function is used.
dim(people)
## [1] 10 3
To access the first 6 rows of a dataframe, head() function is used. To access the last 6 rows of a dataframe, tail() function is used.
print("The first 6 rows")
## [1] "The first 6 rows"
head(people)
## Full_Name Age Teenager
## 1 Fatimah Nizam 24 FALSE
## 2 Basyir Nizam 15 TRUE
## 3 Adam Sinclair 30 FALSE
## 4 Harry Styles 29 FALSE
## 5 Maisarah Zairi 17 TRUE
## 6 Fateh Malik 16 TRUE
print("The last 6 rows")
## [1] "The last 6 rows"
tail(people)
## Full_Name Age Teenager
## 5 Maisarah Zairi 17 TRUE
## 6 Fateh Malik 16 TRUE
## 7 Henry Golding 35 FALSE
## 8 Kendall Jenner 25 FALSE
## 9 Michael Jackson 14 TRUE
## 10 Gigi Hadid 15 TRUE
To access all of the rows except the last one:
people[1:9,]
## Full_Name Age Teenager
## 1 Fatimah Nizam 24 FALSE
## 2 Basyir Nizam 15 TRUE
## 3 Adam Sinclair 30 FALSE
## 4 Harry Styles 29 FALSE
## 5 Maisarah Zairi 17 TRUE
## 6 Fateh Malik 16 TRUE
## 7 Henry Golding 35 FALSE
## 8 Kendall Jenner 25 FALSE
## 9 Michael Jackson 14 TRUE
To exclude the second row from the dataframe:
people[-2,]
## Full_Name Age Teenager
## 1 Fatimah Nizam 24 FALSE
## 3 Adam Sinclair 30 FALSE
## 4 Harry Styles 29 FALSE
## 5 Maisarah Zairi 17 TRUE
## 6 Fateh Malik 16 TRUE
## 7 Henry Golding 35 FALSE
## 8 Kendall Jenner 25 FALSE
## 9 Michael Jackson 14 TRUE
## 10 Gigi Hadid 15 TRUE
ways to select a single element from a dataframe:
people[3,2]
## [1] 30
people[3,"Age"]
## [1] 30
There is a way to select several specific elements from data frame. For example, we want to know the details for Fatimah Nizam and Gigi Hadid.
people[c(1,10),c("Age", "Teenager")]
## Age Teenager
## 1 24 FALSE
## 10 15 TRUE
The names Fatimah Nizam and Gigi Hadid is not displayed since the Full Name column is not selected.
There are a few ways to access the column. The $ symbol and [ ] bracket can be used to access the column’s elements.
people$Age
## [1] 24 15 30 29 17 16 35 25 14 15
people[["Age"]]
## [1] 24 15 30 29 17 16 35 25 14 15
people["Age"]
## Age
## 1 24
## 2 15
## 3 30
## 4 29
## 5 17
## 6 16
## 7 35
## 8 25
## 9 14
## 10 15
You can see a different result from people$age, people[[“Age”]] and people[“Age”]. The first twos show vector result and the latter one shows a dataframe result. Remember, dataframe is actually a list containing all vectors of the same length.
It will give you the same result when you use the index of the column to access its element. For example : people[2] and people[[2]].
There are several ways to add a column to a dataframe. The solution 1 is to assign the column variable to the people dataframe.
## Height column variable is created.
height <- c(165, 177, 163, 162, 157, 170, 180, 167, 175, 171)
# The column variable is then assigned to the dataframe
people$height <- height
#or
people[["height"]] <- height
# result
people
## Full_Name Age Teenager height
## 1 Fatimah Nizam 24 FALSE 165
## 2 Basyir Nizam 15 TRUE 177
## 3 Adam Sinclair 30 FALSE 163
## 4 Harry Styles 29 FALSE 162
## 5 Maisarah Zairi 17 TRUE 157
## 6 Fateh Malik 16 TRUE 170
## 7 Henry Golding 35 FALSE 180
## 8 Kendall Jenner 25 FALSE 167
## 9 Michael Jackson 14 TRUE 175
## 10 Gigi Hadid 15 TRUE 171
The solution 2 is to use the cbind() function.
weight <- c(58, 63, 68, 55, 56, 70, 64, 65, 75, 55)
cbind(people, weight)
## Full_Name Age Teenager height weight
## 1 Fatimah Nizam 24 FALSE 165 58
## 2 Basyir Nizam 15 TRUE 177 63
## 3 Adam Sinclair 30 FALSE 163 68
## 4 Harry Styles 29 FALSE 162 55
## 5 Maisarah Zairi 17 TRUE 157 56
## 6 Fateh Malik 16 TRUE 170 70
## 7 Henry Golding 35 FALSE 180 64
## 8 Kendall Jenner 25 FALSE 167 65
## 9 Michael Jackson 14 TRUE 175 75
## 10 Gigi Hadid 15 TRUE 171 55
But there is a problem by using cbind( ) function. The column assigned is not included in the people dataframe.
people
## Full_Name Age Teenager height
## 1 Fatimah Nizam 24 FALSE 165
## 2 Basyir Nizam 15 TRUE 177
## 3 Adam Sinclair 30 FALSE 163
## 4 Harry Styles 29 FALSE 162
## 5 Maisarah Zairi 17 TRUE 157
## 6 Fateh Malik 16 TRUE 170
## 7 Henry Golding 35 FALSE 180
## 8 Kendall Jenner 25 FALSE 167
## 9 Michael Jackson 14 TRUE 175
## 10 Gigi Hadid 15 TRUE 171
The solution is to actually assign the cbind ( ) function into people dataframe.
weight <- c(58, 63, 68, 55, 56, 70, 64, 65, 75, 55)
people <- cbind(people, weight)
people
## Full_Name Age Teenager height weight
## 1 Fatimah Nizam 24 FALSE 165 58
## 2 Basyir Nizam 15 TRUE 177 63
## 3 Adam Sinclair 30 FALSE 163 68
## 4 Harry Styles 29 FALSE 162 55
## 5 Maisarah Zairi 17 TRUE 157 56
## 6 Fateh Malik 16 TRUE 170 70
## 7 Henry Golding 35 FALSE 180 64
## 8 Kendall Jenner 25 FALSE 167 65
## 9 Michael Jackson 14 TRUE 175 75
## 10 Gigi Hadid 15 TRUE 171 55
Now we will like to add a row into the Dataframe:
tom <- data.frame(Full_Name = "Tom Riddle", Age= 37,
Teenager = FALSE, height = 183, weight = 80)
people <- rbind(people, tom)
people
## Full_Name Age Teenager height weight
## 1 Fatimah Nizam 24 FALSE 165 58
## 2 Basyir Nizam 15 TRUE 177 63
## 3 Adam Sinclair 30 FALSE 163 68
## 4 Harry Styles 29 FALSE 162 55
## 5 Maisarah Zairi 17 TRUE 157 56
## 6 Fateh Malik 16 TRUE 170 70
## 7 Henry Golding 35 FALSE 180 64
## 8 Kendall Jenner 25 FALSE 167 65
## 9 Michael Jackson 14 TRUE 175 75
## 10 Gigi Hadid 15 TRUE 171 55
## 11 Tom Riddle 37 FALSE 183 80
We can sort the column’s elements. For example, we select the Age column.
x <- sort(people$Age)
x
## [1] 14 15 15 16 17 24 25 29 30 35 37
We can also re-ordering the elements.
ranks <- order(people$Age)
ranks
## [1] 9 2 10 6 5 1 8 4 3 7 11
people
## Full_Name Age Teenager height weight
## 1 Fatimah Nizam 24 FALSE 165 58
## 2 Basyir Nizam 15 TRUE 177 63
## 3 Adam Sinclair 30 FALSE 163 68
## 4 Harry Styles 29 FALSE 162 55
## 5 Maisarah Zairi 17 TRUE 157 56
## 6 Fateh Malik 16 TRUE 170 70
## 7 Henry Golding 35 FALSE 180 64
## 8 Kendall Jenner 25 FALSE 167 65
## 9 Michael Jackson 14 TRUE 175 75
## 10 Gigi Hadid 15 TRUE 171 55
## 11 Tom Riddle 37 FALSE 183 80
We can see that Age 14 is the lowest and it belongs to Michael Jackson. Its index 9, comes first in the rank. Tom Riddle is the oldest, with 37 of Age. Its index 11, comes last in the rank.
To reorder the Dataframe according to its Age elements from the oldest to the youngest:
people[order(people$Age, decreasing = TRUE), ]
## Full_Name Age Teenager height weight
## 11 Tom Riddle 37 FALSE 183 80
## 7 Henry Golding 35 FALSE 180 64
## 3 Adam Sinclair 30 FALSE 163 68
## 4 Harry Styles 29 FALSE 162 55
## 8 Kendall Jenner 25 FALSE 167 65
## 1 Fatimah Nizam 24 FALSE 165 58
## 5 Maisarah Zairi 17 TRUE 157 56
## 6 Fateh Malik 16 TRUE 170 70
## 2 Basyir Nizam 15 TRUE 177 63
## 10 Gigi Hadid 15 TRUE 171 55
## 9 Michael Jackson 14 TRUE 175 75
str() function is used to store the Dataframe as a factor.
str(people)
## 'data.frame': 11 obs. of 5 variables:
## $ Full_Name: chr "Fatimah Nizam" "Basyir Nizam" "Adam Sinclair" "Harry Styles" ...
## $ Age : num 24 15 30 29 17 16 35 25 14 15 ...
## $ Teenager : logi FALSE TRUE FALSE FALSE TRUE TRUE ...
## $ height : num 165 177 163 162 157 170 180 167 175 171 ...
## $ weight : num 58 63 68 55 56 70 64 65 75 55 ...
Last, to summarize the Dataframe:
summary(people)
## Full_Name Age Teenager height
## Length:11 Min. :14.00 Mode :logical Min. :157
## Class :character 1st Qu.:15.50 FALSE:6 1st Qu.:164
## Mode :character Median :24.00 TRUE :5 Median :170
## Mean :23.36 Mean :170
## 3rd Qu.:29.50 3rd Qu.:176
## Max. :37.00 Max. :183
## weight
## Min. :55.00
## 1st Qu.:57.00
## Median :64.00
## Mean :64.45
## 3rd Qu.:69.00
## Max. :80.00
Thank you and I hope that solves EVERYTHING for R Dataframe!