A data frame is a table or a two-dimensional array like structure in which each column contains values of one variable and each row contains one set of values from each column.
The following are the characteristics of a Data Frame:
To create a data frame, we shall use the data frame function called data.frame().
For example, this code creates a data frame of five employees information with their respective names, age, salaries, profession.
employees_info <- data.frame(emp_names = c("Frank", "Micheal",
"Peter", "Melissa", "Esther"), emp_age = c(43, 32, 40, 41,
45), salaries = c(1e+05, 2e+05, 50000, 5e+05, 70000), emp_profession = c("Data Scientist",
"Senior It Admin", "associate accountant", "Excutive director",
"junior data scientist"), stringsAsFactors = FALSE)
print(employees_info)
emp_names emp_age salaries emp_profession
1 Frank 43 1e+05 Data Scientist
2 Micheal 32 2e+05 Senior It Admin
3 Peter 40 5e+04 associate accountant
4 Melissa 41 5e+05 Excutive director
5 Esther 45 7e+04 junior data scientist
Also, to find class of an object, use class() function. For example, the class of employees_info object is data.frame.
Additionally, the structure of the the data frame could be found using the str() function. For example, we could find the structure of the dataframe created from the previous example like this:
str(employees_info)
'data.frame': 5 obs. of 4 variables:
$ emp_names : chr "Frank" "Micheal" "Peter" "Melissa" ...
$ emp_age : num 43 32 40 41 45
$ salaries : num 1e+05 2e+05 5e+04 5e+05 7e+04
$ emp_profession: chr "Data Scientist" "Senior It Admin" "associate accountant" "Excutive director" ...
The summary() function was applied here to encapsulate each variable in the Data Frame. As an example, apply summary() function to the employee_info data frame created earlier. While emp_age and salaries variables are numerical variables, emp_names and emp_profession are character variables.
summary(employees_info)
emp_names emp_age salaries emp_profession
Length:5 Min. :32.0 Min. : 50000 Length:5
Class :character 1st Qu.:40.0 1st Qu.: 70000 Class :character
Mode :character Median :41.0 Median :100000 Mode :character
Mean :40.2 Mean :184000
3rd Qu.:43.0 3rd Qu.:200000
Max. :45.0 Max. :500000
To find the row names of any data frame, apply the rownames() function to data frame. For example, Using the employees_info earlier created, rownames(employees_info) returns the names for each row in the data frame, employees_info.
rownames(employees_info)
[1] "1" "2" "3" "4" "5"
Also, we can use rownames() function to change add row names. For example, we add row names to employee_info using the following method
rownames(employees_info) <- c("row1", "row2", "row3", "row4",
"row5")
employees_info
emp_names emp_age salaries emp_profession
row1 Frank 43 1e+05 Data Scientist
row2 Micheal 32 2e+05 Senior It Admin
row3 Peter 40 5e+04 associate accountant
row4 Melissa 41 5e+05 Excutive director
row5 Esther 45 7e+04 junior data scientist
Furthermore, colnames() function can be applied to any data frame to return name of each variable. Accordingly, the following code returns variable names for each column in the employee_info data frame. Also, we could use dimnames() function as well.
colnames(employees_info)
[1] "emp_names" "emp_age" "salaries" "emp_profession"
dimnames(employees_info)
[[1]]
[1] "row1" "row2" "row3" "row4" "row5"
[[2]]
[1] "emp_names" "emp_age" "salaries" "emp_profession"
Just like the way we add row names to each row, we could change column’s names to another name using colnames() function like the following example:
colnames(employees_info) <- c("employee_names", "employee_age",
"employee_salaries", "employee_profession")
employees_info
employee_names employee_age employee_salaries employee_profession
row1 Frank 43 1e+05 Data Scientist
row2 Micheal 32 2e+05 Senior It Admin
row3 Peter 40 5e+04 associate accountant
row4 Melissa 41 5e+05 Excutive director
row5 Esther 45 7e+04 junior data scientist
Elements in data frames are selected using data frame’s indexing. For instance, we could select the first and second rows for the first and the second column.
employees_info[1:2, 1:2]
employee_names employee_age
row1 Frank 43
row2 Micheal 32
It is possible to extract row’s elements with a single column. In the following example, the first, second and third elements of the first column was extracted from the employees_info data frame.
employees_info[[1]][1:3]
[1] "Frank" "Micheal" "Peter"
The meaning of cbind() is column bind. cbind() function is used to combine vectors, matrices, and data frames by columns. As case in point, the code that follows applied cbind() function to combine data_info with v3 vector by column to form a data frame.
# Create a dataframe
data_info <- data.frame(col1 = 1:3, col2 = c(2, 3, 4), col3 = c(120,
130, 150))
# create a vector of decimal number
v3 <- c(12.23, 22.2, 30.3)
# Combine a dataframe with the vector using cbind() function
cbind(data_info, v3)
col1 col2 col3 v3
1 1 2 120 12.23
2 2 3 130 22.20
3 3 4 150 30.30
In the same vein, the name rbind means row-bind. The rbind() function can be used to combine vectors, matrices and/or data frames by row. However, caution must be observed when applying rbind() function for combination because this can cause changes in the columns classes if not handle properly. For instance, if we were to combine the data_info with v3, this will change all column class to a float class. In order to avoid this, we will not be using a vector but rather a data frame in this form:
df <- data.frame(col1 = 4, col2 = 5, col3 = 200)
rbind(data_info, df)
col1 col2 col3
1 1 2 120
2 2 3 130
3 3 4 150
4 4 5 200