A data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column. Data frame is a special case of a list.
The data.frame()function is used to create a data frame from vector.
The c() function creates a vector.
Below shows how data frame is created
x <- data.frame(
book_id = c(1001:1005),
book_author = c("Rick,M","Saltzman,D","Johnson,A","Jones,C","Garcia,K"),
book_publisher = c("Penguin","Penguin","Harper","Livre","Harper"),
stringsAsFactors = FALSE
)
#stringsAsFactors indicates whether strings in a data frame should be treated as factor variables or as just plain strings. By default it is equals to TRUE. Factor variable can be either numeric or strings.
print(x)
## book_id book_author book_publisher
## 1 1001 Rick,M Penguin
## 2 1002 Saltzman,D Penguin
## 3 1003 Johnson,A Harper
## 4 1004 Jones,C Livre
## 5 1005 Garcia,K Harper
In the above example, 3 vectors are created; book_id, book_author and book_publisher where each has a length of 5. The above codes creates a column of 3 and rows of 5.
Structure and Summary of a Data Frame
Use the str() funtion to know the structure of a data frame
str(x)
## 'data.frame': 5 obs. of 3 variables:
## $ book_id : int 1001 1002 1003 1004 1005
## $ book_author : chr "Rick,M" "Saltzman,D" "Johnson,A" "Jones,C" ...
## $ book_publisher: chr "Penguin" "Penguin" "Harper" "Livre" ...
summary() functions shows the statistical summary of a data
summary(x)
## book_id book_author book_publisher
## Min. :1001 Length:5 Length:5
## 1st Qu.:1002 Class :character Class :character
## Median :1003 Mode :character Mode :character
## Mean :1003
## 3rd Qu.:1004
## Max. :1005
Extract, Add and Remove data from a data frame
The following shows how extract data, Add row/column and also remove column/row.
1. Accessing or Extracting Data
Few ways to extract data from a data frame. Data can be extracted from a single cell, entire row or entire columns.
Below shows the syntax and example of how data can be extracted from a data frame.
##extracting row 4 in column 2(book_author)
x[4,2]
## [1] "Jones,C"
## all information from column 3 is being extracted
x[3,]
## book_id book_author book_publisher
## 3 1003 Johnson,A Harper
## all information from column 3 is being extracted
x[,3]
## [1] "Penguin" "Penguin" "Harper" "Livre" "Harper"
(Use the $ (dollar sign) operator to access specific column)
data.frame(x$book_author)
## x.book_author
## 1 Rick,M
## 2 Saltzman,D
## 3 Johnson,A
## 4 Jones,C
## 5 Garcia,K
data.frame(x$book_author[1:3])
## x.book_author.1.3.
## 1 Rick,M
## 2 Saltzman,D
## 3 Johnson,A
2. Add Row/Column
##Add Column
##To add column, use a column vector and insert a new column name
x$book_name = c("A","B","C","D","E")
book_info <- x
print(book_info)
## book_id book_author book_publisher book_name
## 1 1001 Rick,M Penguin A
## 2 1002 Saltzman,D Penguin B
## 3 1003 Johnson,A Harper C
## 4 1004 Jones,C Livre D
## 5 1005 Garcia,K Harper E
##To add row
book.newdata <- data.frame(
book_id = c(1006:1007),
book_author = c("Smith,L","Rivera,B"),
book_publisher = c("Penguin","Mifflin"),
book_name = c("K","L")
)
#Bind book.newdata with x
#rbind() function combines vector, matrix or data frame by rows.
#cbind() function is for column
allbooks <- rbind(x,book.newdata)
print(allbooks)
## book_id book_author book_publisher book_name
## 1 1001 Rick,M Penguin A
## 2 1002 Saltzman,D Penguin B
## 3 1003 Johnson,A Harper C
## 4 1004 Jones,C Livre D
## 5 1005 Garcia,K Harper E
## 6 1006 Smith,L Penguin K
## 7 1007 Rivera,B Mifflin L
3. Remove Row/Column
To remove a row, you can specify by using index
##Remove Row using index
removerow <- allbooks[-c(4:7),] #this will remove 4 books from the list
print(removerow)
## book_id book_author book_publisher book_name
## 1 1001 Rick,M Penguin A
## 2 1002 Saltzman,D Penguin B
## 3 1003 Johnson,A Harper C
To remove a column, you can use the name or column index
##Remove Column using name
dropc = subset(removerow,select=-c(book_name))
print(dropc)
## book_id book_author book_publisher
## 1 1001 Rick,M Penguin
## 2 1002 Saltzman,D Penguin
## 3 1003 Johnson,A Harper
#Remove Column using index. In this case, column with index 3 will be removed which is book_publisher column
dropnew <- removerow[-c(3)]
##small difference of removing rows and columns using index.
##To remove rows [-c(:),]
##To remove column [-c(:)]
##the comma will differentiate whether it is for columns or rows
print (dropnew)
## book_id book_author book_name
## 1 1001 Rick,M A
## 2 1002 Saltzman,D B
## 3 1003 Johnson,A C