Data Frame

A data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column. Data frame is a special case of a list.

The data.frame()function is used to create a data frame from vector.

The c() function creates a vector.

Below shows how data frame is created

x <- data.frame(
  book_id = c(1001:1005),
  book_author = c("Rick,M","Saltzman,D","Johnson,A","Jones,C","Garcia,K"),
  book_publisher = c("Penguin","Penguin","Harper","Livre","Harper"),
  stringsAsFactors = FALSE
)

#stringsAsFactors indicates whether strings in a data frame should be treated as factor variables or as just plain strings. By default it is equals to TRUE. Factor variable can be either numeric or strings. 

print(x)
##   book_id book_author book_publisher
## 1    1001      Rick,M        Penguin
## 2    1002  Saltzman,D        Penguin
## 3    1003   Johnson,A         Harper
## 4    1004     Jones,C          Livre
## 5    1005    Garcia,K         Harper

In the above example, 3 vectors are created; book_id, book_author and book_publisher where each has a length of 5. The above codes creates a column of 3 and rows of 5.

Structure and Summary of a Data Frame

Use the str() funtion to know the structure of a data frame

str(x)
## 'data.frame':    5 obs. of  3 variables:
##  $ book_id       : int  1001 1002 1003 1004 1005
##  $ book_author   : chr  "Rick,M" "Saltzman,D" "Johnson,A" "Jones,C" ...
##  $ book_publisher: chr  "Penguin" "Penguin" "Harper" "Livre" ...

summary() functions shows the statistical summary of a data

summary(x)
##     book_id     book_author        book_publisher    
##  Min.   :1001   Length:5           Length:5          
##  1st Qu.:1002   Class :character   Class :character  
##  Median :1003   Mode  :character   Mode  :character  
##  Mean   :1003                                        
##  3rd Qu.:1004                                        
##  Max.   :1005

Extract, Add and Remove data from a data frame

The following shows how extract data, Add row/column and also remove column/row.

1. Accessing or Extracting Data

Few ways to extract data from a data frame. Data can be extracted from a single cell, entire row or entire columns.

Below shows the syntax and example of how data can be extracted from a data frame.

  1. table1[x,y] where x is row, y is column
##extracting row 4 in column 2(book_author)
x[4,2]
## [1] "Jones,C"
  1. table1[x,] where only the row is being extracted
## all information from column 3 is being extracted

x[3,]
##   book_id book_author book_publisher
## 3    1003   Johnson,A         Harper
  1. table1[,y] where only the column is being extracted
## all information from column 3 is being extracted

x[,3]
## [1] "Penguin" "Penguin" "Harper"  "Livre"   "Harper"
  1. table1$col1 where it refers to extracting the data from column col1 from table1.

(Use the $ (dollar sign) operator to access specific column)

data.frame(x$book_author)
##   x.book_author
## 1        Rick,M
## 2    Saltzman,D
## 3     Johnson,A
## 4       Jones,C
## 5      Garcia,K
  1. table1$col1[x] where it refers to the row x of column col1
data.frame(x$book_author[1:3])
##   x.book_author.1.3.
## 1             Rick,M
## 2         Saltzman,D
## 3          Johnson,A

2. Add Row/Column

##Add Column
##To add column, use a column vector and insert a new column name

x$book_name = c("A","B","C","D","E")

book_info <- x

print(book_info)
##   book_id book_author book_publisher book_name
## 1    1001      Rick,M        Penguin         A
## 2    1002  Saltzman,D        Penguin         B
## 3    1003   Johnson,A         Harper         C
## 4    1004     Jones,C          Livre         D
## 5    1005    Garcia,K         Harper         E
##To add row

book.newdata <- data.frame(
  book_id = c(1006:1007),
  book_author = c("Smith,L","Rivera,B"),
  book_publisher = c("Penguin","Mifflin"),
  book_name = c("K","L")
)

#Bind book.newdata with x 
#rbind() function combines vector, matrix or data frame by rows.
#cbind() function is for column

allbooks <- rbind(x,book.newdata)
print(allbooks)
##   book_id book_author book_publisher book_name
## 1    1001      Rick,M        Penguin         A
## 2    1002  Saltzman,D        Penguin         B
## 3    1003   Johnson,A         Harper         C
## 4    1004     Jones,C          Livre         D
## 5    1005    Garcia,K         Harper         E
## 6    1006     Smith,L        Penguin         K
## 7    1007    Rivera,B        Mifflin         L

3. Remove Row/Column

To remove a row, you can specify by using index

##Remove Row using index 

removerow <- allbooks[-c(4:7),] #this will remove 4 books from the list
print(removerow)
##   book_id book_author book_publisher book_name
## 1    1001      Rick,M        Penguin         A
## 2    1002  Saltzman,D        Penguin         B
## 3    1003   Johnson,A         Harper         C

To remove a column, you can use the name or column index

##Remove Column using name

dropc = subset(removerow,select=-c(book_name))

print(dropc)
##   book_id book_author book_publisher
## 1    1001      Rick,M        Penguin
## 2    1002  Saltzman,D        Penguin
## 3    1003   Johnson,A         Harper
#Remove Column using index. In this case, column with index 3 will be removed which is book_publisher column

dropnew <- removerow[-c(3)] 

##small difference of removing rows and columns using index. 

##To remove rows [-c(:),] 
##To remove column [-c(:)]

##the comma will differentiate whether it is for columns or rows

print (dropnew)
##   book_id book_author book_name
## 1    1001      Rick,M         A
## 2    1002  Saltzman,D         B
## 3    1003   Johnson,A         C