This assignment is to demonstrate the data frame class in R. In summary, a data frame is characterised as follows:
There are a number of methods that can be used to create a data frame. Read functions can be used to import data from external files. There are a number of functions that can be used to perform this:
The following is how to import a csv from the working directory.
flavours <- read.csv('flavors.csv', header = FALSE)
head(flavours)
## V1 V2
## 1 celery vegetable
## 2 corn vegetable
## 3 cucumber vegetable
## 4 horseradish vegetable
## 5 vegetable vegetable
## 6 potato vegetable
Also vector can be used to create a data frame by using data.frame(..) function with each vector is a column. All vectors have to be equal length. The variable name of the vectors will become the column name.
eid = 1:5
fn = c('Jon','Rahul','Alanah','Mike','Jacob')
ln = c('Smith','Kohli','Pearce','Peake','Fullerton')
employees <- data.frame(eid ,fn, ln)
print(employees)
## eid fn ln
## 1 1 Jon Smith
## 2 2 Rahul Kohli
## 3 3 Alanah Pearce
## 4 4 Mike Peake
## 5 5 Jacob Fullerton
The columns can also be named using names(..) function.
names(employees) <- c('employee.id', 'first.name', 'last.name')
print(employees)
## employee.id first.name last.name
## 1 1 Jon Smith
## 2 2 Rahul Kohli
## 3 3 Alanah Pearce
## 4 4 Mike Peake
## 5 5 Jacob Fullerton
The rows can be named using row.names(..) function.
row.names(employees) <- c('J.S', 'R.K', 'A.P', 'M.P', 'J.F')
print(employees)
## employee.id first.name last.name
## J.S 1 Jon Smith
## R.K 2 Rahul Kohli
## A.P 3 Alanah Pearce
## M.P 4 Mike Peake
## J.F 5 Jacob Fullerton
To confirm the variable type, the class(..) function can be used. It will return “data.frame” for data frames.
class(employees)
## [1] "data.frame"
There are multiple ways to summarise a data frame.
str(employees)
## 'data.frame': 5 obs. of 3 variables:
## $ employee.id: int 1 2 3 4 5
## $ first.name : chr "Jon" "Rahul" "Alanah" "Mike" ...
## $ last.name : chr "Smith" "Kohli" "Pearce" "Peake" ...
summary(employees)
## employee.id first.name last.name
## Min. :1 Length:5 Length:5
## 1st Qu.:2 Class :character Class :character
## Median :3 Mode :character Mode :character
## Mean :3
## 3rd Qu.:4
## Max. :5
head(..) and tail(..) functions can be used to access the first few rows or the last few rows.
head(flavours)
## V1 V2
## 1 celery vegetable
## 2 corn vegetable
## 3 cucumber vegetable
## 4 horseradish vegetable
## 5 vegetable vegetable
## 6 potato vegetable
tail(flavours)
## V1 V2
## 851 wet NULL
## 852 wild NULL
## 853 wine-lee NULL
## 854 winey NULL
## 855 yeasty NULL
## 856 ylang NULL
The function ncol(..) can be used to list the number of columns.
ncol(employees)
## [1] 3
The function nrow(..) can be used to list the number of rows
nrow(flavours)
## [1] 856
On rstudio, the function View(..) can be used to show the data frame in another tab.
#View(flavours)
ncol() A data frame column can be accessed by passing a single index number using single corner bracket. This will return a data.frame.
a <- flavours[1]
print(head(a))
## V1
## 1 celery
## 2 corn
## 3 cucumber
## 4 horseradish
## 5 vegetable
## 6 potato
If a single index number is passed using double corner bracket a vector will be returned instead.
b <- flavours[[2]]
print(head(b))
## [1] "vegetable" "vegetable" "vegetable" "vegetable" "vegetable" "vegetable"
The name of the column can also be used to access the column.
c <- employees$first.name
print(c)
## [1] "Jon" "Rahul" "Alanah" "Mike" "Jacob"
d <- employees['first.name']
print(d)
## first.name
## J.S Jon
## R.K Rahul
## A.P Alanah
## M.P Mike
## J.F Jacob
Similarly, if double corner bracket is used, a vector will be returned.
e <- employees[['first.name']]
print(e)
## [1] "Jon" "Rahul" "Alanah" "Mike" "Jacob"
Another way of accessing the column is by using row and column index method, with the row parameter left empty.
f <- employees[,2]
print(f)
## [1] "Jon" "Rahul" "Alanah" "Mike" "Jacob"
But the above method will return a vector. By passing drop=FALSE, data frame will be returned instead.
g <- employees[,2,drop=FALSE]
print(g)
## first.name
## J.S Jon
## R.K Rahul
## A.P Alanah
## M.P Mike
## J.F Jacob
To access the row, row and column index method can be used, with the column parameter left empty.
a <- flavours[1,]
print(a)
## V1 V2
## 1 celery vegetable
Multiple rows can be accessed by passing an index vector.
b <- flavours[1:6,]
print(b)
## V1 V2
## 1 celery vegetable
## 2 corn vegetable
## 3 cucumber vegetable
## 4 horseradish vegetable
## 5 vegetable vegetable
## 6 potato vegetable
c <- flavours[c(3,6),]
print(c)
## V1 V2
## 3 cucumber vegetable
## 6 potato vegetable
When a single row is selected, a vector can be returned instead by passing drop=TRUE parameter.
d <- flavours[1,,drop=TRUE]
print(d)
## $V1
## [1] "celery"
##
## $V2
## [1] "vegetable"
A new row can be added using rbind function.
employees <- rbind(employees, c(6,'James','Willems'))
row.names(employees)[6] <- 'J.W'
print(employees)
## employee.id first.name last.name
## J.S 1 Jon Smith
## R.K 2 Rahul Kohli
## A.P 3 Alanah Pearce
## M.P 4 Mike Peake
## J.F 5 Jacob Fullerton
## J.W 6 James Willems
A new column can be added using cbind function.
employees <- cbind(employees, position = c('Manager','Assistant Manager','Producer','Editor','Editor','Editor'))
print(employees)
## employee.id first.name last.name position
## J.S 1 Jon Smith Manager
## R.K 2 Rahul Kohli Assistant Manager
## A.P 3 Alanah Pearce Producer
## M.P 4 Mike Peake Editor
## J.F 5 Jacob Fullerton Editor
## J.W 6 James Willems Editor
A new column can also be added using assignment to a new name.
employees$salary <- c(8000,7000,5000,3000,3000,3000)
print(employees)
## employee.id first.name last.name position salary
## J.S 1 Jon Smith Manager 8000
## R.K 2 Rahul Kohli Assistant Manager 7000
## A.P 3 Alanah Pearce Producer 5000
## M.P 4 Mike Peake Editor 3000
## J.F 5 Jacob Fullerton Editor 3000
## J.W 6 James Willems Editor 3000
A value can be changed using assignment.
employees[4,4] <- 'Designer'
print(employees)
## employee.id first.name last.name position salary
## J.S 1 Jon Smith Manager 8000
## R.K 2 Rahul Kohli Assistant Manager 7000
## A.P 3 Alanah Pearce Producer 5000
## M.P 4 Mike Peake Designer 3000
## J.F 5 Jacob Fullerton Editor 3000
## J.W 6 James Willems Editor 3000
Boolean condition can be used to select and changed certain values.
employees$salary <- ifelse(employees$employee.id == 4, 4500, employees$salary)
print(employees)
## employee.id first.name last.name position salary
## J.S 1 Jon Smith Manager 8000
## R.K 2 Rahul Kohli Assistant Manager 7000
## A.P 3 Alanah Pearce Producer 5000
## M.P 4 Mike Peake Designer 4500
## J.F 5 Jacob Fullerton Editor 3000
## J.W 6 James Willems Editor 3000
A row can be deleted by passing a negative index.
employees <- employees[-6,]
print(employees)
## employee.id first.name last.name position salary
## J.S 1 Jon Smith Manager 8000
## R.K 2 Rahul Kohli Assistant Manager 7000
## A.P 3 Alanah Pearce Producer 5000
## M.P 4 Mike Peake Designer 4500
## J.F 5 Jacob Fullerton Editor 3000
There are two methods to delete a column. First is by passing a negative index and the other is to assign NULL value.
employees <- employees[,-5]
employees[4] <- NULL
print(employees)
## employee.id first.name last.name
## J.S 1 Jon Smith
## R.K 2 Rahul Kohli
## A.P 3 Alanah Pearce
## M.P 4 Mike Peake
## J.F 5 Jacob Fullerton
—END—