WQD7004 Individual Assignment

Data Frame

Submission by: Mohd Anas Ahmad (s2001089)

This assignment is to demonstrate the data frame class in R. In summary, a data frame is characterised as follows:

Two-dimensional
All rows are equal length. Which makes this a table.
Able to store different classes of objects.
Rows and columns can have names

Creating a data frame

There are a number of methods that can be used to create a data frame. Read functions can be used to import data from external files. There are a number of functions that can be used to perform this:

read.table(..)
read.csv(..)
read.delim(..)
read.fwf(..)

The following is how to import a csv from the working directory.

flavours <- read.csv('flavors.csv', header = FALSE)
head(flavours)

##            V1        V2
## 1      celery vegetable
## 2        corn vegetable
## 3    cucumber vegetable
## 4 horseradish vegetable
## 5   vegetable vegetable
## 6      potato vegetable

Also vector can be used to create a data frame by using data.frame(..) function with each vector is a column. All vectors have to be equal length. The variable name of the vectors will become the column name.

eid = 1:5
fn = c('Jon','Rahul','Alanah','Mike','Jacob')
ln = c('Smith','Kohli','Pearce','Peake','Fullerton')

employees <- data.frame(eid ,fn, ln)


print(employees)

##   eid     fn        ln
## 1   1    Jon     Smith
## 2   2  Rahul     Kohli
## 3   3 Alanah    Pearce
## 4   4   Mike     Peake
## 5   5  Jacob Fullerton

The columns can also be named using names(..) function.

names(employees) <- c('employee.id', 'first.name', 'last.name')
print(employees)

##   employee.id first.name last.name
## 1           1        Jon     Smith
## 2           2      Rahul     Kohli
## 3           3     Alanah    Pearce
## 4           4       Mike     Peake
## 5           5      Jacob Fullerton

The rows can be named using row.names(..) function.

row.names(employees) <- c('J.S', 'R.K', 'A.P', 'M.P', 'J.F')
print(employees)

##     employee.id first.name last.name
## J.S           1        Jon     Smith
## R.K           2      Rahul     Kohli
## A.P           3     Alanah    Pearce
## M.P           4       Mike     Peake
## J.F           5      Jacob Fullerton

To confirm the variable type, the class(..) function can be used. It will return “data.frame” for data frames.

class(employees)

## [1] "data.frame"

Summarise the data frame

There are multiple ways to summarise a data frame.

str(employees)

## 'data.frame':    5 obs. of  3 variables:
##  $ employee.id: int  1 2 3 4 5
##  $ first.name : chr  "Jon" "Rahul" "Alanah" "Mike" ...
##  $ last.name  : chr  "Smith" "Kohli" "Pearce" "Peake" ...

summary(employees)

##   employee.id  first.name         last.name        
##  Min.   :1    Length:5           Length:5          
##  1st Qu.:2    Class :character   Class :character  
##  Median :3    Mode  :character   Mode  :character  
##  Mean   :3                                         
##  3rd Qu.:4                                         
##  Max.   :5

head(..) and tail(..) functions can be used to access the first few rows or the last few rows.

head(flavours)

##            V1        V2
## 1      celery vegetable
## 2        corn vegetable
## 3    cucumber vegetable
## 4 horseradish vegetable
## 5   vegetable vegetable
## 6      potato vegetable

tail(flavours)

##           V1   V2
## 851      wet NULL
## 852     wild NULL
## 853 wine-lee NULL
## 854    winey NULL
## 855   yeasty NULL
## 856    ylang NULL

The function ncol(..) can be used to list the number of columns.

ncol(employees)

## [1] 3

The function nrow(..) can be used to list the number of rows

nrow(flavours)

## [1] 856

On rstudio, the function View(..) can be used to show the data frame in another tab.

#View(flavours)

Accessing data frame column

ncol() A data frame column can be accessed by passing a single index number using single corner bracket. This will return a data.frame.

a <- flavours[1]
print(head(a))

##            V1
## 1      celery
## 2        corn
## 3    cucumber
## 4 horseradish
## 5   vegetable
## 6      potato

If a single index number is passed using double corner bracket a vector will be returned instead.

b <- flavours[[2]]
print(head(b))

## [1] "vegetable" "vegetable" "vegetable" "vegetable" "vegetable" "vegetable"

The name of the column can also be used to access the column.

c <- employees$first.name
print(c)

## [1] "Jon"    "Rahul"  "Alanah" "Mike"   "Jacob"

d <- employees['first.name']
print(d)

##     first.name
## J.S        Jon
## R.K      Rahul
## A.P     Alanah
## M.P       Mike
## J.F      Jacob

Similarly, if double corner bracket is used, a vector will be returned.

e <- employees[['first.name']]
print(e)

## [1] "Jon"    "Rahul"  "Alanah" "Mike"   "Jacob"

Another way of accessing the column is by using row and column index method, with the row parameter left empty.

f <- employees[,2]
print(f)

## [1] "Jon"    "Rahul"  "Alanah" "Mike"   "Jacob"

But the above method will return a vector. By passing drop=FALSE, data frame will be returned instead.

g <- employees[,2,drop=FALSE]
print(g)

##     first.name
## J.S        Jon
## R.K      Rahul
## A.P     Alanah
## M.P       Mike
## J.F      Jacob

Accessing data frame row

To access the row, row and column index method can be used, with the column parameter left empty.

a <- flavours[1,]
print(a)

##       V1        V2
## 1 celery vegetable

Multiple rows can be accessed by passing an index vector.

b <- flavours[1:6,]
print(b)

##            V1        V2
## 1      celery vegetable
## 2        corn vegetable
## 3    cucumber vegetable
## 4 horseradish vegetable
## 5   vegetable vegetable
## 6      potato vegetable

c <- flavours[c(3,6),]
print(c)

##         V1        V2
## 3 cucumber vegetable
## 6   potato vegetable

When a single row is selected, a vector can be returned instead by passing drop=TRUE parameter.

d <- flavours[1,,drop=TRUE]
print(d)

## $V1
## [1] "celery"
## 
## $V2
## [1] "vegetable"

Manipulate data frame

A new row can be added using rbind function.

employees <- rbind(employees, c(6,'James','Willems'))
row.names(employees)[6] <- 'J.W'
print(employees)

##     employee.id first.name last.name
## J.S           1        Jon     Smith
## R.K           2      Rahul     Kohli
## A.P           3     Alanah    Pearce
## M.P           4       Mike     Peake
## J.F           5      Jacob Fullerton
## J.W           6      James   Willems

A new column can be added using cbind function.

employees <- cbind(employees, position = c('Manager','Assistant Manager','Producer','Editor','Editor','Editor'))
print(employees)

##     employee.id first.name last.name          position
## J.S           1        Jon     Smith           Manager
## R.K           2      Rahul     Kohli Assistant Manager
## A.P           3     Alanah    Pearce          Producer
## M.P           4       Mike     Peake            Editor
## J.F           5      Jacob Fullerton            Editor
## J.W           6      James   Willems            Editor

A new column can also be added using assignment to a new name.

employees$salary <- c(8000,7000,5000,3000,3000,3000)
print(employees)

##     employee.id first.name last.name          position salary
## J.S           1        Jon     Smith           Manager   8000
## R.K           2      Rahul     Kohli Assistant Manager   7000
## A.P           3     Alanah    Pearce          Producer   5000
## M.P           4       Mike     Peake            Editor   3000
## J.F           5      Jacob Fullerton            Editor   3000
## J.W           6      James   Willems            Editor   3000

A value can be changed using assignment.

employees[4,4] <- 'Designer'
print(employees)

##     employee.id first.name last.name          position salary
## J.S           1        Jon     Smith           Manager   8000
## R.K           2      Rahul     Kohli Assistant Manager   7000
## A.P           3     Alanah    Pearce          Producer   5000
## M.P           4       Mike     Peake          Designer   3000
## J.F           5      Jacob Fullerton            Editor   3000
## J.W           6      James   Willems            Editor   3000

Boolean condition can be used to select and changed certain values.

employees$salary <- ifelse(employees$employee.id == 4, 4500, employees$salary)
print(employees)

##     employee.id first.name last.name          position salary
## J.S           1        Jon     Smith           Manager   8000
## R.K           2      Rahul     Kohli Assistant Manager   7000
## A.P           3     Alanah    Pearce          Producer   5000
## M.P           4       Mike     Peake          Designer   4500
## J.F           5      Jacob Fullerton            Editor   3000
## J.W           6      James   Willems            Editor   3000

A row can be deleted by passing a negative index.

employees <- employees[-6,]
print(employees)

##     employee.id first.name last.name          position salary
## J.S           1        Jon     Smith           Manager   8000
## R.K           2      Rahul     Kohli Assistant Manager   7000
## A.P           3     Alanah    Pearce          Producer   5000
## M.P           4       Mike     Peake          Designer   4500
## J.F           5      Jacob Fullerton            Editor   3000

There are two methods to delete a column. First is by passing a negative index and the other is to assign NULL value.

employees <- employees[,-5]
employees[4] <- NULL
print(employees)

##     employee.id first.name last.name
## J.S           1        Jon     Smith
## R.K           2      Rahul     Kohli
## A.P           3     Alanah    Pearce
## M.P           4       Mike     Peake
## J.F           5      Jacob Fullerton

—END—