WQD7004 Assignment 1

Name : Chew Way Yan
ID : S2023355

Lect4: Data Frame

4.1 Definition
Based on slides, data frame is defined as the two-dimensional data structure in R which it illustrates in a table form.

4.2 Create Data Frame
Data frame can be created explicitly with the data.frame(..) function by combining the vectors which used the c(..) function are shown as below.

To create data frame:

data <- data.frame(
  Name   = c("Alan","Bob","Charlie","Derek","Esther"),
  Age    = c(23,25,24,20,19),
  Gender = c("M","M","F","M","F"),
  Job    = c("Manager","Police","Engineer","Student","Student")
)

print(data)
##      Name Age Gender      Job
## 1    Alan  23      M  Manager
## 2     Bob  25      M   Police
## 3 Charlie  24      F Engineer
## 4   Derek  20      M  Student
## 5  Esther  19      F  Student

To assign names to row:

row.names(data) <- c("Std1","Std2","Std3","Std4","Std5")

print(data)
##         Name Age Gender      Job
## Std1    Alan  23      M  Manager
## Std2     Bob  25      M   Police
## Std3 Charlie  24      F Engineer
## Std4   Derek  20      M  Student
## Std5  Esther  19      F  Student

In order to study the structure of data, str(..) function can be used.

str(data)
## 'data.frame':    5 obs. of  4 variables:
##  $ Name  : chr  "Alan" "Bob" "Charlie" "Derek" ...
##  $ Age   : num  23 25 24 20 19
##  $ Gender: chr  "M" "M" "F" "M" ...
##  $ Job   : chr  "Manager" "Police" "Engineer" "Student" ...

Furthermore, the structure of data can be summarized with the summary(..) function.

summary(data)
##      Name                Age          Gender              Job           
##  Length:5           Min.   :19.0   Length:5           Length:5          
##  Class :character   1st Qu.:20.0   Class :character   Class :character  
##  Mode  :character   Median :23.0   Mode  :character   Mode  :character  
##                     Mean   :22.2                                        
##                     3rd Qu.:24.0                                        
##                     Max.   :25.0

(A)
The structure of data frame can also be determined by using these functions.

#To determine the names of the columns
names(data)
## [1] "Name"   "Age"    "Gender" "Job"

(B)

#Both functions determine the number of columns in data frame
ncol(data)
## [1] 4
length(data)
## [1] 4

(C)

#To determine the number of rows in data frame
nrow(data)
## [1] 5

4.3 Access of Data
To access the data in data frame, several functions can be used which are shown as below:

(A)
To access the data in row:

#To access the first row of data frame (Std1)
data[1,]
##      Name Age Gender     Job
## Std1 Alan  23      M Manager

(B)
To access the data in column:

#To access the second column of data frame (Gender)
data[,3]
## [1] "M" "M" "F" "M" "F"
#To access the column of data frame by mentioning its column name
data.frame(data$Job)
##   data.Job
## 1  Manager
## 2   Police
## 3 Engineer
## 4  Student
## 5  Student

(C)
To access the data at specified column and row:

#Second row and fourth column of the data frame is accessed
data[2,4]
## [1] "Police"
#To access specified range of row(1:3) at the specified column(Gender)
data.frame(data$Gender[1:3])
##   data.Gender.1.3.
## 1                M
## 2                M
## 3                F

4.4 Addition of Data For addition of data, data can be added into the data frame in form of column or row.

(A) To add in data in column:

data$Postcode = c(52200,51200,46000,58100,50460)
newcol.data <- data

print(newcol.data)
##         Name Age Gender      Job Postcode
## Std1    Alan  23      M  Manager    52200
## Std2     Bob  25      M   Police    51200
## Std3 Charlie  24      F Engineer    46000
## Std4   Derek  20      M  Student    58100
## Std5  Esther  19      F  Student    50460

(B) To add in data in row:

newrow.data <- data.frame(
  Name     = "Faye",
  Age      = 21,
  Gender   = "F",
  Job      = "Receptionist",
  Postcode = 68100
)

complete.data <- rbind(newcol.data,newrow.data)
print(complete.data)
##         Name Age Gender          Job Postcode
## Std1    Alan  23      M      Manager    52200
## Std2     Bob  25      M       Police    51200
## Std3 Charlie  24      F     Engineer    46000
## Std4   Derek  20      M      Student    58100
## Std5  Esther  19      F      Student    50460
## 1       Faye  21      F Receptionist    68100

4.5 Remove of Data To remove data, data can be removed by column, or by row.

(A) To remove data by column:

#Remove column named "Gender"
complete.data$Gender <- NULL
print(complete.data)
##         Name Age          Job Postcode
## Std1    Alan  23      Manager    52200
## Std2     Bob  25       Police    51200
## Std3 Charlie  24     Engineer    46000
## Std4   Derek  20      Student    58100
## Std5  Esther  19      Student    50460
## 1       Faye  21 Receptionist    68100

(B) To remove data by row:

#Remove data in row 2
complete.data <- complete.data[-2,]
print(complete.data)
##         Name Age          Job Postcode
## Std1    Alan  23      Manager    52200
## Std3 Charlie  24     Engineer    46000
## Std4   Derek  20      Student    58100
## Std5  Esther  19      Student    50460
## 1       Faye  21 Receptionist    68100