Name : Chew Way Yan
ID : S2023355
4.1 Definition
Based on slides, data frame is defined as the two-dimensional data structure in R which it illustrates in a table form.
4.2 Create Data Frame
Data frame can be created explicitly with the data.frame(..) function by combining the vectors which used the c(..) function are shown as below.
To create data frame:
data <- data.frame(
Name = c("Alan","Bob","Charlie","Derek","Esther"),
Age = c(23,25,24,20,19),
Gender = c("M","M","F","M","F"),
Job = c("Manager","Police","Engineer","Student","Student")
)
print(data)
## Name Age Gender Job
## 1 Alan 23 M Manager
## 2 Bob 25 M Police
## 3 Charlie 24 F Engineer
## 4 Derek 20 M Student
## 5 Esther 19 F Student
To assign names to row:
row.names(data) <- c("Std1","Std2","Std3","Std4","Std5")
print(data)
## Name Age Gender Job
## Std1 Alan 23 M Manager
## Std2 Bob 25 M Police
## Std3 Charlie 24 F Engineer
## Std4 Derek 20 M Student
## Std5 Esther 19 F Student
In order to study the structure of data, str(..) function can be used.
str(data)
## 'data.frame': 5 obs. of 4 variables:
## $ Name : chr "Alan" "Bob" "Charlie" "Derek" ...
## $ Age : num 23 25 24 20 19
## $ Gender: chr "M" "M" "F" "M" ...
## $ Job : chr "Manager" "Police" "Engineer" "Student" ...
Furthermore, the structure of data can be summarized with the summary(..) function.
summary(data)
## Name Age Gender Job
## Length:5 Min. :19.0 Length:5 Length:5
## Class :character 1st Qu.:20.0 Class :character Class :character
## Mode :character Median :23.0 Mode :character Mode :character
## Mean :22.2
## 3rd Qu.:24.0
## Max. :25.0
(A)
The structure of data frame can also be determined by using these functions.
#To determine the names of the columns
names(data)
## [1] "Name" "Age" "Gender" "Job"
(B)
#Both functions determine the number of columns in data frame
ncol(data)
## [1] 4
length(data)
## [1] 4
(C)
#To determine the number of rows in data frame
nrow(data)
## [1] 5
4.3 Access of Data
To access the data in data frame, several functions can be used which are shown as below:
(A)
To access the data in row:
#To access the first row of data frame (Std1)
data[1,]
## Name Age Gender Job
## Std1 Alan 23 M Manager
(B)
To access the data in column:
#To access the second column of data frame (Gender)
data[,3]
## [1] "M" "M" "F" "M" "F"
#To access the column of data frame by mentioning its column name
data.frame(data$Job)
## data.Job
## 1 Manager
## 2 Police
## 3 Engineer
## 4 Student
## 5 Student
(C)
To access the data at specified column and row:
#Second row and fourth column of the data frame is accessed
data[2,4]
## [1] "Police"
#To access specified range of row(1:3) at the specified column(Gender)
data.frame(data$Gender[1:3])
## data.Gender.1.3.
## 1 M
## 2 M
## 3 F
4.4 Addition of Data For addition of data, data can be added into the data frame in form of column or row.
(A) To add in data in column:
data$Postcode = c(52200,51200,46000,58100,50460)
newcol.data <- data
print(newcol.data)
## Name Age Gender Job Postcode
## Std1 Alan 23 M Manager 52200
## Std2 Bob 25 M Police 51200
## Std3 Charlie 24 F Engineer 46000
## Std4 Derek 20 M Student 58100
## Std5 Esther 19 F Student 50460
(B) To add in data in row:
newrow.data <- data.frame(
Name = "Faye",
Age = 21,
Gender = "F",
Job = "Receptionist",
Postcode = 68100
)
complete.data <- rbind(newcol.data,newrow.data)
print(complete.data)
## Name Age Gender Job Postcode
## Std1 Alan 23 M Manager 52200
## Std2 Bob 25 M Police 51200
## Std3 Charlie 24 F Engineer 46000
## Std4 Derek 20 M Student 58100
## Std5 Esther 19 F Student 50460
## 1 Faye 21 F Receptionist 68100
4.5 Remove of Data To remove data, data can be removed by column, or by row.
(A) To remove data by column:
#Remove column named "Gender"
complete.data$Gender <- NULL
print(complete.data)
## Name Age Job Postcode
## Std1 Alan 23 Manager 52200
## Std2 Bob 25 Police 51200
## Std3 Charlie 24 Engineer 46000
## Std4 Derek 20 Student 58100
## Std5 Esther 19 Student 50460
## 1 Faye 21 Receptionist 68100
(B) To remove data by row:
#Remove data in row 2
complete.data <- complete.data[-2,]
print(complete.data)
## Name Age Job Postcode
## Std1 Alan 23 Manager 52200
## Std3 Charlie 24 Engineer 46000
## Std4 Derek 20 Student 58100
## Std5 Esther 19 Student 50460
## 1 Faye 21 Receptionist 68100