Data-frame - (aka a table) - a data structure that stores multiple data variable.
data.frame() - generating a data table by combing multiple vectors.
data.frame( vector1 , vector2, vector3, . . . )
Requirements
# generate vector for each variable.
ID <- 1:5
Student <- c("Mavis" , "Lucy" , "Patrick" , "Greg" , "Dean")
Gender <- c("Female" , "Female" , "Male" , "Male" , "Male")
GPA <- c(3.84 , 3.90 , 4.00 , 3.70 , 4.00)
# check for the length of each vector
length(ID)
## [1] 5
length(Student)
## [1] 5
length(Gender)
## [1] 5
length(GPA)
## [1] 5
# generate data-frame
student.info <- data.frame(ID , Student , Gender , GPA)
student.info
## ID Student Gender GPA
## 1 1 Mavis Female 3.84
## 2 2 Lucy Female 3.90
## 3 3 Patrick Male 4.00
## 4 4 Greg Male 3.70
## 5 5 Dean Male 4.00
View() - with CAP V - generate/view the data-frame in a new tab.
View(student.info)
str() - check for the structure of the data frame
str(student.info)
## 'data.frame': 5 obs. of 4 variables:
## $ ID : int 1 2 3 4 5
## $ Student: chr "Mavis" "Lucy" "Patrick" "Greg" ...
## $ Gender : chr "Female" "Female" "Male" "Male" ...
## $ GPA : num 3.84 3.9 4 3.7 4
This data table has 5 rows and 4 colummns.
There are 4 variables: ID (integer), Student (character), Gender
(character), GPA (numeric).
class() - higher level of data type.
6.9999999
## [1] 7
7.00000002
## [1] 7
typeof(7.00000002)
## [1] "double"
class(7.00000002)
## [1] "numeric"
7L
## [1] 7
typeof(7L)
## [1] "integer"
class(7L)
## [1] "integer"
data-frame$column-name
# access column Student
student.info$Student
## [1] "Mavis" "Lucy" "Patrick" "Greg" "Dean"
# check the data type of the column
class(student.info$Student)
## [1] "character"
typeof(student.info$Student)
## [1] "character"
Since Gender is a group (Qualitative/Nominal) data, we should factor column Gender.
student.info$Gender <- factor(student.info$Gender)
str(student.info$Gender)
## Factor w/ 2 levels "Female","Male": 1 1 2 2 2
data-frame-name$new-column <- c( )
student.info$Year <- c("Freshman" , "Sophomore" , "Freshman" , "Senior" , "Junior")
# convert column Year into an ordered factor
student.info$Year <- factor(student.info$Year,
labels = c("Freshman", "Sophomore", "Junior", "Senior") ,
order = TRUE)
student.info$Year
## [1] Freshman Senior Freshman Junior Sophomore
## Levels: Freshman < Sophomore < Junior < Senior
# check for the structure of the column Year
str(student.info$Year)
## Ord.factor w/ 4 levels "Freshman"<"Sophomore"<..: 1 4 1 3 2
Note: R will code the values in an alphabetical order, but label them in the ranking order that we specify.
levels() - check for the group of that factor/ordered factor.
levels(student.info$Year)
## [1] "Freshman" "Sophomore" "Junior" "Senior"