Generate a data-frame.

Data-frame - (aka a table) - a data structure that stores multiple data variable.

data.frame() - generating a data table by combing multiple vectors.

data.frame( vector1 , vector2, vector3, . . . )  

Requirements

  1. Each vector should have the same length (aka same number of component within it).
  2. There is no empty space, we should replace missing values with some character type symbols/numbers/NA.
# generate  vector for each variable.  
ID <- 1:5  
Student <- c("Mavis" , "Lucy" , "Patrick" , "Greg" , "Dean")
Gender <- c("Female" , "Female" , "Male" , "Male" , "Male")
GPA <- c(3.84 , 3.90 , 4.00 , 3.70 , 4.00)

# check for the length of each vector
length(ID) 
## [1] 5
length(Student)
## [1] 5
length(Gender)
## [1] 5
length(GPA)
## [1] 5
# generate data-frame
student.info <- data.frame(ID , Student , Gender , GPA)
student.info
##   ID Student Gender  GPA
## 1  1   Mavis Female 3.84
## 2  2    Lucy Female 3.90
## 3  3 Patrick   Male 4.00
## 4  4    Greg   Male 3.70
## 5  5    Dean   Male 4.00

View() - with CAP V - generate/view the data-frame in a new tab.

View(student.info)

str() - check for the structure of the data frame

str(student.info)
## 'data.frame':    5 obs. of  4 variables:
##  $ ID     : int  1 2 3 4 5
##  $ Student: chr  "Mavis" "Lucy" "Patrick" "Greg" ...
##  $ Gender : chr  "Female" "Female" "Male" "Male" ...
##  $ GPA    : num  3.84 3.9 4 3.7 4

This data table has 5 rows and 4 colummns.
There are 4 variables: ID (integer), Student (character), Gender (character), GPA (numeric).

class() - higher level of data type.

6.9999999
## [1] 7
7.00000002
## [1] 7
typeof(7.00000002)
## [1] "double"
class(7.00000002)
## [1] "numeric"
7L
## [1] 7
typeof(7L)
## [1] "integer"
class(7L)
## [1] "integer"

Access a specific column, use $.

  data-frame$column-name   
  
# access column Student
student.info$Student
## [1] "Mavis"   "Lucy"    "Patrick" "Greg"    "Dean"
# check the data type of the column
class(student.info$Student)
## [1] "character"
typeof(student.info$Student)
## [1] "character"

Since Gender is a group (Qualitative/Nominal) data, we should factor column Gender.

student.info$Gender <- factor(student.info$Gender)
str(student.info$Gender)
##  Factor w/ 2 levels "Female","Male": 1 1 2 2 2

Add a new column into an existing data-frame.

data-frame-name$new-column <- c( )
student.info$Year <- c("Freshman" , "Sophomore" , "Freshman" , "Senior" , "Junior")

# convert column Year into an ordered factor
student.info$Year <- factor(student.info$Year, 
                            labels = c("Freshman", "Sophomore", "Junior", "Senior") , 
                            order = TRUE)
student.info$Year
## [1] Freshman  Senior    Freshman  Junior    Sophomore
## Levels: Freshman < Sophomore < Junior < Senior
# check for the structure of the column Year
str(student.info$Year)
##  Ord.factor w/ 4 levels "Freshman"<"Sophomore"<..: 1 4 1 3 2

Note: R will code the values in an alphabetical order, but label them in the ranking order that we specify.

levels() - check for the group of that factor/ordered factor.

levels(student.info$Year)
## [1] "Freshman"  "Sophomore" "Junior"    "Senior"