Dataframes

Harold Nelson

2023-09-04

What is a Dataframe?

A data frame is a list of vectors of equal length. Each vector becomes a column, and each element within the vector is a row entry for that column.

Creating a Dataframe

You can create a data frame using the data.frame() function.

Examples

df1 <- data.frame(name = c("Alice", "Bob"), age = c(25, 30))
df1
##    name age
## 1 Alice  25
## 2   Bob  30
name = c("Alice", "Bob")
age = c(25, 30)
df2 = data.frame(name,age)
df2
##    name age
## 1 Alice  25
## 2   Bob  30
df3 = data.frame(c("Alice", "Bob"),c(25, 30))
df3
##   c..Alice....Bob.. c.25..30.
## 1             Alice        25
## 2               Bob        30
colnames(df3) = c("name","age")
df3
##    name age
## 1 Alice  25
## 2   Bob  30

Exercise

Create a data frame named students that contains the following columns: ID, Name, and Age. Fill it with the following data:

Solution

ID = c(101, 102, 103)
Name = c("John", "Jane", "Jack")
Age = c(21, 23, 20)

students = data.frame(ID,Name,Age)
students
##    ID Name Age
## 1 101 John  21
## 2 102 Jane  23
## 3 103 Jack  20

Exercise

Use the str() function to see the structure of your dataframe.

Solution

str(students)
## 'data.frame':    3 obs. of  3 variables:
##  $ ID  : num  101 102 103
##  $ Name: chr  "John" "Jane" "Jack"
##  $ Age : num  21 23 20

Exercise

Use the $ and [[ ]] operators to obtain the students’ names as a separate vector. Verify that these produce the same result.

Solution

names1 = students$Name
names2 = students[["Name"]]
names1
## [1] "John" "Jane" "Jack"
names2
## [1] "John" "Jane" "Jack"

Exercise

Change the first name in names1 to Joe. Verify that the names in the dataframe and names2 have not been changed.

Solution

names1[1] = "Joe"
names1
## [1] "Joe"  "Jane" "Jack"
names2
## [1] "John" "Jane" "Jack"
students$Name
## [1] "John" "Jane" "Jack"

Exercise

Create a new datframe students2 in the same format. The ID values are 201, 202, and 203. The names are Tom, Dick, and Harry. The ages are 21, 22, and 23. Then use the rbind() function to add students2 to students.

Solution

ID = c(201,202,203)
Name = c("Tom","Dick","Harry")
Age = c(21,22,23)
students2 = data.frame(ID,Name,Age)
students2
##    ID  Name Age
## 1 201   Tom  21
## 2 202  Dick  22
## 3 203 Harry  23
students = rbind(students,students2)
students
##    ID  Name Age
## 1 101  John  21
## 2 102  Jane  23
## 3 103  Jack  20
## 4 201   Tom  21
## 5 202  Dick  22
## 6 203 Harry  23

Exercise

Create the vector Major with values CS,IT,ME,CE,ME,CS. Add it to the datframe using the cbind() function making students3. Add it using the $ operator making students4. Verify that students3 and students4 are identical.

Solution

Major = c("CS","IT","ME","CE","ME","CS")
students3 = cbind(students,Major)
students3
##    ID  Name Age Major
## 1 101  John  21    CS
## 2 102  Jane  23    IT
## 3 103  Jack  20    ME
## 4 201   Tom  21    CE
## 5 202  Dick  22    ME
## 6 203 Harry  23    CS
students4 = students
students4$Major = Major
students4
##    ID  Name Age Major
## 1 101  John  21    CS
## 2 102  Jane  23    IT
## 3 103  Jack  20    ME
## 4 201   Tom  21    CE
## 5 202  Dick  22    ME
## 6 203 Harry  23    CS

Exercise

Create a new dataframe Mechanical_Engineers as a subset of students3 where the value of Major is “ME”. Use the [] operator.

Solution

Mechanical_Engineers = students4[students4$Major == "ME",]
Mechanical_Engineers
##    ID Name Age Major
## 3 103 Jack  20    ME
## 5 202 Dick  22    ME