Write a simple R Markdown to explain some of the R codes you have learned regarding data frame. You are free to write your own program and add new codes such as how would you show all the rows except the last one or how would you get the last 6 rows of the data frame. Eg. you can explain what a data frame is, and then write the code to show how R handles data frame, and so on. But one mandatory topic is you must explain the different ways R returns back a vector or a data frame when you access values from a data frame. Your program doesn’t have to be long. Play around with the different font sizes in markdown to make your markdown readable. Explore. Publish your markdown on RPubs and submit the link only. Adding new codes that have not been discussed will earn you high marks.
Note : Best to create your own data frame or use a data frame which is small in size and so that it would be easy to see the results after the execution of each or after a few codes.
A data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column.It is the best way to show datasets.
Following are the characteristics of a data frame.
The column names should be non-empty.
The row names should be unique.
The data stored in a data frame can be of numeric, factor or character type.
Each column should contain same number of data items.
Unlike matrices, data frames can store different classes of objects in each column.
In other words, a data frame is stored in memory as lists. In fact, a data frame can be regarded as a collection of special list with equal length.
Before we jump into data frame, we need have knowledge about vector, matrix and list because dataframe combines all of them. A vector example as follows.
vector <- c(1:5)
vector
## [1] 1 2 3 4 5
We can check the class of the vector.
class(vector)
## [1] "integer"
Lets have a look at a matrix of 3 rows and 3 columns.
example = matrix(c(1:9), nrow = 3, ncol = 3)
example
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
Lists look like as follows:
fruit <- c("apple", "mango", "watermelon")
list(fruit)
## [[1]]
## [1] "apple" "mango" "watermelon"
Now lets have a look at a dataframe:
index <- 1:8
student_name <- c("Masrur", "Hazel", "Rahim", "Lee", "Jecy", "Tanvir", "Zaed", "Stella")
gender <- c("Male", "Female", "Male", "Male", "Female", "Male", "Male", "Female")
marks_obtained <- c(79, 46, 56, 94, 23, 64, 38, 87)
passed <- c(TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE)
result <- data.frame(index, student_name, gender, marks_obtained, passed)
print(result)
## index student_name gender marks_obtained passed
## 1 1 Masrur Male 79 TRUE
## 2 2 Hazel Female 46 FALSE
## 3 3 Rahim Male 56 TRUE
## 4 4 Lee Male 94 TRUE
## 5 5 Jecy Female 23 FALSE
## 6 6 Tanvir Male 64 TRUE
## 7 7 Zaed Male 38 FALSE
## 8 8 Stella Female 87 TRUE
class(result)
## [1] "data.frame"
str() shows the structure of data frame
str(result)
## 'data.frame': 8 obs. of 5 variables:
## $ index : int 1 2 3 4 5 6 7 8
## $ student_name : chr "Masrur" "Hazel" "Rahim" "Lee" ...
## $ gender : chr "Male" "Female" "Male" "Male" ...
## $ marks_obtained: num 79 46 56 94 23 64 38 87
## $ passed : logi TRUE FALSE TRUE TRUE FALSE TRUE ...
names() shows the name of each column in data frame
names(result)
## [1] "index" "student_name" "gender" "marks_obtained"
## [5] "passed"
nrow() shows the the number of rows in data frame
nrow(result)
## [1] 8
ncol() shows the the number of columns in data frame
ncol(result)
## [1] 5
length() shows the the length of data frame, same as ncol
length(result)
## [1] 5
dim() shows the number of columns and rows
dim(result)
## [1] 8 5
head() function allow us to view the first 6 rows of a data frame, by default.
head(result)
tail() function allow us to view the last 6 rows of a data frame, by default.
tail(result)
summary() shows the the summary information about data frame
summary(result)
## index student_name gender marks_obtained
## Min. :1.00 Length:8 Length:8 Min. :23.00
## 1st Qu.:2.75 Class :character Class :character 1st Qu.:44.00
## Median :4.50 Mode :character Mode :character Median :60.00
## Mean :4.50 Mean :60.88
## 3rd Qu.:6.25 3rd Qu.:81.00
## Max. :8.00 Max. :94.00
## passed
## Mode :logical
## FALSE:3
## TRUE :5
##
##
##
We can access any part of dataframe using simple codes and we can extract them as lists and matrix.
Accessing the column by its name:
result['index']
You can also get name like this:
result[['index']]
## [1] 1 2 3 4 5 6 7 8
And this:
result[[2]]
## [1] "Masrur" "Hazel" "Rahim" "Lee" "Jecy" "Tanvir" "Zaed" "Stella"
Accessing an element in a column:
result[['student_name']][4]
## [1] "Lee"
result$student_name[4]
## [1] "Lee"
Accessing a row or column:
result[2,]
result[,5]
## [1] TRUE FALSE TRUE TRUE FALSE TRUE FALSE TRUE
Return one part of data frame:
result[-1,]
Return all the rows except the last one:
result[-8,]
The last 2 rows:
result[4:5]
Return rows by applying condition:
result[result$marks_obtained > 50,]
We can update the dataframe
result [4, 'marks_obtained'] = 95
result
Removing a component from data frame
result$passed = NULL
result
We can create a simple plot for our short dataframe here.
plot(result$marks_obtained, main = 'Exam result', xlab = 'Students', ylab = 'Marks', type = "o", col = "blue")