TASK

Write a simple R Markdown to explain some of the R codes you have learned regarding data frame. You are free to write your own program and add new codes such as how would you show all the rows except the last one or how would you get the last 6 rows of the data frame. Eg. you can explain what a data frame is, and then write the code to show how R handles data frame, and so on. But one mandatory topic is you must explain the different ways R returns back a vector or a data frame when you access values from a data frame. Your program doesn’t have to be long. Play around with the different font sizes in markdown to make your markdown readable. Explore. Publish your markdown on RPubs and submit the link only. Adding new codes that have not been discussed will earn you high marks.

Note : Best to create your own data frame or use a data frame which is small in size and so that it would be easy to see the results after the execution of each or after a few codes.

R Data Frame

A data frame is a table or a two-dimensional array-like structure in which each column contains values of one variable and each row contains one set of values from each column.It is the best way to show datasets.

Following are the characteristics of a data frame.

The column names should be non-empty.
The row names should be unique.
The data stored in a data frame can be of numeric, factor or character type.
Each column should contain same number of data items.
Unlike matrices, data frames can store different classes of objects in each column.

In other words, a data frame is stored in memory as lists. In fact, a data frame can be regarded as a collection of special list with equal length.

1. Introduction to Data Frame

Before we jump into data frame, we need have knowledge about vector, matrix and list because dataframe combines all of them. A vector example as follows.


vector <- c(1:5)
vector
## [1] 1 2 3 4 5

We can check the class of the vector.

class(vector)
## [1] "integer"

Lets have a look at a matrix of 3 rows and 3 columns.

example = matrix(c(1:9), nrow = 3, ncol = 3)
example
##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

Lists look like as follows:

fruit <- c("apple", "mango", "watermelon")
list(fruit)
## [[1]]
## [1] "apple"      "mango"      "watermelon"

Now lets have a look at a dataframe:

index <- 1:8
student_name <- c("Masrur", "Hazel", "Rahim", "Lee", "Jecy", "Tanvir", "Zaed", "Stella")
gender <- c("Male", "Female", "Male", "Male", "Female", "Male", "Male", "Female")
marks_obtained <- c(79, 46, 56, 94, 23, 64, 38, 87)
passed <- c(TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE)
result <- data.frame(index, student_name, gender, marks_obtained, passed)
print(result)
##   index student_name gender marks_obtained passed
## 1     1       Masrur   Male             79   TRUE
## 2     2        Hazel Female             46  FALSE
## 3     3        Rahim   Male             56   TRUE
## 4     4          Lee   Male             94   TRUE
## 5     5         Jecy Female             23  FALSE
## 6     6       Tanvir   Male             64   TRUE
## 7     7         Zaed   Male             38  FALSE
## 8     8       Stella Female             87   TRUE
class(result)
## [1] "data.frame"

2.Exploring Dataframe

str() shows the structure of data frame

str(result)
## 'data.frame':    8 obs. of  5 variables:
##  $ index         : int  1 2 3 4 5 6 7 8
##  $ student_name  : chr  "Masrur" "Hazel" "Rahim" "Lee" ...
##  $ gender        : chr  "Male" "Female" "Male" "Male" ...
##  $ marks_obtained: num  79 46 56 94 23 64 38 87
##  $ passed        : logi  TRUE FALSE TRUE TRUE FALSE TRUE ...

names() shows the name of each column in data frame

names(result)
## [1] "index"          "student_name"   "gender"         "marks_obtained"
## [5] "passed"

nrow() shows the the number of rows in data frame

nrow(result)
## [1] 8

ncol() shows the the number of columns in data frame

ncol(result)
## [1] 5

length() shows the the length of data frame, same as ncol

length(result)
## [1] 5

dim() shows the number of columns and rows

dim(result)
## [1] 8 5

head() function allow us to view the first 6 rows of a data frame, by default.

head(result)

tail() function allow us to view the last 6 rows of a data frame, by default.

tail(result)

summary() shows the the summary information about data frame

summary(result)
##      index      student_name          gender          marks_obtained 
##  Min.   :1.00   Length:8           Length:8           Min.   :23.00  
##  1st Qu.:2.75   Class :character   Class :character   1st Qu.:44.00  
##  Median :4.50   Mode  :character   Mode  :character   Median :60.00  
##  Mean   :4.50                                         Mean   :60.88  
##  3rd Qu.:6.25                                         3rd Qu.:81.00  
##  Max.   :8.00                                         Max.   :94.00  
##    passed       
##  Mode :logical  
##  FALSE:3        
##  TRUE :5        
##                 
##                 
## 

3.Accessing elements from Data Frame

We can access any part of dataframe using simple codes and we can extract them as lists and matrix.

Accessing the column by its name:

result['index']

You can also get name like this:

result[['index']]
## [1] 1 2 3 4 5 6 7 8

And this:

result[[2]]
## [1] "Masrur" "Hazel"  "Rahim"  "Lee"    "Jecy"   "Tanvir" "Zaed"   "Stella"

Accessing an element in a column:

result[['student_name']][4]
## [1] "Lee"
result$student_name[4]
## [1] "Lee"

Accessing a row or column:

result[2,]
result[,5]
## [1]  TRUE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE

Return one part of data frame:

result[-1,]

Return all the rows except the last one:

result[-8,]

The last 2 rows:

result[4:5]

Return rows by applying condition:

result[result$marks_obtained > 50,]

We can update the dataframe

result [4, 'marks_obtained'] = 95
result

Removing a component from data frame

result$passed = NULL
result

3. Including Visualization

We can create a simple plot for our short dataframe here.

plot(result$marks_obtained,  main = 'Exam result', xlab = 'Students', ylab = 'Marks', type = "o", col = "blue")