#R TUTORIALS R is a free, open-source programming language and software environment for statistical computing and graphics, widely used for data analysis, visualization, and machine learning.
#Everything in ‘R’ is an Object hhhjhhdggdgfhfjf
#Create an R object.
este = 48
este
## [1] 48
48
## [1] 48
# Creating objects of character data type
# Character values are provided within quotes
my_char = 'a'
print(my_char)
## [1] "a"
# Creating objects of ‘numeric’ type
my_numeric = 10
print(my_numeric)
## [1] 10
# Creating objects of logical type
# This can take values of TRUE and FALSE only.
my_logic = TRUE
print(my_logic)
## [1] TRUE
my_float = 3/4
my_float
## [1] 0.75
str(my_float)
## num 0.75
str(my_char)
## chr "a"
#Built-in Data Structures Data structures are objects that are composed of objects of the fundamental data types explained earlier. There are five data structures in ‘R.’ which are: 1. Atomic vector 2. List 3. Matrix 4. Data frame 5. Factor Each of these data structures is intended for a specific purpose.
my_vec2 = 1:3
my_vec2
## [1] 1 2 3
my_vec3 <- seq(from = 2, to = 20, by = 2)
my_vec3
## [1] 2 4 6 8 10 12 14 16 18 20
#Functions on Vector Objects
#There are several built-in functions available that operate on vector objects
length(my_vec3)
## [1] 10
class(my_vec3)
## [1] "numeric"
typeof(my_vec3)
## [1] "double"
str(my_vec3)
## num [1:10] 2 4 6 8 10 12 14 16 18 20
#Access and Modify Elements of a Vector
#The elements of a vector can be accessed or modified using indexes. Indexes represent the way the elements are arranged. In R, the indexing starts from one.
# To access the first element, we use the square brackets enclosing the number i.e. 1.
my_vec3[1]
## [1] 2
# To modify element of a vector, assign values to the indexed object
my_vec3[1] = 10
print(my_vec3)
## [1] 10 4 6 8 10 12 14 16 18 20
#In the above output, we can see that the element in position 1 has changed from 1 to 10 after it has been modified.
#Identifying and Handling Missing Data #One of the unique features of ‘R’ is that it lets the user identify and handle missing values in data. ‘NA’ is a special symbol that represents a missing value.
# Initialize a vector with missing values.
# special symbol NA is used to represent missing values.
my_vec4 = c(1, 2, NA, 4)
print(my_vec4)
## [1] 1 2 NA 4
# is.na() is an in-built function. It takes the name of the object as argument
# This function checks if there are any missing values in the object
is.na(my_vec4)
## [1] FALSE FALSE TRUE FALSE
# anyNA will be TRUE if there is at least one missing value in the object,
anyNA(my_vec4)
## [1] TRUE
Line 1 in the above output displays the contents of my_vec4. Line 2 in the outputs displays the result of the function call ‘is.na(my_vec4).’ Since the first value is not missing, it displays FALSE for 1. Similarly, the second value is also not missing, therefore ‘2’ is also set to FALSE. The third value is missing, therefore, TRUE is displayed for ‘3’. This is a very useful function in ‘R’ and is frequently used to check the positions of the missing values in an object like a vector. Line 3 in the above output displays the result of the function call ‘anyNA(my_vec4).’ Since there is at least one missing value in my_vec4, the output is TRUE.
#Creating a List
# Simplest way of creating a list is by using the
# function ‘list’
my_list1 = list(5)
print(my_list1)
## [[1]]
## [1] 5
# List can also take values of different types
# One of the elements could also be of type- vector.
my_list2 <- list(1, 'Name', c('a', 'b', 'c'))
print(my_list2)
## [[1]]
## [1] 1
##
## [[2]]
## [1] "Name"
##
## [[3]]
## [1] "a" "b" "c"
In the above output, lines 1 and 2 display the output of the function call ‘print(my_list1)’. It has only one element. Line 1 displays the position of the element and line 2 displays the content at that position. The lines 3-6 display the output for the function call ‘print(my_list2)’. This list has 3 elements. The print statement displays the position of each of the elements and its values in the next line. For example, the element at position 1 is 1, the string ‘Name’ is at position 2 and so on.
#Slots of List A list can be thought of as a container with multiple slots. In the previous example, the first slot in the container comprises the number ‘1’, the second slot comprises the string ‘Name’, and the third slot comprises the character vector “a”, “b”, “c”. It is also possible to assign a name to these slots. This can be done using the names() function. The names for each of the slot must be provided as a vector of strings.
# Assign names to slots of list using the function names
# The names of the slots are provided as a vector of strings
# using the c function.
names(my_list2) = c('first', 'second', 'third')
str(my_list2)
## List of 3
## $ first : num 1
## $ second: chr "Name"
## $ third : chr [1:3] "a" "b" "c"
The above output displays the structure of my_list2 after the slots have been assigned names. It displays the number of slots in Line 1. The lines 2-4 display the slot names ‘first’, ‘second’ and ‘third’ along with the content. The dollar symbol $ preceding the slot name implies that the name is that of a slot.
#Accessing and Modifying List Similar to vectors, elements in a list can also be accessed using the square brackets and indexes. Elements in a list can also be accessed using the name of the slots. For this, the list name is suffixed with a $ sign followed by the slot name to access the contents of that slot. Once the content is accessed, it can be modified using the assignment operator.
# Elements in the list can be accessed using indexes within
# square brackets
my_list2[1]
## $first
## [1] 1
# They can also be accessed using $ sign followed by slot name.
my_list2$first
## [1] 1
# Once the content is accessed, they can be modified using the
# assignment operator
my_list2[1]
## $first
## [1] 1
#Matrix A matrix data structure is a tabular arrangement of objects of the same data type. It is like an atomic vector or a vector, but it can be of one or two dimensions. Therefore, it has a rectangular arrangement of numbers.
#Creation of Matrix A Matrix object can be created using the function ‘matrix().’ It takes the following arguments: 1. Elements of the matrix as a single vector 2. Number of rows 3. Number of columns By default, the elements are arranged by columns. If the elements need to be arranged by rows, then the argument ‘byrow’ needs to be set to TRUE. By default, the rows and columns are given the numbers 1, 2, etc. However, we can specify the names for the rows and the columns for better identification. This is done using the function ‘rownames()’ and ‘colnames()’.
# Create a matrix using the function ‘matrix’
# This takes the elements as a vector, number of rows
# number of columns as arguments.
my_matrix1 = matrix(c(1,2,3,4,5,6), nrow = 3, ncol = 2)
print(my_matrix1)
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
# if the elements needs to be arranged in rows, then set the argument
# byrow to TRUE.
my_matrix2 = matrix(c(1,2,3,4,5,6), nrow = 2, ncol = 3, byrow = TRUE)
print(my_matrix2)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
# Assign row and column names using the function
# rownames and ‘colnames’
rownames(my_matrix2) = c('row1', 'row2')
colnames(my_matrix2) = c('col1', 'col2', 'col3')
str(my_matrix2)
## num [1:2, 1:3] 1 4 2 5 3 6
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:2] "row1" "row2"
## ..$ : chr [1:3] "col1" "col2" "col3"
In the above output, lines 1-4 displays the matrix corresponding to the output of the statement matrix(c(1,2,3,4,5,6), nrow = 3, ncol = 2). By default, the numbers 1, 2, 3, 4, 5, 6 are arranged column-wise. The numbers 1, 2, 3 form the first column and 4, 5, 6 from the next column.
The lines 5-7 represent the matrix corresponding to the output of the statement matrix(c(1,2,3,4,5,6), nrow = 2, ncol = 3, byrow = TRUE). Since the argument ‘byrow’ is set to TRUE, R arranges the numbers in a row-wise fashion. Therefore, the numbers 1, 2, 3, show up as the first row, followed by the numbers 4, 5, 6 in the second row.
The lines 8-11 represent structure of the matrix after the row names and the column names have been modified. The outputs imply that the matrix is a numerical array which has 2 rows (1:2) and 3 columns (1:3). The elements are being displayed in a column-wise manner. The next lines indicate the names of the dimensions i.e., the names of the rows and the columns.
my_matrix2
## col1 col2 col3
## row1 1 2 3
## row2 4 5 6
#Accessing Elements of a Matrix The elements in the matrix are accessed using the indexes. Two indexes are needed to specify the row index and the column index. This should be specified within a square bracket and delimited using a comma.
# To access the element in the first row and second columns
# specify 1,2 within the square brackets after the matrix object name
my_matrix2[1, 2]
## [1] 2
The output displays the value of element in the first row and the second column.
#Data Frame A data frame is a list of lists with each sublist of the same length. It is equivalent to a rectangular list. A data frame is typically used to store data that are read from text/CSV files by retaining the underlying structure such as row names, column names, etc. A data frame can also be created manually.
#Creation of Data Frame The function to create a dataframe is ‘data.frame()’. It takes as input various lists of equal length. Each of this list will correspond to a column in the dataframe. We can assign the names to the columns and rows using the functions ‘rownames()’ and ‘colnames().’
# Create a dataframe manually
ID = c('A', 'B', 'C')
Age = c(21, 22, 20)
Height = c(150, 160, 170)
sData = data.frame(ID, Age, Height)
# Assign names to the rows and columns of the data frame
rownames(sData) = c('Ajith', 'John', 'Bob')
colnames(sData) = c('ID', 'Age', 'Height')
The above code creates three rows of data for ‘Ajith,’ ‘John’, and ‘Bob’. The columns will be ‘ID,’ ‘Age’, and ‘Height’. The values will be assigned from the vectors ‘ID,’ ‘Age’, and ‘Height’ to each of the rows.
#Built-in Functions on Data Frame There are several built-in functions which help in manipulating and exploring the data frames. The code snippets below illustrate a few of them.
# Structure of the data frame
str(sData)
## 'data.frame': 3 obs. of 3 variables:
## $ ID : chr "A" "B" "C"
## $ Age : num 21 22 20
## $ Height: num 150 160 170
The output indicates that the object is of type dataframe and that it has three observations with three variables or features for each of them. The output also indicates that there are 3 slots—ID, Age and Height. Each of the slot internally is a list of values.
#Head and Tail Functions
#’head’ function prints the 1st five rows
# the numbers of rows that needs to be displayed can be sent as a parameter
head (sData, 2)
## ID Age Height
## Ajith A 21 150
## John B 22 160
The output shown above corresponds to the function call ‘head(sData,2). It displays the first two rows of the dataframe named ‘sData’.
# Similarly, tail displayes the last rows of the dataframe.
tail (sData, 2)
## ID Age Height
## John B 22 160
## Bob C 20 170
The output shown above corresponds to the function call ‘tail (sData,2). It displays the last two rows of the dataframe named ‘sData’.
#Dimensionality Functions
# Get the dimension of the data frame
dim(sData)
## [1] 3 3
# Number of rows in the data frame
nrow(sData)
## [1] 3
# Number of columns in the data frame
ncol(sData)
## [1] 3
Line 1 of the above output indicates that there are 3 entries along dimension 1 and 3 entries along dimension 2. The lines 2 and 3 displays the output of the functions ‘nrow()’ and ‘ncol()’.
#Accessing Elements of a Data Frame/Accessing Elements of a #Column Accessing the elements of a slot or column of a dataframe can be done by using the ‘$’ operator or by using double square brackets with column-name provided within quotes. We can also use a single square bracket. However, in this case, the result would be a data frame.
# Access a particular column in a dataframe
sData$Age
## [1] 21 22 20
sData[['Age']]
## [1] 21 22 20
sData['Age']
## Age
## Ajith 21
## John 22
## Bob 20
In the above output, Lines 1 and 2 display output of the statements sData$Age and sData[[‘Age’]], respectively. Note that the two outputs are the same. Line 3 displays the output for the statement sData[‘Age’] (single square brackets). In this case, though the output is similar to the previous one, the type of the output returned is a dataframe.
#Accessing Rows of a Data Frame To access the rows of the dataframe, we provide the name of the row within single quotes followed by a comma which are together enclosed within square brackets.
# To access the elements of the row – ‘John’
# provide ‘John’ followed by a comma within square brackets
sData['John', ]
## ID Age Height
## John B 22 160
The above output displays all the information corresponding to the row named ‘John.’ Note that the return value is of type dataframe.
#Accessing Multiple Columns of a Data Frame To retrieve values from multiple columns, the name of the column can be provided as a vector.
# To access more than one columns use the c function
#to create a vector of column names
sData[c('ID', 'Age')]
## ID Age
## Ajith A 21
## John B 22
## Bob C 20
#Factor A factor is a vector that can contain only predefined values and is used to store categorical data.
A factor can be initialised using the function ‘factor().’ There are built-in functions available that operate on factor objects. For example, the built-in function ‘levels()’ can be used to identify the unique categories inside the vector. Factor objects can also be modified using indexes. A factor data structure will be important when we deal with categorical type data such as gender, education level, blood type, etc.
# Create a factor for storing a list of genders
gender = factor(c('Male', 'Male', 'Female', 'Female'))
print(gender)
## [1] Male Male Female Female
## Levels: Female Male
# In-built functions on factors
levels(gender)
## [1] "Female" "Male"
# Modify a gender
gender[1] = 'Female'
print(gender)
## [1] Female Male Female Female
## Levels: Female Male
The lines 1 and 2 display the factor object named ‘gender’. Line 3 displays the output of the function ‘levels()’ which are the unique categories within the object. The lines 4 and 5 display the factor object after it is modified.