This is a Frantz Moudoute Notebook. The following document will provide an overlook of a few ML techniques used in R between 2014 and 2022.

Introduction to vectors

Among the key concept in R are the way R manages data. Data are stored as vectors, each vector contains elements. Watch out, because all element must be of the same type in R. there can be integer, double, character, logical (Boolean) among others.

Let create some vector with a few elements below, first a vector of strings.

my_favorite_cities <- c("Hong Kong", "Tokyo", "Paris", "Douala")
print(my_favorite_cities)

Then a vector of doubles

their_temperatures <- c(25.3, 21.0,15.0,35.0)
print(their_temperatures)

And last a vector of booleans

mega_cities <- c(FALSE, TRUE, FALSE, FALSE)  # You can see that TRUE/FALSE are all in capital letters 
print(mega_cities)

In R, as one would expect, values stored in a vector will preserve their order, so we can call each value using its position, let call the second city and the second temperature. Lets play a bit with the data and print as well the second to the fourth cities.

print(my_favorite_cities[2])

print(their_temperatures[2])

print(my_favorite_cities[2:4])

We can manipulate vectors as it pleases us, say for example we want to remove the first element of the cities (You notice that we are not using the ‘<-’ sign to update the value of the vector.

print(my_favorite_cities[-1])

Introduction to Factors

Factors are a data structure in R that is ideal for representing nominal data (Nominal data is data that can be labelled or classified into mutually exclusive categories within a variable). One could use a vector of strings to stor nominal data, but we will see in a few steps some of the advantages of R’s factors data structure. Factors are a special type of vector.

Basic factors

cities_weather <- factor(c("Tempered", "Cold", "Cold","Warm"))
print(cities_weather)

We can notice in the print command here above that there is a mention of levels. Levels provide us with the unique categories comprises in this factor vector.

Advanced factors - setting levels

We can modify factors to ready them for additional values. One way to do it is to set the levels manually.

cities_weather <- factor(c("Tempered", "Cold", "Cold","Warm"), levels = c("Tempered", "Cold","Warm", "Extremely Warm"))
print(cities_weather)

Advanced factors - setting levels and ordering them

On top of manually setting up levels we can also order them, see below

cities_weather <- factor(c("Tempered", "Cold", "Cold","Warm"), levels = c("Cold","Tempered", "Warm", "Extremely Warm"), ordered = TRUE)
print(cities_weather)

Comparing factor value

One of the key advantage of the factors is that we can compare the value or the elements, something we could not do if we had used a vector of string instead. IN the following example, we will check which of the cities_weather value is not that cold

cities_weather > "Tempered"
print(cities_weather)

As seen above, the R factors enables for a comparison of nominal or string like elements.

Introduction to lists

List in R are pretty similar to vectors, except that they accept all sort of data type. cities_memories <-

cities_memories <- list("Sighseeing", TRUE, 2323, FALSE)
print(cities_memories)

The other interesting point about lists is that we can assign a name to each of the element. let’s try:

cities_list <- list(activity = "Sighseeing", parking_free = TRUE, postcode = 2323, free_wifi = FALSE)
print(cities_list)

With the name of the items in the list assigned, we can now call the items as follows:

cities_list$activity

Meanwhile the list carries some similar functions to the Vector. we can call elements by their position. See the example below we call the second element of the list:

cities_list[2]

Introduction to Dataframe

Dataframe are at the very center of the R programming framework. They can be compared to a spreadsheet from a visual angle, or to a database for the most advanced of you. A dataframe is nothing else than a combination of vectors, of the same length. Lets create a dataframe by combining some of the fectors that we created earlier.

destination_df <- data.frame(my_favorite_cities , their_temperatures ,mega_cities , cities_weather  )
print(destination_df)

Now that we have introduced the dataframe, we might want to verify that each of the columns contains the type of value that we expected. The str function is quite handy for that.

print(str(destination_df))

Our data have been integrated into the new vector as we expected.

Extracting from dataframe

Extracting one column from a dataframe

Extracting a column from a dataframe in R is very easy. call the dataframe df and the dollar sign $ followed by the name of the column or attribute to be more precise. Lets give it a try

print(destination_df$my_favorite_cities)

Extracting two columns from a dataframe

To extract multiple columns, we will simply pass a vector to the dataframe. That vector will contain the name of the columns

destination_df[c("my_favorite_cities", "cities_weather")]

Extracting the 1st row in the second column

destination_df[1,2]

Extracting the 1st and the 3rd row of the second and third columns

destination_df[c(1,3),c(2,3)]

Extracting all the rows of the second column

destination_df[,2]

Extracting the first and second rows of two specific columns

destination_df[c(1,2), c("mega_cities","cities_weather")]

We can keep playing with these combination and even add negative selectors such as destination_city[-2,-2], try it and see what comes out.

Introduction to matrices

In R, matrices are an object used to store data in a tabular shape. Usually they are of a single data type, as one would mostly used them for calculus. Remember as c() created vectors, list() creates list and data.frame() creates dataframe? for matrices we can simply use matrix() to create a matrix. Careful with the trick here, we need to specify the number of columns or rows. Lets do a few examples.

mat_a <- matrix(c(4,5,8,7,5,4,11,21,3), nrow=3)
print(mat_a)

As one can see, the matrix contains a vector, of which the lengh should be greater to than the square value of the number of rows or columns instructed. Let review the same matrix but with a ncol this time, and see what happens.

mat_b <- matrix(c(4,5,8,7,5,4,11,21,3), ncol=2)
print(mat_b)

We got a warning that indicates that the data length is inconsistent.

