This is a Frantz
Moudoute Notebook. The following document will provide an overlook
of a few ML techniques used in R between 2014 and 2022.
Introduction to vectors
Among the key concept in R are the way R manages data. Data are
stored as vectors, each vector contains elements. Watch out, because all
element must be of the same type in R. there can be integer, double,
character, logical (Boolean) among others.
Let create some vector with a few elements below, first a vector of
strings.
my_favorite_cities <- c("Hong Kong", "Tokyo", "Paris", "Douala")
print(my_favorite_cities)
Then a vector of doubles
their_temperatures <- c(25.3, 21.0,15.0,35.0)
print(their_temperatures)
And last a vector of booleans
mega_cities <- c(FALSE, TRUE, FALSE, FALSE) # You can see that TRUE/FALSE are all in capital letters
print(mega_cities)
In R, as one would expect, values stored in a vector will preserve
their order, so we can call each value using its position, let call the
second city and the second temperature. Lets play a bit with the data
and print as well the second to the fourth cities.
print(my_favorite_cities[2])
print(their_temperatures[2])
print(my_favorite_cities[2:4])
We can manipulate vectors as it pleases us, say for example we want
to remove the first element of the cities (You notice that we are not
using the ‘<-’ sign to update the value of the vector.
print(my_favorite_cities[-1])
Introduction to Factors
Factors are a data structure in R that is ideal for representing
nominal data (Nominal data is data that can be labelled or classified
into mutually exclusive categories within a variable). One could use a
vector of strings to stor nominal data, but we will see in a few steps
some of the advantages of R’s factors data structure. Factors are a
special type of vector.
Basic factors
cities_weather <- factor(c("Tempered", "Cold", "Cold","Warm"))
print(cities_weather)
We can notice in the print command here above that there is a mention
of levels. Levels provide us with the unique categories comprises in
this factor vector.
Advanced factors - setting levels
We can modify factors to ready them for additional values. One way to
do it is to set the levels manually.
cities_weather <- factor(c("Tempered", "Cold", "Cold","Warm"), levels = c("Tempered", "Cold","Warm", "Extremely Warm"))
print(cities_weather)
Advanced factors - setting levels and ordering them
On top of manually setting up levels we can also order them, see
below
cities_weather <- factor(c("Tempered", "Cold", "Cold","Warm"), levels = c("Cold","Tempered", "Warm", "Extremely Warm"), ordered = TRUE)
print(cities_weather)
Comparing factor value
One of the key advantage of the factors is that we can compare the
value or the elements, something we could not do if we had used a vector
of string instead. IN the following example, we will check which of the
cities_weather value is not that cold
cities_weather > "Tempered"
print(cities_weather)
As seen above, the R factors enables for a comparison of nominal or
string like elements.
Introduction to lists
List in R are pretty similar to vectors, except that they accept all
sort of data type. cities_memories <-
cities_memories <- list("Sighseeing", TRUE, 2323, FALSE)
print(cities_memories)
The other interesting point about lists is that we can assign a name
to each of the element. let’s try:
cities_list <- list(activity = "Sighseeing", parking_free = TRUE, postcode = 2323, free_wifi = FALSE)
print(cities_list)
With the name of the items in the list assigned, we can now call the
items as follows:
cities_list$activity
Meanwhile the list carries some similar functions to the Vector. we
can call elements by their position. See the example below we call the
second element of the list:
cities_list[2]
Introduction to Dataframe
Dataframe are at the very center of the R programming framework. They
can be compared to a spreadsheet from a visual angle, or to a database
for the most advanced of you. A dataframe is nothing else than a
combination of vectors, of the same length. Lets create a dataframe by
combining some of the fectors that we created earlier.
destination_df <- data.frame(my_favorite_cities , their_temperatures ,mega_cities , cities_weather )
print(destination_df)
Now that we have introduced the dataframe, we might want to verify
that each of the columns contains the type of value that we expected.
The str function is quite handy for that.
print(str(destination_df))
Our data have been integrated into the new vector as we expected.
Introduction to matrices
In R, matrices are an object used to store data in a tabular shape.
Usually they are of a single data type, as one would mostly used them
for calculus. Remember as c() created vectors, list() creates list and
data.frame() creates dataframe? for matrices we can simply use matrix()
to create a matrix. Careful with the trick here, we need to specify the
number of columns or rows. Lets do a few examples.
mat_a <- matrix(c(4,5,8,7,5,4,11,21,3), nrow=3)
print(mat_a)
As one can see, the matrix contains a vector, of which the lengh
should be greater to than the square value of the number of rows or
columns instructed. Let review the same matrix but with a ncol this
time, and see what happens.
mat_b <- matrix(c(4,5,8,7,5,4,11,21,3), ncol=2)
print(mat_b)
We got a warning that indicates that the data length is
inconsistent.
