This is a Frantz
Moudoute Notebook. The following document will provide an overlook
of a few ML techniques used in R between 2014 and 2022.
Introduction to vectors
Among the key concept in R are the way R manages data. Data are
stored as vectors, each vector contains elements. Watch out, because all
element must be of the same type in R. there can be integer, double,
character, logical (Boolean) among others.
Let create some vector with a few elements below, first a vector of
strings.
my_favorite_cities <- c("Hong Kong", "Tokyo", "Paris", "Douala")
print(my_favorite_cities)
Then a vector of doubles
their_temperatures <- c(25.3, 21.0,15.0,35.0)
print(their_temperatures)
And last a vector of booleans
mega_cities <- c(FALSE, TRUE, FALSE, FALSE) # You can see that TRUE/FALSE are all in capital letters
print(mega_cities)
In R, as one would expect, values stored in a vector will preserve
their order, so we can call each value using its position, let call the
second city and the second temperature. Lets play a bit with the data
and print as well the second to the fourth cities.
print(my_favorite_cities[2])
print(their_temperatures[2])
print(my_favorite_cities[2:4])
We can manipulate vectors as it pleases us, say for example we want
to remove the first element of the cities (You notice that we are not
using the ‘<-’ sign to update the value of the vector.
print(my_favorite_cities[-1])
Introduction to Factors
Factors are a data structure in R that is ideal for representing
nominal data (Nominal data is data that can be labelled or classified
into mutually exclusive categories within a variable). One could use a
vector of strings to stor nominal data, but we will see in a few steps
some of the advantages of R’s factors data structure. Factors are a
special type of vector.
Basic factors
cities_weather <- factor(c("Tempered", "Cold", "Cold","Warm"))
print(cities_weather)
We can notice in the print command here above that there is a mention
of levels. Levels provide us with the unique categories comprises in
this factor vector.
Advanced factors - setting levels
We can modify factors to ready them for additional values. One way to
do it is to set the levels manually.
cities_weather <- factor(c("Tempered", "Cold", "Cold","Warm"), levels = c("Tempered", "Cold","Warm", "Extremely Warm"))
print(cities_weather)
Advanced factors - setting levels and ordering them
On top of manually setting up levels we can also order them, see
below
cities_weather <- factor(c("Tempered", "Cold", "Cold","Warm"), levels = c("Cold","Tempered", "Warm", "Extremely Warm"), ordered = TRUE)
print(cities_weather)
Comparing factor value
One of the key advantage of the factors is that we can compare the
value or the elements, something we could not do if we had used a vector
of string instead. IN the following example, we will check which of the
cities_weather value is not that cold
cities_weather > "Tempered"
print(cities_weather)
As seen above, the R factors enables for a comparison of nominal or
string like elements.
Introduction to lists
List in R are pretty similar to vectors, except that they accept all
sort of data type. cities_memories <-
cities_memories <- list("Sighseeing", TRUE, 2323, FALSE)
print(cities_memories)
The other interesting point about lists is that we can assign a name
to each of the element. let’s try:
cities_list <- list(activity = "Sighseeing", parking_free = TRUE, postcode = 2323, free_wifi = FALSE)
print(cities_list)
With the name of the items in the list assigned, we can now call the
items as follows:
cities_list$activity
Meanwhile the list carries some similar functions to the Vector. we
can call elements by their position. See the example below we call the
second element of the list:
cities_list[2]
Introduction to Dataframe
Dataframe are at the very center of the R programming framework. They
can be compared to a spreadsheet from a visual angle, or to a database
for the most advanced of you. A dataframe is nothing else than a
combination of vectors, of the same length. Lets create a dataframe by
combining some of the fectors that we created earlier.
destination_df <- data.frame(my_favorite_cities , their_temperatures ,mega_cities , cities_weather )
print(destination_df)
Now that we have introduced the dataframe, we might want to verify
that each of the columns contains the type of value that we expected.
The str function is quite handy for that.
print(str(destination_df))
Our data have been integrated into the new vector as we expected.
Introduction to matrices
In R, matrices are an object used to store data in a tabular shape.
Usually they are of a single data type, as one would mostly used them
for calculus. Remember as c() created vectors, list() creates list and
data.frame() creates dataframe? for matrices we can simply use matrix()
to create a matrix. Careful with the trick here, we need to specify the
number of columns or rows. Lets do a few examples.
mat_a <- matrix(c(4,5,8,7,5,4,11,21,3), nrow=3)
print(mat_a)
As one can see, the matrix contains a vector, of which the lengh
should be greater to than the square value of the number of rows or
columns instructed. Let review the same matrix but with a ncol this
time, and see what happens.
mat_b <- matrix(c(4,5,8,7,5,4,11,21,3), ncol=2)
print(mat_b)
We got a warning that indicates that the data length is
inconsistent.
---
title: "R, from Zero to Hero: Introduction"
output: html_notebook
---

This is a [Frantz Moudoute](install.packages('RWeka')) Notebook. The following document will provide an overlook of a few ML techniques used in R between 2014 and 2022.

# Introduction to vectors

Among the key concept in R are the way R manages data. Data are stored as vectors, each vector contains elements. Watch out, because all element must be of the same type in R. there can be integer, double, character, logical (Boolean) among others.

Let create some vector with a few elements below, first a vector of strings.

```{r}
my_favorite_cities <- c("Hong Kong", "Tokyo", "Paris", "Douala")
print(my_favorite_cities)
```

Then a vector of doubles

```{r}
their_temperatures <- c(25.3, 21.0,15.0,35.0)
print(their_temperatures)
```

And last a vector of booleans

```{r}
mega_cities <- c(FALSE, TRUE, FALSE, FALSE)  # You can see that TRUE/FALSE are all in capital letters 
print(mega_cities)
```

In R, as one would expect, values stored in a vector will preserve their order, so we can call each value using its position, let call the second city and the second temperature. Lets play a bit with the data and print as well the second to the fourth cities.

```{r}
print(my_favorite_cities[2])

print(their_temperatures[2])

print(my_favorite_cities[2:4])
```

We can manipulate vectors as it pleases us, say for example we want to remove the first element of  the cities (You notice that  we are not using the '<-' sign to update the value of the vector.

```{r}
print(my_favorite_cities[-1])
```

# Introduction to Factors
Factors are a data structure in R that is ideal for representing nominal data (Nominal data is data that can be labelled or classified into mutually exclusive categories within a variable). One could use a vector of strings to stor nominal data, but we will see in a few steps some of the advantages of R's factors data structure. Factors are a special type of vector.

## Basic factors
```{r}
cities_weather <- factor(c("Tempered", "Cold", "Cold","Warm"))
print(cities_weather)
```
We can notice in the print command here above that there is a mention of levels. Levels provide us with the unique categories comprises in this factor vector.

## Advanced factors - setting levels
We can modify factors to ready them for additional values. One way to do it is to set the levels manually.
```{r}
cities_weather <- factor(c("Tempered", "Cold", "Cold","Warm"), levels = c("Tempered", "Cold","Warm", "Extremely Warm"))
print(cities_weather)
```

## Advanced factors - setting levels and ordering them
On top of manually setting up levels we can also order them, see below
```{r}
cities_weather <- factor(c("Tempered", "Cold", "Cold","Warm"), levels = c("Cold","Tempered", "Warm", "Extremely Warm"), ordered = TRUE)
print(cities_weather)
```
### Comparing factor value
One of the key advantage of the factors is that we can compare the value or the elements, something we could not do if we had used a vector of string instead. 
IN the following example, we will check which of the cities_weather value is not that cold
```{r}
cities_weather > "Tempered"
print(cities_weather)
```
As seen above, the R factors enables for a comparison of nominal or string like elements.

# Introduction to lists
List in R are pretty similar to vectors, except that they accept all sort of data type. 
cities_memories <- 
```{r}
cities_memories <- list("Sighseeing", TRUE, 2323, FALSE)
print(cities_memories)
```
 The other interesting point about lists is that we can assign a name to each of the element. let's try:

```{r}
cities_list <- list(activity = "Sighseeing", parking_free = TRUE, postcode = 2323, free_wifi = FALSE)
print(cities_list)
```
With the name of the items in the list assigned, we can now call the items as follows:
```{r}
cities_list$activity
```

Meanwhile the list carries some similar functions to the Vector. we can call elements by their position. See the example below we call the second element of the list:
```{r}
cities_list[2]

```

# Introduction to Dataframe
Dataframe are at the very center of the R programming framework. They can be compared to a spreadsheet from a visual angle, or  to a database for the most advanced of you. A dataframe is nothing else than a combination of vectors, of the same length. Lets create a dataframe by combining some of the fectors that we created earlier.

```{r}
destination_df <- data.frame(my_favorite_cities , their_temperatures ,mega_cities , cities_weather  )
print(destination_df)

```
Now that we have introduced the dataframe, we might want to verify that each of the columns contains the type of value that we expected. The *str* function is quite handy for that.
```{r}
print(str(destination_df))
```
Our data have been integrated into the new vector as we expected. 

## Extracting from dataframe
### Extracting one column from a dataframe
Extracting a column from a dataframe in R is very easy. call the dataframe *df* and the dollar sign *$* followed by the name of the column or attribute to be more precise. Lets give it a try 
```{r}
print(destination_df$my_favorite_cities)

```

### Extracting two columns from a dataframe
To extract multiple columns, we will simply pass a vector to the dataframe. That vector will contain the name of the columns
```{r}
destination_df[c("my_favorite_cities", "cities_weather")]
```

### Extracting the 1st row in the second column

```{r}
destination_df[1,2]

```

### Extracting the 1st and the 3rd row of the second and third columns
```{r}
destination_df[c(1,3),c(2,3)]

```

### Extracting all the rows of the second column
```{r}
destination_df[,2]

```
### Extracting the first and second rows of two specific columns 
```{r}
destination_df[c(1,2), c("mega_cities","cities_weather")]

```
We can keep playing with these combination and even add negative selectors such as *destination_city[-2,-2]*, try it and see what comes out.

# Introduction  to matrices
In R, matrices are an object used to store data in a tabular shape. Usually they are of a single data type, as one would mostly used them for calculus. Remember as c() created vectors, list() creates list and data.frame() creates dataframe? for matrices we can simply use matrix() to create a matrix. Careful with the trick here, we need to specify the number of columns or rows. Lets do a few examples.

```{r}
mat_a <- matrix(c(4,5,8,7,5,4,11,21,3), nrow=3)
print(mat_a)
```
As one can see, the matrix contains a vector, of which the lengh should be greater to than the square value of the number of rows or columns instructed. Let review the same matrix but with a ncol this time, and see what happens.


```{r}
mat_b <- matrix(c(4,5,8,7,5,4,11,21,3), ncol=2)
print(mat_b)
```
We got a warning that indicates that the data length is inconsistent.


