Creation of Project Folders.

Reproduce your work with renv
Keeps R History (Global environment is stored)
Loading data is an easy task (No need for path! if you do it right)

An R project enables your work to be bundled in a portable, self-contained folder. Within the project, all the relevant scripts, data files, figures/outputs, and history are stored in sub-folders and importantly - the working directory is the project’s root folder. You cant appreciate this enough until you start working on big projects.

Reproducible Programming

Use R markdown or sweave. We use rmarkdown for our case.

Repoducability
Speed of task management
Smart workfow

Reproducible Environment

$renv::init()$
$renv::ativate()$
$renv::snapshot()$
$renv.restore()$
to check the version of package.
packageVersion(“ggplot2”)

Git

future lesson!

Expressions

Arithmetic Expressions

+
-
*
^ exponent sum sum / is used in division

%% is used in modulus division- Returns the remainder

%/% used in integer division- Returns integers from divisions getting rid of decimal/floats

1+2

## [1] 3

4%%3

## [1] 1

5%%3

## [1] 2

4%%2

## [1] 0

4%/%3

## [1] 1

4%%3

## [1] 1

5%%3

## [1] 2

Boolean Expressions

Comparison Operators

== Equals to

!= Not Equal to

>

<= less than or equal to They are expressions whose output is either true or false.

Logical Operators

| OR & and ! Logical operator not that negates values

| logical element wise OR operator- Returns TRUE if at least one condition is TRUE

& logical element wise AND operator - Only returns true if all are TRUE

4>6

## [1] FALSE

4<=6

## [1] TRUE

4<6 & 4>3

## [1] TRUE

4<6 & 4<3

## [1] FALSE

!TRUE

## [1] FALSE

!FALSE

## [1] TRUE

x<-c(T,F,T,F)
y<-c(T,T,F,F)
x|y

## [1]  TRUE  TRUE  TRUE FALSE

x&y

## [1]  TRUE FALSE FALSE FALSE

Variable Assignments

m<-c('mango',"Apple","Berries")
length(m)

## [1] 3

m=c('mango',"Apple","Berries","Mary's")
m

## [1] "mango"   "Apple"   "Berries" "Mary's"

c(1,56,56)->z

z[1]<-2
z

## [1]  2 56 56

#Other specific assignments for R environments!

## [1] 1

## [1]  2  2  7  9 10 33

Basic Data Types

The class() is used to check the data type of data

1. Numeric Data

It is data whose values are members of real number
Such as :
- Integers(whole numbers)
- Double/Floats/Decimals
- Fractions
- Complex/imaginary numbers
We can coerce data to Numeric using the as.numeric()
We can check if numeric using the is.numeric()

is.numeric(2)

## [1] TRUE

class(2)

## [1] "numeric"

2. Integer data

It is data whose values are whole numbers
By default R saves all numbers as Numeric.
To coerce a number as Integer, we add a prefix L to it
We can also use the as.integer()
When a decimal is coerced as integer it loses its floating numbers

is.integer(2)

## [1] FALSE

is.integer(2L)

## [1] TRUE

x<-as.integer(2)
class(2L)

## [1] "integer"

class(x)

## [1] "integer"

as.integer(2.8)

## [1] 2

3. Complex Data

Refers to data that is imaginary
Imaginary implies that the data is not a member of the real numbers
A number is defined as complex if it has a prefix i

1+0i

## [1] 1+0i

is.complex(1+0i)

## [1] TRUE

class(1+0i)

## [1] "complex"

Dealing with complex data enables us find the square root of negative numbers

sqrt(as.complex(-5))

## [1] 0+2.236068i

5. Character Data

Refers to data whose values are strings
Any data value inside double "" or single '' quotations marks is regarded as a string

x<-"hello world"
class(x)

## [1] "character"

is.character(x)

## [1] TRUE

as.character(3.14)

## [1] "3.14"

Data Structures

By concatenating simple data types, you create data structures and they include
In the event you have different data types in a vector, R tries to coerce them to the most suitable data type and it follows the below priority
- Character
- Complex
- Numeric
- Integer
- Logical If for example we have a vector of all data types.

1. Vectors

A vector is a one dimension data structure that stores data of the same type
Vectors are also called atomic sequences

v<-c(1,2,3,"f");v

## [1] "1" "2" "3" "f"

1:3

## [1] 1 2 3

rep("foo",3)

## [1] "foo" "foo" "foo"

We can check if a data structure is a vector using the is.atomic() or is.vector() functions

is.atomic(v)

## [1] TRUE

is.vector(v)

## [1] TRUE

In the event a vector has an attribute then is.vector() will return FALSE
The is.atomic() returns TRUE confirming that its a vector even though it has attributes

v<-1:3
is.vector(v)

## [1] TRUE

attr(v,"foo")<-"bar";v

## [1] 1 2 3
## attr(,"foo")
## [1] "bar"

is.vector(v)

## [1] FALSE

is.atomic(v)

## [1] TRUE

x<-c(1,2,3i,"foo",2L,2<3)
class(x)

## [1] "character"

x<-c(1,2,3i,2L,2<3)
class(x)

## [1] "complex"

x<-c(1,2,2L,2<3)
class(x)

## [1] "numeric"

x<-c(2L,2<3)
class(x)

## [1] "integer"

x<-2<3
class(x)

## [1] "logical"

2. Matrix

It is a 2 dimensional data structure that stores data of the same type
A matrix is a 2 dimensional vector
- The dim() is used to define the dimensions(rows and columns) of a vector to make it a matrix
- The attribute() function is used to check the dimensions of a matrix

v<-1:6
attributes(v) #returns null as v has no dimensions yet

## NULL

dim(v)<-c(2,3) #We've defined the dimensions of v making it amatrix with 2 rows and 3 columns
attributes(v) # checks dimensions

## $dim
## [1] 2 3

dim(v) # checks dimensions

## [1] 2 3

v # is now a matrix with 2 rows and 3 columns

##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

By default values in a vector r added column-wise by R
The matrix() is used to create matrices and can be used to define if data is to be input row wise or column wise
The argument byrow=T indicates that data is input row-wise into a matrix

v<-1:6
v<-matrix(v,nrow=2,ncol=3,byrow=F);v

##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

v<-1:6
v<-matrix(v,nrow=2,ncol=3,byrow=T);v

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6

NB: In Matrices , * is used for scalar multiplication while %*% is used for matrix multiplication

(a<-matrix(1:4,nrow=2))

##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4

(b<-matrix(5:8,nrow=2))

##      [,1] [,2]
## [1,]    5    7
## [2,]    6    8

a*b

##      [,1] [,2]
## [1,]    5   21
## [2,]   12   32

a%*%b

##      [,1] [,2]
## [1,]   23   31
## [2,]   34   46

######====f - t() is used to transpose a matrix while the solve() is used to inverse a matrix

t(a)

##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4

solve(a)

##      [,1] [,2]
## [1,]   -2  1.5
## [2,]    1 -0.5

solve(a)%*%a

##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1

3. Lists

Lists and Arrays store data in layers
A list is a one dimension data structures that stores different data types in layers
An Array is a 2 dimensional data structure that stores different data types in layers
A Dataframe is a 2 dimensional data structure that stores differenet data types
The list() function is used to create lists

list(1:3,5:8) # A list of 2 vectors

## [[1]]
## [1] 1 2 3
## 
## [[2]]
## [1] 5 6 7 8

list(1:3,c(T,F))

## [[1]]
## [1] 1 2 3
## 
## [[2]]
## [1]  TRUE FALSE

We can also create recurrsive lists (A list inside another list )

list(list(),list(list(),list()))

## [[1]]
## list()
## 
## [[2]]
## [[2]][[1]]
## list()
## 
## [[2]][[2]]
## list()

(list_s<-list(c("jan","feb",'march','april'),
             matrix(c(3,9,5,1,-2,8),nrow = 2),
             list("flowers",'chocolate')))

## [[1]]
## [1] "jan"   "feb"   "march" "april"
## 
## [[2]]
##      [,1] [,2] [,3]
## [1,]    3    5   -2
## [2,]    9    1    8
## 
## [[3]]
## [[3]][[1]]
## [1] "flowers"
## 
## [[3]][[2]]
## [1] "chocolate"

The unlist() is used to reduce a list into a vector

unlist(list(1:4,5:7))

## [1] 1 2 3 4 5 6 7

Indexing

The ?"[[" displays documentation for indexing in R

?"[["

We use indexing to access elements of different data structures

Indexing Vectors

v<-1:4
v[2]

## [1] 2

v[2:3]

## [1] 2 3

# Special case for using vector of indices
v[c(1,1,4,3,2)]

## [1] 1 1 4 3 2

v[-1]

## [1] 2 3 4

v[-(1:2)]

## [1] 3 4

You can’t combine negative and positive indices
Another way to index is by using Boolean expressions or a Boolean vector
The Boolean vector should be of the same size as the vector

v[v%%2==0]

## [1] 2 4

By Indexing you can assign values to a vector or add new values

v[v%%2==0]<-13;v

## [1]  1 13  3 13

Indexing matrices

m<-matrix(1:6,nrow=2,byrow=T);m

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6

m[1,] #access 1st row

## [1] 1 2 3

m[,1] # access 1st column

## [1] 1 4

When accessing a single row or column, it is reduced to a vector
Using the argument drop=F we maintain the single rows and columns as 2 dimensional

m<-matrix(1:6,nrow=2, byrow=T);m

##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6

m[1,,drop=T]

## [1] 1 2 3

m[,1,drop=F]

##      [,1]
## [1,]    1
## [2,]    4

Indexing Lists

NB: When sub-setting list using indexing[] we get another list
This case applies even if you’re sub-setting a single element
- Reason: When sub-setting a single element your’e not getting that elemnt but the list that contains it

L<-list(1:3,6:7)
is.list(L[1])

## [1] TRUE

L[[1]]

## [1] 1 2 3

To get that element instead, you have to use double square brackets

L[[1]]

## [1] 1 2 3

Named values

Elements in vectors or lists can have names
These are the attributes that don’t affect the values of the elements and can be used to refer to them

v<-c(a=1,b=2,c=3,d=4);v

## a b c d 
## 1 2 3 4

l<-list(a=1:5,b=c(T,F));l

## $a
## [1] 1 2 3 4 5
## 
## $b
## [1]  TRUE FALSE

The names()function can also be used in naming the elements

names(v)<-LETTERS[1:4];v

## A B C D 
## 1 2 3 4

The names of this elements can be used to Access them
This is referred to as string indexing

v["A"]

## A 
## 1

l['a']

## $a
## [1] 1 2 3 4 5

l[["a"]]

## [1] 1 2 3 4 5

When elements are named, one can use the $ to access the elements
Essentially it works like the [[]] But there’s no need of having the names in quotations
NB: $ can’t be applied to vectors

l$a

## [1] 1 2 3 4 5

When [[]] is used in vectors, it allows you to only extract one element and if the element is name, the name is removed

## A B C D 
## 1 2 3 4

v[[1]]

## [1] 1

4. Factors

Factors in R are a way to handle categorical data. Categorical data is data that can be divided into a limited number of categories or groups. These categories are known as levels.

Imagine you have a survey where people can rate their satisfaction as “low”, “medium”, or “high”. These ratings are categorical data because there are only a few possible values (levels) the rating can take.

Nominal Categorical Variables

Order dooes not matter $as.factor(x)$

color_vector <- c('blue', 'red', 'green', 'white', 'black', 'yellow')
color_vector_f<-as.factor(color_vector)
class(color_vector_f)

## [1] "factor"

ordinal

Order matters Ordinal categorical variables do have a natural ordering. We can specify the order, from the lowest to the highest with order = TRUE and highest levs=“desired order”.

day_vector <- c('evening', 'morning', 'afternoon', 'midday', 'midnight', 'evening')
# Convert `day_vector` to a factor with ordered level
factor_day <- factor(day_vector, order = TRUE, levels =c('morning', 'midday', 'afternoon', 'evening', 'midnight'))
class(factor_day)

## [1] "ordered" "factor"

str(factor_day)

##  Ord.factor w/ 5 levels "morning"<"midday"<..: 4 1 3 2 5 4

##descending to order
factor_day <- factor(day_vector, order = T, levels =rev(c('morning', 'midday', 'afternoon', 'evening', 'midnight')))

Dataframe

Tabular Structure: Organizes data into rows and columns. Variable Heterogeneity: Accommodates different data types in one dataset. Integration: Works seamlessly with many R packages and functions. Data Transformation: Facilitates easy manipulation and transformation of data. Compatibility: Imports from and exports to various data sources. Interactive Analysis: Supports interactive exploration and visualization. Reporting: Enables generation of formatted reports and summaries.

# Create a data frame
Data_Frame <- data.frame (
  Training = c("Strength", "Stamina", "Other"),
  Pulse = c(100, 150, 120),
  Duration = c(60, 30, 45)
)

# Print the data frame
Data_Frame

Acessing data frames

Data_Frame[1]

Data_Frame[["Training"]]

## [1] "Strength" "Stamina"  "Other"

#### We often use this####
Data_Frame$Training

## [1] "Strength" "Stamina"  "Other"

Manipulate

# add a column
Data_Frame$heartrate=Data_Frame[,2]/Data_Frame[,3]
Data_Frame

# append a column
Data_Frame2=data.frame(volume=sample(3000:5000,size = 3,replace = T))
new_dataframe=cbind(Data_Frame,Data_Frame2)
new_dataframe

#### append rows
new_dataframe2=rbind(Data_Frame,Data_Frame)
new_dataframe2

Data_Frame_New <- Data_Frame[-c(1,3), -c(1,2)]

# Print the new data frame
Data_Frame_New

$ncols()$

$nrows()$

$dim()$ $colnames()$

Question

Write matrix $A=5\times 5$ and $B=5\times 5$ matrix. Carry out both scalar multiplication and matrix multiplication, assign the two values $scalar\_a$ and second matrix $mat\_a$. Find out if the object in the third row and second column in$mat\_a$ is less than object in fourth row and second column $scalar\_a$.

If you have come all this way ** click here**

For those who got the answers** click here**

DekutR

Gathu Macharia

2024-04-30