Creation of Project Folders.

Reproduce your work with renv
Keeps R History (Global environment is stored)
Loading data is an easy task (No need for path! if you do it right)

An R project enables your work to be bundled in a portable, self-contained folder. Within the project, all the relevant scripts, data files, figures/outputs, and history are stored in sub-folders and importantly - the working directory is the project’s root folder. You cant appreciate this enough until you start working on big projects.

Reproducible Programming

Use R markdown or sweave. We use rmarkdown for our case.

Repoducability
Speed of task management
Smart workfow

Reproducible Environment

\(renv::init()\)
\(renv::ativate()\)
\(renv::snapshot()\)
\(renv.restore()\)
to check the version of package.
packageVersion(“ggplot2”)

Git

future lesson!

Expressions

Arithmetic Expressions

+
-
*
^ exponent sum sum / is used in division

%% is used in modulus division- Returns the remainder

%/% used in integer division- Returns integers from divisions getting rid of decimal/floats

 1+2
## [1] 3
4%%3
## [1] 1
5%%3
## [1] 2
4%%2
## [1] 0
4%/%3
## [1] 1
4%%3
## [1] 1
5%%3
## [1] 2

Boolean Expressions

Comparison Operators

== Equals to

!= Not Equal to

>

<= less than or equal to They are expressions whose output is either true or false.

Logical Operators

| OR & and ! Logical operator not that negates values

| logical element wise OR operator- Returns TRUE if at least one condition is TRUE

& logical element wise AND operator - Only returns true if all are TRUE

4>6
## [1] FALSE
4<=6
## [1] TRUE
4<6 & 4>3
## [1] TRUE
4<6 & 4<3
## [1] FALSE
!TRUE
## [1] FALSE
!FALSE
## [1] TRUE
x<-c(T,F,T,F)
y<-c(T,T,F,F)
x|y
## [1]  TRUE  TRUE  TRUE FALSE
x&y
## [1]  TRUE FALSE FALSE FALSE

Variable Assignments

m<-c('mango',"Apple","Berries")
length(m)
## [1] 3
m=c('mango',"Apple","Berries","Mary's")
m
## [1] "mango"   "Apple"   "Berries" "Mary's"
c(1,56,56)->z

z[1]<-2
z 
## [1]  2 56 56
#Other specific assignments for R environments!
## [1] 1
## [1]  2  2  7  9 10 33

Basic Data Types

  • The class() is used to check the data type of data

1. Numeric Data

  • It is data whose values are members of real number

  • Such as :

    • Integers(whole numbers)

    • Double/Floats/Decimals

    • Fractions

    • Complex/imaginary numbers

  • We can coerce data to Numeric using the as.numeric()

  • We can check if numeric using the is.numeric()

is.numeric(2)
## [1] TRUE
class(2)
## [1] "numeric"

2. Integer data

  • It is data whose values are whole numbers

  • By default R saves all numbers as Numeric.

  • To coerce a number as Integer, we add a prefix L to it

  • We can also use the as.integer()

  • When a decimal is coerced as integer it loses its floating numbers

is.integer(2)
## [1] FALSE
is.integer(2L)
## [1] TRUE
x<-as.integer(2)
class(2L)
## [1] "integer"
class(x)
## [1] "integer"
as.integer(2.8)
## [1] 2

3. Complex Data

  • Refers to data that is imaginary

  • Imaginary implies that the data is not a member of the real numbers

  • A number is defined as complex if it has a prefix i

1+0i
## [1] 1+0i
is.complex(1+0i)
## [1] TRUE
class(1+0i)
## [1] "complex"
  • Dealing with complex data enables us find the square root of negative numbers
sqrt(as.complex(-5))
## [1] 0+2.236068i

5. Character Data

  • Refers to data whose values are strings

  • Any data value inside double "" or single '' quotations marks is regarded as a string

x<-"hello world"
class(x)
## [1] "character"
is.character(x)
## [1] TRUE
as.character(3.14)
## [1] "3.14"

Data Structures

  • By concatenating simple data types, you create data structures and they include
  • In the event you have different data types in a vector, R tries to coerce them to the most suitable data type and it follows the below priority
    • Character

    • Complex

    • Numeric

    • Integer

    • Logical If for example we have a vector of all data types.

1. Vectors

  • A vector is a one dimension data structure that stores data of the same type

  • Vectors are also called atomic sequences

v<-c(1,2,3,"f");v
## [1] "1" "2" "3" "f"
1:3
## [1] 1 2 3
rep("foo",3)
## [1] "foo" "foo" "foo"
  • We can check if a data structure is a vector using the is.atomic() or is.vector() functions
is.atomic(v)
## [1] TRUE
is.vector(v)
## [1] TRUE
  • In the event a vector has an attribute then is.vector() will return FALSE

  • The is.atomic() returns TRUE confirming that its a vector even though it has attributes

v<-1:3
is.vector(v)
## [1] TRUE
attr(v,"foo")<-"bar";v
## [1] 1 2 3
## attr(,"foo")
## [1] "bar"
is.vector(v)
## [1] FALSE
is.atomic(v)
## [1] TRUE
  • I

    x<-c(1,2,3i,"foo",2L,2<3)
    class(x)
    ## [1] "character"
    x<-c(1,2,3i,2L,2<3)
    class(x)
    ## [1] "complex"
    x<-c(1,2,2L,2<3)
    class(x)
    ## [1] "numeric"
    x<-c(2L,2<3)
    class(x)
    ## [1] "integer"
    x<-2<3
    class(x)
    ## [1] "logical"

2. Matrix

  • It is a 2 dimensional data structure that stores data of the same type

  • A matrix is a 2 dimensional vector

    • The dim() is used to define the dimensions(rows and columns) of a vector to make it a matrix

    • The attribute() function is used to check the dimensions of a matrix

v<-1:6
attributes(v) #returns null as v has no dimensions yet
## NULL
dim(v)<-c(2,3) #We've defined the dimensions of v making it amatrix with 2 rows and 3 columns
attributes(v) # checks dimensions
## $dim
## [1] 2 3
dim(v) # checks dimensions
## [1] 2 3
v # is now a matrix with 2 rows and 3 columns
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
  • By default values in a vector r added column-wise by R

  • The matrix() is used to create matrices and can be used to define if data is to be input row wise or column wise

  • The argument byrow=T indicates that data is input row-wise into a matrix

v<-1:6
v<-matrix(v,nrow=2,ncol=3,byrow=F);v
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
v<-1:6
v<-matrix(v,nrow=2,ncol=3,byrow=T);v
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
  • NB: In Matrices , * is used for scalar multiplication while %*% is used for matrix multiplication
(a<-matrix(1:4,nrow=2))
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
(b<-matrix(5:8,nrow=2))
##      [,1] [,2]
## [1,]    5    7
## [2,]    6    8
a*b
##      [,1] [,2]
## [1,]    5   21
## [2,]   12   32
a%*%b
##      [,1] [,2]
## [1,]   23   31
## [2,]   34   46

######====f - t() is used to transpose a matrix while the solve() is used to inverse a matrix

t(a)
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
solve(a)
##      [,1] [,2]
## [1,]   -2  1.5
## [2,]    1 -0.5
solve(a)%*%a
##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1

3. Lists

  • Lists and Arrays store data in layers

  • A list is a one dimension data structures that stores different data types in layers

  • An Array is a 2 dimensional data structure that stores different data types in layers

  • A Dataframe is a 2 dimensional data structure that stores differenet data types

  • The list() function is used to create lists

list(1:3,5:8) # A list of 2 vectors 
## [[1]]
## [1] 1 2 3
## 
## [[2]]
## [1] 5 6 7 8
list(1:3,c(T,F))
## [[1]]
## [1] 1 2 3
## 
## [[2]]
## [1]  TRUE FALSE
  • We can also create recurrsive lists (A list inside another list )
list(list(),list(list(),list()))
## [[1]]
## list()
## 
## [[2]]
## [[2]][[1]]
## list()
## 
## [[2]][[2]]
## list()
(list_s<-list(c("jan","feb",'march','april'),
             matrix(c(3,9,5,1,-2,8),nrow = 2),
             list("flowers",'chocolate')))
## [[1]]
## [1] "jan"   "feb"   "march" "april"
## 
## [[2]]
##      [,1] [,2] [,3]
## [1,]    3    5   -2
## [2,]    9    1    8
## 
## [[3]]
## [[3]][[1]]
## [1] "flowers"
## 
## [[3]][[2]]
## [1] "chocolate"
  • The unlist() is used to reduce a list into a vector
unlist(list(1:4,5:7))
## [1] 1 2 3 4 5 6 7

Indexing

The ?"[[" displays documentation for indexing in R

?"[["
  • We use indexing to access elements of different data structures
  1. Indexing Vectors
v<-1:4
v[2]
## [1] 2
v[2:3]
## [1] 2 3
# Special case for using vector of indices
v[c(1,1,4,3,2)]
## [1] 1 1 4 3 2
v[-1]
## [1] 2 3 4
v[-(1:2)]
## [1] 3 4
  • You can’t combine negative and positive indices

  • Another way to index is by using Boolean expressions or a Boolean vector

  • The Boolean vector should be of the same size as the vector

v[v%%2==0]
## [1] 2 4
  • By Indexing you can assign values to a vector or add new values
v[v%%2==0]<-13;v
## [1]  1 13  3 13
  1. Indexing matrices
m<-matrix(1:6,nrow=2,byrow=T);m
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
m[1,] #access 1st row
## [1] 1 2 3
m[,1] # access 1st column
## [1] 1 4
  • When accessing a single row or column, it is reduced to a vector

  • Using the argument drop=F we maintain the single rows and columns as 2 dimensional

m<-matrix(1:6,nrow=2, byrow=T);m
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
m[1,,drop=T]
## [1] 1 2 3
m[,1,drop=F]
##      [,1]
## [1,]    1
## [2,]    4
  1. Indexing Lists
  • NB: When sub-setting list using indexing[] we get another list

  • This case applies even if you’re sub-setting a single element

    • Reason: When sub-setting a single element your’e not getting that elemnt but the list that contains it
L<-list(1:3,6:7)
is.list(L[1])
## [1] TRUE
L[[1]]
## [1] 1 2 3
  • To get that element instead, you have to use double square brackets
L[[1]]
## [1] 1 2 3

Named values

  • Elements in vectors or lists can have names

  • These are the attributes that don’t affect the values of the elements and can be used to refer to them

v<-c(a=1,b=2,c=3,d=4);v
## a b c d 
## 1 2 3 4
l<-list(a=1:5,b=c(T,F));l
## $a
## [1] 1 2 3 4 5
## 
## $b
## [1]  TRUE FALSE
  • The names()function can also be used in naming the elements
names(v)<-LETTERS[1:4];v
## A B C D 
## 1 2 3 4
  • The names of this elements can be used to Access them

  • This is referred to as string indexing

v["A"]
## A 
## 1
l['a']
## $a
## [1] 1 2 3 4 5
l[["a"]]
## [1] 1 2 3 4 5
  • When elements are named, one can use the $ to access the elements

  • Essentially it works like the [[]] But there’s no need of having the names in quotations

  • NB: $ can’t be applied to vectors

l$a
## [1] 1 2 3 4 5
  • When [[]] is used in vectors, it allows you to only extract one element and if the element is name, the name is removed
v
## A B C D 
## 1 2 3 4
v[[1]]
## [1] 1

4. Factors

Factors in R are a way to handle categorical data. Categorical data is data that can be divided into a limited number of categories or groups. These categories are known as levels.

Imagine you have a survey where people can rate their satisfaction as “low”, “medium”, or “high”. These ratings are categorical data because there are only a few possible values (levels) the rating can take.

Nominal Categorical Variables

Order dooes not matter \(as.factor(x)\)

color_vector <- c('blue', 'red', 'green', 'white', 'black', 'yellow')
color_vector_f<-as.factor(color_vector)
class(color_vector_f)
## [1] "factor"

ordinal

Order matters Ordinal categorical variables do have a natural ordering. We can specify the order, from the lowest to the highest with order = TRUE and highest levs=“desired order”.

day_vector <- c('evening', 'morning', 'afternoon', 'midday', 'midnight', 'evening')
# Convert `day_vector` to a factor with ordered level
factor_day <- factor(day_vector, order = TRUE, levels =c('morning', 'midday', 'afternoon', 'evening', 'midnight'))
class(factor_day)
## [1] "ordered" "factor"
str(factor_day)
##  Ord.factor w/ 5 levels "morning"<"midday"<..: 4 1 3 2 5 4
##descending to order
factor_day <- factor(day_vector, order = T, levels =rev(c('morning', 'midday', 'afternoon', 'evening', 'midnight')))

Dataframe

Tabular Structure: Organizes data into rows and columns. Variable Heterogeneity: Accommodates different data types in one dataset. Integration: Works seamlessly with many R packages and functions. Data Transformation: Facilitates easy manipulation and transformation of data. Compatibility: Imports from and exports to various data sources. Interactive Analysis: Supports interactive exploration and visualization. Reporting: Enables generation of formatted reports and summaries.

# Create a data frame
Data_Frame <- data.frame (
  Training = c("Strength", "Stamina", "Other"),
  Pulse = c(100, 150, 120),
  Duration = c(60, 30, 45)
)

# Print the data frame
Data_Frame

Acessing data frames

Data_Frame[1]
Data_Frame[["Training"]]
## [1] "Strength" "Stamina"  "Other"
#### We often use this####
Data_Frame$Training
## [1] "Strength" "Stamina"  "Other"

Manipulate

# add a column
Data_Frame$heartrate=Data_Frame[,2]/Data_Frame[,3]
Data_Frame
# append a column
Data_Frame2=data.frame(volume=sample(3000:5000,size = 3,replace = T))
new_dataframe=cbind(Data_Frame,Data_Frame2)
new_dataframe
#### append rows
new_dataframe2=rbind(Data_Frame,Data_Frame)
new_dataframe2
Data_Frame_New <- Data_Frame[-c(1,3), -c(1,2)]

# Print the new data frame
Data_Frame_New

\(ncols()\)

\(nrows()\)

\(dim()\) \(colnames()\)

Question

Write matrix \(A=5\times 5\) and \(B=5\times 5\) matrix. Carry out both scalar multiplication and matrix multiplication, assign the two values \(scalar\_a\) and second matrix \(mat\_a\). Find out if the object in the third row and second column in\(mat\_a\) is less than object in fourth row and second column \(scalar\_a\).

If you have come all this way ** click here**

For those who got the answers** click here**