Reproduce your work with renv
Keeps R History (Global environment is stored)
Loading data is an easy task (No need for path! if you do it right)
An R project enables your work to be bundled in a portable, self-contained folder. Within the project, all the relevant scripts, data files, figures/outputs, and history are stored in sub-folders and importantly - the working directory is the project’s root folder. You cant appreciate this enough until you start working on big projects.
Use R markdown or sweave. We use rmarkdown for our case.
Repoducability
Speed of task management
Smart workfow
\(renv::init()\)
\(renv::ativate()\)
\(renv::snapshot()\)
\(renv.restore()\)
to check the version of package.
packageVersion(“ggplot2”)
future lesson!
+
-
*
^ exponent sum sum / is used in
division
%% is used in modulus division- Returns the
remainder
%/% used in integer division- Returns integers from
divisions getting rid of decimal/floats
1+2
## [1] 3
4%%3
## [1] 1
5%%3
## [1] 2
4%%2
## [1] 0
4%/%3
## [1] 1
4%%3
## [1] 1
5%%3
## [1] 2
== Equals to
!= Not Equal to
>
<= less than or equal to They are expressions whose
output is either true or false.
| OR & and ! Logical
operator not that negates values
| logical element wise OR operator- Returns TRUE if at
least one condition is TRUE
& logical element wise AND operator - Only returns
true if all are TRUE
4>6
## [1] FALSE
4<=6
## [1] TRUE
4<6 & 4>3
## [1] TRUE
4<6 & 4<3
## [1] FALSE
!TRUE
## [1] FALSE
!FALSE
## [1] TRUE
x<-c(T,F,T,F)
y<-c(T,T,F,F)
x|y
## [1] TRUE TRUE TRUE FALSE
x&y
## [1] TRUE FALSE FALSE FALSE
m<-c('mango',"Apple","Berries")
length(m)
## [1] 3
m=c('mango',"Apple","Berries","Mary's")
m
## [1] "mango" "Apple" "Berries" "Mary's"
c(1,56,56)->z
z[1]<-2
z
## [1] 2 56 56
#Other specific assignments for R environments!
## [1] 1
## [1] 2 2 7 9 10 33
class() is used to check the data type of dataIt is data whose values are members of real number
Such as :
Integers(whole numbers)
Double/Floats/Decimals
Fractions
Complex/imaginary numbers
We can coerce data to Numeric using the
as.numeric()
We can check if numeric using the
is.numeric()
is.numeric(2)
## [1] TRUE
class(2)
## [1] "numeric"
It is data whose values are whole numbers
By default R saves all numbers as Numeric.
To coerce a number as Integer, we add a prefix L to
it
We can also use the as.integer()
When a decimal is coerced as integer it loses its floating numbers
is.integer(2)
## [1] FALSE
is.integer(2L)
## [1] TRUE
x<-as.integer(2)
class(2L)
## [1] "integer"
class(x)
## [1] "integer"
as.integer(2.8)
## [1] 2
Refers to data that is imaginary
Imaginary implies that the data is not a member of the real numbers
A number is defined as complex if it has a prefix
i
1+0i
## [1] 1+0i
is.complex(1+0i)
## [1] TRUE
class(1+0i)
## [1] "complex"
sqrt(as.complex(-5))
## [1] 0+2.236068i
Refers to data whose values are strings
Any data value inside double "" or single
'' quotations marks is regarded as a string
x<-"hello world"
class(x)
## [1] "character"
is.character(x)
## [1] TRUE
as.character(3.14)
## [1] "3.14"
Character
Complex
Numeric
Integer
Logical If for example we have a vector of all data types.
A vector is a one dimension data structure that stores data of the same type
Vectors are also called atomic sequences
v<-c(1,2,3,"f");v
## [1] "1" "2" "3" "f"
1:3
## [1] 1 2 3
rep("foo",3)
## [1] "foo" "foo" "foo"
is.atomic() or is.vector() functionsis.atomic(v)
## [1] TRUE
is.vector(v)
## [1] TRUE
In the event a vector has an attribute then
is.vector() will return FALSE
The is.atomic() returns TRUE confirming that its a
vector even though it has attributes
v<-1:3
is.vector(v)
## [1] TRUE
attr(v,"foo")<-"bar";v
## [1] 1 2 3
## attr(,"foo")
## [1] "bar"
is.vector(v)
## [1] FALSE
is.atomic(v)
## [1] TRUE
I
x<-c(1,2,3i,"foo",2L,2<3)
class(x)
## [1] "character"
x<-c(1,2,3i,2L,2<3)
class(x)
## [1] "complex"
x<-c(1,2,2L,2<3)
class(x)
## [1] "numeric"
x<-c(2L,2<3)
class(x)
## [1] "integer"
x<-2<3
class(x)
## [1] "logical"It is a 2 dimensional data structure that stores data of the same type
A matrix is a 2 dimensional vector
The dim() is used to define the dimensions(rows and
columns) of a vector to make it a matrix
The attribute() function is used to check the
dimensions of a matrix
v<-1:6
attributes(v) #returns null as v has no dimensions yet
## NULL
dim(v)<-c(2,3) #We've defined the dimensions of v making it amatrix with 2 rows and 3 columns
attributes(v) # checks dimensions
## $dim
## [1] 2 3
dim(v) # checks dimensions
## [1] 2 3
v # is now a matrix with 2 rows and 3 columns
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
By default values in a vector r added column-wise by R
The matrix() is used to create matrices and can be
used to define if data is to be input row wise or column wise
The argument byrow=T indicates that data is input
row-wise into a matrix
v<-1:6
v<-matrix(v,nrow=2,ncol=3,byrow=F);v
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
v<-1:6
v<-matrix(v,nrow=2,ncol=3,byrow=T);v
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
* is used for scalar
multiplication while %*% is used for matrix
multiplication(a<-matrix(1:4,nrow=2))
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
(b<-matrix(5:8,nrow=2))
## [,1] [,2]
## [1,] 5 7
## [2,] 6 8
a*b
## [,1] [,2]
## [1,] 5 21
## [2,] 12 32
a%*%b
## [,1] [,2]
## [1,] 23 31
## [2,] 34 46
######====f - t() is used to transpose a matrix while
the solve() is used to inverse a matrix
t(a)
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
solve(a)
## [,1] [,2]
## [1,] -2 1.5
## [2,] 1 -0.5
solve(a)%*%a
## [,1] [,2]
## [1,] 1 0
## [2,] 0 1
Lists and Arrays store data in layers
A list is a one dimension data structures that stores different data types in layers
An Array is a 2 dimensional data structure that stores different data types in layers
A Dataframe is a 2 dimensional data structure that stores differenet data types
The list() function is used to create lists
list(1:3,5:8) # A list of 2 vectors
## [[1]]
## [1] 1 2 3
##
## [[2]]
## [1] 5 6 7 8
list(1:3,c(T,F))
## [[1]]
## [1] 1 2 3
##
## [[2]]
## [1] TRUE FALSE
list(list(),list(list(),list()))
## [[1]]
## list()
##
## [[2]]
## [[2]][[1]]
## list()
##
## [[2]][[2]]
## list()
(list_s<-list(c("jan","feb",'march','april'),
matrix(c(3,9,5,1,-2,8),nrow = 2),
list("flowers",'chocolate')))
## [[1]]
## [1] "jan" "feb" "march" "april"
##
## [[2]]
## [,1] [,2] [,3]
## [1,] 3 5 -2
## [2,] 9 1 8
##
## [[3]]
## [[3]][[1]]
## [1] "flowers"
##
## [[3]][[2]]
## [1] "chocolate"
unlist() is used to reduce a list into a
vectorunlist(list(1:4,5:7))
## [1] 1 2 3 4 5 6 7
The ?"[[" displays documentation for indexing in R
?"[["
v<-1:4
v[2]
## [1] 2
v[2:3]
## [1] 2 3
# Special case for using vector of indices
v[c(1,1,4,3,2)]
## [1] 1 1 4 3 2
v[-1]
## [1] 2 3 4
v[-(1:2)]
## [1] 3 4
You can’t combine negative and positive indices
Another way to index is by using Boolean expressions or a Boolean vector
The Boolean vector should be of the same size as the vector
v[v%%2==0]
## [1] 2 4
v[v%%2==0]<-13;v
## [1] 1 13 3 13
m<-matrix(1:6,nrow=2,byrow=T);m
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
m[1,] #access 1st row
## [1] 1 2 3
m[,1] # access 1st column
## [1] 1 4
When accessing a single row or column, it is reduced to a vector
Using the argument drop=F we maintain the single
rows and columns as 2 dimensional
m<-matrix(1:6,nrow=2, byrow=T);m
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
m[1,,drop=T]
## [1] 1 2 3
m[,1,drop=F]
## [,1]
## [1,] 1
## [2,] 4
NB: When sub-setting list using
indexing[] we get another list
This case applies even if you’re sub-setting a single element
L<-list(1:3,6:7)
is.list(L[1])
## [1] TRUE
L[[1]]
## [1] 1 2 3
L[[1]]
## [1] 1 2 3
Elements in vectors or lists can have names
These are the attributes that don’t affect the values of the elements and can be used to refer to them
v<-c(a=1,b=2,c=3,d=4);v
## a b c d
## 1 2 3 4
l<-list(a=1:5,b=c(T,F));l
## $a
## [1] 1 2 3 4 5
##
## $b
## [1] TRUE FALSE
names()function can also be used in naming the
elementsnames(v)<-LETTERS[1:4];v
## A B C D
## 1 2 3 4
The names of this elements can be used to Access them
This is referred to as string indexing
v["A"]
## A
## 1
l['a']
## $a
## [1] 1 2 3 4 5
l[["a"]]
## [1] 1 2 3 4 5
When elements are named, one can use the $ to access
the elements
Essentially it works like the [[]] But there’s no
need of having the names in quotations
NB: $ can’t be applied to
vectors
l$a
## [1] 1 2 3 4 5
[[]] is used in vectors, it allows you to only
extract one element and if the element is name, the name is removedv
## A B C D
## 1 2 3 4
v[[1]]
## [1] 1
Factors in R are a way to handle categorical data. Categorical data is data that can be divided into a limited number of categories or groups. These categories are known as levels.
Imagine you have a survey where people can rate their satisfaction as “low”, “medium”, or “high”. These ratings are categorical data because there are only a few possible values (levels) the rating can take.
Order dooes not matter \(as.factor(x)\)
color_vector <- c('blue', 'red', 'green', 'white', 'black', 'yellow')
color_vector_f<-as.factor(color_vector)
class(color_vector_f)
## [1] "factor"
Order matters Ordinal categorical variables do have a natural ordering. We can specify the order, from the lowest to the highest with order = TRUE and highest levs=“desired order”.
day_vector <- c('evening', 'morning', 'afternoon', 'midday', 'midnight', 'evening')
# Convert `day_vector` to a factor with ordered level
factor_day <- factor(day_vector, order = TRUE, levels =c('morning', 'midday', 'afternoon', 'evening', 'midnight'))
class(factor_day)
## [1] "ordered" "factor"
str(factor_day)
## Ord.factor w/ 5 levels "morning"<"midday"<..: 4 1 3 2 5 4
##descending to order
factor_day <- factor(day_vector, order = T, levels =rev(c('morning', 'midday', 'afternoon', 'evening', 'midnight')))
Tabular Structure: Organizes data into rows and columns. Variable Heterogeneity: Accommodates different data types in one dataset. Integration: Works seamlessly with many R packages and functions. Data Transformation: Facilitates easy manipulation and transformation of data. Compatibility: Imports from and exports to various data sources. Interactive Analysis: Supports interactive exploration and visualization. Reporting: Enables generation of formatted reports and summaries.
# Create a data frame
Data_Frame <- data.frame (
Training = c("Strength", "Stamina", "Other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
# Print the data frame
Data_Frame
Data_Frame[1]
Data_Frame[["Training"]]
## [1] "Strength" "Stamina" "Other"
#### We often use this####
Data_Frame$Training
## [1] "Strength" "Stamina" "Other"
# add a column
Data_Frame$heartrate=Data_Frame[,2]/Data_Frame[,3]
Data_Frame
# append a column
Data_Frame2=data.frame(volume=sample(3000:5000,size = 3,replace = T))
new_dataframe=cbind(Data_Frame,Data_Frame2)
new_dataframe
#### append rows
new_dataframe2=rbind(Data_Frame,Data_Frame)
new_dataframe2
Data_Frame_New <- Data_Frame[-c(1,3), -c(1,2)]
# Print the new data frame
Data_Frame_New
\(ncols()\)
\(nrows()\)
\(dim()\) \(colnames()\)
Write matrix \(A=5\times 5\) and \(B=5\times 5\) matrix. Carry out both scalar multiplication and matrix multiplication, assign the two values \(scalar\_a\) and second matrix \(mat\_a\). Find out if the object in the third row and second column in\(mat\_a\) is less than object in fourth row and second column \(scalar\_a\).
If you have come all this way ** click here**
For those who got the answers** click here**