In this material you will have the opportunity to learn and test your self with: R basics, data structures, vectors, matrices and dataframe. As well as the fundamentals of R programming: Control structures and writing functions.
You can start by typing for example if you want info about “statistics” use:
# ??statistics
or
# help.search(statistics)
Otherwise you can use the Help menu in R or Rstudio.
10-8
## [1] 2
3+5
## [1] 8
4+6*2
## [1] 16
1-3*(4-8)/5
## [1] 3.4
Assign values to a parameter: Note: remove # when it is before the object to better understand the output.
a<-7
a
## [1] 7
b=9
b
## [1] 9
A<-a
A
## [1] 7
# g
There are some restrictions on object naming: • May not contain “weird” symbols such as, +, -, #. • A period (.) And a hyphen (_) are allowed, as well as names beginning with a period (.) • A name cannot start with a number
Note: remove # sign to execute the functions
math.1<-6
# 2.mathem<-12
# math-2
mathem<-3
# ania mathem
mathem_3=5
The scan () function makes it possible to insert some data in R: Note: to activate the function remove the #
# mathematics<-scan()
A vector in R can be declared in several ways. You can use: 1) <- or 2) =, and immediately after it c (). Try writing:
a=c(3,4,5,6,3)
a
## [1] 3 4 5 6 3
ba<-c(3,7,5,6,13)
t(a)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 3 4 5 6 3
t(ba)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 3 7 5 6 13
t(t(a))
## [,1]
## [1,] 3
## [2,] 4
## [3,] 5
## [4,] 6
## [5,] 3
a+3
## [1] 6 7 8 9 6
a-1
## [1] 2 3 4 5 2
a/5
## [1] 0.6 0.8 1.0 1.2 0.6
a*8
## [1] 24 32 40 48 24
Arithmetic calculations with vectors:
a+ba
## [1] 6 11 10 12 16
a-ba
## [1] 0 -3 0 0 -10
a/ba
## [1] 1.0000000 0.5714286 1.0000000 1.0000000 0.2307692
a*ba
## [1] 9 28 25 36 39
In R, we can find the minimum or maximum element of a vector using the commands min () or max ():
min(a)
## [1] 3
max(ba)
## [1] 13
min(a,ba)
## [1] 3
max(a,ba)
## [1] 13
f<-c(min(a),min(ba),min(math.1))
f
## [1] 3 3 6
# How can you do it for max()
Parallel minima and maxima: pmin() and pmax()
w<-c(2,4,3,2,1)
u<-c(8,5,3,2,5)
s<-c(0,3,6,81,23)
pmin(w,u,s)
## [1] 0 3 3 2 1
pmax(w,u,s)
## [1] 8 5 6 81 23
# Try to add an element to vectors and re-call the functions, what you observe?
sort () and order () The sort () function makes it possible to sort the elements of a numeric vector from the smallest value to the largest value:
sort(a)
## [1] 3 3 4 5 6
sort(ba)
## [1] 3 5 6 7 13
The order () function makes it possible to sort the values of a vector but careful , displays their positions:
order(a)
## [1] 1 5 2 3 4
order(ba)
## [1] 1 3 4 2 5
What happens in R if the vectors have different lengths? R will almost always give you an output BUT be careful to understand the output if it has a logic.
d=c(1,2,3,4,2,3,1,0)
e<-c(3,4,5,6,3)
d+e
## Warning in d + e: longer object length is not a multiple of shorter object
## length
## [1] 4 6 8 10 5 6 5 5
d-e
## Warning in d - e: longer object length is not a multiple of shorter object
## length
## [1] -2 -2 -2 -2 -1 0 -3 -5
d/e
## Warning in d/e: longer object length is not a multiple of shorter object length
## [1] 0.3333333 0.5000000 0.6000000 0.6666667 0.6666667 1.0000000 0.2500000
## [8] 0.0000000
e*d
## Warning in e * d: longer object length is not a multiple of shorter object
## length
## [1] 3 8 15 24 6 9 4 0
length(a)
## [1] 5
sum(a)
## [1] 21
mean(a)
## [1] 4.2
mean(a,trim=1/7)
## [1] 4.2
var(a)
## [1] 1.7
sd(a)
## [1] 1.30384
range(a)
## [1] 3 6
quantile(a)
## 0% 25% 50% 75% 100%
## 3 3 4 5 6
IQR<-c(quantile(a,0.25),quantile(a,0.75))
fivenum(a)
## [1] 3 3 4 5 6
library(moments)
skewness(a)
## [1] 0.3631735
kurtosis(a)
## [1] 1.628028
EXERCISE: Write in R some commands that make it possible to calculate the dispersion of a vector with discrete data.
x=c(5,7,4,1,2,8,9,6,3,2,9,NA,4,5,3,NA)
mean(x)
## [1] NA
mean(x,na.rm=TRUE)
## [1] 4.857143
Can we give other values to the vector elements? Of course yes!
x[3]=0
x[c(1,5)]=0
# elements in sequence from position 1 to 4 ?
# vector x by removing element in position 5
x2<-x[-5]
# remove elements in position 3,7,8
# x[c(-,-,-)]
# a shorter vector from x
x3<-x[3:8]
index<-c(3,6,7)
x[index]=NA
sum(a<5)
## [1] 3
sum(a[a<5])
## [1] 10
sum(a*(a<5))
## [1] 10
#you have same results. Why? Let us go step by step.
a<5
## [1] TRUE TRUE FALSE FALSE TRUE
1*(a<5)
## [1] 1 1 0 0 1
a*(a<5)
## [1] 3 4 0 0 3
sum(a*(a<5) )
## [1] 10
Find the mean of all elements in vector a smaller than 2?
mean(a<2)
## [1] 0
What do the following commands display in R editor? Describe in words the result.
c=sum(a,a<5)
c
## [1] 24
d=sum(a,a<2)
d
## [1] 21
sum(b>3|b<6)
## [1] 1
sum(b>3 & b<6)
## [1] 0
This operator “==” tests all elements of the vector if they are equal to the declared value. R responds with a vector with TRUE or FALSE values in the controlled positions.
x==3
## [1] FALSE FALSE NA FALSE FALSE NA NA FALSE TRUE FALSE FALSE NA
## [13] FALSE FALSE TRUE NA
x == NA
## [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
which(x==6)
## [1] 8
# which(x==max(x))
Match function - order is important. Where do the values in the second vector appear in the first vector?’
d=c(1,2,3,4,2,3,1,0)
e<-c(3,4,5,6,3)
match(d,e)
## [1] NA NA 1 2 NA 1 NA NA
match(e,d)
## [1] 3 4 NA NA 3
# note that it shows the first position where the value is observed
The function of finding the absolute value of a number in R is abs ():
yy=c(-1,2,-3,1,-4,5,3,55)
abs(yy)
## [1] 1 2 3 1 4 5 3 55
# combining functions!
which(abs(yy-2)==min(abs(yy-2)))
## [1] 2
all(yy>0) # are all elements in yy greater than 0?
## [1] FALSE
any(yy>0) # is any element in yy greater than 0?
## [1] TRUE
If we have two given values (g <f or f <g) and we want to create a vector of values from g to f or from f to g, moving by 1 unit then we use the symbol: The function seq () gives a range of values according to some conditions e.g.
e1=19
e2=1
seq(e2,e1)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
seq(e1,e2,-2)
## [1] 19 17 15 13 11 9 7 5 3 1
seq(2,13,3)
## [1] 2 5 8 11
seq(13,2)#? correct it
## [1] 13 12 11 10 9 8 7 6 5 4 3 2
i=1:15
i
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
j=i:(i+1)
## Warning in i:(i + 1): numerical expression has 15 elements: only the first used
## Warning in i:(i + 1): numerical expression has 15 elements: only the first used
j
## [1] 1 2
Construct a sequence of an arithmetic progression with d=4, 20 values , starting from 0. #### TABLE -function
table(yy)
## yy
## -4 -3 -1 1 2 3 5 55
## 1 1 1 1 1 1 1 1
table(yy)/length(yy)
## yy
## -4 -3 -1 1 2 3 5 55
## 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125
# what if we want it in %?
var<-c("yes","no")
v2<-c("y","n","n","n") # table it
table(v2)
## v2
## n y
## 3 1
stud<-c("ana","eralda","daniel","datascience")
length(stud)
## [1] 4
nchar(stud)
## [1] 3 6 6 11
sum(nchar(stud))
## [1] 26
paste(var,stud,sep="")
## [1] "yesana" "noeralda" "yesdaniel" "nodatascience"
Extracting parts of strings
phrase<-"the student of data science course are fantastic"
q<-character(sum(nchar(phrase)))
for (i in 1:sum(nchar(phrase))) q[i]<- substr(phrase,1,i)
q
## [1] "t"
## [2] "th"
## [3] "the"
## [4] "the "
## [5] "the s"
## [6] "the st"
## [7] "the stu"
## [8] "the stud"
## [9] "the stude"
## [10] "the studen"
## [11] "the student"
## [12] "the student "
## [13] "the student o"
## [14] "the student of"
## [15] "the student of "
## [16] "the student of d"
## [17] "the student of da"
## [18] "the student of dat"
## [19] "the student of data"
## [20] "the student of data "
## [21] "the student of data s"
## [22] "the student of data sc"
## [23] "the student of data sci"
## [24] "the student of data scie"
## [25] "the student of data scien"
## [26] "the student of data scienc"
## [27] "the student of data science"
## [28] "the student of data science "
## [29] "the student of data science c"
## [30] "the student of data science co"
## [31] "the student of data science cou"
## [32] "the student of data science cour"
## [33] "the student of data science cours"
## [34] "the student of data science course"
## [35] "the student of data science course "
## [36] "the student of data science course a"
## [37] "the student of data science course ar"
## [38] "the student of data science course are"
## [39] "the student of data science course are "
## [40] "the student of data science course are f"
## [41] "the student of data science course are fa"
## [42] "the student of data science course are fan"
## [43] "the student of data science course are fant"
## [44] "the student of data science course are fanta"
## [45] "the student of data science course are fantas"
## [46] "the student of data science course are fantast"
## [47] "the student of data science course are fantasti"
## [48] "the student of data science course are fantastic"
strsplit(phrase,split=character(0))
## [[1]]
## [1] "t" "h" "e" " " "s" "t" "u" "d" "e" "n" "t" " " "o" "f" " " "d" "a" "t" "a"
## [20] " " "s" "c" "i" "e" "n" "c" "e" " " "c" "o" "u" "r" "s" "e" " " "a" "r" "e"
## [39] " " "f" "a" "n" "t" "a" "s" "t" "i" "c"
Qualitative data are often used to classify data at different levels or factors (modalities). To create a factor (modality) in R is simple with the command factor or as.factor. See the following example:
factor(v2)
## [1] y n n n
## Levels: n y
table(factor(v2))
##
## n y
## 3 1
is.numeric(x)
## [1] TRUE
v3<-"student"
class(v3)
## [1] "character"
is.character(a)
## [1] FALSE
v4<-"system"
paste(v3,v4)
## [1] "student system"
paste(v3,v4,sep="")
## [1] "studentsystem"
substr(v3,2,5)
## [1] "tude"
We can use the function seq () to count deciles:
decile<-seq(0,1,0.1)
Decile=quantile(yy,decile)
Decile
## 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
## -4.0 -3.3 -2.2 -0.8 0.6 1.5 2.2 2.9 4.2 20.0 55.0
• The rep () function repeats a parameter, value, value vector several times, for example:
rep("A",10)
## [1] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"
rep(x,each=3)
## [1] 0 0 0 7 7 7 NA NA NA 1 1 1 0 0 0 NA NA NA NA NA NA 6 6 6 3
## [26] 3 3 2 2 2 9 9 9 NA NA NA 4 4 4 5 5 5 3 3 3 NA NA NA
rep(2:5,c(4,1,4,2))# repeat each element with the given frequency
## [1] 2 2 2 2 3 4 4 4 4 5 5
• The sort () function sorts the values of the numeric vector, for example:
sort(x)
## [1] 0 0 1 2 3 3 4 5 6 7 9
(sort(x))^(-1)
## [1] Inf Inf 1.0000000 0.5000000 0.3333333 0.3333333 0.2500000
## [8] 0.2000000 0.1666667 0.1428571 0.1111111
# let's combine functions!
rev(sort(x))
## [1] 9 7 6 5 4 3 3 2 1 0 0
rev(sort(x))[1:3]
## [1] 9 7 6
sum(rev(sort(x))[1:3])
## [1] 22
The function that allows a rounding to R is: round ().
round(1.34)
## [1] 1
round(1.345,2)
## [1] 1.34
sample(1:50,6)
## [1] 42 47 36 35 9 25
sample(1:50,6)
## [1] 22 14 50 2 45 27
sample(1:50,6,replace=TRUE)
## [1] 40 8 30 37 25 29
sample(c("H","T"),15,replace=TRUE)# coin toss
## [1] "T" "T" "H" "H" "T" "T" "H" "H" "T" "T" "T" "T" "T" "H" "T"
sample(c("H","T"),15,replace=TRUE,prob=c(0.7,0.3))
## [1] "H" "H" "H" "H" "T" "H" "T" "T" "T" "T" "T" "H" "H" "H" "T"
Simulate the roll of a dice 50 times.