In this material you will have the opportunity to learn and test your self with: R basics, data structures, vectors, matrices and dataframe. As well as the fundamentals of R programming: Control structures and writing functions.

How to use Help?

You can start by typing for example if you want info about “statistics” use:

# ??statistics 

or

# help.search(statistics)

Otherwise you can use the Help menu in R or Rstudio.

Operators

Arithmetic and Logic

Arithmetic

+ addition
- subtraction
* multiplication
/ division

Logic:

< less than
> greater than
<= less than or equal
>= greater than or equal
== equivalent
!= different

Example

10-8
## [1] 2
3+5
## [1] 8
4+6*2
## [1] 16
1-3*(4-8)/5
## [1] 3.4

Assign values to a parameter: Note: remove # when it is before the object to better understand the output.

a<-7
a
## [1] 7
b=9
b
## [1] 9
A<-a
A
## [1] 7
# g

NOTE!

There are some restrictions on object naming: • May not contain “weird” symbols such as, +, -, #. • A period (.) And a hyphen (_) are allowed, as well as names beginning with a period (.) • A name cannot start with a number

Note: remove # sign to execute the functions

 math.1<-6
# 2.mathem<-12
# math-2
 mathem<-3
 # ania mathem
 mathem_3=5

scan()

The scan () function makes it possible to insert some data in R: Note: to activate the function remove the #

# mathematics<-scan() 

VECTORS

A vector in R can be declared in several ways. You can use: 1) <- or 2) =, and immediately after it c (). Try writing:

a=c(3,4,5,6,3)
a
## [1] 3 4 5 6 3
ba<-c(3,7,5,6,13)   
t(a)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    3    4    5    6    3
t(ba)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    3    7    5    6   13
t(t(a))
##      [,1]
## [1,]    3
## [2,]    4
## [3,]    5
## [4,]    6
## [5,]    3
a+3
## [1] 6 7 8 9 6
a-1
## [1] 2 3 4 5 2
a/5
## [1] 0.6 0.8 1.0 1.2 0.6
a*8
## [1] 24 32 40 48 24

Arithmetic calculations with vectors:

a+ba
## [1]  6 11 10 12 16
a-ba
## [1]   0  -3   0   0 -10
a/ba
## [1] 1.0000000 0.5714286 1.0000000 1.0000000 0.2307692
a*ba
## [1]  9 28 25 36 39

Minimum and Maximum

In R, we can find the minimum or maximum element of a vector using the commands min () or max ():

min(a)
## [1] 3
max(ba)
## [1] 13
min(a,ba)
## [1] 3
max(a,ba)
## [1] 13
f<-c(min(a),min(ba),min(math.1))
f
## [1] 3 3 6
# How can you do it for max()

Parallel minima and maxima: pmin() and pmax()

w<-c(2,4,3,2,1)
u<-c(8,5,3,2,5)
s<-c(0,3,6,81,23)
pmin(w,u,s)
## [1] 0 3 3 2 1
pmax(w,u,s)
## [1]  8  5  6 81 23
# Try to add an element to vectors and re-call the functions, what you observe?

Sort and Order

sort () and order () The sort () function makes it possible to sort the elements of a numeric vector from the smallest value to the largest value:

sort(a)
## [1] 3 3 4 5 6
sort(ba)
## [1]  3  5  6  7 13

The order () function makes it possible to sort the values of a vector but careful , displays their positions:

order(a)
## [1] 1 5 2 3 4
order(ba)
## [1] 1 3 4 2 5

Different length of vectors

What happens in R if the vectors have different lengths? R will almost always give you an output BUT be careful to understand the output if it has a logic.

d=c(1,2,3,4,2,3,1,0)
e<-c(3,4,5,6,3)
d+e
## Warning in d + e: longer object length is not a multiple of shorter object
## length
## [1]  4  6  8 10  5  6  5  5
d-e
## Warning in d - e: longer object length is not a multiple of shorter object
## length
## [1] -2 -2 -2 -2 -1  0 -3 -5
d/e
## Warning in d/e: longer object length is not a multiple of shorter object length
## [1] 0.3333333 0.5000000 0.6000000 0.6666667 0.6666667 1.0000000 0.2500000
## [8] 0.0000000
e*d
## Warning in e * d: longer object length is not a multiple of shorter object
## length
## [1]  3  8 15 24  6  9  4  0

DESCRIPTIVE STATISTICS

length(a)
## [1] 5
sum(a)
## [1] 21
mean(a)
## [1] 4.2
mean(a,trim=1/7)
## [1] 4.2
var(a)
## [1] 1.7
sd(a)
## [1] 1.30384
range(a)
## [1] 3 6
quantile(a)
##   0%  25%  50%  75% 100% 
##    3    3    4    5    6
IQR<-c(quantile(a,0.25),quantile(a,0.75))
fivenum(a)
## [1] 3 3 4 5 6
library(moments)
skewness(a)
## [1] 0.3631735
kurtosis(a)
## [1] 1.628028

EXERCISE: Write in R some commands that make it possible to calculate the dispersion of a vector with discrete data.

Missing data
x=c(5,7,4,1,2,8,9,6,3,2,9,NA,4,5,3,NA)
mean(x)
## [1] NA
mean(x,na.rm=TRUE)
## [1] 4.857143

Can we give other values to the vector elements? Of course yes!

x[3]=0
x[c(1,5)]=0
# elements in sequence from position 1 to 4 ?
# vector x by removing element in position 5
x2<-x[-5]
# remove elements in position 3,7,8
# x[c(-,-,-)]
# a shorter vector from x
x3<-x[3:8]
index<-c(3,6,7)
x[index]=NA
sum(a<5)
## [1] 3
sum(a[a<5])
## [1] 10
sum(a*(a<5))
## [1] 10
#you have same results. Why? Let us go step by step.
a<5
## [1]  TRUE  TRUE FALSE FALSE  TRUE
1*(a<5)
## [1] 1 1 0 0 1
a*(a<5)  
## [1] 3 4 0 0 3
sum(a*(a<5) )
## [1] 10

EXERCISE

Find the mean of all elements in vector a smaller than 2?

mean(a<2)
## [1] 0

EXERCISE

What do the following commands display in R editor? Describe in words the result.

c=sum(a,a<5) 
c
## [1] 24
d=sum(a,a<2)
d
## [1] 21
sum(b>3|b<6)
## [1] 1
sum(b>3 & b<6)
## [1] 0

Operator = =

This operator “==” tests all elements of the vector if they are equal to the declared value. R responds with a vector with TRUE or FALSE values in the controlled positions.

x==3
##  [1] FALSE FALSE    NA FALSE FALSE    NA    NA FALSE  TRUE FALSE FALSE    NA
## [13] FALSE FALSE  TRUE    NA
x == NA
##  [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
which(x==6)
## [1] 8
# which(x==max(x))

Match

Match function - order is important. Where do the values in the second vector appear in the first vector?’

d=c(1,2,3,4,2,3,1,0)
e<-c(3,4,5,6,3)
match(d,e)
## [1] NA NA  1  2 NA  1 NA NA
match(e,d)
## [1]  3  4 NA NA  3
# note that it shows the first position where the value is observed

Absolute value

The function of finding the absolute value of a number in R is abs ():

yy=c(-1,2,-3,1,-4,5,3,55)
abs(yy)
## [1]  1  2  3  1  4  5  3 55
# combining functions!
which(abs(yy-2)==min(abs(yy-2)))
## [1] 2
all(yy>0) # are all elements in yy greater than 0?
## [1] FALSE
any(yy>0) # is any element in yy greater than 0?
## [1] TRUE

SEQUENCE

If we have two given values (g <f or f <g) and we want to create a vector of values from g to f or from f to g, moving by 1 unit then we use the symbol: The function seq () gives a range of values according to some conditions e.g.

e1=19
e2=1
seq(e2,e1)
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19
seq(e1,e2,-2)
##  [1] 19 17 15 13 11  9  7  5  3  1
seq(2,13,3)
## [1]  2  5  8 11
seq(13,2)#? correct it
##  [1] 13 12 11 10  9  8  7  6  5  4  3  2
i=1:15
i
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
j=i:(i+1)
## Warning in i:(i + 1): numerical expression has 15 elements: only the first used

## Warning in i:(i + 1): numerical expression has 15 elements: only the first used
j
## [1] 1 2

EXERCISE

Construct a sequence of an arithmetic progression with d=4, 20 values , starting from 0. #### TABLE -function

table(yy)
## yy
## -4 -3 -1  1  2  3  5 55 
##  1  1  1  1  1  1  1  1
table(yy)/length(yy)
## yy
##    -4    -3    -1     1     2     3     5    55 
## 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125
# what if we want it in %?

Categorical variable

var<-c("yes","no")
v2<-c("y","n","n","n") # table it
table(v2)
## v2
## n y 
## 3 1
stud<-c("ana","eralda","daniel","datascience")
length(stud)
## [1] 4
nchar(stud)
## [1]  3  6  6 11
sum(nchar(stud))
## [1] 26
paste(var,stud,sep="")
## [1] "yesana"        "noeralda"      "yesdaniel"     "nodatascience"

Extract

Extracting parts of strings

phrase<-"the student of data science course are fantastic"
q<-character(sum(nchar(phrase)))
for (i in 1:sum(nchar(phrase))) q[i]<- substr(phrase,1,i)
q
##  [1] "t"                                               
##  [2] "th"                                              
##  [3] "the"                                             
##  [4] "the "                                            
##  [5] "the s"                                           
##  [6] "the st"                                          
##  [7] "the stu"                                         
##  [8] "the stud"                                        
##  [9] "the stude"                                       
## [10] "the studen"                                      
## [11] "the student"                                     
## [12] "the student "                                    
## [13] "the student o"                                   
## [14] "the student of"                                  
## [15] "the student of "                                 
## [16] "the student of d"                                
## [17] "the student of da"                               
## [18] "the student of dat"                              
## [19] "the student of data"                             
## [20] "the student of data "                            
## [21] "the student of data s"                           
## [22] "the student of data sc"                          
## [23] "the student of data sci"                         
## [24] "the student of data scie"                        
## [25] "the student of data scien"                       
## [26] "the student of data scienc"                      
## [27] "the student of data science"                     
## [28] "the student of data science "                    
## [29] "the student of data science c"                   
## [30] "the student of data science co"                  
## [31] "the student of data science cou"                 
## [32] "the student of data science cour"                
## [33] "the student of data science cours"               
## [34] "the student of data science course"              
## [35] "the student of data science course "             
## [36] "the student of data science course a"            
## [37] "the student of data science course ar"           
## [38] "the student of data science course are"          
## [39] "the student of data science course are "         
## [40] "the student of data science course are f"        
## [41] "the student of data science course are fa"       
## [42] "the student of data science course are fan"      
## [43] "the student of data science course are fant"     
## [44] "the student of data science course are fanta"    
## [45] "the student of data science course are fantas"   
## [46] "the student of data science course are fantast"  
## [47] "the student of data science course are fantasti" 
## [48] "the student of data science course are fantastic"
strsplit(phrase,split=character(0))
## [[1]]
##  [1] "t" "h" "e" " " "s" "t" "u" "d" "e" "n" "t" " " "o" "f" " " "d" "a" "t" "a"
## [20] " " "s" "c" "i" "e" "n" "c" "e" " " "c" "o" "u" "r" "s" "e" " " "a" "r" "e"
## [39] " " "f" "a" "n" "t" "a" "s" "t" "i" "c"

Factors

Qualitative data are often used to classify data at different levels or factors (modalities). To create a factor (modality) in R is simple with the command factor or as.factor. See the following example:

factor(v2)
## [1] y n n n
## Levels: n y
table(factor(v2))
## 
## n y 
## 3 1
is.numeric(x) 
## [1] TRUE
v3<-"student"
class(v3) 
## [1] "character"
is.character(a) 
## [1] FALSE
v4<-"system"
paste(v3,v4)
## [1] "student system"
paste(v3,v4,sep="")
## [1] "studentsystem"
substr(v3,2,5)
## [1] "tude"

Sequence

We can use the function seq () to count deciles:

decile<-seq(0,1,0.1)
Decile=quantile(yy,decile)
Decile
##   0%  10%  20%  30%  40%  50%  60%  70%  80%  90% 100% 
## -4.0 -3.3 -2.2 -0.8  0.6  1.5  2.2  2.9  4.2 20.0 55.0

Repeat

• The rep () function repeats a parameter, value, value vector several times, for example:

rep("A",10)
##  [1] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A"
rep(x,each=3)
##  [1]  0  0  0  7  7  7 NA NA NA  1  1  1  0  0  0 NA NA NA NA NA NA  6  6  6  3
## [26]  3  3  2  2  2  9  9  9 NA NA NA  4  4  4  5  5  5  3  3  3 NA NA NA
rep(2:5,c(4,1,4,2))# repeat each element with the given frequency
##  [1] 2 2 2 2 3 4 4 4 4 5 5

Sort

• The sort () function sorts the values of the numeric vector, for example:

sort(x)
##  [1] 0 0 1 2 3 3 4 5 6 7 9
(sort(x))^(-1) 
##  [1]       Inf       Inf 1.0000000 0.5000000 0.3333333 0.3333333 0.2500000
##  [8] 0.2000000 0.1666667 0.1428571 0.1111111
# let's combine functions!
rev(sort(x))
##  [1] 9 7 6 5 4 3 3 2 1 0 0
rev(sort(x))[1:3]
## [1] 9 7 6
sum(rev(sort(x))[1:3])
## [1] 22

Round

The function that allows a rounding to R is: round ().

round(1.34)
## [1] 1
round(1.345,2)
## [1] 1.34

Sampling in R

sample(1:50,6) 
## [1] 42 47 36 35  9 25
sample(1:50,6)
## [1] 22 14 50  2 45 27
sample(1:50,6,replace=TRUE)
## [1] 40  8 30 37 25 29
sample(c("H","T"),15,replace=TRUE)# coin toss
##  [1] "T" "T" "H" "H" "T" "T" "H" "H" "T" "T" "T" "T" "T" "H" "T"
sample(c("H","T"),15,replace=TRUE,prob=c(0.7,0.3))
##  [1] "H" "H" "H" "H" "T" "H" "T" "T" "T" "T" "T" "H" "H" "H" "T"

EXERCISE 0

Simulate the roll of a dice 50 times.

EXERCISE 1

  1. For the following data, calculate: arithmetic mean, mean, maximum value, minimum value, amplitude, mean square deviation, quantiles. 45, 43, 46, 48, 51, 46, 50, 47, 46, 45,54,76,13,89,8,

EXERCISE 2

  1. The following values are given: 653 656 659 662 664 668 671 674, 564, 783, 943, 350, 567, 876 Write these numbers in R. Use the diff () function in the data. What do you get? Try: diff(diff())?

EXERCISE 3

  1. Suppose you work for 10 days and your working hours are below: 17 16 20 24 22 15 21 15 17 22
  1. Construct a vector with the data in R.
  2. Find the maximum number of working hours.
  3. Which day has been the most tiring, so that has lasted the longest for you?
  4. On average, how many working hours have you had these 10 days? On which day did you work the least?
  5. Value 24 was incorrect. It should have been 18. How would you fix this? Once you have adjusted find the new average.
  6. How many times was your working time 20 hours or more? To answer this question try: sum (vector> = 20) (if you called the working hours vector “vector”) What do you get? What percentage of the days did you work less than 17 hours?