This is a cheatsheet to give basic documentation for the R functions encountered in introductory statistics.
It’s sometimes called “combine” or “collect” or “concatenate” or just plain “cram together”. It just takes a list of items or smaller vectors and produces one larger vector. Here are some examples.
# Create x
x <- c(1,4,5,6,9)
# Display x
x
## [1] 1 4 5 6 9
# Create y
y <- c("a","b","c")
# Display y
y
## [1] "a" "b" "c"
# Create a and b, then combine into c
a <- c(1,2,3)
b <- c(4,5,6)
c <- c(a,b)
# Display c
c
## [1] 1 2 3 4 5 6
Note that if you use a command in R which creates something but does not use it in an assignment statement, the thing you create is displayed, but is not available for further use. Here’s an example.
c(12,23,45)
## [1] 12 23 45
But if you do the creation on the right hand side of an assignment statement, it is not automatically displayed.
x <- c(12,23,45)
To display something, just type its name.
x
## [1] 12 23 45
To create a vector of integers starting with Integer1 and ending with Integer2, just place a : between the two. Here are some examples.
x <- 1:100
x
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## [18] 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
## [35] 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51
## [52] 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
## [69] 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
## [86] 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
x <- -10:10
x
## [1] -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
## [18] 7 8 9 10
It has two arguments, the thing to be repeated and the number of repetitions. Here are some examples.
x <- rep(0,5)
x
## [1] 0 0 0 0 0
x <- rep(c(1,2,3),5)
x
## [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
This just tells you how many elements there are in the vector.
x <- -23:45
length(x)
## [1] 69
This adds up all the numbers in the vector.
x <- 1:100
sum(x)
## [1] 5050
x <- 1:100
min(x)
## [1] 1
max(x)
## [1] 100
This gives the usual arithmetic mean, the sum of values divided by the length of the vector.
x <- 1:100
mean(x)
## [1] 50.5
This gives the value which has half of the numbers in the vector on either side.
x <- 1:100
median(x)
## [1] 50.5
The mean and the median are measures of central location. They are numbers which describe where the numbers in the vector are located.
This has two arguments. The first is a vector of numbers. The second is a fraction between 0 and 1. The function returns a number which has that fraction of the numbers in the vector to the left of itself. This is usually referred to as a percentile.
x <- 1:100
quantile(x,.75)
## 75%
## 75.25
Note that there are many different algorithms for determining quantiles, which give slightly different results. You should ignore these differences. You can type help(quantile) into R to read the details.
This gives the standard deviation of the numbers in the vector. It is a measure of the variation in the numbers in the vector.
x <- 1:100
sd(x)
## [1] 29.01149
x <- 1:200
sd(x)
## [1] 57.87918
Note that the numers in the second version of x are more spread out than those in the first version. Comparing the two values of the standard deviaition confirm this.
The interquartile range is the difference between the 75th percentile and the 25th percentile of the numbers in the vector. It is an alternative measure of variation.
x <- 1:100
IQR(x)
## [1] 49.5
x <- 1:200
IQR(x)
## [1] 99.5
Note that the IQR values indicate that the numbers in the second version of x are more spread out than those in the first version.
The summary function produces some basic statistical measures.
x <- 1:100
summary(x)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 25.75 50.50 50.50 75.25 100.00