There are number of functions for generating random variables from kind of the standard probability distributions.
rnorm: for generating random Normal variates with a given mean and std rpois: for generating random Poisson variates with a given rate(value of lambda)
str(rnorm)## function (n, mean = 0, sd = 1)
rnorm(10) # to generate random Normal variates with mean zero and std one (default)## [1] -0.6932141 -0.8622156 1.3121772 1.2695060 -0.4875785 -0.9104652
## [7] -1.0864099 2.0544296 -0.7017115 0.8121763
rnorm(10,20,2) # to generate random Normal variates with mean 20 and std 2 (explicitly)## [1] 21.61330 22.03902 20.76473 19.19569 19.71153 17.42064 23.41415 24.19128
## [9] 19.34963 19.64094
str(rpois)## function (n, lambda)
x<-rpois(10,2)
str(rbinom)## function (n, size, prob)
# we can generate a single random variable that represents the number of heads in 100 flips of our unfair coin using
rbinom(1, size = 100, prob = 0.7) # prob of success .7## [1] 68
# if we want to see all of the 0s and 1s, we can request 100 observations,each of size 1, with success probability of 0.7
rbinom(100,size=1,prob=.7)## [1] 0 1 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0
## [38] 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 0 0 0 1 1 1 0 0 1 1 0 0 0 1 1 0 1 0 0 1 1 1
## [75] 0 0 0 1 1 0 0 0 1 1 0 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1
set.seed() allows for us to reproduce random numbers that we generate. The seed can be any integer we want.
set.seed(1)
rnorm(10)## [1] -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078 -0.8204684
## [7] 0.4874291 0.7383247 0.5757814 -0.3053884
rnorm(10)## [1] 1.51178117 0.38984324 -0.62124058 -2.21469989 1.12493092 -0.04493361
## [7] -0.01619026 0.94383621 0.82122120 0.59390132
set.seed(1)
rnorm(10)## [1] -0.6264538 0.1836433 -0.8356286 1.5952808 0.3295078 -0.8204684
## [7] 0.4874291 0.7383247 0.5757814 -0.3053884
for each probability distribution there are 4 functions available
that starts with p for cumulative distribution q for quantile function d for density
str(ppois) # lower.tail logical; if TRUE (default), probabilities are P[X ≤ x](for example P[X ≤ 2]) otherwise, P[X > x]## function (q, lambda, lower.tail = TRUE, log.p = FALSE)
# to know the probability that a Poisson random variable is less than or equal to 2 with rate 2
ppois(2,2)## [1] 0.6766764
ppois(4,2)## [1] 0.947347
ppois(6,2)## [1] 0.9954662
simulate random numbers from linear model y = b0+b1*x+e; here b0 = .5, b1 = 2, x follows standard normal distribution with mean 0 and std 1, random noise(epsilon) follows standard normal distribution with mean 0 and std 2
set.seed(20)
x<-rnorm(100,0,1)
e<-rnorm(100,0,2)
y<-.5+2*x+e
summary(y)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -6.4084 -1.5402 0.6789 0.6893 2.9303 6.5052
Random Sampling: sample() is used to draw randomly from a specific set of objects that we specify
str(sample) # prob: a vector of probability weights for obtaining the elements of the vector being sampled.## function (x, size, replace = FALSE, prob = NULL)
sample(1:10,4)## [1] 5 10 3 6
sample(1:10,4)## [1] 2 3 5 6
# suppose we want to simulate 100 flips of an unfair two-sided coin. This particular coin has a 0.3 probability of landing 'tails' and a 0.7 probability of landing 'heads'.Let the value 0 represent tails and the value 1 represent heads. Use sample() to draw a sample of size 100 from the vector c(0,1), with replacement.
flips<-sample(c(0,1),100,replace=TRUE,prob=c(.3,.7))
flips## [1] 1 1 0 1 1 1 1 1 1 1 0 0 1 0 1 1 1 1 1 1 0 0 1 1 1 0 0 0 0 1 1 1 0 1 0 1 1
## [38] 0 1 0 0 1 1 1 1 1 1 1 1 1 0 0 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0
## [75] 1 1 1 0 1 0 1 1 1 0 0 1 1 0 0 1 0 1 1 1 1 1 0 1 1 1
table(flips) # to find no of heads and tails## flips
## 0 1
## 28 72
sum(flips) # to find no of heads## [1] 72
sample(1:10) # permutation## [1] 9 8 6 4 7 2 5 1 3 10
sample(1:10,replace=TRUE)## [1] 8 8 6 7 1 3 1 2 7 2
If you are curious as to how much space the dataset is occupying in memory, you can use object.size() function.
x<-matrix(1:6,2,3,byrow = TRUE)
object.size(x)## 248 bytes
names() will return a character vector of column (i.e. variable) names.
df<-data.frame(a=1:3,b=c(0,0,0))
df## a b
## 1 1 0
## 2 2 0
## 3 3 0
names(df)## [1] "a" "b"