If you worked with random variables before, in classical probability theory courses that do not require any statistical software, you might have used special tables to calculate probabilities for standard normal variables or binomial ones. Actually, in real research people rarely do that because it is possible to compute anything in R, Python, STATA or using a scientific calculator.

Let’s start from generation of random variables in R. To be more precise, all random values generated by computers are not purely random, they are pseudorandom since algorithms used for generation are deterministic, so numbers obtained are predictable.

Now we will create a sample of 1000 observations taken from a standard normal distribution:

```
# rnorm - random normal
# mean is the expected value (mu)
# sd is the standard deviation (sigma)
X <- rnorm(1000, mean = 0, sd = 1)
head(X)
```

`## [1] -0.83572623 0.01572484 1.17525766 0.70531399 -0.46586740 0.53242011`

As `X`

is a random sample, its values will be different for all people running the code above. Moreover, they will be different if we repeat the operation:

```
X <- rnorm(1000, mean = 0, sd = 1)
head(X)
```

`## [1] -0.49180685 -0.72384646 1.52833831 0.15106414 -0.73146989 0.04735057`

Sometimes it is not important, but if our research includes computer simulations that have to be reproducible, we should make sure that all users will get the same results. To do so, we need the `set.seed()`

function that specifies the starting point of a random algorithm making the output “frozen”:

```
set.seed(1234)
X <- rnorm(1000, mean = 0, sd = 1)
head(X)
```

`## [1] -1.2070657 0.2774292 1.0844412 -2.3456977 0.4291247 0.5060559`

And repeat with the seed set:

```
set.seed(1234)
X <- rnorm(1000, mean = 0, sd = 1)
head(X) # the same!
```

`## [1] -1.2070657 0.2774292 1.0844412 -2.3456977 0.4291247 0.5060559`

Let’s plot a histogram and look at the shape of `X`

distribution:

`hist(X, col="tomato") # bell-shaped, normal`

The `rnorm()`

is one of the functions used to handle normal variables. If we ask R for help, we will see a detailed documentation for all functions:

`help(rnorm)`

The most useful are `pnorm()`

and `qnorm()`

. The former calculates the probabilities like \(P(X<a)\) for \(X \sim N(\mu, \sigma^2)\) and the latter calculates the quantiles of a certain level \(p\), so finds \(x\) such that \(P(X<x)=p\).

Consider the following example. We know that \(X \sim N(\mu = 5, \sigma^2 = 9)\) or \(X \sim N(\mu = 5, \sigma = 3)\) and we are interested in \(P(X < 2)\). Let’s compute it using R:

`pnorm(2, mean = 5, sd = 3) # ok`

`## [1] 0.1586553`

We can find the quantile of the level 0.65, i.e. the value \(x\) of \(X\) such that \(P(X<x) = 0.65\).

`qnorm(0.65, mean = 5, sd = 3)`

`## [1] 6.155961`

Now we will use this function to show that the limits of typical values of a variable calculated for box plots approximately coincide with those obtained by the three-sigma rule. Recall: box plots are informative for variables whose distributions are normal or close to normal (not seriously skewed).

Let us take a standard normal variable (default one in `norm`

-functions in R) and calculate its lower and upper quartiles:

```
q1 <- qnorm(0.25)
q3 <- qnorm(0.75)
cat(q1, q3)
```

`## -0.6744898 0.6744898`

And the interquartile range:

```
iqr <- qnorm(0.75) - qnorm(0.25)
iqr
```

`## [1] 1.34898`

Now compute two endpoints: [\(Q_1-1.5*IQR\); \(Q_3+1.5*IQR\)].

`q1 - 1.5 * iqr`

`## [1] -2.697959`

`q3 + 1.5 * iqr`

`## [1] 2.697959`

These values (-2.7 and 2.7 approximately) are close to -3 and 3 that we get applying the three-sigma rule. Recall: the three-sigma rule states, in particular, that 99% of values of a normal variable lie between [\(\mu-3\sigma\); \(\mu+3\sigma\)], so here we get [\(0-3\cdot 1\); \(0 + 3\cdot 1\)].

To see the same fact in a more interesting way, see the visualization in Wikipedia.