Finding Probabilities in a Random Vector

Here, I use a Cumulative Distribution Function under-the-hood of pnorm() to get a normal probability distribution. I want to find the probability of seeing numbers equal to or greater than 20 in a vector of 1,000 random numbers!

Load and install packages

#install.packages("tidyverse")
#library("tidyverse")

Generate vector with a random set of numbers & Set seed for reproducibility

#?rnorm()
set.seed(123)
x <- round(rnorm(1000, mean = 23, sd = 9))

I called a help function for rnorm() to get syntax. I set seed to 123. I used rnorm() to get a random vector of 1000 numbers with a mean around 23 and a standard deviation of approximately 9. I know, it’s a fairly large standard deviation! I thought it’d be fun to get as wide of a spread as possible just for kicks!

Test object output

##    [1] 18 21 37 24 24 38 27 12 17 19 34 26 27 24 18 39 27  5 29 19 13 21 14 16
##   [25] 17  8 31 24 13 34 27 20 31 31 30 29 28 22 20 20 17 21 12 43 34 13 19 19
##   [49] 30 22 25 23 23 35 21 37  9 28 24 25 26 18 20 14 13 26 27 23 31 41 19  2
##   [73] 32 17 17 32 20 12 25 22 23 26 20 29 21 26 33 27 20 33 32 28 25 17 35 18
##   [97] 43 37 21 14 17 25 21 20 14 23 16  8 20 31 18 28  8 22 28 26 24 17 15 14
##  [121] 24 14 19 21 40 17 25 24 14 22 36 27 23 19  5 33 10 30 40 10 29 21  9  9
##  [145]  9 18 10 29 42 11 30 30 26 14 22 20 28 20 32 20 32 14 12 52 19 26 29 19
##  [169] 28 26 21 24 23 42 16 13 23 26 27 19 13 34 20 15 21 21 33 24 30 19 25 20
##  [193] 24 15 11 41 28 12 17 12 43 35 21 28 19 19 16 18 38 23 24 25 34 18 14 38
##  [217] 19 16 12 11 18 29 33 29 20 24 17 17 31 14 41 22 25 16 18 11 21 27 26 16
##  [241] 16 18 36 13 21 40 22 11 17 27 20 18 20 24 37 22 33 29 22  9 18 19 23 35
##  [265] 44 37 22  7 20 24 31 32 29 10 31 19 25 24 27 23  8 30 26 21 24 24 25 38
##  [289] 21 25 34 32 33 18 41 24 40 11 23 34 17 16 15 14 19 26  5 25 34 41 35 30
##  [313]  7 18 20 29 22 12 38 31 25 34 11 29 18 29 22 29 35 23 32 12 17 37 26  5
##  [337] 11 21 31 22 29 32 38 24 23  7 24 18 14 21 32  5 19 24 15 26 27 23  1 46
##  [361] 21 29 25 32 30 21 26 14 31 19 45  8 19 30 28 18 14 24 23  7 23 25 25 14
##  [385] 27 35 27 13 19 26 17  4 31 16 18 37 16 31 12 20 22 12 17 23 29  8 20 30
##  [409] 18 25 27 25 29 22 19 -1 22 27 28 18 39 26 24 34 17 19 45 23 38 10 21 26
##  [433] 26 14 23 13 29 33  3 34 12 27 29 21 17 24 27 31  5  8 36 32 27 29 31 -1
##  [457] 33 19 25 20 31 20 28 19 13 34 30 39 24 33 41 20 11 21 21 24 38 20 26 21
##  [481] 23 26 35 24 29 30 31 18 38 20 22 36 35 13 15 11 25 24 26 28 18 14 32 30
##  [505]  9 22 15  4 24 22 22 25 31 25 17 16 22 26 14 21 32 22 17 21 33 28 34 24
##  [529] 27 18 28 18 10 24 41 30 33 26 18 21 21 19 29 12 31 31 12 29 45 18 31 16
##  [553] 33 25 38 10 23 18 21 17 15 28 13 36 12 24 28 28 20 24 32 10 16 26 19 35
##  [577] 29 24  9 23 20 22 12 27 14 21 26 16 28 11 -2 27 31 20 28 13 22  6 34 40
##  [601] 33 23 23  9 30 21 17 10 20 15 19 12 38 23 33  0 19 17 12 37 10 26 31 25
##  [625] 15 31 25 13 11 42 17  6 28 26 11  6 22 33 29 19 15 25 24 29 19 31 16 20
##  [649] 30 32  6 24 28 10 27 16 32 28 30 24 31 35 41 23  3 23 25 22 28 32 18 20
##  [673] 27 18 24  5 13 11 15 17 26 32 16 14 14 19 21 27 20  4 22 34 34 16  9 45
##  [697] 22 22 27  8 16  9 17 24 11 28 26 15 25 30 33 21 22 22 36 33 31 20 26 27
##  [721] 14  7 29  9 23 25 28 25 16 32 39 31  6 36 22 28 29 22 22 32 29 32 44 29
##  [745] 25  3 47 19 44 26 37 22 28 25 21 22 32 21  5 21 28 29 29  8 26 32 34 21
##  [769] 20 36  8 19 27  8 26 40 23 20 27 20 30 31 23 12 11 18 30  5  6 17 27 15
##  [793] 31 26 21 30 26 19 25 29 26 17 31 33 25 24 22 42 25 22  0  9 22 25 25 30
##  [817] 21 34 15 12 12 30 22 30 21 18 20  6 12 10 32 18 30 37 21 26  7 16 24 26
##  [841] 16 47 19 24 29 23 17 32 38 23 28 22 32 13 44 18 10 20 24 38 31 24 10 15
##  [865] 22 46  6 33 18 38 13 24 13 31 36 41 30 40 34 22 27 14 21 34 28 27 14 18
##  [889] 16 27 14 28 13 39 27 17 25 20 26 25 14 16 26 38 33 17 30 23 26 25 46 12
##  [913] 24  7 28 33 36  6 27 37 19 21 23 26 29 35 22 25 45 27 25 11 23 21 23 18
##  [937] 17 17 21 35 32 20  8 15 19 24 17 41 21 12 22 14 24 25 26  8 25 26 10 32
##  [961] 30 44 24 23 24 24  6  8 22 18 23  4 10 13 32 13 16 24 20 24 44 13 20 18
##  [985] 37 16 22 30 13 38 29 13 27 21 26 22 33 11 18 21

Called object to see this massive vector.

Find mean of x

mean(x)
## [1] 23.14

I wanted to double-check that the mean comes to 23 the way I specified in the rnorm() arguments. As expected, the average is 23.

Find mode, as in most frequent value in vector

table(x) |>  
  sort(decreasing = TRUE) |>  
  head(10)
## x
## 24 21 22 26 25 20 18 23 19 27 
## 54 50 49 47 45 43 40 39 38 37

I can see that the top 3, most frequently appearing numbers are centered around the mean, with 24 as the mode popping up a total of 54 times in the vector. Surprisingly, 23 is the 8th most frequent number and not in the top 3.

This symbol: “|>” is just a native pipe operator. I like it because it’s potentially faster, has fewer characters, and it’s shaped like an arrow pointing to the next function, which is a lot easier to read for me.

Find sd of x

sd(x)
## [1] 8.934888

Used sd() to go ahead and double-check the standard deviation, and it turns out, it’s just under 9.

Find median of x

median(x)
## [1] 23

Ran median() function to get an even clearer picture of centrality or central tendency of this sample. Median is very close to mean, showing that this is a classic bell shaped distribution, nearly perfectly symmetrical.

Visualize to check for Normality

 hist(x)

I already have a fairly strong basis to assume that this sample is normalized because obviously it came from a vector of normally distributed random numbers, i.e., rnorm(). However, I wanted to get a visualization in front of me just to confirm the shape of the distribution.

Add a Density curve

# Establishing axis grid
axgrid <- seq(min(x), max(x), length = 50) |> 
  round()
# Recalling dnorm() arguments
#?dnorm
# Calculating density curve
dcurve <- dnorm(axgrid, mean = mean(x), sd = sd(x))

Adding Breaks, Density Curve and Aesthetics to Histogram

hist(x, breaks = 50, prob = TRUE, col = "white",
     ylim = c(0, .05),
     main = "Normal Density Curve on Histogram")
lines(axgrid, dcurve, col = "darkturquoise", lwd = 2) + 
  abline(v= mean(x), lwd= 3, lty= 2,  col= "black")
## integer(0)
text(24,.048, "Mean of x", col = "black", adj = c(0, -.1))

This is clearly normally distributed, so I won’t bother with any Shapiro or QQ methods. I played around with the break sizes to show nuances and outliers, added annotations and shifted axis until I was pleased with the overall aesthetic. Overall, it looks like the rnorm() function has worked it’s magic with this sample!

Recall pnorm() syntax

#?pnorm

Finally, I will begin to find Probabilities of numbers greater than or less than 20 in the vector using the pnorm() function.

Find the probability of values Equal To or Less Than 20 occurring

pnorm(q=20, mean=23.14, sd=8.934888, lower.tail = TRUE)
## [1] 0.3626324

There’s a 36% chance of seeing numbers equal to or less than 20!

Find the probability of values Greater Than 20 occurring

pnorm(q=20, mean=23.14, sd=8.934888, lower.tail = FALSE)
## [1] 0.6373676

That means there’s a whopping 63% chance of seeing numbers equal to or greater than 20 in this vector and there’s roughly, a 36% chance of seeing numbers equal to or less than 20!

It’s interesting to see that such a large portion of the values took above the value of 20. Considering the fact that the average of 23 is so close to 20, though 23 is greater, I still expected to see more of an even split of value probabilities. I’d expect around, for instance, 45% to 55% or even a 60% to 40% split.

To see 63% on one side is pretty cool and shows just how random (pseudo random) the data is! Definitely intriguing!

Double-check percentages

Lastly, to double-check my percentages, I’ll add them up to ensure they’re equal to 1.

0.6373676 + 0.3626324
## [1] 1

Great! I can now hold confidence in my calculations, as both percentages add up to a total of 100%!

This has been my short example of working with Probabilities using pnorm().

Thanks for viewing!