Here, I use a Cumulative Distribution Function under-the-hood of pnorm() to get a normal probability distribution. I want to find the probability of seeing numbers equal to or greater than 20 in a vector of 1,000 random numbers!
#install.packages("tidyverse")
#library("tidyverse")
#?rnorm()
set.seed(123)
x <- round(rnorm(1000, mean = 23, sd = 9))
I called a help function for rnorm() to get syntax. I set seed to 123. I used rnorm() to get a random vector of 1000 numbers with a mean around 23 and a standard deviation of approximately 9. I know, it’s a fairly large standard deviation! I thought it’d be fun to get as wide of a spread as possible just for kicks!
## [1] 18 21 37 24 24 38 27 12 17 19 34 26 27 24 18 39 27 5 29 19 13 21 14 16
## [25] 17 8 31 24 13 34 27 20 31 31 30 29 28 22 20 20 17 21 12 43 34 13 19 19
## [49] 30 22 25 23 23 35 21 37 9 28 24 25 26 18 20 14 13 26 27 23 31 41 19 2
## [73] 32 17 17 32 20 12 25 22 23 26 20 29 21 26 33 27 20 33 32 28 25 17 35 18
## [97] 43 37 21 14 17 25 21 20 14 23 16 8 20 31 18 28 8 22 28 26 24 17 15 14
## [121] 24 14 19 21 40 17 25 24 14 22 36 27 23 19 5 33 10 30 40 10 29 21 9 9
## [145] 9 18 10 29 42 11 30 30 26 14 22 20 28 20 32 20 32 14 12 52 19 26 29 19
## [169] 28 26 21 24 23 42 16 13 23 26 27 19 13 34 20 15 21 21 33 24 30 19 25 20
## [193] 24 15 11 41 28 12 17 12 43 35 21 28 19 19 16 18 38 23 24 25 34 18 14 38
## [217] 19 16 12 11 18 29 33 29 20 24 17 17 31 14 41 22 25 16 18 11 21 27 26 16
## [241] 16 18 36 13 21 40 22 11 17 27 20 18 20 24 37 22 33 29 22 9 18 19 23 35
## [265] 44 37 22 7 20 24 31 32 29 10 31 19 25 24 27 23 8 30 26 21 24 24 25 38
## [289] 21 25 34 32 33 18 41 24 40 11 23 34 17 16 15 14 19 26 5 25 34 41 35 30
## [313] 7 18 20 29 22 12 38 31 25 34 11 29 18 29 22 29 35 23 32 12 17 37 26 5
## [337] 11 21 31 22 29 32 38 24 23 7 24 18 14 21 32 5 19 24 15 26 27 23 1 46
## [361] 21 29 25 32 30 21 26 14 31 19 45 8 19 30 28 18 14 24 23 7 23 25 25 14
## [385] 27 35 27 13 19 26 17 4 31 16 18 37 16 31 12 20 22 12 17 23 29 8 20 30
## [409] 18 25 27 25 29 22 19 -1 22 27 28 18 39 26 24 34 17 19 45 23 38 10 21 26
## [433] 26 14 23 13 29 33 3 34 12 27 29 21 17 24 27 31 5 8 36 32 27 29 31 -1
## [457] 33 19 25 20 31 20 28 19 13 34 30 39 24 33 41 20 11 21 21 24 38 20 26 21
## [481] 23 26 35 24 29 30 31 18 38 20 22 36 35 13 15 11 25 24 26 28 18 14 32 30
## [505] 9 22 15 4 24 22 22 25 31 25 17 16 22 26 14 21 32 22 17 21 33 28 34 24
## [529] 27 18 28 18 10 24 41 30 33 26 18 21 21 19 29 12 31 31 12 29 45 18 31 16
## [553] 33 25 38 10 23 18 21 17 15 28 13 36 12 24 28 28 20 24 32 10 16 26 19 35
## [577] 29 24 9 23 20 22 12 27 14 21 26 16 28 11 -2 27 31 20 28 13 22 6 34 40
## [601] 33 23 23 9 30 21 17 10 20 15 19 12 38 23 33 0 19 17 12 37 10 26 31 25
## [625] 15 31 25 13 11 42 17 6 28 26 11 6 22 33 29 19 15 25 24 29 19 31 16 20
## [649] 30 32 6 24 28 10 27 16 32 28 30 24 31 35 41 23 3 23 25 22 28 32 18 20
## [673] 27 18 24 5 13 11 15 17 26 32 16 14 14 19 21 27 20 4 22 34 34 16 9 45
## [697] 22 22 27 8 16 9 17 24 11 28 26 15 25 30 33 21 22 22 36 33 31 20 26 27
## [721] 14 7 29 9 23 25 28 25 16 32 39 31 6 36 22 28 29 22 22 32 29 32 44 29
## [745] 25 3 47 19 44 26 37 22 28 25 21 22 32 21 5 21 28 29 29 8 26 32 34 21
## [769] 20 36 8 19 27 8 26 40 23 20 27 20 30 31 23 12 11 18 30 5 6 17 27 15
## [793] 31 26 21 30 26 19 25 29 26 17 31 33 25 24 22 42 25 22 0 9 22 25 25 30
## [817] 21 34 15 12 12 30 22 30 21 18 20 6 12 10 32 18 30 37 21 26 7 16 24 26
## [841] 16 47 19 24 29 23 17 32 38 23 28 22 32 13 44 18 10 20 24 38 31 24 10 15
## [865] 22 46 6 33 18 38 13 24 13 31 36 41 30 40 34 22 27 14 21 34 28 27 14 18
## [889] 16 27 14 28 13 39 27 17 25 20 26 25 14 16 26 38 33 17 30 23 26 25 46 12
## [913] 24 7 28 33 36 6 27 37 19 21 23 26 29 35 22 25 45 27 25 11 23 21 23 18
## [937] 17 17 21 35 32 20 8 15 19 24 17 41 21 12 22 14 24 25 26 8 25 26 10 32
## [961] 30 44 24 23 24 24 6 8 22 18 23 4 10 13 32 13 16 24 20 24 44 13 20 18
## [985] 37 16 22 30 13 38 29 13 27 21 26 22 33 11 18 21
Called object to see this massive vector.
mean(x)
## [1] 23.14
I wanted to double-check that the mean comes to 23 the way I specified in the rnorm() arguments. As expected, the average is 23.
table(x) |>
sort(decreasing = TRUE) |>
head(10)
## x
## 24 21 22 26 25 20 18 23 19 27
## 54 50 49 47 45 43 40 39 38 37
I can see that the top 3, most frequently appearing numbers are centered around the mean, with 24 as the mode popping up a total of 54 times in the vector. Surprisingly, 23 is the 8th most frequent number and not in the top 3.
This symbol: “|>” is just a native pipe operator. I like it because it’s potentially faster, has fewer characters, and it’s shaped like an arrow pointing to the next function, which is a lot easier to read for me.
sd(x)
## [1] 8.934888
Used sd() to go ahead and double-check the standard deviation, and it turns out, it’s just under 9.
median(x)
## [1] 23
Ran median() function to get an even clearer picture of centrality or central tendency of this sample. Median is very close to mean, showing that this is a classic bell shaped distribution, nearly perfectly symmetrical.
hist(x)
I already have a fairly strong basis to assume that this sample is normalized because obviously it came from a vector of normally distributed random numbers, i.e., rnorm(). However, I wanted to get a visualization in front of me just to confirm the shape of the distribution.
# Establishing axis grid
axgrid <- seq(min(x), max(x), length = 50) |>
round()
# Recalling dnorm() arguments
#?dnorm
# Calculating density curve
dcurve <- dnorm(axgrid, mean = mean(x), sd = sd(x))
hist(x, breaks = 50, prob = TRUE, col = "white",
ylim = c(0, .05),
main = "Normal Density Curve on Histogram")
lines(axgrid, dcurve, col = "darkturquoise", lwd = 2) +
abline(v= mean(x), lwd= 3, lty= 2, col= "black")
## integer(0)
text(24,.048, "Mean of x", col = "black", adj = c(0, -.1))
This is clearly normally distributed, so I won’t bother with any Shapiro or QQ methods. I played around with the break sizes to show nuances and outliers, added annotations and shifted axis until I was pleased with the overall aesthetic. Overall, it looks like the rnorm() function has worked it’s magic with this sample!
#?pnorm
Finally, I will begin to find Probabilities of numbers greater than or less than 20 in the vector using the pnorm() function.
pnorm(q=20, mean=23.14, sd=8.934888, lower.tail = TRUE)
## [1] 0.3626324
There’s a 36% chance of seeing numbers equal to or less than 20!
pnorm(q=20, mean=23.14, sd=8.934888, lower.tail = FALSE)
## [1] 0.6373676
That means there’s a whopping 63% chance of seeing numbers equal to or greater than 20 in this vector and there’s roughly, a 36% chance of seeing numbers equal to or less than 20!
It’s interesting to see that such a large portion of the values took above the value of 20. Considering the fact that the average of 23 is so close to 20, though 23 is greater, I still expected to see more of an even split of value probabilities. I’d expect around, for instance, 45% to 55% or even a 60% to 40% split.
To see 63% on one side is pretty cool and shows just how random (pseudo random) the data is! Definitely intriguing!
Lastly, to double-check my percentages, I’ll add them up to ensure they’re equal to 1.
0.6373676 + 0.3626324
## [1] 1
Great! I can now hold confidence in my calculations, as both percentages add up to a total of 100%!