norm
R functionsExample R code is shown in the questions below, where relevant.
pnorm
No answer required.
We could verify this value using:
round(pnorm(1.5, mean = 0, sd = 1), 4)
## [1] 0.9332
No answer required.
pnorm(2, mean = 0, sd = 1)
## [1] 0.9772499
pnorm(-1, mean = 0, sd = 1)
## [1] 0.1586553
pnorm(1, mean = 2, sd = 1)
## [1] 0.1586553
Notice that the result is the same as for b
, because we have shifted our curve two units to the right (so it is now centred at 2 rather than 0) but the spread has not changed.
pnorm(1, mean = 0, sd = sqrt(2))
## [1] 0.7602499
pnorm(2, mean = 0, sd = sqrt(3)) - pnorm(1, mean = 0, sd = sqrt(3))
## [1] 0.1577449
pnorm(1, mean = 0, sd = 2) - pnorm(-1, mean = 0, sd = 2)
## [1] 0.3829249
Note here that we could use the alternate approach below, which makes use of the symmetry property.
1- 2 * pnorm(-1, mean = 0, sd = 2)
## [1] 0.3829249
1 - pnorm(2, mean = 0, sd = sqrt(5))
## [1] 0.1855467
Note here that we are using the complement rule.
1 - pnorm(-1, mean = 0, sd = 3)
## [1] 0.6305587
Note here that we are using the complement rule.
pnorm(-2, mean = 0, sd = 1) + (1 - pnorm(2, mean = 0, sd = 1))
## [1] 0.04550026
Note here that we could use the alternate approach below, which makes use of the symmetry property.
2 * pnorm(-2, mean = 0, sd = 1)
## [1] 0.04550026
qnorm
No answer required.
round(qnorm(0.9772499, mean = 0, sd = 1), 2)
## [1] 2
qnorm(0.5, mean = 0, sd = 1)
## [1] 0
round(qnorm(0.7733726, mean = 0, sd = 1), 2)
## [1] 0.75
round(qnorm(0.2742531, mean = 1, sd = 1), 2)
## [1] 0.4
round(qnorm(0.7421539, mean = 3, sd = sqrt(2)), 2)
## [1] 3.92
rnorm
(if you have time)set.seed(2)
y <- rnorm(10000, mean = 0, sd = 1)
hist(y, xlab = "x value", main = "Example Plot", col = "skyblue", freq = FALSE)
curve(dnorm(x, mean = 0, sd = 1),
col = "orange", yaxt = "n", lwd = 3, add = TRUE)
dnorm
(if you have time)xval <- 1
dnorm(xval, mean = 0, sd = 1)
## [1] 0.2419707
To compute the density at \(x=-1\), we can use the code
dnorm(-1, mean = 0, sd = 1)
## [1] 0.2419707
The density at \(x=-1.5\) would be the same as the density at \(x=1.5\), i.e., \(0.1295\).
binom
R functionsNo answer required.
No answer required.
binom
R functionsNo answer required.
dbinom
The probability of guessing correctly exactly once out of ten guesses is
dbinom(1, 10, 0.25)
## [1] 0.1877117
We have
The probability of making zero correct guesses out of ten guesses is
dbinom(0, 10, 0.25)
## [1] 0.05631351
The probability of guessing correctly exactly twice out of ten guesses is
dbinom(2, 10, 0.25)
## [1] 0.2815676
The probability of guessing correctly exactly three times out of ten guesses is
dbinom(3, 10, 0.25)
## [1] 0.2502823
The probability of guessing correctly exactly nine times out of ten guesses is
dbinom(9, 10, 0.25)
## [1] 2.861023e-05
The probability of guessing correctly exactly ten times out of ten guesses is
dbinom(10, 10, 0.25)
## [1] 9.536743e-07
We notice that (as expected) the probability associated with a high number of successes is lower.
pbinom
We have:
pbinom(6, 10, 0.25)
## [1] 0.9964943
pbinom(3, 10, 0.25)
## [1] 0.7758751
1 - pbinom(3, 10, 0.25)
## [1] 0.2241249
1 - pbinom(8, 10, 0.25)
## [1] 2.95639e-05
pbinom(8, 10, 0.25) - pbinom(6, 10, 0.25)
## [1] 0.003476143
Note that for e
, since the Binomial distribution is a discrete distribution, we could also have used dbinom
here, i.e. dbinom(7, 10, 0.25) + dbinom(8, 10, 0.25)
.
pbinom(6, 10, 0.25)
## [1] 0.9964943
pbinom(5, 10, 0.25)
## [1] 0.9802723
You could use either
1 - pbinom(8, 10, 0.25)
## [1] 2.95639e-05
or
dbinom(9, 10, 0.25) + dbinom(10, 10, 0.25)
## [1] 2.95639e-05
Again, you could use either
1 - pbinom(7, 10, 0.25)
## [1] 0.000415802
or
dbinom(8, 10, 0.25) + dbinom(9, 10, 0.25) + dbinom(10, 10, 0.25)
## [1] 0.000415802
You could use either
pbinom(7, 10, 0.25) - pbinom(4, 10, 0.25)
## [1] 0.07771111
or
dbinom(5, 10, 0.25) + dbinom(6, 10, 0.25) + dbinom(7, 10, 0.25)
## [1] 0.07771111
It is highly unlikely (but not impossible) that they are telling the truth. Using our results from 2.5, the probability of making more than 6 correct guesses out of 10 is 0.0035057, which is extremely small. This probability is almost equal to the probability of making between 7 to 8 correct guesses. To achieve this twice is highly unlikely. In conclusion the student is probably lying.
rbinom
(if you have time)To generate data, we use the rbinom
R function, with n = 20
, size = 300
and prob = 0.25
, i.e.:
y <- rbinom(n = 20, size = 10, prob = 0.25)
y # look at results
## [1] 4 4 1 2 5 4 0 2 4 0 2 6 2 1 1 3 3 2 2 2
The 20 numbers represent the numbers of correct guesses made by each student.
Note that for this question, these answers may not match yours, since we are dealing with randomly generated values. The process however remains the same.
Note that these summary results will not make sense if you have mixed up the n
and size
arguments.
summary(y)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 1.75 2.00 2.50 4.00 6.00
Here we can see that the average number of correct guesses was 2.5, and the maximum number of correct guesses was 6. These values aren’t too surprising.
Example R code for the analysis of the Portuguese student data (UCI Machine Learning Repository 2014) is shown in the questions below, where relevant.
No answer required.
No answer required.
data <- read.csv("student-mat.csv", sep = ";")
No answer required.
results <- data[, c("absences", "G1", "G2", "G3")]
results$G1[results$G1 == 0] <- NA # Set any 0s in G1 to NA
results$G2[results$G2 == 0] <- NA # Set any 0s in G2 to NA
results$G3[results$G3 == 0] <- NA # Set any 0s in G3 to NA
results <- na.omit(results) # Remove 0s (now NAs) from the data
hist(x = results$G1, main = "First Period Results", xlab = "Score")
hist(x = results$G1, freq = FALSE,
main = "First Period Results", xlab = "Score", ylim = c(0, 0.14))
curve(dnorm(x, mean = mean(results$G1), sd = sd(results$G1)),
col="green", yaxt="n", lwd=2, add=TRUE)
par(mfrow = c(2,2), cex = 0.8, mex = 0.8)
hist(x = results$G1, freq = FALSE,
main = "First Period Results", xlab = "Score", ylim = c(0, 0.14))
curve(dnorm(x, mean = mean(results$G1), sd = sd(results$G1)),
col="green", yaxt="n", lwd=2, add=TRUE)
hist(x = results$G2, freq = FALSE,
main = "Second Period Results", xlab = "Score", ylim = c(0, 0.14))
curve(dnorm(x, mean = mean(results$G2), sd = sd(results$G2)),
col="green", yaxt="n", lwd=2, add=TRUE)
hist(x = results$G3, freq = FALSE,
main = "Final Grade", xlab = "Score", ylim = c(0, 0.14))
curve(dnorm(x, mean = mean(results$G3), sd = sd(results$G3)),
col="green", yaxt="n", lwd=2, add=TRUE)
hist(x = results$absences, freq = FALSE,
main = "Absences", xlab = "Number", ylim = c(0, 0.14))
curve(dnorm(x, mean = mean(results$absences), sd = sd(results$absences)),
col="green", yaxt="n", lwd=2, add=TRUE)
This is somewhat open to interpretation. One could say that the First Period Results and the Final Grade histograms look roughly normally distributed, as the bins appear to roughly follow the normal distribution bell curve. However, one could also observe that these two histograms are both slightly positively skewed (and hence do not look normally distributed). The histogram closest to being normally distributed is the first one (the Final Grade histogram has a heavier than expected left tail). The absences histogram is clearly not normally distributed, and the Second Period Results histogram is probably not normally distributed, although if you changed the number of bins, this observation could also change.
These notes have been prepared by Rupert Kuveke and Amanda Shaker. The copyright for the material in these notes resides with the author named above, with the Department of Mathematics and Statistics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.