The ACT is a standardized college admissions test used in the United States. The four multi-part questions in this assessment all involve simulating some ACT test scores and answering probability questions about them.
For the three year period 2016-2018, ACT standardized test scores were approximately normally distributed with a mean of 20.9 and standard deviation of 5.7. (Real ACT scores are integers between 1 and 36, but we will ignore this detail and use continuous values instead.)
First we’ll simulate an ACT test score dataset and answer some questions about it.
Set the seed to 16, then use rnorm() to generate a normal distribution of 10000 tests with a mean of 20.9 and standard deviation of 5.7. Save these values as act_scores. You’ll be using this dataset throughout these four multi-part questions.
(IMPORTANT NOTE! If you use R 3.6 or later, you will need to use the command format set.seed(x, sample.kind = “Rounding”) instead of set.seed(x). Your R version will be printed at the top of the Console window when you start RStudio.)
set.seed(16, sample.kind = "Rounding")
## Warning in set.seed(16, sample.kind = "Rounding"): non-uniform 'Rounding'
## sampler used
act_scores <- rnorm(10000, 20.9, 5.7)
act_scores?mean(act_scores)
## [1] 20.84012
act_scores?sd(act_scores)
## [1] 5.675237
sum(act_scores >= 36)
## [1] 41
mean(act_scores > 30)
## [1] 0.0527
mean(act_scores <= 10)
## [1] 0.0282
x <- seq(1, 36)
f_x <- dnorm(x, mean = 20.9, sd = 5.7)
plot(x, f_x)
In this 3-part question, you will convert raw ACT scores to Z-scores and answer some questions about them.
Convert act_scores to Z-scores. Recall from Data Visualization (the second course in this series) that to standardize values (convert values into Z-scores, that is, values distributed with a mean of 0 and standard deviation of 1), you must subtract the mean and then divide by the standard deviation. Use the mean and standard deviation of act_scores, not the original values used to generate random test scores.
zscores <- (act_scores - mean(act_scores)) / sd(act_scores)
mean(zscores > 2)
## [1] 0.0233
2*sd(act_scores) + mean(act_scores)
## [1] 32.1906
qnorm(.975, mean(act_scores), sd(act_scores))
## [1] 31.96338
In this 4-part question, you will write a function to create a CDF for ACT scores. Write a function that takes a value and produces the probability of an ACT score less than or equal to that value (the CDF). Apply this function to the range 1 to 36.
cdf <- sapply(1:36, function (x){
mean(act_scores <= x)
})
min(which(cdf >= .95))
## [1] 31
qnorm(.95, 20.9, 5.7)
## [1] 30.27567
pnorm(26, mean(act_scores), sd(act_scores))
## [1] 0.8183755
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.0.4 ✓ dplyr 1.0.4
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggplot2)
p <- seq(0.01, 0.99, 0.01)
sample_quantiles <- quantile(act_scores, p)
theoretical_quantiles <- qnorm(p, 20.9, 5.7)
qplot(theoretical_quantiles, sample_quantiles) + geom_abline()