# 70 students as of Feb 25
windows11 <- c("aflores-hernandez","akwong25","bdelacruz-angeles",
"bhernandezarteaga","cbettencourt2","cchen271","cornelas3",
"craigelijaesoriano","davila-castaneda","dvargas38",
"ecastillo-quevedo","efernando","ekjotjohal","elliottwhitney",
"fromerobojorquez","genaxiong","ggonzalez-ramirez",
"ghendrickson","gurindersahota","jasminesamayoa",
"jcontrerastrinidad","jessiemorales","jlegaspina","joneal2",
"jwong290","kchen129","leogarciaortiz","lillieyang",
"lindaespinozamunoz","lorenackerman","mdesilva","rbujji",
"roderickma","skodur","sraman7","tolaniyan","trevoroh")
macOS <- c("adimasagurto","ahmiyasalter","alannahtanner","aleroux",
"alizeibarra","apatterson9","asingh368",
"eflores136","elmermartinez","emendozagonzalez",
"emoya8","isidrohernandez","jaisingh","jangel15","jardindo",
"jessecaclark","jmandujano4","jperez460","kamryntaylor",
"kchen132","kvu56","lalagos","malachifuqua","manroopkaur",
"mayraarias","msuccari","omarkhalil","rbeattie","seanjimenez",
"vchezhiyan","xcortes2")
language_table_students <- c("edmondcheng", "angeliachunyu")
teamsize <- 7
# Set seed for reproducibility
set.seed("20260227")
shuffled_win11 <- sample(windows11, replace = FALSE)
tables <- c(rep(1:3,each=teamsize),rep(4:5,each=(teamsize+1)))
teams_win11 <- split(shuffled_win11, tables)
shuffled_macOS <- sample(macOS, replace = FALSE)
tables <- c(rep(6,(teamsize-1)),rep(7,(teamsize-2)),rep(8:9,each=teamsize),rep(10,(teamsize-1)))
teams_macOS <- split(shuffled_macOS, tables)
teams_macOS$`7` <- c(teams_macOS$`7`,language_table_students)
teams <- lapply(c(teams_win11,teams_macOS),sort)
wed_lab_mac <- c("adimasagurto","alannahtanner","aleroux","alizeibarra","asingh368","edmondcheng","eflores136","elmermartinez","jardindo","jmandujano4")
wed_lab_win <- c("bhernandezarteaga","cbettencourt2","craigelijaesoriano","efernando","ekjotjohal","ggonzalez-ramirez","jasminesamayoa","joneal2","jwong290","mdesilva")
invisible(lapply(seq_along(teams), function(i) {
cat("Team at Table", i, ":", teams[[i]], "\n")
}))
## Team at Table 1 : ecastillo-quevedo gurindersahota jwong290 leogarciaortiz lillieyang roderickma tolaniyan
## Team at Table 2 : bhernandezarteaga cchen271 davila-castaneda fromerobojorquez jasminesamayoa joneal2 rbujji
## Team at Table 3 : craigelijaesoriano ghendrickson jessiemorales jlegaspina kchen129 lindaespinozamunoz skodur
## Team at Table 4 : cornelas3 dvargas38 genaxiong ggonzalez-ramirez jcontrerastrinidad lorenackerman mdesilva sraman7
## Team at Table 5 : aflores-hernandez akwong25 bdelacruz-angeles cbettencourt2 efernando ekjotjohal elliottwhitney trevoroh
## Team at Table 6 : aleroux elmermartinez jangel15 jperez460 msuccari rbeattie
## Team at Table 7 : angeliachunyu apatterson9 edmondcheng isidrohernandez kamryntaylor malachifuqua vchezhiyan
## Team at Table 8 : asingh368 emoya8 jmandujano4 lalagos manroopkaur seanjimenez xcortes2
## Team at Table 9 : ahmiyasalter alizeibarra emendozagonzalez jardindo jessecaclark kchen132 mayraarias
## Team at Table 10 : adimasagurto alannahtanner eflores136 jaisingh kvu56 omarkhalilDSC 011 S26 Lecture 15 Demo
Deviations, Degrees of Freedom and Binomial Computations
Preliminaries: Assignment of Teams and Team Tables
The class will split into OS-specific randomly assigned teams of about seven. The teams and team tables are determined by random sampling without replacement (also known as permutation) as follows:
Instructions for Completing and Submitting This Assignment
- Download and open today’s template notebook in RStudio
- Personalize the file by writing your name in the YAML header (replace “FirstName LastName”) — be sure to do this or you will lose points!
- Save with your name in RStudio and move to course directory: In RStudio select
File ??? Save as..., find your course directory files and move and rename the file to include your name (e.g.,FirstName_LastName_Quantiles_Demo.qmd) - Render to HTML
- Follow instructions from the HTML rendered output by editing your personalized notebook.
- As you work the assignment, keep rendering and editing the file, asking for help from your team until you get all CORRECT for each problem. Two or more students may ask for help from the instructors.
- Render to HTML and submit to Catcourses. Turn in as much CORRECT work as you can by the end of class today. Submission by end of class qualifies you for credit.
- Resubmit your best work by midnight tonight for better grade or fully accepted work – only your latest and best work gets graded.
Assignment
Computing Deviations
Demonstration 1: Computing Deviations for the Nile Data
Given a sample vector of values \(\vec{x} = \langle x_1, x_2, \dots , x_n \rangle\) with mean \(\bar{x} \equiv 1/n \sum_{i=1}^n x_i\), a sample vector of deviations \(\vec{y}\) may be computed as a simple transformation of the sample values called a shift or translation, by subtracting the mean \(\bar{x}\) from every value of the sample:
\[\vec{y} \equiv \vec{x} - \bar{x} = \langle (x_1 - \bar{x}), (x_2 - \bar{x}), \dots, (x_n - \bar{x}) \rangle \equiv \langle y_1, y_2, \dots y_n \rangle \] The mean of a sample of deviations is always zero:
\[ \bar{y} = \frac{1}{n} \sum_{i = 1}^n y_i = \frac{1}{n} \sum_{i = 1}^n (x_i - \bar{x}) = \frac{1}{n} (\sum_{i = 1}^n x_i) - \frac{1}{n} (\sum_{i = 1}^n \bar{x}) = \bar{x} - \frac{1}{n}(n\bar{x}) = \bar{x} - \bar{x} = 0.\]
Let’s demonstrate this in practice for the Nile data. Down in the R Console, do these steps:
- apply the
mean()function to theNiledata and use the assignment operator to save the return value tomean_nile. - subtract
mean_nilefrom theNiledata and use the assignment operator to save the return value tonile_deviations. - apply the
hist()function tonile_deviationsto plot the deviations. - after getting this to work in the R Console, at the top of the code chunk below, add the code to plot the histogram of deviations so that the plot is added into the rendered notebook, and use assignment to set
answerto the value ofnile_deviationsto check your work. Before you can setanswer, you need to also copy the definitions ofmean_nileandnile_deviationsfrom your Console into the code chunk, so that they are defined successively anew when you render.
mean_nile <- mean(Nile)
nile_deviations <- Nile - mean(Nile)
hist(nile_deviations)answer <- nile_deviations
print_and_check(answer, "8653f650cbf829a7e385d8a9bde83cb6d32128123536546be9a7ec98166c3b41")
## Time Series:
## Start = 1871
## End = 1970
## Frequency = 1
## [1] 200.649999999999977 240.649999999999977 43.649999999999977
## [4] 290.649999999999977 240.649999999999977 240.649999999999977
## [7] -106.350000000000023 310.649999999999977 450.649999999999977
## [10] 220.649999999999977 75.649999999999977 15.649999999999977
## [13] 190.649999999999977 74.649999999999977 100.649999999999977
## [16] 40.649999999999977 260.649999999999977 -120.350000000000023
## [19] 38.649999999999977 220.649999999999977 180.649999999999977
## [22] 290.649999999999977 230.649999999999977 330.649999999999977
## [25] 340.649999999999977 300.649999999999977 110.649999999999977
## [28] 180.649999999999977 -145.350000000000023 -79.350000000000023
## [31] -45.350000000000023 -225.350000000000023 20.649999999999977
## [34] -86.350000000000023 -218.350000000000023 -3.350000000000023
## [37] -227.350000000000023 100.649999999999977 130.649999999999977
## [40] 49.649999999999977 -88.350000000000023 -193.350000000000023
## [43] -463.350000000000023 -95.350000000000023 -217.350000000000023
## [46] 200.649999999999977 180.649999999999977 -87.350000000000023
## [49] -155.350000000000023 -98.350000000000023 -151.350000000000023
## [52] -74.350000000000023 -55.350000000000023 -57.350000000000023
## [55] -221.350000000000023 -74.350000000000023 -175.350000000000023
## [58] -123.350000000000023 120.649999999999977 -160.350000000000023
## [61] -138.350000000000023 -54.350000000000023 -74.350000000000023
## [64] 24.649999999999977 64.649999999999977 -22.350000000000023
## [67] -97.350000000000023 90.649999999999977 -148.350000000000023
## [70] -243.350000000000023 -270.350000000000023 -73.350000000000023
## [73] -107.350000000000023 -177.350000000000023 -118.350000000000023
## [76] 120.649999999999977 -59.350000000000023 -45.350000000000023
## [79] -71.350000000000023 -29.350000000000023 -175.350000000000023
## [82] -170.350000000000023 -81.350000000000023 130.649999999999977
## [85] -1.350000000000023 66.649999999999977 -122.350000000000023
## [88] 3.649999999999977 55.649999999999977 -104.350000000000023
## [91] 100.649999999999977 -13.350000000000023 -18.350000000000023
## [94] 250.649999999999977 -7.350000000000023 -173.350000000000023
## [97] -0.350000000000023 -201.350000000000023 -205.350000000000023
## [100] -179.350000000000023
## [VALUE] CORRECTNow call the mean() function on the nile_deviations R object to see its mean, in the following code chunk:
mean(nile_deviations)
## [1] -0.0000000000000227803886865274It is very likely that your answer is a very small number, that is close to zero. For example, on my system, the value returned is -3.126388e-14. The exact number may be different on different types of computers or operating system versions and depend on which processor your machine has. We just proved that the mean of deviations is always zero above, so why is this number not zero? The reason is because when doing math on computers, real numbers are only represented approximately with a finite number of digits, which leads to the accumulation of rounding errors. This issue of machine precision is important to take into account when doing statistical computing, and the general solution to problems of machine precision is to round the final result of calculations to a recommended number of significant digits. In this class, you will do that in R using the round() function.
Wrap the round() function around your call if the mean() function on the deviations.Nile R object to see its mean rounded, in the following code chunk:
round(mean(nile_deviations))
## [1] 0To make a nicer histogram of the deviations of the Nile dataset we can label its mean. Copying your expression for the rounded mean of nile_deviations above as a replacement for NULL in the first assignment statement for nile_deviations_mean in the code chunk below.
nile_deviations_mean <- round(mean(nile_deviations))
hist(nile_deviations,main="Histogram of Deviations of Nile")
abline(v=nile_deviations_mean,col="blue",lty=2)
text(100,25,paste0("Mean: ",nile_deviations_mean),col="blue")Demonstration 2: Find the Missing Value: Introduction to Degrees of Freedom
Suppose you have obtained a sample of incomplete observations on a numeric vaiable $ = r,s,t,u $ of size \(n = 4\) such as cells/dish in the cell-cycle experiment. If you know that the mean \(\bar{x}\) of this sample is \(\bar{x} = 21.25\) cells/dish and that the sum \(r + s + t\) of the first three observations is \(r + s + t = 68 \text{ cells/dish}\), what is the value of \(u\) in cells/dish? Please use the following code chunk to solve this and have R compute the answer.
n <- 4
x_bar <- 21.25
sum_rst <- 68
u <- (x_bar*n) - sum_rst
print_and_check(u,"bd60e0fee41eee64706b028add0f183242acac9faaf08d99916b92e66222e28b")
## [1] 17
## [VALUE] CORRECTThe deep lesson of this demonstration is that the sample mean contains information about every value in the sample, and can therefore be used to completely reconstruct one of them if lost.
We say that estimation of the sample mean uses up one degree of freedom of the data.
Compute Binomial Tail Probabilities, Interval Probabilities, Quantiles, and Simulating Binomial Variates in R
Demonstration 3 Binomial Tail Probabilities
Use pbinom, the cumulative distribution function for the binomial, and fill in values 2 for x, 10 for size, and 0.5 for prob to compute the one-sided tail probability of 2 successes in 10 I.I.D. trials of a fair Bernoulli random variable. Round to 5 digits after the decimal place.
answer <- NULL
print_and_check(answer, "d9ccea63afea578b23fcffb51718ac96b838de9f66b29592bb3e690eaf09e056")
## NULL
## [VALUE] INCORRECTDemonstration 4: Binomial Interval Probabilities
Replace NULL with an R expression that uses pbinom twice to show that for 100 I.I.D. Bernoulli trials with probability of success \(\pi = 0.5\), there is more than a 95% chance that the numbers of successes will be between 40 and 60 inclusive. Round to five digits after the decimal place.
answer <- NULL
print_and_check(answer, "4d7ae686b7ef3a545c13354034c6fa6df4af8d8bc8f8e8e052993373d5f37176")
## NULL
## [VALUE] INCORRECTDemonstration 5: Binomial Quantiles
Replace NULL with an R expression that uses qbinom to calculate the 95th percentile of the number of successes expected in 100 I.I.D. Bernoulli trials with probability of success \(\pi = 0.75\).
answer <- NULL
print_and_check(answer, "9f06801ef5c78e154b8a42393591326e36bcda798727e0f44bd189728f2cfbb4")
## NULL
## [VALUE] INCORRECTDemonstration 6: Simulate Binomial Variates
Replace NULL with an R expression that uses rbinom to simulate 10 binomial variates in 100 I.I.D. Bernoulli trials with probability of success \(\pi = 0.75\).
set.seed(1234)
answer <- NULL
print_and_check(answer, "465dea1b1d21ef19091439594c982833105c6ff07a3b50b6ac0be11339f71fd3")
## NULL
## [VALUE] INCORRECT