DSC 011 S26 Lecture 15 Demo

Deviations, Degrees of Freedom and Binomial Computations

Author

Leo Garcia Ortiz

Published

February 27, 2026

Preliminaries: Assignment of Teams and Team Tables

The class will split into OS-specific randomly assigned teams of about seven. The teams and team tables are determined by random sampling without replacement (also known as permutation) as follows:


# 70 students as of Feb 25
windows11 <- c("aflores-hernandez","akwong25","bdelacruz-angeles",
               "bhernandezarteaga","cbettencourt2","cchen271","cornelas3",
               "craigelijaesoriano","davila-castaneda","dvargas38",
               "ecastillo-quevedo","efernando","ekjotjohal","elliottwhitney",
               "fromerobojorquez","genaxiong","ggonzalez-ramirez",
               "ghendrickson","gurindersahota","jasminesamayoa",
               "jcontrerastrinidad","jessiemorales","jlegaspina","joneal2",
               "jwong290","kchen129","leogarciaortiz","lillieyang",
               "lindaespinozamunoz","lorenackerman","mdesilva","rbujji",
               "roderickma","skodur","sraman7","tolaniyan","trevoroh")

macOS <- c("adimasagurto","ahmiyasalter","alannahtanner","aleroux",
          "alizeibarra","apatterson9","asingh368",
          "eflores136","elmermartinez","emendozagonzalez",
          "emoya8","isidrohernandez","jaisingh","jangel15","jardindo",
          "jessecaclark","jmandujano4","jperez460","kamryntaylor",
          "kchen132","kvu56","lalagos","malachifuqua","manroopkaur",
          "mayraarias","msuccari","omarkhalil","rbeattie","seanjimenez",
          "vchezhiyan","xcortes2")


language_table_students <- c("edmondcheng", "angeliachunyu")
teamsize  <- 7

# Set seed for reproducibility
set.seed("20260227")  
shuffled_win11  <- sample(windows11, replace = FALSE)
tables    <- c(rep(1:3,each=teamsize),rep(4:5,each=(teamsize+1)))
teams_win11     <- split(shuffled_win11, tables)


shuffled_macOS  <- sample(macOS, replace = FALSE)
tables    <- c(rep(6,(teamsize-1)),rep(7,(teamsize-2)),rep(8:9,each=teamsize),rep(10,(teamsize-1)))
teams_macOS   <- split(shuffled_macOS, tables)
teams_macOS$`7` <- c(teams_macOS$`7`,language_table_students)

teams <- lapply(c(teams_win11,teams_macOS),sort)

wed_lab_mac <- c("adimasagurto","alannahtanner","aleroux","alizeibarra","asingh368","edmondcheng","eflores136","elmermartinez","jardindo","jmandujano4")

wed_lab_win <- c("bhernandezarteaga","cbettencourt2","craigelijaesoriano","efernando","ekjotjohal","ggonzalez-ramirez","jasminesamayoa","joneal2","jwong290","mdesilva")

invisible(lapply(seq_along(teams), function(i) {
  cat("Team at Table", i, ":", teams[[i]], "\n")
}))
## Team at Table 1 : ecastillo-quevedo gurindersahota jwong290 leogarciaortiz lillieyang roderickma tolaniyan 
## Team at Table 2 : bhernandezarteaga cchen271 davila-castaneda fromerobojorquez jasminesamayoa joneal2 rbujji 
## Team at Table 3 : craigelijaesoriano ghendrickson jessiemorales jlegaspina kchen129 lindaespinozamunoz skodur 
## Team at Table 4 : cornelas3 dvargas38 genaxiong ggonzalez-ramirez jcontrerastrinidad lorenackerman mdesilva sraman7 
## Team at Table 5 : aflores-hernandez akwong25 bdelacruz-angeles cbettencourt2 efernando ekjotjohal elliottwhitney trevoroh 
## Team at Table 6 : aleroux elmermartinez jangel15 jperez460 msuccari rbeattie 
## Team at Table 7 : angeliachunyu apatterson9 edmondcheng isidrohernandez kamryntaylor malachifuqua vchezhiyan 
## Team at Table 8 : asingh368 emoya8 jmandujano4 lalagos manroopkaur seanjimenez xcortes2 
## Team at Table 9 : ahmiyasalter alizeibarra emendozagonzalez jardindo jessecaclark kchen132 mayraarias 
## Team at Table 10 : adimasagurto alannahtanner eflores136 jaisingh kvu56 omarkhalil

Instructions for Completing and Submitting This Assignment

Download and open today’s template notebook in RStudio
Personalize the file by writing your name in the YAML header (replace “FirstName LastName”) — be sure to do this or you will lose points!
Save with your name in RStudio and move to course directory: In RStudio select File ??? Save as..., find your course directory files and move and rename the file to include your name (e.g., FirstName_LastName_Quantiles_Demo.qmd)
Render to HTML
Follow instructions from the HTML rendered output by editing your personalized notebook.
As you work the assignment, keep rendering and editing the file, asking for help from your team until you get all CORRECT for each problem. Two or more students may ask for help from the instructors.
Render to HTML and submit to Catcourses. Turn in as much CORRECT work as you can by the end of class today. Submission by end of class qualifies you for credit.
Resubmit your best work by midnight tonight for better grade or fully accepted work – only your latest and best work gets graded.

Assignment

Computing Deviations

Demonstration 1: Computing Deviations for the Nile Data

Given a sample vector of values $\vec{x} = \langle x_1, x_2, \dots , x_n \rangle$ with mean $\bar{x} \equiv 1/n \sum_{i=1}^n x_i$, a sample vector of deviations $\vec{y}$ may be computed as a simple transformation of the sample values called a shift or translation, by subtracting the mean $\bar{x}$ from every value of the sample:

\[\vec{y} \equiv \vec{x} - \bar{x} = \langle (x_1 - \bar{x}), (x_2 - \bar{x}), \dots, (x_n - \bar{x}) \rangle \equiv \langle y_1, y_2, \dots y_n \rangle \] The mean of a sample of deviations is always zero:

\[ \bar{y} = \frac{1}{n} \sum_{i = 1}^n y_i = \frac{1}{n} \sum_{i = 1}^n (x_i - \bar{x}) = \frac{1}{n} (\sum_{i = 1}^n x_i) - \frac{1}{n} (\sum_{i = 1}^n \bar{x}) = \bar{x} - \frac{1}{n}(n\bar{x}) = \bar{x} - \bar{x} = 0.\]

Let’s demonstrate this in practice for the Nile data. Down in the R Console, do these steps:

apply the mean() function to the Nile data and use the assignment operator to save the return value to mean_nile.
subtract mean_nile from the Nile data and use the assignment operator to save the return value to nile_deviations.
apply the hist() function to nile_deviations to plot the deviations.
after getting this to work in the R Console, at the top of the code chunk below, add the code to plot the histogram of deviations so that the plot is added into the rendered notebook, and use assignment to set answer to the value of nile_deviations to check your work. Before you can set answer, you need to also copy the definitions of mean_nile and nile_deviations from your Console into the code chunk, so that they are defined successively anew when you render.

mean_nile <- mean(Nile)
nile_deviations <- Nile - mean(Nile)
hist(nile_deviations)

answer <- nile_deviations
print_and_check(answer, "8653f650cbf829a7e385d8a9bde83cb6d32128123536546be9a7ec98166c3b41")
## Time Series:
## Start = 1871 
## End = 1970 
## Frequency = 1 
##   [1]  200.649999999999977  240.649999999999977   43.649999999999977
##   [4]  290.649999999999977  240.649999999999977  240.649999999999977
##   [7] -106.350000000000023  310.649999999999977  450.649999999999977
##  [10]  220.649999999999977   75.649999999999977   15.649999999999977
##  [13]  190.649999999999977   74.649999999999977  100.649999999999977
##  [16]   40.649999999999977  260.649999999999977 -120.350000000000023
##  [19]   38.649999999999977  220.649999999999977  180.649999999999977
##  [22]  290.649999999999977  230.649999999999977  330.649999999999977
##  [25]  340.649999999999977  300.649999999999977  110.649999999999977
##  [28]  180.649999999999977 -145.350000000000023  -79.350000000000023
##  [31]  -45.350000000000023 -225.350000000000023   20.649999999999977
##  [34]  -86.350000000000023 -218.350000000000023   -3.350000000000023
##  [37] -227.350000000000023  100.649999999999977  130.649999999999977
##  [40]   49.649999999999977  -88.350000000000023 -193.350000000000023
##  [43] -463.350000000000023  -95.350000000000023 -217.350000000000023
##  [46]  200.649999999999977  180.649999999999977  -87.350000000000023
##  [49] -155.350000000000023  -98.350000000000023 -151.350000000000023
##  [52]  -74.350000000000023  -55.350000000000023  -57.350000000000023
##  [55] -221.350000000000023  -74.350000000000023 -175.350000000000023
##  [58] -123.350000000000023  120.649999999999977 -160.350000000000023
##  [61] -138.350000000000023  -54.350000000000023  -74.350000000000023
##  [64]   24.649999999999977   64.649999999999977  -22.350000000000023
##  [67]  -97.350000000000023   90.649999999999977 -148.350000000000023
##  [70] -243.350000000000023 -270.350000000000023  -73.350000000000023
##  [73] -107.350000000000023 -177.350000000000023 -118.350000000000023
##  [76]  120.649999999999977  -59.350000000000023  -45.350000000000023
##  [79]  -71.350000000000023  -29.350000000000023 -175.350000000000023
##  [82] -170.350000000000023  -81.350000000000023  130.649999999999977
##  [85]   -1.350000000000023   66.649999999999977 -122.350000000000023
##  [88]    3.649999999999977   55.649999999999977 -104.350000000000023
##  [91]  100.649999999999977  -13.350000000000023  -18.350000000000023
##  [94]  250.649999999999977   -7.350000000000023 -173.350000000000023
##  [97]   -0.350000000000023 -201.350000000000023 -205.350000000000023
## [100] -179.350000000000023
## [VALUE]  CORRECT

Now call the mean() function on the nile_deviations R object to see its mean, in the following code chunk:

mean(nile_deviations)
## [1] -0.0000000000000227803886865274

It is very likely that your answer is a very small number, that is close to zero. For example, on my system, the value returned is -3.126388e-14. The exact number may be different on different types of computers or operating system versions and depend on which processor your machine has. We just proved that the mean of deviations is always zero above, so why is this number not zero? The reason is because when doing math on computers, real numbers are only represented approximately with a finite number of digits, which leads to the accumulation of rounding errors. This issue of machine precision is important to take into account when doing statistical computing, and the general solution to problems of machine precision is to round the final result of calculations to a recommended number of significant digits. In this class, you will do that in R using the round() function.

Wrap the round() function around your call if the mean() function on the deviations.Nile R object to see its mean rounded, in the following code chunk:

round(mean(nile_deviations))
## [1] 0

To make a nicer histogram of the deviations of the Nile dataset we can label its mean. Copying your expression for the rounded mean of nile_deviations above as a replacement for NULL in the first assignment statement for nile_deviations_mean in the code chunk below.

nile_deviations_mean <- round(mean(nile_deviations))
hist(nile_deviations,main="Histogram of Deviations of Nile")
abline(v=nile_deviations_mean,col="blue",lty=2)
text(100,25,paste0("Mean: ",nile_deviations_mean),col="blue")

Demonstration 2: Find the Missing Value: Introduction to Degrees of Freedom

Suppose you have obtained a sample of incomplete observations on a numeric vaiable $ = r,s,t,u $ of size $n = 4$ such as cells/dish in the cell-cycle experiment. If you know that the mean $\bar{x}$ of this sample is $\bar{x} = 21.25$ cells/dish and that the sum $r + s + t$ of the first three observations is $r + s + t = 68 \text{ cells/dish}$, what is the value of $u$ in cells/dish? Please use the following code chunk to solve this and have R compute the answer.

n       <- 4
x_bar   <- 21.25
sum_rst <- 68
u <- (x_bar*n) - sum_rst
print_and_check(u,"bd60e0fee41eee64706b028add0f183242acac9faaf08d99916b92e66222e28b")
## [1] 17
## [VALUE]  CORRECT

The deep lesson of this demonstration is that the sample mean contains information about every value in the sample, and can therefore be used to completely reconstruct one of them if lost.

We say that estimation of the sample mean uses up one degree of freedom of the data.

Compute Binomial Tail Probabilities, Interval Probabilities, Quantiles, and Simulating Binomial Variates in R

Demonstration 3 Binomial Tail Probabilities

Use pbinom, the cumulative distribution function for the binomial, and fill in values 2 for x, 10 for size, and 0.5 for prob to compute the one-sided tail probability of 2 successes in 10 I.I.D. trials of a fair Bernoulli random variable. Round to 5 digits after the decimal place.

answer <- NULL
print_and_check(answer, "d9ccea63afea578b23fcffb51718ac96b838de9f66b29592bb3e690eaf09e056")
## NULL
## [VALUE]  INCORRECT

Demonstration 4: Binomial Interval Probabilities

Replace NULL with an R expression that uses pbinom twice to show that for 100 I.I.D. Bernoulli trials with probability of success $\pi = 0.5$, there is more than a 95% chance that the numbers of successes will be between 40 and 60 inclusive. Round to five digits after the decimal place.

answer <- NULL
print_and_check(answer, "4d7ae686b7ef3a545c13354034c6fa6df4af8d8bc8f8e8e052993373d5f37176")
## NULL
## [VALUE]  INCORRECT

Demonstration 5: Binomial Quantiles

Replace NULL with an R expression that uses qbinom to calculate the 95th percentile of the number of successes expected in 100 I.I.D. Bernoulli trials with probability of success $\pi = 0.75$.

answer <- NULL
print_and_check(answer, "9f06801ef5c78e154b8a42393591326e36bcda798727e0f44bd189728f2cfbb4")
## NULL
## [VALUE]  INCORRECT

Demonstration 6: Simulate Binomial Variates

Replace NULL with an R expression that uses rbinom to simulate 10 binomial variates in 100 I.I.D. Bernoulli trials with probability of success $\pi = 0.75$.

set.seed(1234)
answer <- NULL
print_and_check(answer, "465dea1b1d21ef19091439594c982833105c6ff07a3b50b6ac0be11339f71fd3")
## NULL
## [VALUE]  INCORRECT