DSC 011 S26 Lecture 14 Demo (KEY)

Quantiles

Author

Leo Garcia Ortiz

Published

February 25, 2026

Preliminaries: Assignment of Teams and Team Tables

The class will split into OS-specific randomly assigned teams of about seven. The teams and team tables are determined by random sampling without replacement (also known as permutation) as follows:


# 70 students as of Feb 25
windows11 <- c("aflores-hernandez","akwong25","bdelacruz-angeles",
               "bhernandezarteaga","cbettencourt2","cchen271","cornelas3",
               "craigelijaesoriano","davila-castaneda","dvargas38",
               "ecastillo-quevedo","efernando","ekjotjohal","elliottwhitney",
               "fromerobojorquez","genaxiong","ggonzalez-ramirez",
               "ghendrickson","gurindersahota","jasminesamayoa",
               "jcontrerastrinidad","jessiemorales","jlegaspina","joneal2",
               "jwong290","kchen129","leogarciaortiz","lillieyang",
               "lindaespinozamunoz","lorenackerman","mdesilva","rbujji",
               "roderickma","skodur","sraman7","tolaniyan","trevoroh")

macOS <- c("adimasagurto","ahmiyasalter","alannahtanner","aleroux",
          "alizeibarra","apatterson9","asingh368",
          "eflores136","elmermartinez","emendozagonzalez",
          "emoya8","isidrohernandez","jaisingh","jangel15","jardindo",
          "jessecaclark","jmandujano4","jperez460","kamryntaylor",
          "kchen132","kvu56","lalagos","malachifuqua","manroopkaur",
          "mayraarias","msuccari","omarkhalil","rbeattie","seanjimenez",
          "vchezhiyan","xcortes2")


language_table_students <- c("edmondcheng", "angeliachunyu")
teamsize  <- 7

# Set seed for reproducibility
set.seed("20260225")  
shuffled_win11  <- sample(windows11, replace = FALSE)
tables    <- c(rep(1:3,each=teamsize),rep(4:5,each=(teamsize+1)))
teams_win11     <- split(shuffled_win11, tables)


shuffled_macOS  <- sample(macOS, replace = FALSE)
tables    <- c(rep(6,(teamsize-1)),rep(7,(teamsize-2)),rep(8:9,each=teamsize),rep(10,(teamsize-1)))
teams_macOS   <- split(shuffled_macOS, tables)
teams_macOS$`7` <- c(teams_macOS$`7`,language_table_students)

teams <- lapply(c(teams_win11,teams_macOS),sort)

invisible(lapply(seq_along(teams), function(i) {
  cat("Team at Table", i, ":", teams[[i]], "\n")
}))
## Team at Table 1 : cbettencourt2 elliottwhitney jlegaspina lorenackerman mdesilva skodur tolaniyan 
## Team at Table 2 : cornelas3 craigelijaesoriano ggonzalez-ramirez ghendrickson jessiemorales lindaespinozamunoz trevoroh 
## Team at Table 3 : akwong25 dvargas38 ecastillo-quevedo jasminesamayoa joneal2 kchen129 sraman7 
## Team at Table 4 : aflores-hernandez bdelacruz-angeles bhernandezarteaga cchen271 efernando genaxiong gurindersahota jwong290 
## Team at Table 5 : davila-castaneda ekjotjohal fromerobojorquez jcontrerastrinidad leogarciaortiz lillieyang rbujji roderickma 
## Team at Table 6 : apatterson9 emendozagonzalez jangel15 kamryntaylor kvu56 rbeattie 
## Team at Table 7 : alizeibarra angeliachunyu edmondcheng eflores136 emoya8 msuccari omarkhalil 
## Team at Table 8 : alannahtanner aleroux asingh368 jardindo jperez460 kchen132 malachifuqua 
## Team at Table 9 : adimasagurto elmermartinez isidrohernandez jaisingh manroopkaur seanjimenez xcortes2 
## Team at Table 10 : ahmiyasalter jessecaclark jmandujano4 lalagos mayraarias vchezhiyan

Instructions for Completing and Submitting This Assignment

  1. Download and open today’s template notebook in RStudio
  2. Personalize the file by writing your name in the YAML header (replace “FirstName LastName”) — be sure to do this or you will lose points!
  3. Save with your name in RStudio and move to course directory: In RStudio select File ??? Save as..., find your course directory files and move and rename the file to include your name (e.g., FirstName_LastName_Quantiles_Demo.qmd)
  4. Render to HTML
  5. Follow instructions from the HTML rendered output by editing your personalized notebook.
  6. As you work the assignment, keep rendering and editing the file, asking for help from your team until you get all CORRECT for each problem. Two or more students may ask for help from the instructors.
  7. Render to HTML and submit to Catcourses. Turn in as much CORRECT work as you can by the end of class today. Submission by end of class qualifies you for credit.
  8. Resubmit your best work by midnight tonight for better grade or fully accepted work – only your latest and best work gets graded.

Assignment

Ranking Data

Demonstration 1: Add a Column of Ranks to the Puromycin Dataset

Down in the R Console, use the assignment operator (<-) and the rank() function in R to add a column of ranks of the concentration variable to a copy of the Puromycin data in a new column with the name conc_rank using the selection operator (assign to Puromycin$conc_rank. Copy this code that creates a new column into the code chunk below, in a line of the code chunk before the assignment of a copy of the modified Puromycin with extra column to answer.

Puromycin$conc_rank <- rank(Puromycin$conc)
answer <- Puromycin
print_and_check(answer, "7d8eb9495ea95ab0a32febac478cdb283aa6887c2d5552d042ff8d749c66b341")
##    conc rate     state conc_rank
## 1  0.02   76   treated       2.5
## 2  0.02   47   treated       2.5
## 3  0.06   97   treated       6.5
## 4  0.06  107   treated       6.5
## 5  0.11  123   treated      10.5
## 6  0.11  139   treated      10.5
## 7  0.22  159   treated      14.5
## 8  0.22  152   treated      14.5
## 9  0.56  191   treated      18.5
## 10 0.56  201   treated      18.5
## 11 1.10  207   treated      22.0
## 12 1.10  200   treated      22.0
## 13 0.02   67 untreated       2.5
## 14 0.02   51 untreated       2.5
## 15 0.06   84 untreated       6.5
## 16 0.06   86 untreated       6.5
## 17 0.11   98 untreated      10.5
## 18 0.11  115 untreated      10.5
## 19 0.22  131 untreated      14.5
## 20 0.22  124 untreated      14.5
## 21 0.56  144 untreated      18.5
## 22 0.56  158 untreated      18.5
## 23 1.10  160 untreated      22.0
## [VALUE]  CORRECT

Demonstration 2: Compute Ranks for Sunfish Pigmentation Data.

This demonstration is already completed for you; you only need to look at the code and its output.

pigment <- rep(c("no","faint","mod","heavy","solid"),c(13,68,44,21,8))
pigment_factor <- ordered(pigment,levels=c("no","faint","mod","heavy","solid"))
sunfish <- data.frame(pigment=pigment_factor,rank=rank(pigment_factor))
answer <- sunfish
print_and_check(answer,"1f42e3a9227dc5af4f0ad8d49e9590e2f0e2305c5a33f362afbe28cd35ffd9bc")
##     pigment  rank
## 1        no   7.0
## 2        no   7.0
## 3        no   7.0
## 4        no   7.0
## 5        no   7.0
## 6        no   7.0
## 7        no   7.0
## 8        no   7.0
## 9        no   7.0
## 10       no   7.0
## 11       no   7.0
## 12       no   7.0
## 13       no   7.0
## 14    faint  47.5
## 15    faint  47.5
## 16    faint  47.5
## 17    faint  47.5
## 18    faint  47.5
## 19    faint  47.5
## 20    faint  47.5
## 21    faint  47.5
## 22    faint  47.5
## 23    faint  47.5
## 24    faint  47.5
## 25    faint  47.5
## 26    faint  47.5
## 27    faint  47.5
## 28    faint  47.5
## 29    faint  47.5
## 30    faint  47.5
## 31    faint  47.5
## 32    faint  47.5
## 33    faint  47.5
## 34    faint  47.5
## 35    faint  47.5
## 36    faint  47.5
## 37    faint  47.5
## 38    faint  47.5
## 39    faint  47.5
## 40    faint  47.5
## 41    faint  47.5
## 42    faint  47.5
## 43    faint  47.5
## 44    faint  47.5
## 45    faint  47.5
## 46    faint  47.5
## 47    faint  47.5
## 48    faint  47.5
## 49    faint  47.5
## 50    faint  47.5
## 51    faint  47.5
## 52    faint  47.5
## 53    faint  47.5
## 54    faint  47.5
## 55    faint  47.5
## 56    faint  47.5
## 57    faint  47.5
## 58    faint  47.5
## 59    faint  47.5
## 60    faint  47.5
## 61    faint  47.5
## 62    faint  47.5
## 63    faint  47.5
## 64    faint  47.5
## 65    faint  47.5
## 66    faint  47.5
## 67    faint  47.5
## 68    faint  47.5
## 69    faint  47.5
## 70    faint  47.5
## 71    faint  47.5
## 72    faint  47.5
## 73    faint  47.5
## 74    faint  47.5
## 75    faint  47.5
## 76    faint  47.5
## 77    faint  47.5
## 78    faint  47.5
## 79    faint  47.5
## 80    faint  47.5
## 81    faint  47.5
## 82      mod 103.5
## 83      mod 103.5
## 84      mod 103.5
## 85      mod 103.5
## 86      mod 103.5
## 87      mod 103.5
## 88      mod 103.5
## 89      mod 103.5
## 90      mod 103.5
## 91      mod 103.5
## 92      mod 103.5
## 93      mod 103.5
## 94      mod 103.5
## 95      mod 103.5
## 96      mod 103.5
## 97      mod 103.5
## 98      mod 103.5
## 99      mod 103.5
## 100     mod 103.5
## 101     mod 103.5
## 102     mod 103.5
## 103     mod 103.5
## 104     mod 103.5
## 105     mod 103.5
## 106     mod 103.5
## 107     mod 103.5
## 108     mod 103.5
## 109     mod 103.5
## 110     mod 103.5
## 111     mod 103.5
## 112     mod 103.5
## 113     mod 103.5
## 114     mod 103.5
## 115     mod 103.5
## 116     mod 103.5
## 117     mod 103.5
## 118     mod 103.5
## 119     mod 103.5
## 120     mod 103.5
## 121     mod 103.5
## 122     mod 103.5
## 123     mod 103.5
## 124     mod 103.5
## 125     mod 103.5
## 126   heavy 136.0
## 127   heavy 136.0
## 128   heavy 136.0
## 129   heavy 136.0
## 130   heavy 136.0
## 131   heavy 136.0
## 132   heavy 136.0
## 133   heavy 136.0
## 134   heavy 136.0
## 135   heavy 136.0
## 136   heavy 136.0
## 137   heavy 136.0
## 138   heavy 136.0
## 139   heavy 136.0
## 140   heavy 136.0
## 141   heavy 136.0
## 142   heavy 136.0
## 143   heavy 136.0
## 144   heavy 136.0
## 145   heavy 136.0
## 146   heavy 136.0
## 147   solid 150.5
## 148   solid 150.5
## 149   solid 150.5
## 150   solid 150.5
## 151   solid 150.5
## 152   solid 150.5
## 153   solid 150.5
## 154   solid 150.5
## [VALUE]  CORRECT

Cutting Numerical Variables into Ordinal

Demonstration Cut Rates of Puromycin Data into Ordinal Categories

This demonstration is already completed for you; you only need to look at the code and its output.

Puromycin$rate_bin <- cut(Puromycin$rate,3,labels=c("low","medium","high"))
answer <- Puromycin
print_and_check(answer,"ca30b4a708f5a154e86e038213b03de4e38710ba01c2186fdb567a0609b80a8d")
##    conc rate     state rate_bin
## 1  0.02   76   treated      low
## 2  0.02   47   treated      low
## 3  0.06   97   treated      low
## 4  0.06  107   treated   medium
## 5  0.11  123   treated   medium
## 6  0.11  139   treated   medium
## 7  0.22  159   treated     high
## 8  0.22  152   treated   medium
## 9  0.56  191   treated     high
## 10 0.56  201   treated     high
## 11 1.10  207   treated     high
## 12 1.10  200   treated     high
## 13 0.02   67 untreated      low
## 14 0.02   51 untreated      low
## 15 0.06   84 untreated      low
## 16 0.06   86 untreated      low
## 17 0.11   98 untreated      low
## 18 0.11  115 untreated   medium
## 19 0.22  131 untreated   medium
## 20 0.22  124 untreated   medium
## 21 0.56  144 untreated   medium
## 22 0.56  158 untreated     high
## 23 1.10  160 untreated     high
## [VALUE]  INCORRECT

Computing Percentiles in R

Demonstration 3: Computing percentiles, and specifically the 6th percentile, of the Nile dataset

  1. In the Console, use the quantile() function on the Nile dataset to see its default output (the minimum, 1st quartile, median, 3rd quartile, and maximum). Then copy and paste this working R expression code from the Console into the code chunk below, replacing NULL as the argument to the quote() to test correctness.
answer <- quote(quantile(Nile))
print_and_check_expr(
  answer,
  value_key    = "70729c2977f73064eb5ffc9500de62bfae84ee3f842be2c09cc875665d712540",  
  required_fns = "quantile"
)
##     0%    25%    50%    75%   100% 
##  456.0  798.5  893.5 1032.5 1370.0 
## [VALUE]  CORRECT
## [CODE]   Uses quantile(): YES -- CORRECT
  1. In the Console, evaluate the expression 1:100/100 to see a numeric vector of proportions used to compute all 100 percentiles of the Nile data. Then copy and paste this working R expression code from the Console into the code chunk below, replacing NULL as the argument to the quote() to test its correctness.
answer <- quote(1:100/100)
print_and_check_expr(
  answer,
  value_key = "4847c34597c5f6608bb98bd504bce9b304664a0b5c0756aec4a465f5a6879e72"
)
##   [1] 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15
##  [16] 0.16 0.17 0.18 0.19 0.20 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 0.30
##  [31] 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39 0.40 0.41 0.42 0.43 0.44 0.45
##  [46] 0.46 0.47 0.48 0.49 0.50 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.60
##  [61] 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.70 0.71 0.72 0.73 0.74 0.75
##  [76] 0.76 0.77 0.78 0.79 0.80 0.81 0.82 0.83 0.84 0.85 0.86 0.87 0.88 0.89 0.90
##  [91] 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00
## [VALUE]  CORRECT
  1. In the Console, add Nile as the first argument and probs = 1:100/100 as the second argument of quantile() to compute all 100 percentiles of the Nile data. Then copy and paste this working R expression code from the Console into the code chunk below, replacing NULL as the argument to the quote() to test correctness.
print_and_check_expr(
  quote(quantile(Nile, 1: 100/100)),
  value_key    = "b1da29a7b765dc40c139585d5d2cbe03aea3f55224276c21c00b875c3b90b6f2", 
  required_fns = "quantile"
)
##      1%      2%      3%      4%      5%      6%      7%      8%      9%     10% 
##  647.07  675.46  691.52  693.92  697.80  700.82  701.93  713.04  717.64  725.20 
##     11%     12%     13%     14%     15%     16%     17%     18%     19%     20% 
##  738.46  741.76  743.74  744.00  745.70  748.52  757.30  763.10  767.24  770.40 
##     21%     22%     23%     24%     25%     26%     27%     28%     29%     30% 
##  773.37  779.46  792.55  796.76  798.50  800.48  809.03  812.72  814.42  819.20 
##     31%     32%     33%     34%     35%     36%     37%     38%     39%     40% 
##  821.69  823.36  828.69  831.66  832.65  836.20  839.26  843.10  845.00  845.00 
##     41%     42%     43%     44%     45%     46%     47%     48%     49%     50% 
##  845.59  847.16  854.84  861.12  863.10  864.54  869.77  874.00  882.16  893.50 
##     51%     52%     53%     54%     55%     56%     57%     58%     59%     60% 
##  898.96  903.40  908.82  913.84  916.90  918.44  920.72  928.04  937.05  941.60 
##     61%     62%     63%     64%     65%     66%     67%     68%     69%     70% 
##  949.46  958.76  961.11  965.16  971.10  978.06  984.66  988.56  994.31  999.50 
##     71%     72%     73%     74%     75%     76%     77%     78%     79%     80% 
## 1012.90 1020.00 1020.00 1022.60 1032.50 1040.00 1042.30 1050.00 1060.50 1100.00 
##     81%     82%     83%     84%     85%     86%     87%     88%     89%     90% 
## 1100.00 1101.80 1111.70 1120.00 1123.00 1140.00 1141.30 1151.20 1160.00 1160.00 
##     91%     92%     93%     94%     95%     96%     97%     98%     99%    100% 
## 1160.90 1170.80 1182.10 1210.00 1210.50 1220.40 1230.60 1250.20 1261.10 1370.00 
## [VALUE]  CORRECT
## [CODE]   Uses quantile(): YES -- CORRECT
  1. In the Console, use the up-arrow to recall the last expression computing percentiles of the Nile data. In R you can select specific values from a return value vector by using the indexed selection operator ([i]), where i is an integer, or vector of integers, greater than or equal to 1 and less than or equal to the length of the vector. Edit your previous command by adding [6] after the expression that evaluates to all percentiles of Nile in order to extract only the 6th element, representing the 6th percentile of the Nile data. After you get it to work, then copy and paste this working R expression code from the Console into the code chunk below, replacing NULL as the argument to the quote() to test correctness.
print_and_check_expr(
  quote(quantile(Nile, 1: 100/100)[6]),
  value_key    = "7e37b39f08125b5e4030df1d5d6cf610599a393ede2be464121f0e01b205c9f5",  
  required_fns = "quantile"
)
##     6% 
## 700.82 
## [VALUE]  CORRECT
## [CODE]   Uses quantile(): YES -- CORRECT

Computing and Labelling Percentiles on a Histogram

Demonstration 4: Computing and labelling percentiles for the bins of the default histogram of the Nile dataset

  1. To compute the proportion of the Nile data contained in bins of the default Nile histogram and then label the histogram with them, first compute the proportion of the Nile data sample contained in each successive bar (or absence of bar) of the histogram using its return value, and show that they sum to one:
return_value <- hist(Nile, plot = FALSE)
return_value$density * 100
##  [1] 0.01 0.00 0.05 0.20 0.25 0.19 0.12 0.11 0.06 0.01
print(paste0("Sum of values: ", sum(return_value$density * 100)))
## [1] "Sum of values: 1"

Next we use the cumsum() function to compute cumulative sums of the elements:

cumsum(return_value$density * 100)
##  [1] 0.01 0.01 0.06 0.26 0.51 0.70 0.82 0.93 0.99 1.00

Notice that the cumulative sum of the second element (which did not add any observations as none occurred between 500 and 600 units of Annual Flow) remained 1%.

Now lets turn these cumulative proportions into percentages and add a percentage sign to each of them:

paste0(cumsum(return_value$density * 100) * 100, "%")
##  [1] "1%"   "1%"   "6%"   "26%"  "51%"  "70%"  "82%"  "93%"  "99%"  "100%"

Copy the previous expression as the value of the labels optional argument to hist() to see the cumulative sample proportions labelled over the bars.

hist(Nile,
     main   = "Annual Flow of the River Nile (1871-1970)",
     xlab   = "Annual Flow (10^8 m^3)",
     ylim   = c(0, 30),
     labels = paste0(cumsum(return_value$density * 100) * 100, "%"))

Plotting Empirical Cumulative Distribution Functions

Demonstration 5: Plot the ECDF of the control arm of the cell cycle data

Pass the vector-valued expression c(34, 22, 12), representing the sample from the control arm of the cell cycle data, as the argument to ecdf() in the following expression:

plot(ecdf(c(34, 22, 12)),
     main = "ECDF of Cells/Dish, Control Arm of Cell Cycle Experiment")

Demonstration 6: Plot the ECDF of the Nile data

Modify the expression below to compute the ECDF of the Nile data.

plot(ecdf(c(Nile)), main = "ECDF of Nile Data")