Name: Jared Ali netID: 230005904 Collaborated with: no one :(
Your homework must be submitted in Word or PDF format, created by calling “Knit Word” or “Knit PDF” from RStudio on your R Markdown document. Submission in other formats may receive a grade of 0. Your responses must be supported by both textual explanations and the code you generate to produce your result. Note that all R code used to produce your results must be shown in your knitted file.
For each line of the following code, either explain why they should be erroneous, or explain what tasks the non-erroneous ones perform.
vector1 <- c(5, 12, TRUE, 32)
This first line describes a vector's arguments in order. This would need the boolean TRUE to be changed to a number in order for the code to perform.
max(vector1)
Because the boolean TRUE is included in the vector, we cannot find the maximum value of the vector without knowing the value of TRUE.
sort(vector1)
Likewise in order to sort a vector, this would return a new vector and we need boolean TRUE to be defined in terms of vector1 in order to do this command.
sum(vector1)
This final vector function returns single numbers and boolean TRUE would need to be a numeric in order for to use this command.For each block of the following code, either explain why they should be erroneous, or explain what tasks the non-erroneous ones perform.
vector2 <- c(5,"7",12)
vector2[2] + vector2[3]
7 needs to lose the "," in order for this block of code to perform or be defined as an integer.
dataframe3 <- data.frame(z1="5",z2=7,z3=12)
dataframe3[1,2] + dataframe3[1,3]
The first constituent of the dataframe "5", needs to lose the "," in order for the code to run.
list4 <- list(z1="6", z2=42, z3="49", z4=126)
list4[[2]]+list4[[4]]
list4[2]+list4[4]
Because the values in the list are not all of the same type, they cannot be added and it is erroneous.
seq(). Using the help
command ?seq to learn about the function, produce an
expression that will give you the sequence of numbers from 1 to 10000 in
increments of 369. Produce another that will give you a sequence between
1 and 10000 that is exactly 50 numbers in length (i.e., the first number
is 1 and the last number is 10000; and the differences between a pair of
consecutive numbers are the same).seq(1, 10000, +369)
## [1] 1 370 739 1108 1477 1846 2215 2584 2953 3322 3691 4060 4429 4798 5167
## [16] 5536 5905 6274 6643 7012 7381 7750 8119 8488 8857 9226 9595 9964
seq(1, 10000, +200)
## [1] 1 201 401 601 801 1001 1201 1401 1601 1801 2001 2201 2401 2601 2801
## [16] 3001 3201 3401 3601 3801 4001 4201 4401 4601 4801 5001 5201 5401 5601 5801
## [31] 6001 6201 6401 6601 6801 7001 7201 7401 7601 7801 8001 8201 8401 8601 8801
## [46] 9001 9201 9401 9601 9801
rep() repeats a vector some number of
times. Explain the difference between `rep(1:3, times=3) and rep(1:3,
each=3).rep1:3, times=3 would be as follows: 1 2 3 1 2 3 1 2 3 (The vector’s individual components are repeated)
and rep1:3, each=3 would be: 1 1 1 2 2 2 3 3 3 (The vector’s components are repeated according to their order first)
The binomial distribution \(\mathrm{Bin}(m,p)\) is defined by the number of successes in \(m\) independent trials, each have probability \(p\) of success. Think of flipping a coin \(m\) times, where the coin is weighted to have probability \(p\) of landing on heads.
The R function rbinom() generates random variables with
a binomial distribution. E.g.,
rbinom(n=20, size=10, prob=0.5)
produces 20 observations from \(\mathrm{Bin}(10,0.5)\).
The following generates 300 binomials composed of 15 trials each with
varying probability of success: 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, and 0.8,
storing the results in vectors called bin.draws.0.2,
bin.draws.0.3, bin.draws.0.4.,
bin.draws.0.5., bin.draws.0.6,
bin.draws.0.7 and bin.draws.0.8. The means
ares stored in the vector bin.draws.means.
set.seed(01202023) #for randomization; do not change
bins.draws.0.2 <- rbinom(300, size = 15, prob = 0.2)
bins.draws.0.3 <- rbinom(300, size = 15, prob = 0.3)
bins.draws.0.4 <- rbinom(300, size = 15, prob = 0.4)
bins.draws.0.5 <- rbinom(300, size = 15, prob = 0.5)
bins.draws.0.6 <- rbinom(300, size = 15, prob = 0.6)
bins.draws.0.7 <- rbinom(300, size = 15, prob = 0.7)
bins.draws.0.8 <- rbinom(300, size = 15, prob = 0.8)
bin.draws.means <- c(
mean(bins.draws.0.2),
mean(bins.draws.0.3),
mean(bins.draws.0.4),
mean(bins.draws.0.5),
mean(bins.draws.0.6),
mean(bins.draws.0.7),
mean(bins.draws.0.8)
)
bin.matrix, whose columns contain the 7 vectors we’ve
created, in order of the success probabilities of their underlying
binomial distributions (0.2 through 0.8). Hint: use
cbind().bin.matrix <- matrix((bins.draws.0.2, bins.draws.0.3, bins.draws.0.4, bins.draws.0.5, bins.draws.0.6, bins.draws.0.7, bins.draws.0.8))
cbind(bin.matrix)
## Error: <text>:1:37: unexpected ','
## 1: bin.matrix <- matrix((bins.draws.0.2,
## ^
b.Print the first five rows of bin.matrix. Print the
element in the 66th row and 5th column. Compute the largest element in
first column. Compute the largest element in all but the first
column.
bin.matrix <- matrix(bin.matrix, nrow = 66, ncol = 5)
print(bin.matrix)
max.col(bin.matrix, ties.method = c("specific", first", "largest"))
max.col(bin.matrix, ties.method = c("specific", largest", "all except the first"))
## Error: <text>:3:54: unexpected string constant
## 2: print(bin.matrix)
## 3: max.col(bin.matrix, ties.method = c("specific", first", "
## ^
bin.matrix by using just
a single function call.mean(col(bin.matrix)) # mean
## Error in eval(expr, envir, enclos): object 'bin.matrix' not found
bin.draws.means, in two ways. First, using ==,
and second, using identical(). What do the two ways report?
Are the results compatible? Explain.mean(bin.draws.means) == mean(bin.matrix)
## Error in eval(expr, envir, enclos): object 'bin.matrix' not found
identical(mean(bin.draws.means), mean(bin.matrix))
## Error in eval(expr, envir, enclos): object 'bin.matrix' not found
identical(mean(bin.draws.means), mean(bin.matrix))
## Error in eval(expr, envir, enclos): object 'bin.matrix' not found
It appears these results are compatible. Both == and identical() say
that the means of bin.draws.means and bin.matrix are FALSE. e. Take the
transpose of bin.matrix and then take row means. Are these
the same as what you just computed? Should they be?
t(bin.matrix)
## Error in eval(expr, envir, enclos): object 'bin.matrix' not found
(rowMeans(bin.matrix))
## Error in eval(expr, envir, enclos): object 'bin.matrix' not found
These are not the same as what I computed. They don’t necessarily have to be.
bin.matrix into a list using
as.list() and save the result as bin.list.
Find the number of bytes used to store bin.matrix and
bin.list. How many megabytes (MB) is this, for each object?
Which object requires more memory, and why do you think this is the
case? Remind yourself: why are lists special compared to vectors, and is
this property important for the current purpose (storing the binomial
draws)? Hint: look at the help page for object.size to see
how to change the units to MB.bin.list <- as.list(bin.matrix)
## Error in eval(expr, envir, enclos): object 'bin.matrix' not found
object.size(bin.list) + object.size(bin.matrix)
## Error in eval(expr, envir, enclos): object 'bin.list' not found
The bin.list (0.021168 megabytes) takes up more memory than the bin.matrix (0.001536 megabytes). The list requires more memory than the matrix because it has a non-continuous memory meaning it allocates extra memory to store pointers before and after an element.
R’s capacity for data storage and computation is very large compared
to what was available 10 years ago. The following code generate 5
million numbers from \(\mathrm{Bin}(1 \times
10^6, 0.5)\) distribution and store them in a vector called
big.bin.draws.
big.bin.draws <- rbinom(n = 5e6, size = 1e6, prob = 0.5)
big.bin.draws.standardized,
which is given by taking big.bin.draws, subtracting off its
mean, and then dividing by its standard deviation. Calculate the mean
and standard deviation of big.bin.draws.standardized.
(These should be 0 and 1, respectively, or very close to it; if not,
you’ve made a mistake somewhere).big.bin.draws.standardized <- ((big.bin.draws)-mean(big.bin.draws)/sd(big.bin.draws))
mean(big.bin.draws) # mean
## [1] 499999.7
sd(big.bin.draws) # sd
## [1] 500.1553
big.bin.draws into a list using
as.list() and save the result as
big.bin.draws.list. Check that you indeed have a list by
calling class() on the result. Check also that your list
has the right length, and that its 1159th element is equal to that of
big.bin.draws.big.bin.draws.list <- as.list(big.bin.draws)
class(big.bin.draws)
## [1] "integer"
length(big.bin.draws)
## [1] 5000000
big.bin.draws.list[1159]
## [[1]]
## [1] 499202
as.list(big.bin.draws[1159])
## [[1]]
## [1] 499202
big.bin.draws.list. Note that lapply() applies
the function supplied in the second argument to every element of the
list supplied in the first argument, and then returns a list of the
function outputs. (We’ll learn much more about the apply()
family of functions later in the course.) Did this lapply()
command take longer to evaluate than the code you wrote in part a? (It
should have; otherwise your previous code could have been improved, so
go back and improve it.) Why do you think this is the case?big.bin.draws.mean = mean(big.bin.draws)
big.bin.draws.sd = sd(big.bin.draws)
standardize = function(x) {
return((x - big.bin.draws.mean) / big.bin.draws.sd)
}
big.bin.draws.list.standardized.slow = lapply(big.bin.draws.list, standardize)
The lapply() command did take longer to evaluate. I think this is the case because this command applies a function to elements of a list or vector and that routine likely explains the longer evaluation time.
big.bin.draws
and big.bin.draws.list. How many megabytes (MB) is this,
for each object? Which object requires more memory, and why do you think
this is the case? Discuss any additional observations compared to part f
of the previous question.object.size(big.bin.draws)
## 20000048 bytes
object.size(big.bin.draws.list)
## 320000048 bytes
big.bin.draws has 20.000048 MB and big.bin.draws.list has 320.000048 MB. The list object requires more memory because it is not an integer like big.bin.draws is. The list has to allocate memory to each particular element that it entails, before and after each element.