Assignment 1_CAP3330

Question 1 (60 points):

1a) Write a statement in R that allows you to create a vector x with the numbers {10, 20, 30, 44, 55, 10, 30, 50, 32, 30, 46}.Then,write another statement that inserts the number 26 between the second to last and the last element in x, and assign the result of this insertion to x again.

x= c(10, 20, 30, 44, 55, 10, 30, 50, 32, 30, 46) # creating a vector 
x# displaying result

##  [1] 10 20 30 44 55 10 30 50 32 30 46

x = c(x[1:(length(x)-1)], 26, x[length(x)]) # creating a new vector using all numbers from the second to last element in x, inserting number 26, and appending the last number
x# displaying result

##  [1] 10 20 30 44 55 10 30 50 32 30 26 46

1b) Compute the median of x WITHOUT using the median() function available in R. Show your work.

x_sorted = sort(x)  # sort the vector 

n = length(x_sorted)  # number of elements in x
if (n %% 2 == 1) {
  median_value = x_sorted[(n + 1) %/% 2] # Median for odd number of elements 
} else {
  median_value = (x_sorted[n %/% 2] + x_sorted[(n %/% 2) + 1]) / 2 # Median for even number of elements 
}

median_value# displaying result

## [1] 30

# the median value = 30

Note: In case you need a reminder of how to compute the median, visit this webpage: https://en.wikipedia.org/wiki/Median

Note: This exercise is similar to the one we did in class where we computed the mean of a sample without using the mean() function.

1c) Create a data frame with the following four columns: x (from the previous question), x1, x2, and y. Name the data frame DFAssg1.

x1 and x2 are the result of generating 12 random numbers from a normal distribution with a mean of 5 and standard deviation of 1. The numbers in x1 and x2 must be rounded to 1 decimal place. Use the following code to create x1 and x2:

set.seed (1)
x1= rnorm(n= 12, mean= 5, sd= 1)
x1= round(x1, 1)

set.seed (2)
x2= rnorm(n= 12, mean= 5, sd= 1)
x2= round(x2, 1)

y is a character vector with four A’s, followed by four B’s, and then followed by four C’s. You MUST create the vector y by using the rep() function.

y <- rep(c('A', 'B', 'C'), each = 4) # creating a y vector using rep() function 
DFAssg1 <- data.frame(x = x, x1 = x1, x2 = x2, y = y) #Combining x, x1, and x2 and y into a data frame 
DFAssg1# displaying result

1d) Use the data frame DFAssg1 as a reference. Write only ONE LINE of code that returns the lowest value of x for each of the letters A, B, and C (i.e., what’s the lowest value of x when y equals A, what’s the lowest value of x when y equals B, and what’s the lowest value of x when y equals C).

aggregate(DFAssg1$x, by = list(DFAssg1$y), FUN = min) # using aggregate function to calculate the lowest of value of x for each letter or group. Based on the output the lowest value of A = 10, the lowest value of B = 10, and the lowest value of C = 26.

1e) Use the data frame DFAssg1 as a reference. Write only ONE LINE of code that returns the average of x1 and x2.

mean(c(DFAssg1$x1, DFAssg1$x2)) #calculation the mean of the combined vector of x1 and x2 from the DFAssg1 data frame.

## [1] 5.2875

1f) Use the data frame DFAssg1 as a reference. Write only ONE LINE of code that returns the values of x2 for records (rows) where the letter is B or C (i.e., return the values of x2 excluding the rows where the letter is A)

DFAssg1$x2[DFAssg1$y %in% c('B', 'C')] # filtering the values of B or C and excluding A from the output.

## [1] 4.9 5.1 5.7 4.8 7.0 4.9 5.4 6.0

1g) Add a new column to the data frame DFAssg1 with a logical vector indicating whether each value of x1 is greater than the average (i.e., the mean) of x1. Name this new column Is_high_value.

DFAssg1$Is_high_value <- DFAssg1$x1 > mean(DFAssg1$x1)  #calculating the mean of x1 then, creating a new column named Is_high_value where each value is TRUE if x1 is greater than the mean
DFAssg1# displaying result

1h) Display the duplicate elements in x (i.e., the values that repeat), without repeating each value more than once. For example, if the value 32 repeats four times, do not return 32 repeated four times. Only return 32 once.

duplicates <- unique(x[duplicated(x)])  # identifying the values in x using the duplicate function and then extracting unique values using the unique function
duplicates

## [1] 10 30

Hint: The functions duplicated () and unique() may be helpful. Hint: This question does not mention the requirement of only using one line, thus, feel free to use more than one statement if you need to.

1i) Write only ONE LINE of code that returns the count of how many elements in x are greater than 30.

sum(x > 30)  #using sum function to count # of elements in vector x that are greater than 30. 5 elements in x are greater than 30.

## [1] 5

Question 2 (15 points): Simulating the roll of a pair of dice 1000 times.

2a) One can simulate 1000 rolls of a fair die using the R function sample(6, 1000, replace=TRUE). Using this function twice, store 1000 simulated rolls of the first die in the variable die1 and 1000 simulated rolls of the second die in the variable die2.

die1 <- sample(1:6, 1000, replace = TRUE) #creating variable die1 and die 2 and using sample function to store results in each variable.
die2 <- sample(1:6, 1000, replace = TRUE)

2b) For each pair of rolls, compute the sum of rolls, and store the sums in the variable die.sum.

die.sum <- die1 + die2   #created variable die sum to compute the sum of each pair of rolls from die1 and die2
die.sum# displaying result

##    [1]  2  9  2  6  6 10  7  9  9  8 10  6  4  8 10 12 12  7  4  9  8  4  5  3
##   [25]  6  3  6  5 11 12 10  6  6  6 10 11  4 11 10  9  7 10  5  6  3  8  8  9
##   [49]  7  9 12 11 11  7 10 11  4  5  7  8  7  5 12  2  8  4  3  6 10 11  7  6
##   [73]  8  4  3  6  5  7  2  8 12  2  7  8  9  8  4  9  6  8 10  7  8  5  9  8
##   [97]  4  9  6  8  6  4 10  4  9  4  6  8  4 10  9 10  5  7  4  7 11  7  5  9
##  [121]  5  7  5  6  9  7  9  8  6  6  8  5  5  5  8 10  9  3 10  8  7  9  8 12
##  [145]  9  4  6  7  7  7  7 12  9  7  4 10  9  4 11  5 11 10  7  9  5  8  9  8
##  [169] 11  2  9 12  5 12  4 12 10 10 11  9  4  4  9  7  4  8 11  4  7 10  3  6
##  [193]  5  6  6 11  5  6  6  8  7  6  7  4  7 10  9  6  6  7  7  7  6  6 10  7
##  [217]  8  7  7  7  4 10  6  7  6  6 11  8  7 10 11  5  7  3  7  6 12  6  4  6
##  [241] 11  8 10  9  7  8 12  7  5  7  8 11  4  2  8  5 10  5  8  4  9  6  6 12
##  [265]  7  3  7  5 10  7  8  7  5 10  7 12  8  7  9  4 12  5  6  2  6  8  7  7
##  [289]  7  5  5  5  2  5  2  5  6  6  7  7  5  9  7  8 10 10  9  5  5  4  7  5
##  [313] 11 11  8  5  7  6  7  7  6  4  5  9  7  2  4  8  6  7 10  6  2  7  7  5
##  [337] 12  6  6  6  8  8 10 11  8 10  8  4  6  7  5  6  7  7  8  3  9  5 10 10
##  [361]  5 10 12  5  7  2  9  5  3  4  9  3  4  7  9  2  3  2 12  9 11  4  4  6
##  [385]  6  6  4  9  4 10  2  8  7  7  5  4  6 10  4  9  5  2  4  2  6  8 11 11
##  [409]  9  7 10  6  4  7  4  8  7  8  7  5 12  9  4  8  7 11  6  9  6 12  4  5
##  [433]  8  7  9 11  7 11  6  9  6  9  6  9  3  4  7  9  7  7  4  5  3  7  7 11
##  [457]  5 11  6  8  6 12  6  5  2  6  9 12  8  7  7  8 12  5  6  5  4  6  7  6
##  [481]  3  5  8  8  7 10  5  3 11  4  6  2  9  8 12  5  7  7 10  4 11  2  9  4
##  [505] 10  8 12  7  4  6  7  6  3 10  4  4  9  4  9  6  6  5  7 12  8 11  6  5
##  [529]  9  7 11  3  5  6  7  6  3  4  5  7  6  9 11  4  3  4 12  3  9 10  8  9
##  [553]  8 10  4  9  8  9 10  8 12  9  7  6  9  3 10  7  8  5  4  6  6  7  4  5
##  [577] 11  9  6  7  7  6  7  2  8  6  6  8  8  7 11  4 11  8  4  4  8  7  7  7
##  [601]  6  9  5  8  5  5  8  8  4  2  3  7  8 10  7  6  8  6 10  7  5  6  5 10
##  [625]  7  9  4  7  5  8  7 11 10  6  9 10 10  9  7  8  6 11 10  6  8  8  8 11
##  [649] 10  5  7  6 10  6  7  5  9 12  9  9 10  6  7 10  7 11  9  6  8  4 10  3
##  [673]  6  8  6  7  6  8  5  4  4  8  7  6  3  5  6  7  9  8  8  9  4 11  9 11
##  [697]  4  9  8  6  4  4 11 11 11  6  6  2  9  6  8  9  8 10  8 10  6  9  5  9
##  [721]  6 10  7  7  6  8  4  5  8 11  3  6  8  9  3 10  7  7  6  5  9 10  5  6
##  [745]  6  8 10  6  8  9  5  6  7 10  5  6  5 11  6  3  7  5  9  8  9  9  7  8
##  [769]  8  7 10  7 11  8  9  5  4 11  7  8 12  8  5  3  8  9 10 11  4 12  2  7
##  [793]  5  5  2  6  5  4  5  6 11  6  7  9  7  5  8 12  2 11  8 12  7  4  7  9
##  [817]  5  9 12  8  5  3 10  8  8  5  6 11  9  6  8 11 11  7  5  8  9 10  3 10
##  [841]  7  7  8  8  7  7  4  9  8  5  7 11  3  7  5  6  8 10  5  3  4 11  8  6
##  [865]  5  8  6  9  2  7  4  8  5  8  5  3 11 10 11  7  6 10  3 10  4  8 10  7
##  [889]  4  8  3  7  7  7 11  8  7  6  7  9  3 10 10  8  8  3  6  7  2  7 11  7
##  [913]  9  8  5  8  6  4  9  5  6  3  6  3  8  9 12  6 10  9  2 10  3  5  8  5
##  [937]  8  6  7  8  8  9  6  8 11  5 11  4  6  4 12  8  7  8  8  8 10  7  7  3
##  [961]  2  3  7 11  7  9  2  4  8  2 10  8  5  4 10  5  7  9  3  9 10  5  9  7
##  [985]  4  7  3 10  7  7  6 12  3  8 10  3  8  7  5  7

2c) Use the function table() to tabulate the values of the sum of die rolls (i.e., the values in die.sum from part b). How many times does the sum of rolls equal 1? How many times does the sum of rolls equal 2?

table_result <- table(die.sum)
 # counting how many times the sum equals 1 and 2

count_sum_1 <- table_result[1]  
count_sum_2 <- table_result[2]

count_sum_1  # number of times it equals 1

##  2 
## 33

count_sum_2  # number of times it equals 2

##  3 
## 48

#the sum of rolls = 1 two (2) times 
# the sum of rolls = 2, 33 times

Question 3 (25 points):

Use R to generate a sample of 20 values. These 20 values are going to be 20 random numbers from 1 to 100. Save the generated sample in a vector called “sample_of_20”

3a) Create a new vector called “standardized_values” that contains the standardized values of sample_of_20. The formula for standardizing a value is:

, where mean is the mean of the sample and sd is the standard deviation of the sample.

sample_of_20 <- sample(1:100, 20, replace = TRUE) # Generating a sample of 20 random values from 1 to 100

# Calculate the mean and the standard deviation of the sample. 
sample_mean <- mean(sample_of_20)
sample_sd <- sd(sample_of_20)

#standardize the values 

standardized_values <- (sample_of_20 - sample_mean) / sample_sd


sample_of_20  # created standardized values of vector

##  [1]  33  48   1   6  14   9  43  81  17 100  96  28  39  67  43  93  34  79  78
## [20]  99

standardized_values# displaying result

##  [1] -0.5185519 -0.0715244 -1.4722106 -1.3232014 -1.0847867 -1.2337959
##  [7] -0.2205336  0.9119361 -0.9953812  1.4781709  1.3589636 -0.6675611
## [13] -0.3397409  0.4947104 -0.2205336  1.2695581 -0.4887501  0.8523324
## [19]  0.8225306  1.4483691

3b) Compute the mean of standardized_values WITHOUT using the mean() function. Round to three decimal places.

#computing the mean of standardized values without using the mean function by summing up all the values in standardized values and diving by the total number of elements. Lastly, using the round() function to round the result to 3 decimal places. 
mean_standardized <- sum(standardized_values) / length(standardized_values)
mean_standardized <- round(mean_standardized, 3)
mean_standardized # displaying result

## [1] 0

3c) Compute the variance of standardized_values WITHOUT using the var() function. Round to three decimal places.

mean_standardized <- sum(standardized_values) / length(standardized_values) # calculating mean of standardized values 
squared_diff <- (standardized_values - mean_standardized)^2 # computing square difference between each valued and the mean and storing it in squared diff vector
variance_standardized <- sum(squared_diff) / (length(standardized_values) - 1) # calculating variance as the sum of squared differences divided by (n-1), n = standardized values 
variance_standardized <- round(variance_standardized, 3) # rounding to three decimal places 
variance_standardized # displaying result

## [1] 1

Assignment 1_CAP3330_Fall2023