Dukum_Dian (50 points)

Test Items starts from here - There are 5 sections - 50 points total

Read each question carefully and address each element. Do not output contents of vectors or data frames unless requested.

Section 1: (8 points) This problem deals with vector manipulations.

(1)(a) Create a vector that contains the following: * The integer sequence 1 to 5, inclusive, * The square root of 2, * The product of 17 and 14, and * Three (3) repetitions of the vector c(2.5, 5, 7.5).

Assign the vector to the name q1_vector and output. You will use q1_vector for the following four (4) questions.

q1_vector <- c(seq(1,5, by=1), sqrt(2), (17*4), rep(c(2.5, 5, 7.5), 3))
print(q1_vector )

##  [1]  1.000000  2.000000  3.000000  4.000000  5.000000  1.414214 68.000000
##  [8]  2.500000  5.000000  7.500000  2.500000  5.000000  7.500000  2.500000
## [15]  5.000000  7.500000

(1)(b) Remove all elements of q1_vector greater than 7. Assign this new vector, with only values equal to or less than 7, to the name q1_vector_7. What is the length of q1_vector_7?

q1_vector_7 <- q1_vector[q1_vector <= 7]
length(q1_vector_7)

## [1] 12

(1)(c) Sort q1_vector in ascending order and assign the sorted vector to the name q1_vector_sorted. What is the sum of the 5th through 10th elements of q1_vector_sorted, inclusive?

q1_vector_sorted <- sort(q1_vector, decreasing = FALSE)
sum(q1_vector_sorted[5:10])

## [1] 22

(1)(d) Square each element of q1_vector and assign the new, squared value vector to the name q1_vector_sqrd. How many elements of q1_vector_sqrd are greater than 25?

q1_vector_sqrd <- q1_vector^2
length(q1_vector_sqrd[q1_vector_sqrd > 25])

## [1] 4

(1)(e) Remove the first and last elements of q1_vector. Assign the two (2) removed elements to the name q1_vector_short. What is the product of the elements of q1_vector_short?

q1_vector_short <- c(head(q1_vector,1), tail(q1_vector, 1))
prod(q1_vector_short)

## [1] 7.5

Section 2: (10 points) The expression y = sin(x/2) - cos(x/2) is a trigonometric function.

(2)(a) Create a user-defined function - via function() - that implements the trigonometric function above, accepts numeric values, “x,” calculates and returns values “y.”

mytrig <- function(x){
  if(is.numeric(x)==TRUE){
    y <- sin(x/2) - cos(x/2)
  }
  else{
    y <- 'Please enter numeric value!'
  }
  
  return(y)
  
}

(2)(b) Create a vector, x, of 4001 equally-spaced values from -2 to 2, inclusive. Compute values for y using the vector x and your function from (2)(a). Do not output x or y. Find the value in the vector x that corresponds to the minimum value in the vector y. Restrict attention to only the values of x and y you have computed; i.e. do not interpolate. Round to 3 decimal places and output both the minimum y and corresponding x value.

Finding the two desired values can be accomplished in as few as two lines of code. Do not use packages or programs you may find on the internet or elsewhere. Do not output the other elements of the vectors x and y. Relevant coding methods are given in the Quick Start Guide for R.

x <- seq(-2,2, length.out=4001)
y <- mytrig(x)

sprintf("Minimum value of y is: %.3f, when x = %.3f", round(min(y),3) , round(x[which.min(y)] , 3))

## [1] "Minimum value of y is: -1.414, when x = -1.571"

(2)(c) Plot y versus x in color, with x on the horizontal axis. Show the location of the minimum value of y determined in 2(b). Show the values of x and y corresponding to the minimum value of y in the display. Add a title and other features such as text annotations. Text annotations may be added via text() for base R plots and geom_text() or geom_label() for ggplots.

y_coor <- round(max(y), 3)
x_coor <- round(x[which.max(y)], 3)
plot(x,y, type="l" , col="blue")
title(main = 'Y = sin(x/2) - cos(x/2)')
text(x_coor, y_coor, '(2.0, 0.301)', pos =2)
points(x_coor , y_coor , col="red", pch =19)

Section 3: (8 points) This problem requires finding the point of intersection of two functions. Using the function `y = cos(x / 2) * sin(x / 2)`, find where the curved line y = -(x/2)^3 intersects it within the range of values used in part (2) (i.e. 4001 equally-spaced values from -2 to 2). Plot both functions on the same display, and show the point of intersection. Present the coordinates of this point as text in the display.

my_y <- function(x){
  return(cos(x/2)*sin(x/2))
}
my_y2 <- function(x){
  return(-(x/2)^3)
}
y <- my_y(x)
y2 <- my_y2(x)

intersection_idx <- y == y2
intersection <- c(x[intersection_idx], y[intersection_idx], y2[intersection_idx])

plot(x,y, type = "l", col="blue")
lines(x,y2, col="red")
points(x[intersection_idx], y[intersection_idx], col="black", pch=19)
text(x[intersection_idx], y[intersection_idx], '(0,0)' , pos =2)
title(main = 'Intersection')
legend(x = -1, y = .05,
       legend = c("cos(x/2) * sin(x/2)", "-(x/2)^3 "),
       lty = c(1,1),
       col = c("blue", "red"),
       lwd = 1)

Section 4: (12 points) Use the “trees” dataset for the following items. This dataset has three variables (Girth, Height, Volume) on 31 felled black cherry trees.

(4)(a) Use data(trees) to load the dataset. Check and output the structure with str(). Use apply() to return the mean values for the three variables. Output these values. Using R and logicals, determine the number of trees with Volume greater than the mean Volume; effectively, how many rows have Volume greater than the mean Volume.

data(trees)
str(trees)

## 'data.frame':    31 obs. of  3 variables:
##  $ Girth : num  8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
##  $ Height: num  70 65 63 72 81 83 66 75 80 75 ...
##  $ Volume: num  10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...

apply(trees,2,mean)

##    Girth   Height   Volume 
## 13.24839 76.00000 30.17097

length(which(trees$Volume > mean(trees$Volume)))

## [1] 12

(4)(b) Girth is defined as the diameter of a tree taken at 4 feet 6 inches from the ground. Convert each diameter to a radius, r. Calculate the cross-sectional area of each tree using pi times the squared radius. What is the interquartile range (IQR) of areas?

radius <- trees$Girth/2
area <- pi*radius^2
IQR(sort(area))

## [1] 87.1949

(4)(c) Create a histogram of the areas calculated in (b). Title and label the axis.

classes = seq(10, 400, by = 35)
colors = c("green", "blue", "steelblue2", "violet", "pink", "purple", "gray")
hist(area, breaks = classes, main = "Area Distribution",
     xlab = "Area", col=colors)

(4)(d) Identify the tree with the largest area and output on one line its row number and three measurements.

df.area <- data.frame(radius, area)
trees[df.area$area == max(area), ]

##    Girth Height Volume
## 31  20.6     87     77

Section 5: (12 points) The Student’s t distribution is an example of a symmetric, bell-shaped distribution but with ‘heavier’ tails than a normal distribution. This problem involves comparing the two.

5(a) Use set.seed(9999) and rt() with n = 100, df = 10 to generate a random sample designated as y. Generate a second random sample designated as x with set.seed(123) and rnorm() using n = 100, mean = 0 and sd = 1.25.

Generate a new object using cbind(x, y). Do not output this object; instead, assign it to a new name. Pass this object to apply() and compute the inter-quartile range (IQR) for each column: x and y. Use the function IQR() for this purpose. Round the results to four decimal places and present (this exercise shows the similarity of the IQR values.).

For information about rt(), use help(rt) or ?rt(). Do not output x or y.

set.seed(9999)
y <- rt(n = 100, df = 10)

set.seed(123)
x <- rnorm(n=100, mean=0, sd=1.25)

dat <- cbind(x,y)
round(apply(dat, 2, IQR), 4)

##      x      y 
## 1.4821 1.5242

(5)(b) This item will illustrate the difference between a normal and heavy-tailed distribution. For base R plots, use par(mfrow = c(2, 2)) to generate a display with four diagrams; grid.arrange() for ggplots. On the first row, for the normal results, present a histogram and a horizontal boxplot for x in color. For the t results, present a histogram and a horizontal boxplot for y in color.

par(mfrow = c(2,2))
colors = c("steelblue", "orange", "red", "purple", "green", "yellow", "gray", "cyan", "magenta")

hist(x, breaks=7, main='Histogram of Normal Distribution', col=colors)
boxplot(x, col=3, main='Boxplot of Normal Distribution', xlab = 'x', horizontal = TRUE)

hist(y, breaks=7, main='Histogram of T Distribution', col=colors)
boxplot(x, col=3, main='Boxplot of T Distribution', xlab = 'x', horizontal = TRUE)

(5)(c) QQ plots are useful for detecting the presence of heavy-tailed distributions. Present side-by-side QQ plots, one for each sample, using qqnorm() and qqline(). Add color and titles. In base R plots, “cex” can be used to control the size of the plotted data points and text; “size” for ggplot2 figures. Lastly, determine if there are any extreme outliers in either sample.Remember extreme outliers are based on 3 multiplied by the IQR in the box plot. R uses a default value of 1.5 times the IQR to define outliers (not extreme) in both boxplot and boxplot stats.

par(mfrow=c(1,2))

qqnorm(x, col = 2, main = 'QQ Plot: Norm')
qqline(x, col=2)

qqnorm(y, col=3, main='QQ Plot: Exp')
qqline(y, col=3)

boxplot.stats(y, coef = 3.0)

## $stats
## [1] -3.348114108 -0.751765006 -0.005347442  0.806521129  3.295581470
## 
## $n
## [1] 100
## 
## $conf
## [1] -0.2515567  0.2408618
## 
## $out
## numeric(0)

boxplot.stats(x, coef = 3.0)

## $stats
## [1] -2.88646109 -0.62084664  0.07719539  0.86874760  2.73416624
## 
## $n
## [1] 100
## 
## $conf
## [1] -0.1581605  0.3125513
## 
## $out
## numeric(0)