Read each question carefully and address each element. Do not output contents of vectors or data frames unless requested.
v <- c(seq(0, 4), 13, rep(c(2, -5.1, -23), times=3), 7/42 + 3 + 35/42)
v
## [1] 0.0 1.0 2.0 3.0 4.0 13.0 2.0 -5.1 -23.0 2.0 -5.1 -23.0
## [13] 2.0 -5.1 -23.0 4.0
v.ascending <- sort(v)
v.ascending
## [1] -23.0 -23.0 -23.0 -5.1 -5.1 -5.1 0.0 1.0 2.0 2.0 2.0 2.0
## [13] 3.0 4.0 4.0 13.0
L <- length(v.ascending)
L
## [1] 16
v.add_seq <- seq(L, 1, -1) + v.ascending
v.add_seq
## [1] -7.0 -8.0 -9.0 7.9 6.9 5.9 10.0 10.0 10.0 9.0 8.0 7.0 7.0 7.0 6.0
## [16] 14.0
v2 <- v.add_seq[c(1, length(v.add_seq))]
v2
## [1] -7 14
v3 <- v.add_seq[2:(length(v.add_seq)-1)]
v3
## [1] -8.0 -9.0 7.9 6.9 5.9 10.0 10.0 10.0 9.0 8.0 7.0 7.0 7.0 6.0
v.1d <- c(v2[1], v3, tail(v2, n=1))
round(sum(v.1d), 2)
## [1] 84.7
f1 <- function(x){
return(sin(x/2) + cos(x/2))
}
Create a vector, x, of 4001 equally-spaced values from -2 to 2, inclusive. Compute values for y using the vector x and your function from 2a. Do not output x or y. Find the value in the vector x that corresponds to the maximum value in the vector y. Restrict attention to only the values of x and y you have computed; i.e. do not interpolate. Round to 3 decimal places and output both the maximum y and corresponding x value.
Finding the two desired values can be accomplished in as few as two lines of code. Do not use packages or programs you may find on the internet or elsewhere. Do not output the other elements of the vectors x and y. Relevant coding methods are given in the Quick Start Guide for R.
x <- seq(-2, 2, 4/4000)
y <- f1(x)
y_m <- max(y)
x_y_m <- x[which.max(y)]
sprintf("Max value of y is: %.3f, when x=%.3f", y_m, x_y_m)
## [1] "Max value of y is: 1.414, when x=1.571"
Plot y versus x in color, with x on the horizontal axis. Show the location of the maximum value of y determined in 2(b). Show the values of x and y corresponding to the maximum value of y in the display. Add a title and other features such as text annotations. Text annotations may be added via text() for base R plots and geom_text() or geom_label() for ggplots.
plot(x, y, type="l", col="steelblue", main=expression(y == sin(x/2) + cos(x/2)),
xlab="x", ylab="y",
ylim=c(min(y)-.1, max(y)+.1), bty="n", lwd=1.2)
axis(side=2, at=c(-.3, seq(0, 1.5, .5)))
abline(h=y_m, col="red", lty=2, lwd=1)
abline(v=x_y_m, col="red", lty=2, lwd=1)
points(x_y_m, y_m, col="black", cex=1.2)
text(x_y_m, y_m, "x=1.571, y=1.414",
cex=1.1, pos=3,col="orange")
#install.packages("tidyverse")
require("ggplot2")
## Loading required package: ggplot2
df <- data.frame(x, y)
g <- ggplot(df, aes(x, y)) + geom_line(color='steelblue', size=.8) +
geom_point(aes(x_y_m, y_m), color="purple") +
geom_text(aes(x_y_m, y_m, label="x=1.571\ny=1.414"), color="orange",) +
ggtitle("y = sin(x/2) + cos(x/2)") +
theme(
panel.background = element_blank(),
plot.title = element_text(family = "Helvetica", face="bold", color="steelblue", size = (15), hjust=.5),
axis.title = element_text(family = "Helvetica", size = (15), color="steelblue4"),
axis.text = element_text(family = "Courier", color="cornflowerblue", size = (15))
)
g
This problem requires finding the point of intersection of two functions. Using the function y = cos(x/2)*sin(x/2), find where the curved line y = -(x/2)^3 intersects it within the range of values used in part (2) (i.e. 4001 equally-spaced values from -2 to 2). Plot both functions on the same display, and show the point of intersection. Present the coordinates of this point as text in the display.
# Define two functions asked
f3a <- function(x){
return(cos(x/2)*sin(x/2))
}
f3b <- function(x){
return(-(x/2)^3)
}
# calc results for both functions
y3a <- f3a(x)
y3b <- f3b(x)
# Find the intersection
intersection_idx <- y3a == y3b
c(x[intersection_idx], y3a[intersection_idx], y3b[intersection_idx])
## [1] 0 0 0
df3a <- data.frame(x, y3a)
df3b <- data.frame(x, y3b)
col.name = c("x", "val")
colnames(df3a) <- col.name
colnames(df3b) <- col.name
g <- ggplot() +
geom_line(data=df3b, aes(x=x, y=val, col='-(x/2)^3'), size=.8) +
geom_line(data=df3a, aes(x=x, y=val, col='cos(x/2)*sin(x/2)'), size=.8) +
geom_point(aes(x[intersection_idx], y3a[intersection_idx]), color="purple") +
geom_text(aes(x[intersection_idx], y3a[intersection_idx], label="x=0\ny=0"), color="orange",) +
ggtitle(paste(expression(-(x/2)^3), " vs ", expression(cos(x/2)*sin(x/2)), sep="")) +
theme(
panel.background = element_blank(),
plot.title = element_text(family = "Helvetica", face="bold", color="steelblue", size = (15), hjust=.5),
axis.title = element_text(family = "Helvetica", size = (15), color="steelblue4"),
axis.text = element_text(family = "Courier", color="cornflowerblue", size = (15))
)
g
Use data(trees) to load the dataset. Check and output the structure with str(). Use apply() to return the median values for the three variables. Output these values. Using R and logicals, output the row number and the three measurements - Girth, Height and Volume - of any trees with Girth equal to median Girth. It is possible to accomplish this last request with one line of code.
data(trees)
str(trees)
## 'data.frame': 31 obs. of 3 variables:
## $ Girth : num 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
## $ Height: num 70 65 63 72 81 83 66 75 80 75 ...
## $ Volume: num 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...
apply(trees, 2, median)
## Girth Height Volume
## 12.9 76.0 24.2
trees[trees$Girth == median(trees$Girth),]
## Girth Height Volume
## 16 12.9 74 22.2
## 17 12.9 85 33.8
r <- trees$Girth/2
f.area <- function(r){
return(pi*r^2)
}
area <- f.area(r)
stem(r)
##
## The decimal point is at the |
##
## 4 | 234
## 5 | 34455667779
## 6 | 055799
## 7 | 013
## 8 | 0278
## 9 | 000
## 10 | 3
classes = seq(3.5, 10.5, by = 1)
colors = c("yellow", "orange", "steelblue2", "violet", "pink", "cyan", "green")
hist(r, breaks = classes, main = "Histogram Showing Tree Radius Distribution",
xlab = "Radius", col=colors)
df.area <- data.frame(r, area)
g <- ggplot(df.area, aes(x=r, y=area)) +
geom_point(color="purple") +
ggtitle("Cross-Sectional Area of Each Tree") +
xlab("Radius") +
ylab("Cross-Sectional Area") +
theme(
panel.background = element_blank(),
plot.title = element_text(family = "Helvetica", face="bold", color="steelblue", size = (15), hjust=.5),
axis.title = element_text(family = "Helvetica", size = (15), color="steelblue4"),
axis.text = element_text(family = "Courier", color="cornflowerblue", size = (15))
)
g
boxplot(area, col = "steelblue", range = 1.5, main = "Boxplot of Cross-Sectional Area",
xlab = "Cross-Sectional Area", notch = TRUE, horizontal=TRUE, frame=F)
boxplot.stats(area, coef=3)
## $stats
## [1] 54.10608 95.90104 130.69811 183.09595 333.29156
##
## $n
## [1] 31
##
## $conf
## [1] 105.9543 155.4420
##
## $out
## numeric(0)
trees[df.area$area == max(area), ]
## Girth Height Volume
## 31 20.6 87 77
Use set.seed(124) and rexp() with n = 100, rate = 5.5 to generate a random sample designated as y.
set.seed(124)
y <- rexp(n = 100, rate = 5.5)
Generate a second random sample designated as x with set.seed(127) and rnorm() using n = 100, mean = 0 and sd = 0.15.
set.seed(127)
x <- rnorm(n=100, mean=0, sd=.15)
Generate a new object using cbind(x, y). Do not output this object; instead, assign it to a new name.
obj.5a <- cbind(x, y)
Pass this object to apply() and compute the inter-quartile range (IQR) for each column: x and y. Use the function IQR() for this purpose. Round the results to four decimal places and present (this exercise shows the similarity of the IQR values.).
round(apply(obj.5a, 2, IQR), 4)
## x y
## 0.2041 0.2164
This item will illustrate the difference between a right-skewed distribution and a symmetric one. For base R plots, use par(mfrow = c(2, 2)) to generate a display with four diagrams; grid.arrange() for ggplots. On the first row, for the normal results, present a histogram and a horizontal boxplot for x in color. For the exponential results, present a histogram and a horizontal boxplot for y in color.
par(mfrow = c(2, 2))
colors = c("steelblue", "orange", "steelblue1", "violet", "pink", "cyan", "green", "grey", "steelblue2")
hist(x, breaks=15, main = "Histgram of Normal Distribution", col=colors)
boxplot(x, col = "steelblue", range = 1.5, main = "Boxplot of Normal Distribution",
notch = TRUE, horizontal=TRUE, frame=F)
hist(y, breaks=15, main = "Histgram of Exponential Distribution", col=colors)
boxplot(y, col = "steelblue", range = 1.5, main = "Boxplot of Exponential Distribution",
notch = TRUE, horizontal=TRUE, frame=F)
QQ plots are useful for detecting the presence of heavy-tailed distributions. Present side-by-side QQ plots, one for each sample, using qqnorm() and qqline(). Add color and titles. In base R plots, “cex” can be used to control the size of the plotted data points and text.
par(mfrow = c(1, 2), pty="s", cex=.9)
qqnorm(x, pch=2, col="violet", frame=FALSE)
qqline(x, col = "steelblue", lwd = 2)
qqnorm(y, pch=5, col="violet", frame=FALSE, main = "Exponential Q-Q Plot")
qqline(y, col = "steelblue", lwd = 2)
Lastly, determine if there are any extreme outliers in either sample. Remember extreme outliers are based on 3.0IQR in the box plot. R uses a default value of 1.5IQR to define outliers (not extreme) in both boxplot and boxplot stats.
boxplot.stats(x, coef=3)
## $stats
## [1] -0.2976325808 -0.1007240230 0.0003706968 0.1088532648 0.4310592908
##
## $n
## [1] 100
##
## $conf
## [1] -0.03274251 0.03348391
##
## $out
## numeric(0)
boxplot.stats(y, coef=3)
## $stats
## [1] 0.003880211 0.053278194 0.152793270 0.271774062 0.667719381
##
## $n
## [1] 100
##
## $conf
## [1] 0.1182709 0.1873156
##
## $out
## [1] 1.448679