Read each question carefully and address each element. Do not output contents of vectors or data frames unless requested.
(1)(a) Create a vector that contains the following, in this order, and output the final, resulting vector. Do not round any values, unless requested. * A sequence of integers from 0 to 4, inclusive. * The number 13 * Three repetitions of the vector c(2, -5.1, -23). * The arithmetic sum of 7/42, 3 and 35/42
x<-c((0:4),13,rep(c(2,-5.1,-23),3),c(7/42+3+35/42))
x
## [1] 0.0 1.0 2.0 3.0 4.0 13.0 2.0 -5.1 -23.0 2.0 -5.1
## [12] -23.0 2.0 -5.1 -23.0 4.0
(1)(b) Sort the vector created in (1)(a) in ascending order. Output this result. Determine the length of the resulting vector and assign to “L”. Output L. Generate a descending sequence starting with L and ending with 1. Add this descending sequence arithmetically the sorted vector. This is vector addition, not vector combination. Output the contents. Do not round any values.
sort(x)
## [1] -23.0 -23.0 -23.0 -5.1 -5.1 -5.1 0.0 1.0 2.0 2.0 2.0
## [12] 2.0 3.0 4.0 4.0 13.0
L<-length(x)
L
## [1] 16
y<-seq(16,1,by=-1)
y
## [1] 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
z<-c(sort(x)+y)
z
## [1] -7.0 -8.0 -9.0 7.9 6.9 5.9 10.0 10.0 10.0 9.0 8.0 7.0 7.0 7.0
## [15] 6.0 14.0
(1)(c) Extract the first and last elements of the vector you have created in (1)(b) to form another vector with the extracted elements. Form a third vector from the elements not extracted. Output these vectors.
z
## [1] -7.0 -8.0 -9.0 7.9 6.9 5.9 10.0 10.0 10.0 9.0 8.0 7.0 7.0 7.0
## [15] 6.0 14.0
z[1]
## [1] -7
z[16]
## [1] 14
m<-c(z[1],z[16])
m
## [1] -7 14
n<-c(z[2:15])
n
## [1] -8.0 -9.0 7.9 6.9 5.9 10.0 10.0 10.0 9.0 8.0 7.0 7.0 7.0 6.0
(1)(d) Use the vectors from (c) to reconstruct the vector in (b). Output this vector. Sum the elements and round to two decimal places.
p<-c(m[1],n,m[2])
p
## [1] -7.0 -8.0 -9.0 7.9 6.9 5.9 10.0 10.0 10.0 9.0 8.0 7.0 7.0 7.0
## [15] 6.0 14.0
round(sum(p),digits=2)
## [1] 84.7
(2)(a) Create a user-defined function - via function() - that implements the trigonometric function above, accepts numeric values, “x,” calculates and returns values “y.”
solvey<-function(x){
y<-sin(x/2)+cos(x/2)
return(y)
}
(2)(b) Create a vector, x, of 4001 equally-spaced values from -2 to 2, inclusive. Compute values for y using the vector x and your function from (2)(a). Do not output x or y. Find the value in the vector x that corresponds to the maximum value in the vector y. Restrict attention to only the values of x and y you have computed; i.e. do not interpolate. Round to 3 decimal places and output both the minimum y and corresponding x value.
Finding the two desired values can be accomplished in as few as two lines of code. Do not use packages or programs you may find on the internet or elsewhere. Do not output the other elements of the vectors x and y. Use coding methods shown in the Quick Start Guide for R.
x<-seq(-2,2,length.out = 4001)
y<-solvey(x)
max_y<-max(y)
max(y)
## [1] 1.414214
which.max(y)
## [1] 3572
x2<-x[3572]
x2
## [1] 1.571
round(max(y), digits=3)
## [1] 1.414
(2)(c) Plot y versus x in color, with x on the horizontal axis. Show the location of the minimum value of y determined in 2(b). Show the values of x and y corresponding to the minimum value of y in the display. Add a title and other features such as text annotations. Text annotations may be added via text() for base R plots and geom_tex() or geom_label() for ggplots.
plot(x,y,col="blue",main="Basic Plot")
text(1.5,1.2,"(1.571,1.414)")
function1<-function(x){
y<-cos(x/2)*sin(x/2)
return(y)
}
function2<-function(x){
y<--(x/2)**3
return(y)
}
x<-seq(-2,2,length.out = 4001)
y<-function1(x)
z<-function2(x)
plot(x,y,col="blue")
plot(x,z,col="orange")
f<-function(x) cos(x/2)*sin(x/2)+(x/2)**3
intrsect_x<-uniroot(f,c(-2,2))
intrsect_x
## $root
## [1] 0
##
## $f.root
## [1] 0
##
## $iter
## [1] 1
##
## $init.it
## [1] NA
##
## $estim.prec
## [1] 2
text(0,-0.1,"the lines intersect at 0,0")
(4)(a) Use data(trees) to load the dataset. Check and output the structure with str(). Use apply() to return the median values for the three variables. Output these values. Using R and logicals, output the row number and the three measurements - Girth, Height and Volume - of any trees with Girth equal to median Girth. It is possible to accomplish this last request with one line of code.
library(tidyverse)
## -- Attaching packages ----------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.0.0 v purrr 0.2.5
## v tibble 1.4.2 v dplyr 0.7.6
## v tidyr 0.8.1 v stringr 1.3.1
## v readr 1.1.1 v forcats 0.3.0
## -- Conflicts -------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
data(trees)
str(trees)
## 'data.frame': 31 obs. of 3 variables:
## $ Girth : num 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
## $ Height: num 70 65 63 72 81 83 66 75 80 75 ...
## $ Volume: num 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...
trees
## Girth Height Volume
## 1 8.3 70 10.3
## 2 8.6 65 10.3
## 3 8.8 63 10.2
## 4 10.5 72 16.4
## 5 10.7 81 18.8
## 6 10.8 83 19.7
## 7 11.0 66 15.6
## 8 11.0 75 18.2
## 9 11.1 80 22.6
## 10 11.2 75 19.9
## 11 11.3 79 24.2
## 12 11.4 76 21.0
## 13 11.4 76 21.4
## 14 11.7 69 21.3
## 15 12.0 75 19.1
## 16 12.9 74 22.2
## 17 12.9 85 33.8
## 18 13.3 86 27.4
## 19 13.7 71 25.7
## 20 13.8 64 24.9
## 21 14.0 78 34.5
## 22 14.2 80 31.7
## 23 14.5 74 36.3
## 24 16.0 72 38.3
## 25 16.3 77 42.6
## 26 17.3 81 55.4
## 27 17.5 82 55.7
## 28 17.9 80 58.3
## 29 18.0 80 51.5
## 30 18.0 80 51.0
## 31 20.6 87 77.0
apply(trees,2,median)
## Girth Height Volume
## 12.9 76.0 24.2
trees
## Girth Height Volume
## 1 8.3 70 10.3
## 2 8.6 65 10.3
## 3 8.8 63 10.2
## 4 10.5 72 16.4
## 5 10.7 81 18.8
## 6 10.8 83 19.7
## 7 11.0 66 15.6
## 8 11.0 75 18.2
## 9 11.1 80 22.6
## 10 11.2 75 19.9
## 11 11.3 79 24.2
## 12 11.4 76 21.0
## 13 11.4 76 21.4
## 14 11.7 69 21.3
## 15 12.0 75 19.1
## 16 12.9 74 22.2
## 17 12.9 85 33.8
## 18 13.3 86 27.4
## 19 13.7 71 25.7
## 20 13.8 64 24.9
## 21 14.0 78 34.5
## 22 14.2 80 31.7
## 23 14.5 74 36.3
## 24 16.0 72 38.3
## 25 16.3 77 42.6
## 26 17.3 81 55.4
## 27 17.5 82 55.7
## 28 17.9 80 58.3
## 29 18.0 80 51.5
## 30 18.0 80 51.0
## 31 20.6 87 77.0
slice(trees,16:17)
## Girth Height Volume
## 1 12.9 74 22.2
## 2 12.9 85 33.8
(4)(b) Girth is defined as the diameter of a tree taken at 4 feet 6 inches from the ground. Convert each diameter to a radius, r. Calculate the cross-sectional area of each tree using pi times the squared radius. Present a stem-and-leaf plot of the radii, and a histogram of the radii in color. Plot Area (y-axis) versus Radius (x-axis) in color showing the individual data points. Label appropriately.
data(trees)
summary(trees)
## Girth Height Volume
## Min. : 8.30 Min. :63 Min. :10.20
## 1st Qu.:11.05 1st Qu.:72 1st Qu.:19.40
## Median :12.90 Median :76 Median :24.20
## Mean :13.25 Mean :76 Mean :30.17
## 3rd Qu.:15.25 3rd Qu.:80 3rd Qu.:37.30
## Max. :20.60 Max. :87 Max. :77.00
str(trees)
## 'data.frame': 31 obs. of 3 variables:
## $ Girth : num 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
## $ Height: num 70 65 63 72 81 83 66 75 80 75 ...
## $ Volume: num 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...
radius<-(trees$Girth/2)
radius
## [1] 4.15 4.30 4.40 5.25 5.35 5.40 5.50 5.50 5.55 5.60 5.65
## [12] 5.70 5.70 5.85 6.00 6.45 6.45 6.65 6.85 6.90 7.00 7.10
## [23] 7.25 8.00 8.15 8.65 8.75 8.95 9.00 9.00 10.30
solvea<-function(x){
a<-(pi*(x^2))
return(a)
}
x<-radius
a<-solvea(x)
a
## [1] 54.10608 58.08805 60.82123 86.59015 89.92024 91.60884 95.03318
## [8] 95.03318 96.76891 98.52035 100.28749 102.07035 102.07035 107.51315
## [15] 113.09734 130.69811 130.69811 138.92908 147.41138 149.57123 153.93804
## [22] 158.36769 165.12996 201.06193 208.67244 235.06182 240.52819 251.64943
## [29] 254.46900 254.46900 333.29156
str(trees)
## 'data.frame': 31 obs. of 3 variables:
## $ Girth : num 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
## $ Height: num 70 65 63 72 81 83 66 75 80 75 ...
## $ Volume: num 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...
a
## [1] 54.10608 58.08805 60.82123 86.59015 89.92024 91.60884 95.03318
## [8] 95.03318 96.76891 98.52035 100.28749 102.07035 102.07035 107.51315
## [15] 113.09734 130.69811 130.69811 138.92908 147.41138 149.57123 153.93804
## [22] 158.36769 165.12996 201.06193 208.67244 235.06182 240.52819 251.64943
## [29] 254.46900 254.46900 333.29156
stem(radius)
##
## The decimal point is at the |
##
## 4 | 234
## 5 | 34455667779
## 6 | 055799
## 7 | 013
## 8 | 0278
## 9 | 000
## 10 | 3
hist(radius,col="orange")
plot(radius,a,main="plot of radius vs. area",col="green")
(4)(c) Present a horizontal, notched, colored boxplot of the areas calculated in (b). Title and label the axis.
boxplot(a,main="Horizontal Boxplot of Girth vs. Radius",horizontal=TRUE,notch=TRUE,col="green",xlab="area")
(4)(d) Demonstrate that the outlier revealed in the boxplot of Volume is not an extreme outlier. It is possible to do this with one line of code using boxplot.stats() or ‘manual’ calculation and logicals. Identify the tree with the largest area and output on one line its row number and three measurements.
boxplot(trees$Volume)
Vol<-trees$Volume
boxplot.stats(Vol)
## $stats
## [1] 10.2 19.4 24.2 37.3 58.3
##
## $n
## [1] 31
##
## $conf
## [1] 19.1204 29.2796
##
## $out
## [1] 77
largetrees<-max(trees[1])
which((trees[1]==largetrees))
## [1] 31
library(tidyverse)
data<-select(trees,Girth,Height,Volume)
data<-mutate(data,rownum=row_number())
data<-slice(data,31)
data##Output
## Girth Height Volume rownum
## 1 20.6 87 77 31
5(a) Use set.seed(124) and rexp() with n = 100, rate = 5.5 to generate a random sample designated as y. Generate a second random sample designated as x with set.seed(127) and rnorm() using n = 100, mean = 0 and sd = 0.15.
Generate a new object using cbind(x, y). Do not output this object; instead, assign it to a new name. Pass this object to apply() and compute the inter-quartile range (IQR) for each column: x and y. Use the function IQR() for this purpose. Round the results to four decimal places and present (this exercise shows the similarity of the IQR values.).
For information about rexp(), use help(rexp) or ?rexp(). Do not output x or y.
set.seed(124)
y<-rexp(n=100,rate=5.5)
y
## [1] 0.437761560 0.241513786 0.005558180 0.232850287 0.288897399
## [6] 0.161288398 0.095541892 0.241878874 0.576966003 0.003880211
## [11] 0.037650125 0.420996213 0.245179898 0.193448843 0.364371758
## [16] 0.123718493 0.137720431 0.120427174 0.223975168 0.116623119
## [21] 0.168002224 0.139971855 0.018726739 0.175141408 0.266417369
## [26] 0.118240139 0.062380220 0.293363246 0.270919475 0.122228679
## [31] 0.007962139 0.283253607 0.276549354 0.022890977 0.132504602
## [36] 0.025567723 1.448679249 0.196334731 0.344274664 0.049969804
## [41] 0.102111461 0.029483199 0.199200618 0.342870941 0.021363166
## [46] 0.231080846 0.334475987 0.065370555 0.022726401 0.153795854
## [51] 0.272628650 0.185699404 0.164132110 0.187289884 0.062322528
## [56] 0.178406304 0.146002818 0.359182956 0.110417838 0.010587406
## [61] 0.023541697 0.198456190 0.275018149 0.151790687 0.276823199
## [66] 0.017184750 0.067754993 0.016009592 0.466994853 0.036760671
## [71] 0.170749150 0.172349844 0.667719381 0.094420808 0.040545840
## [76] 0.217021169 0.343978388 0.035894875 0.621650643 0.497808613
## [81] 0.077462013 0.056586583 0.214938102 0.075032300 0.033910210
## [86] 0.048081739 0.211056142 0.036396471 0.378935512 0.348713090
## [91] 0.033524882 0.120908768 0.103190538 0.013742108 0.282278492
## [96] 0.337948729 0.058934138 0.005874031 0.093975859 0.019267246
set.seed(127)
x<-rnorm(n=100,mean=0,sd=0.15)
x
## [1] -0.0851600611 -0.1222140868 -0.0740909395 0.0002728268 0.1229677400
## [6] 0.1495136787 0.1127673329 -0.0188320834 0.0846929832 0.0200262835
## [11] -0.0158944814 0.0908894426 0.0019876462 -0.0418183048 -0.0203511794
## [16] 0.1294718015 -0.0044715024 -0.1113679646 -0.1310392003 -0.0415959047
## [21] -0.0915482974 0.1997528562 0.1754515197 0.0661273446 -0.2609202883
## [26] -0.1902555023 -0.2839053144 -0.1361981924 0.0611551627 0.2228695060
## [31] 0.2637731829 0.1461282680 -0.1162939262 -0.2976325808 -0.0366385697
## [36] -0.0129906419 -0.0469025956 -0.0563756463 0.2653057727 -0.1852674395
## [41] -0.1236434881 0.0210087211 0.1769341292 -0.0400438128 -0.0442256688
## [46] 0.0106239866 0.0322232815 -0.1172843108 -0.2543741224 -0.1124033563
## [51] 0.1551280797 0.1718688485 0.2984961304 0.0959261993 0.0250177212
## [56] 0.4310592908 0.0259642407 0.0535536453 -0.1770221644 0.1353861399
## [61] -0.0628797854 -0.0862303818 -0.1502165384 -0.1672733709 0.0004685668
## [66] 0.1666075316 0.1984494552 0.0647066153 -0.0767151852 -0.0832515551
## [71] 0.0697492689 -0.0864920262 0.1580389690 -0.1993476135 0.0103737822
## [76] 0.1183408097 0.0359635886 -0.0397496935 0.0257508831 0.2081058341
## [81] -0.2074887475 -0.1521173054 -0.1669987243 -0.1277276438 -0.1077832051
## [86] -0.0936648410 0.1439567185 -0.0585897986 0.1049391967 0.0876554447
## [91] 0.1789853822 0.1310457666 -0.0033869653 0.0521523693 -0.1729482671
## [96] 0.3717619627 0.0838325419 -0.1588807565 0.0162015492 0.0409062256
obj1<-cbind(x,y)
interqrange<-apply(obj1,2,IQR)
round(interqrange,digits=4)
## x y
## 0.2041 0.2164
(5)(b) This item will illustrate the difference between a right-skewed distribution and a symmetric one. For base R plots, use par(mfrow = c(2, 2)) to generate a display with four diagrams; grid.arrange() for ggplots. On the first row, for the normal results, present a histogram and a horizontal boxplot for x in color. For the exponential results, present a histogram and a horizontal boxplot for y in color.
par(mfrow=c(2,2))
hist(x,col="purple")
boxplot(x,horizontal=TRUE,col="blue")
hist(y,col="purple")
boxplot(y,horizontal=TRUE,col="blue")
(5)(c) QQ plots are useful for detecting the presence of heavy-tailed distributions. Present side-by-side QQ plots, one for each sample, using qqnorm() and qqline(). Add color and titles. In base R plots, “cex” can be used to control the size of the plotted data points and text. Lastly, determine if there are any extreme outliers in either sample.
par(mfrow=c(1,2))
qqnorm(x,main="Normal Q-Q Plot",col="red")
qqline(x,distribution = qnorm)
qqnorm(y,main="Exponential Q-Q Plot",col="blue")
qqline(y,distribution =qnorm)