Exercise 1: Write an expression to compute the number of seconds in a 365-day year, and execute the expression.
365*24*60*60
## [1] 31536000
Exercise 2: Define a workspace object which contains the number of seconds in 365-day year, and display the results.
(Second_per_year <- 365*24*60*60)
## [1] 31536000
Exercise 3: Find the function name for base-10 logarithms, and compute the base-10 logarithm of 10, 100, and 1000 (use the ?? function at the console to search).
log10(10); log10(100); log10(1000)
## [1] 1
## [1] 2
## [1] 3
Exercise 4: What are the arguments of the rbinom (random numbers following the binomial distribution) function? Are any default or must all be specified? What is the value returned?
rbinom has 3 arguments and all of them must be specified.## n number of observations. If length(n) > 1, the length is taken to be the number required.
## size number of trials (zero or more).
## prob probability of success on each trial.
rbinom will return random deviates.Exercise 5: Display the vector of the number of successes in 24 trials with probability of success 0.2 (20%), this simulation carried out 128 times.
(vector_e5 <- rbinom(128, 24, 0.2))
## [1] 7 5 2 8 5 7 6 7 8 7 4 5 4 5 4 4 5 5 7 6 3 4 2 6 3
## [26] 3 1 5 3 5 1 4 5 3 2 4 3 5 4 4 5 7 4 1 3 7 8 5 4 6
## [51] 6 8 6 2 4 5 7 4 3 5 4 3 5 5 6 6 4 5 3 2 4 7 5 6 8
## [76] 3 1 5 3 5 4 6 2 5 5 1 5 4 5 6 5 4 3 10 3 4 6 6 3 4
## [101] 7 4 2 6 3 4 6 5 2 11 4 4 3 2 2 5 6 8 6 2 1 5 3 5 3
## [126] 5 5 0
Exercise 6: Summarize the result of rbinom (previous exercise) with the table function. What is the range of results, i.e., the minimum and maximum values? Which is the most likely result? For these, write text which includes the computed results. This is necessary because the results change with each random sampling.
(table(vector_e5))
## vector_e5
## 0 1 2 3 4 5 6 7 8 10 11
## 1 6 11 19 25 31 17 10 6 1 1
vector_e5 is from 0 to 11.Exercise 7: Create and display a vector representing latitudes in degrees from \(0^\circ\) (equator) to \(+90^\circ\) (north pole), in intervals of \(5^\circ\). Compute and display their cosines – recall, the trig functions in R expect arguments in radians. Find and display the maximum cosine.
(latitudes <- seq(0, 90, by=5))
## [1] 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
cos(latitudes)
## [1] 1.00000000 0.28366219 -0.83907153 -0.75968791 0.40808206 0.99120281
## [7] 0.15425145 -0.90369221 -0.66693806 0.52532199 0.96496603 0.02212676
## [13] -0.95241298 -0.56245385 0.63331920 0.92175127 -0.11038724 -0.98437664
## [19] -0.44807362
max(cos(latitudes))
## [1] 1
Exercise 8: Check if the gstat package is installed on your system. If not, install it. Load it into the workspace. Display its help and find the variogram function. What is its description?
install.packages('gstat', dependencies=TRUE)
library('gstat')
search()
## [1] ".GlobalEnv" "package:gstat" "package:stats"
## [4] "package:graphics" "package:grDevices" "package:utils"
## [7] "package:datasets" "package:methods" "Autoloads"
## [10] "package:base"
help('variogram')
variogram.## Calculates the sample variogram from data, or in case of a linear model is given, for the residuals, with options for directional, robust, and pooled variogram, and for irregular distance intervals.
## In case spatio-temporal data is provided, the function variogramST is called with a different set of parameters.
Exercise 9: Display the classes of the built-in constant pi and of the built-in constant letters.
class(pi)
## [1] "numeric"
class(letters)
## [1] "character"
Exercise 10: What is the class of the object returned by the variogram function? (Hint: see the heading “Value” in the help text.)
Exercise 11: List the datasets in the gstat package.
data(package="gstat")
Exercise 12: Load, summarize, and show the structure of the oxford dataset.
data(oxford, package="gstat")
summary(oxford)
## PROFILE XCOORD YCOORD ELEV PROFCLASS
## Min. : 1.00 Min. :100 Min. : 100 Min. :540.0 Cr:19
## 1st Qu.: 32.25 1st Qu.:200 1st Qu.: 600 1st Qu.:558.0 Ct:36
## Median : 63.50 Median :350 Median :1100 Median :573.0 Ia:71
## Mean : 63.50 Mean :350 Mean :1100 Mean :573.6
## 3rd Qu.: 94.75 3rd Qu.:500 3rd Qu.:1600 3rd Qu.:584.5
## Max. :126.00 Max. :600 Max. :2100 Max. :632.0
## MAPCLASS VAL1 CHR1 LIME1 VAL2
## Cr:31 Min. :2.000 Min. :1.000 Min. :0.000 Min. :4.00
## Ct:36 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:4.00
## Ia:59 Median :4.000 Median :2.000 Median :4.000 Median :8.00
## Mean :3.508 Mean :2.468 Mean :2.643 Mean :6.23
## 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:8.00
## Max. :4.000 Max. :4.000 Max. :4.000 Max. :8.00
## CHR2 LIME2 DEPTHCM DEP2LIME PCLAY1
## Min. :2 Min. :0.000 Min. :10.00 Min. :20.00 Min. :10.00
## 1st Qu.:2 1st Qu.:4.000 1st Qu.:25.00 1st Qu.:20.00 1st Qu.:20.00
## Median :2 Median :5.000 Median :36.00 Median :20.00 Median :24.50
## Mean :3 Mean :3.889 Mean :46.25 Mean :30.32 Mean :24.44
## 3rd Qu.:4 3rd Qu.:5.000 3rd Qu.:64.75 3rd Qu.:40.00 3rd Qu.:28.00
## Max. :6 Max. :5.000 Max. :91.00 Max. :90.00 Max. :37.00
## PCLAY2 MG1 OM1 CEC1
## Min. :10.00 Min. : 19.00 Min. : 2.600 Min. : 7.00
## 1st Qu.:10.00 1st Qu.: 44.00 1st Qu.: 4.100 1st Qu.:12.00
## Median :10.00 Median : 72.00 Median : 5.350 Median :15.00
## Mean :14.76 Mean : 93.53 Mean : 5.995 Mean :18.88
## 3rd Qu.:20.00 3rd Qu.:123.25 3rd Qu.: 7.175 3rd Qu.:25.25
## Max. :40.00 Max. :308.00 Max. :13.100 Max. :43.00
## PH1 PHOS1 POT1
## Min. :4.200 Min. : 1.700 Min. : 83.0
## 1st Qu.:7.200 1st Qu.: 6.200 1st Qu.:127.0
## Median :7.500 Median : 8.500 Median :164.0
## Mean :7.152 Mean : 8.752 Mean :181.7
## 3rd Qu.:7.600 3rd Qu.:10.500 3rd Qu.:194.8
## Max. :7.700 Max. :25.000 Max. :847.0
str(oxford)
## 'data.frame': 126 obs. of 22 variables:
## $ PROFILE : num 1 2 3 4 5 6 7 8 9 10 ...
## $ XCOORD : num 100 100 100 100 100 100 100 100 100 100 ...
## $ YCOORD : num 2100 2000 1900 1800 1700 1600 1500 1400 1300 1200 ...
## $ ELEV : num 598 597 610 615 610 595 580 590 598 588 ...
## $ PROFCLASS: Factor w/ 3 levels "Cr","Ct","Ia": 2 2 2 3 3 2 3 2 3 3 ...
## $ MAPCLASS : Factor w/ 3 levels "Cr","Ct","Ia": 2 3 3 3 3 2 2 3 3 3 ...
## $ VAL1 : num 3 3 4 4 3 3 4 4 4 3 ...
## $ CHR1 : num 3 3 3 3 3 2 2 3 3 3 ...
## $ LIME1 : num 4 4 4 4 4 0 2 1 0 4 ...
## $ VAL2 : num 4 4 5 8 8 4 8 4 8 8 ...
## $ CHR2 : num 4 4 4 2 2 4 2 4 2 2 ...
## $ LIME2 : num 4 4 4 5 5 4 5 4 5 5 ...
## $ DEPTHCM : num 61 91 46 20 20 91 30 61 38 25 ...
## $ DEP2LIME : num 20 20 20 20 20 20 20 20 40 20 ...
## $ PCLAY1 : num 15 25 20 20 18 25 25 35 35 12 ...
## $ PCLAY2 : num 10 10 20 10 10 20 10 20 10 10 ...
## $ MG1 : num 63 58 55 60 88 168 99 59 233 87 ...
## $ OM1 : num 5.7 5.6 5.8 6.2 8.4 6.4 7.1 3.8 5 9.2 ...
## $ CEC1 : num 20 22 17 23 27 27 21 14 27 20 ...
## $ PH1 : num 7.7 7.7 7.5 7.6 7.6 7 7.5 7.6 6.6 7.5 ...
## $ PHOS1 : num 13 9.2 10.5 8.8 13 9.3 10 9 15 12.6 ...
## $ POT1 : num 196 157 115 172 238 164 312 184 123 282 ...
Exercise 13: load the women sample dataset. How many observations (cases) and how many attributes (fields) for each case? What are the column (field) and row names? What is the height of the first-listed woman?
data("women")
str(women)
## 'data.frame': 15 obs. of 2 variables:
## $ height: num 58 59 60 61 62 63 64 65 66 67 ...
## $ weight: num 115 117 120 123 126 129 132 135 139 142 ...
colnames(women)
## [1] "height" "weight"
rownames(women)
## [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15"
women[1, "height"]
## [1] 58
Exercise 14: List the factors in the oxford dataset.
str(oxford)
## 'data.frame': 126 obs. of 22 variables:
## $ PROFILE : num 1 2 3 4 5 6 7 8 9 10 ...
## $ XCOORD : num 100 100 100 100 100 100 100 100 100 100 ...
## $ YCOORD : num 2100 2000 1900 1800 1700 1600 1500 1400 1300 1200 ...
## $ ELEV : num 598 597 610 615 610 595 580 590 598 588 ...
## $ PROFCLASS: Factor w/ 3 levels "Cr","Ct","Ia": 2 2 2 3 3 2 3 2 3 3 ...
## $ MAPCLASS : Factor w/ 3 levels "Cr","Ct","Ia": 2 3 3 3 3 2 2 3 3 3 ...
## $ VAL1 : num 3 3 4 4 3 3 4 4 4 3 ...
## $ CHR1 : num 3 3 3 3 3 2 2 3 3 3 ...
## $ LIME1 : num 4 4 4 4 4 0 2 1 0 4 ...
## $ VAL2 : num 4 4 5 8 8 4 8 4 8 8 ...
## $ CHR2 : num 4 4 4 2 2 4 2 4 2 2 ...
## $ LIME2 : num 4 4 4 5 5 4 5 4 5 5 ...
## $ DEPTHCM : num 61 91 46 20 20 91 30 61 38 25 ...
## $ DEP2LIME : num 20 20 20 20 20 20 20 20 40 20 ...
## $ PCLAY1 : num 15 25 20 20 18 25 25 35 35 12 ...
## $ PCLAY2 : num 10 10 20 10 10 20 10 20 10 10 ...
## $ MG1 : num 63 58 55 60 88 168 99 59 233 87 ...
## $ OM1 : num 5.7 5.6 5.8 6.2 8.4 6.4 7.1 3.8 5 9.2 ...
## $ CEC1 : num 20 22 17 23 27 27 21 14 27 20 ...
## $ PH1 : num 7.7 7.7 7.5 7.6 7.6 7 7.5 7.6 6.6 7.5 ...
## $ PHOS1 : num 13 9.2 10.5 8.8 13 9.3 10 9 15 12.6 ...
## $ POT1 : num 196 157 115 172 238 164 312 184 123 282 ...
PROFCLASS and MAPCLASS are oxford’s factors.Exercise 15: Identify the thin trees, defined as those with height/girth ratio more than 1 s.d. above the mean. You will have to define a new field in the dataframe with this ratio, and then use the mean and sd summary functions, along with a logical expression.
(trees$"Height/Girth" <- trees$Height/trees$Girth)
## [1] 8.433735 7.558140 7.159091 6.857143 7.570093 7.685185 6.000000 6.818182
## [9] 7.207207 6.696429 6.991150 6.666667 6.666667 5.897436 6.250000 5.736434
## [17] 6.589147 6.466165 5.182482 4.637681 5.571429 5.633803 5.103448 4.500000
## [25] 4.723926 4.682081 4.685714 4.469274 4.444444 4.444444 4.223301
(sd_e15 <- sd(trees$"Height/Girth"))
## [1] 1.186666
(mean_e15 <- mean(trees$"Height/Girth"))
## [1] 5.985513
(trees$Thin <- trees$"Height/Girth" > mean_e15+sd_e15)
## [1] TRUE TRUE FALSE FALSE TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
(trees)
## Girth Height Volume Height/Girth Thin
## 1 8.3 70 10.3 8.433735 TRUE
## 2 8.6 65 10.3 7.558140 TRUE
## 3 8.8 63 10.2 7.159091 FALSE
## 4 10.5 72 16.4 6.857143 FALSE
## 5 10.7 81 18.8 7.570093 TRUE
## 6 10.8 83 19.7 7.685185 TRUE
## 7 11.0 66 15.6 6.000000 FALSE
## 8 11.0 75 18.2 6.818182 FALSE
## 9 11.1 80 22.6 7.207207 TRUE
## 10 11.2 75 19.9 6.696429 FALSE
## 11 11.3 79 24.2 6.991150 FALSE
## 12 11.4 76 21.0 6.666667 FALSE
## 13 11.4 76 21.4 6.666667 FALSE
## 14 11.7 69 21.3 5.897436 FALSE
## 15 12.0 75 19.1 6.250000 FALSE
## 16 12.9 74 22.2 5.736434 FALSE
## 17 12.9 85 33.8 6.589147 FALSE
## 18 13.3 86 27.4 6.466165 FALSE
## 19 13.7 71 25.7 5.182482 FALSE
## 20 13.8 64 24.9 4.637681 FALSE
## 21 14.0 78 34.5 5.571429 FALSE
## 22 14.2 80 31.7 5.633803 FALSE
## 23 14.5 74 36.3 5.103448 FALSE
## 24 16.0 72 38.3 4.500000 FALSE
## 25 16.3 77 42.6 4.723926 FALSE
## 26 17.3 81 55.4 4.682081 FALSE
## 27 17.5 82 55.7 4.685714 FALSE
## 28 17.9 80 58.3 4.469274 FALSE
## 29 18.0 80 51.5 4.444444 FALSE
## 30 18.0 80 51.0 4.444444 FALSE
## 31 20.6 87 77.0 4.223301 FALSE
Exercise 16: Display a histogram of the diamond prices in the diamonds dataset.
library(ggplot2)
histogram_e16 <- ggplot(data=diamonds) +
geom_histogram(mapping = aes(x=price), binwidth = 500,
colour="pink") +
geom_rug(mapping = aes(x=price))
print(histogram_e16)
Exercise 17: Write a model to predict tree height from tree girth. How much of the height can be predicted from the girth?
model_e17 <- lm(Height ~ Girth, data=trees)
summary(model_e17)
##
## Call:
## lm(formula = Height ~ Girth, data = trees)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.5816 -2.7686 0.3163 2.4728 9.9456
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 62.0313 4.3833 14.152 1.49e-14 ***
## Girth 1.0544 0.3222 3.272 0.00276 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.538 on 29 degrees of freedom
## Multiple R-squared: 0.2697, Adjusted R-squared: 0.2445
## F-statistic: 10.71 on 1 and 29 DF, p-value: 0.002758
Exercise 18: Write a model to predict tree volume as a linear function of tree height and tree girth, with no interaction.
model_e18 <- lm(Volume ~ Height + Girth, data=trees)
summary(model_e18)
##
## Call:
## lm(formula = Volume ~ Height + Girth, data = trees)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.4065 -2.6493 -0.2876 2.2003 8.4847
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -57.9877 8.6382 -6.713 2.75e-07 ***
## Height 0.3393 0.1302 2.607 0.0145 *
## Girth 4.7082 0.2643 17.816 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.882 on 28 degrees of freedom
## Multiple R-squared: 0.948, Adjusted R-squared: 0.9442
## F-statistic: 255 on 2 and 28 DF, p-value: < 2.2e-16
apply family of functionsExercise 19: Write a function to restrict the values of a vector to the range \(0 \ldots 1\). Any values \(< 0\) should be replaced with \(0\), and any values \(>1\) should be replaced with \(1\). Test the function on a vector with elements from \(-1.2\) to \(+1.2\) in increments of \(0.1\) – see the seq “sequence” function.
(vector_e19 <- seq(-1.2, 1.2, by = 0.1))
## [1] -1.200000e+00 -1.100000e+00 -1.000000e+00 -9.000000e-01 -8.000000e-01
## [6] -7.000000e-01 -6.000000e-01 -5.000000e-01 -4.000000e-01 -3.000000e-01
## [11] -2.000000e-01 -1.000000e-01 2.220446e-16 1.000000e-01 2.000000e-01
## [16] 3.000000e-01 4.000000e-01 5.000000e-01 6.000000e-01 7.000000e-01
## [21] 8.000000e-01 9.000000e-01 1.000000e+00 1.100000e+00 1.200000e+00
restrict_vector_e19 <- function(v){
for (i in 1:length(v)){
if (v[i] < 0){
v[i] <- 0
}
else if (v[i] > 1){
v[i] <- 1
}
}
return(v)
}
restrict_vector_e19(vector_e19)
## [1] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [6] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
## [11] 0.000000e+00 0.000000e+00 2.220446e-16 1.000000e-01 2.000000e-01
## [16] 3.000000e-01 4.000000e-01 5.000000e-01 6.000000e-01 7.000000e-01
## [21] 8.000000e-01 9.000000e-01 1.000000e+00 1.000000e+00 1.000000e+00
Bonus Exercise : Use tidyverse functions and pipes on the trees dataset, to select the trees (use the filter function) with a volume greater than the median volume (use the median function), compute the ratio of girth to height as a new variable (use the mutate function), and sort by this (use the arrange function) from thin to thick trees.
trees %>%
filter(Volume>median(Volume)) %>%
mutate(ratio_be = round(Girth/Height, 3)) %>%
arrange(ratio_be)