Highlights:
1.i. Checking Available Objects in the Active Working Directory
1.ii. Calculating the Quadratic Equation for the Value of X
1.iii. Removing Unnecessary Objects from the Working Directory
2.i. Five Basic Characteristics
Data Sets and Variables
Variable Types
Writing Loops in R
5.i. Write Codes for all Possible Outcomes One by One
5.ii. Give a Range
5.iii. Easy for loop
5.iii.a. Easy for loop b
5.iv. for loop 2
5.v. Professional for loop
5.vi. Extra for loop
5.vii. Sample Question/Answer *for loop
We use the term object to describe anything that is stored in R. Variables are examples, but objects can also be more complicated entities such as functions. Let’s create objects a, b, and c and assign some values to them.
a <- 1
b <- 2
c <- -4
#Can also be done using '=' sign but not recommended because it can create some confusions
d = 3
e = 2
f = 4
# Creating more objects
g <- 15
h <- 19
i <- 21
j <- 26
# Printing the saved objects on R console
a;b;c;d;e;f
## [1] 1
## [1] 2
## [1] -4
## [1] 3
## [1] 2
## [1] 4
R uses Random Excess Memory to store the active objects. It makes no sense for the objects to stress the system when we don’t really need them. Or, sometime we want to check how many objects we have on the active workstation. In either case, we have to check for the objects, and we can do so using ls() function in R.
ls()
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
We created these objects a moment ago the they have been drawn by the ls() command.
We can use the objects as we need. Let’s imagine I want to calculate a quadratic equation (x^2+2x-4) using a quadratic formula, e.g., (-b+-√b^2-4ac) / 2a
(-b + sqrt(b^2 - 4*a*c))/(2*a) # First possible outcome
[1] 1.236068
(-b - sqrt(b^2 - 4*a*c))/(2*a) # Second possible outcome
[1] -3.236068
We have both possible values.
We can remove the objects that we don’t need by using the rm() function.
# Removing one object (a)
rm(a)
# Removing Two Objects at the same time
rm(b,c)
# Removing Three Objects at the same time
rm(d,e,f)
# Do we have to write the names of the objects all the time? Yes. If you want to omit just some of the objects in the working directory. And No. If you want to remove all the objects at the same time.
# Removes all the objects at the same time
rm(list=ls())
# Lets check for the objects in the working directory.
ls()
character(0)
We see that the ls() command shows that there hasn’t been any object. We removed 6 of the total variables were removed using rm() options and remaining using rm(list=ls()) options.
In general, data analysis can be described as a series of function applied to the data. R has several predefined functions. We have already used use of them. For example, rm(), ls(), etc. If we want to know more about the new function we can just type the function in R console for example:
Note 1: Functions need parenthesis to be evaluated but there are some exceptions, e.g., arithmetic (+,-, etc.) and relational (2^3) operators*
ls
function (name, pos = -1L, envir = as.environment(pos), all.names = FALSE,
pattern, sorted = TRUE)
{
if (!missing(name)) {
pos <- tryCatch(name, error = function(e) e)
if (inherits(pos, "error")) {
name <- substitute(name)
if (!is.character(name))
name <- deparse(name)
warning(gettextf("%s converted to character string",
sQuote(name)), domain = NA)
pos <- name
}
}
all.names <- .Internal(ls(envir, all.names, sorted))
if (!missing(pattern)) {
if ((ll <- length(grep("[", pattern, fixed = TRUE))) &&
ll != length(grep("]", pattern, fixed = TRUE))) {
if (pattern == "[") {
pattern <- "\\["
warning("replaced regular expression pattern '[' by '\\\\['")
}
else if (length(grep("[^\\\\]\\[<-", pattern))) {
pattern <- sub("\\[<-", "\\\\\\[<-", pattern)
warning("replaced '[<-' by '\\\\[<-' in regular expression pattern")
}
}
grep(pattern, all.names, value = TRUE)
}
else all.names
}
<bytecode: 0x000000001546e010>
<environment: namespace:base>
It gave me everything about the function ls. We can use this strategy to know about any function in R.
ls()
## character(0)
Note 2: To specify the arguments, we use ‘=’ sign not the ‘<-’ sign
### Log & exp Functions: These functions require at least one argument. Remember they are inverse functions.
log(1)#gives the natural log of 1
[1] 0
log10(2)#gives the log base10 of 2
[1] 0.30103
# Or
log(x=2, base=10)
[1] 0.30103
# Or
log(2, 10)
[1] 0.30103
log2(5)#gives the log base8 of 5
[1] 2.321928
# or
log(5, base=2)
[1] 2.321928
# Now exp of the values we received
exp(0)
[1] 1
exp(0.6931472)
[1] 2
exp(1.609438)
[1] 5
#Gives the base argument for log base 2
args(log2)
function (x)
NULL
#Gives the base argument for standard deviation
args(sd)
function (x, na.rm = FALSE)
NULL
log base of 2 require one base argument, i.e. a scalar (a numeric or constant vector), while the ‘sd’ function requires 2 base arguments. If we don’t omit the missing values we will not be able to calculate the sd.
# First evaluates the 'exp' and then the log function
log(exp(0))
[1] 0
log10(exp(0.6931472))#log base 10
[1] 0.30103
log2(exp(1.609438))#log base 2
[1] 2.321928
# First evaluates the 'log' and then the 'exp' function
#exp(log(2))
#exp(log10(8))
#exp(log2(5))
# Asking for help in R about the 'mean' function
help("mean")
# Or using the '?' sign before the function
?exp
R has many data sets for its users to practice. We can check all available data sets by using data() command.
data()
R also has some pre-determined values of some universal symbols like
pi
[1] 3.141593
Inf
[1] Inf
Variable names can be anything not just a, b, c or any thing simple.*
We do have different types of variables in R. We can have ‘numbers’, or ‘character strings’ or and ‘tables’ or a ‘matrix’. The function “class” helps us check the type of an object. For example, if I define ‘w’ as 2, and I look at its class, I will see it’s a numeric. Here’s another example. Let’s look at the class of the function ‘ls’. Not surprisingly, we see that it’s a function.
a <- 1
class(a)
[1] "numeric"
#OR
h <- vector("numeric", length = 15)
h #the default value is 0 for the numeric vector
[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
class(h)
[1] "numeric"
b <- c("m")
class(b)
[1] "character"
# Variable as a string of letters (character vector)
d <- letters[1:5]
class(d)
[1] "character"
c <- 1:5
class(c)
[1] "integer"
class(ls)
[1] "function"
f <- c("TRUE", "FALSE")
class(f)
[1] "character"
g <- c(10+0i, 3+2i)
class(g)
[1] "complex"
e <- matrix(ncol=2, nrow=2, data = 1:4)
e
[,1] [,2]
[1,] 1 3
[2,] 2 4
class(e)
[1] "matrix" "array"
Lists are a special type of vector that can contain elements of different classes. Lists are very important data type in R and we need to know them well.
i <- list(2, 3, "c","d", FALSE, 2+3i, 4+2i)
i
[[1]]
[1] 2
[[2]]
[1] 3
[[3]]
[1] "c"
[[4]]
[1] "d"
[[5]]
[1] FALSE
[[6]]
[1] 2+3i
[[7]]
[1] 4+2i
class(i)
[1] "list"
So far, the variables we have defined have been just one number. This is not very useful for storing data. The most common way of storing data sets in R is with data frames. Conceptually, we can think of data frames as tables or matrix. Rows represent observations, and different variables are represented by different columns.
Data frames are particularly useful for data sets because we can combine different types into one single object.
Merriam-Webster dictionary describes loop as “a piece of film or magnetic tape whose ends are spliced together so as to project or play back the same material continuously”. As mentioned in this quote loop in R mean exactly the same thing. It’s some type of function/process that conducts the designated task over and over again until the given iterations are completed.
Now, to help me understand what loop is, how it works and why we need it. Let’s write a simple R code.
I want some R code that gives me the numbers 10 to 30 followed by the phrase “This is Number”. First thing I can do is as follow
This is how many different syntax we need if we don’t use the loop. It is also said that we should not copy and and paste the same syntax more than twice.
print(paste("This is number", 10))
[1] "This is number 10"
print(paste("This is number", 11))
[1] "This is number 11"
#print(paste("This is number", 12))
#print(paste("This is number", 13))
#print(paste("This is number", 14))
#print(paste("This is number", 15))
#print(paste("This is number", 16))
#print(paste("This is number", 17))
#print(paste("This is number", 18))
#print(paste("This is number", 19))
#print(paste("This is number", 20))
#print(paste("This is number", 21))
#print(paste("This is number", 22))
#print(paste("This is number", 23))
#print(paste("This is number", 24))
#print(paste("This is number", 25))
#print(paste("This is number", 26))
#print(paste("This is number", 27))
print(paste("This is number", 28))
[1] "This is number 28"
print(paste("This is number", 29))
[1] "This is number 29"
print(paste("This is number", 30))
[1] "This is number 30"
We achieved our goals. However, look at the code. It repetitive and dull after a few lines. What if increase the range to 10 through 1,000, or 5,000, or 100,000? Do we repeat the same line a hundred thousand times? It would be ridiculously hard.
print(paste("This is number", 10:30))
[1] "This is number 10" "This is number 11" "This is number 12"
[4] "This is number 13" "This is number 14" "This is number 15"
[7] "This is number 16" "This is number 17" "This is number 18"
[10] "This is number 19" "This is number 20" "This is number 21"
[13] "This is number 22" "This is number 23" "This is number 24"
[16] "This is number 25" "This is number 26" "This is number 27"
[19] "This is number 28" "This is number 29" "This is number 30"
Yey. We got our output. This is much easier for the problem we set at outset, but it is limited in many ways.
Every loop has three basic components:
i. the Output: in my example below ’paste(“This is number”, number) is the outcome. Basically, this section tells R what to generate as an output. This section consists of two arguments, 1. the type of vector, e.g., logical, integer, double, character, etc., and 2. the length of the vector
ii. the sequence: example below shows a manual sequence, e.g., “c(10,11,12,13,14,15,16,17,18,19,20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30))”. This section dictates the sequence type. And,
iii. the body: This is the the heart of the loop, because it does all magic that we want to see. in the example below “print(paste(”This is number“, number))” is the body. Or if we want to go atomic, its the “print of the number”.
for (number in c(10,11,12,13,14,15,16,17,18,19,20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)){
print(paste("This is number", number))
}
[1] "This is number 10"
[1] "This is number 11"
[1] "This is number 12"
[1] "This is number 13"
[1] "This is number 14"
[1] "This is number 15"
[1] "This is number 16"
[1] "This is number 17"
[1] "This is number 18"
[1] "This is number 19"
[1] "This is number 20"
[1] "This is number 21"
[1] "This is number 22"
[1] "This is number 23"
[1] "This is number 24"
[1] "This is number 25"
[1] "This is number 26"
[1] "This is number 27"
[1] "This is number 28"
[1] "This is number 29"
[1] "This is number 30"
Here, ‘number’ at the beginning of the parenthesis is the number at the end of the syntax. I assigned the numbers to the vector number. I can get rid of the word ‘number’ from the syntax and assign those numbers to any word or expression I like. For example, I will assign the numbers to a word ‘silly’.
for (frog in c(10,11,12,13,14,15,16,17,18,19,20,20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)){
print(paste("This is number", frog))
}
[1] "This is number 10"
[1] "This is number 11"
[1] "This is number 12"
[1] "This is number 13"
[1] "This is number 14"
[1] "This is number 15"
[1] "This is number 16"
[1] "This is number 17"
[1] "This is number 18"
[1] "This is number 19"
[1] "This is number 20"
[1] "This is number 20"
[1] "This is number 21"
[1] "This is number 22"
[1] "This is number 23"
[1] "This is number 24"
[1] "This is number 25"
[1] "This is number 26"
[1] "This is number 27"
[1] "This is number 28"
[1] "This is number 29"
[1] "This is number 30"
Just like that.
for (number in 10:30){
print(paste("This is number", number))
}
[1] "This is number 10"
[1] "This is number 11"
[1] "This is number 12"
[1] "This is number 13"
[1] "This is number 14"
[1] "This is number 15"
[1] "This is number 16"
[1] "This is number 17"
[1] "This is number 18"
[1] "This is number 19"
[1] "This is number 20"
[1] "This is number 21"
[1] "This is number 22"
[1] "This is number 23"
[1] "This is number 24"
[1] "This is number 25"
[1] "This is number 26"
[1] "This is number 27"
[1] "This is number 28"
[1] "This is number 29"
[1] "This is number 30"
Instead of writing all those numbers, I can just assign the range and get the same results.
for(i in 10:30) {
print(paste("This is number", i))
}
[1] "This is number 10"
[1] "This is number 11"
[1] "This is number 12"
[1] "This is number 13"
[1] "This is number 14"
[1] "This is number 15"
[1] "This is number 16"
[1] "This is number 17"
[1] "This is number 18"
[1] "This is number 19"
[1] "This is number 20"
[1] "This is number 21"
[1] "This is number 22"
[1] "This is number 23"
[1] "This is number 24"
[1] "This is number 25"
[1] "This is number 26"
[1] "This is number 27"
[1] "This is number 28"
[1] "This is number 29"
[1] "This is number 30"
How about we make it fancier and probably much more adaptable. Most of the proficient R programmers write the codes this way. We assing the range to a variable ‘i’ and it loops through the range.
What if we don’t want all the numbers within the range but just the odds. Here’s how we do it.
for (i in 10:30){
if(!i %% 2) {
next
}
print(paste("This is number", i))}
[1] "This is number 11"
[1] "This is number 13"
[1] "This is number 15"
[1] "This is number 17"
[1] "This is number 19"
[1] "This is number 21"
[1] "This is number 23"
[1] "This is number 25"
[1] "This is number 27"
[1] "This is number 29"
for (i in 1:300){
if(i %% 27) {
next
}
print(paste("This number is", i))}
[1] "This number is 27"
[1] "This number is 54"
[1] "This number is 81"
[1] "This number is 108"
[1] "This number is 135"
[1] "This number is 162"
[1] "This number is 189"
[1] "This number is 216"
[1] "This number is 243"
[1] "This number is 270"
[1] "This number is 297"
output <- vector("double", ncol(mtcars))
for(i in seq_along(mtcars)){
output[[i]] <- mean(mtcars[[i]])
}
output
[1] 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250
[7] 17.848750 0.437500 0.406250 3.687500 2.812500
library(tidyverse)
library(nycflights13)
data(flights)
output1 <- vector("list", 19)
for(i in seq_along(flights)){
output1[[i]] <- class(flights[[i]])
}
output1
[[1]]
[1] "integer"
[[2]]
[1] "integer"
[[3]]
[1] "integer"
[[4]]
[1] "integer"
[[5]]
[1] "integer"
[[6]]
[1] "numeric"
[[7]]
[1] "integer"
[[8]]
[1] "integer"
[[9]]
[1] "numeric"
[[10]]
[1] "character"
[[11]]
[1] "integer"
[[12]]
[1] "character"
[[13]]
[1] "character"
[[14]]
[1] "character"
[[15]]
[1] "numeric"
[[16]]
[1] "numeric"
[[17]]
[1] "numeric"
[[18]]
[1] "numeric"
[[19]]
[1] "POSIXct" "POSIXt"
output2 <- vector ("list", length(iris))
for(i in seq_along(iris)){
output2[[i]] <- unique(iris[[i]])
}
output2
[[1]]
[1] 5.1 4.9 4.7 4.6 5.0 5.4 4.4 4.8 4.3 5.8 5.7 5.2 5.5 4.5 5.3 7.0 6.4 6.9 6.5
[20] 6.3 6.6 5.9 6.0 6.1 5.6 6.7 6.2 6.8 7.1 7.6 7.3 7.2 7.7 7.4 7.9
[[2]]
[1] 3.5 3.0 3.2 3.1 3.6 3.9 3.4 2.9 3.7 4.0 4.4 3.8 3.3 4.1 4.2 2.3 2.8 2.4 2.7
[20] 2.0 2.2 2.5 2.6
[[3]]
[1] 1.4 1.3 1.5 1.7 1.6 1.1 1.2 1.0 1.9 4.7 4.5 4.9 4.0 4.6 3.3 3.9 3.5 4.2 3.6
[20] 4.4 4.1 4.8 4.3 5.0 3.8 3.7 5.1 3.0 6.0 5.9 5.6 5.8 6.6 6.3 6.1 5.3 5.5 6.7
[39] 6.9 5.7 6.4 5.4 5.2
[[4]]
[1] 0.2 0.4 0.3 0.1 0.5 0.6 1.4 1.5 1.3 1.6 1.0 1.1 1.8 1.2 1.7 2.5 1.9 2.1 2.2
[20] 2.0 2.4 2.3
[[5]]
[1] setosa versicolor virginica
Levels: setosa versicolor virginica
means <- c(-10, 0, 10, 100)
n <- 50
output3 <- vector("list", length(means))
for (i in seq_along(output3)){
output3[[i]] <- rnorm(n, mean = means[i])
}
str(output3)
List of 4
$ : num [1:50] -10.25 -9.36 -9.88 -9.69 -10.68 ...
$ : num [1:50] 0.252 0.785 0.223 1.308 -0.77 ...
$ : num [1:50] 9.32 10.18 10.57 9.46 9.65 ...
$ : num [1:50] 98.8 99.3 101.1 101.3 100.6 ...
*** OR we can simply complete the same task this way:***
set.seed(1)
means <- c(-10, 0, 10, 100)
N <- 10
matrix(rnorm(n * length(means), mean = means), ncol = N)
[,1] [,2] [,3] [,4] [,5]
[1,] -10.6264538 -9.08102263 -10.1645236 -7.598382239 -10.5686687
[2,] 0.1836433 0.78213630 -0.2533617 -0.039240003 -0.1351786
[3,] 9.1643714 10.07456498 10.6969634 10.689739362 11.1780870
[4,] 101.5952808 98.01064830 100.5566632 100.028002159 98.4764332
[5,] -9.6704922 -9.38017425 -10.6887557 -10.743273209 -9.4060538
[6,] -0.8204684 -0.05612874 -0.7074952 0.188792300 0.3329504
[7,] 10.4874291 9.84420449 10.3645820 8.195041371 11.0630998
[8,] 100.7383247 98.52924762 100.7685329 101.465554862 99.6958161
[9,] -9.4242186 -10.47815006 -10.1123462 -9.846746662 -9.6299812
[10,] -0.3053884 0.41794156 0.8811077 2.172611670 0.2670988
[11,] 11.5117812 11.35867955 10.3981059 10.475509529 9.4574800
[12,] 100.3898432 99.89721227 99.3879736 99.290053569 101.2078678
[13,] -10.6212406 -9.61232839 -9.6588803 -9.389273647 -8.8395974
[14,] -2.2146999 -0.05380504 -1.1293631 -0.934097632 0.7002136
[15,] 11.1249309 8.62294044 11.4330237 8.746366600 11.5868335
[16,] 99.9550664 99.58500544 101.9803999 100.291446236 100.5584864
[17,] -10.0161903 -10.39428995 -10.3672215 -10.443291873 -11.2765922
[18,] 0.9438362 -0.05931340 -1.0441346 0.001105352 -0.5732654
[19,] 10.8212212 11.10002537 10.5697196 10.074341324 8.7753874
[20,] 100.5939013 100.76317575 99.8649454 99.410479054 99.5265994
[,6] [,7] [,8] [,9] [,10]
[1,] -10.62036668 -10.5059575 -11.9143594 -9.57489962 -11.2313234
[2,] 0.04211587 1.3430388 1.1765833 -0.23864710 0.9838956
[3,] 9.08907835 9.7854206 8.3350276 11.05848305 10.2199248
[4,] 100.15802877 99.8204435 99.5364696 100.88642265 98.5327500
[5,] -10.65458464 -10.1001907 -11.1159201 -10.61924305 -9.4789773
[6,] 1.76728727 0.7126663 -0.7508190 2.20610246 -0.1587546
[7,] 10.71670748 9.9264356 12.0871665 9.74497297 11.4645873
[8,] 100.91017423 99.9623658 100.0173956 98.57550535 99.2339180
[9,] -9.61581464 -10.6816605 -11.2863005 -10.14439960 -10.4302118
[10,] 1.68217608 -0.3242703 -1.6406055 0.20753834 -0.9261095
[11,] 9.36426355 10.0601604 10.4501871 12.30797840 9.8228960
[12,] 99.53835527 99.4111055 99.9814402 100.10580237 100.4020118
[13,] -8.56771776 -9.4685038 -10.3180684 -9.54300119 -10.7317482
[14,] -0.65069635 -1.5183941 -0.9293621 -0.07715294 0.8303732
[15,] 9.79261926 10.3065579 8.5125397 9.66599916 8.7919172
[16,] 99.60719207 98.4635502 98.9248077 99.96527397 98.9520156
[17,] -10.31999287 -10.3009761 -8.9999712 -9.21236039 -8.5588423
[18,] -0.27911330 -0.5282799 -0.6212667 2.07524501 -1.0158475
[19,] 10.49418833 9.3479052 8.6155732 11.02739244 10.4119747
[20,] 99.82266952 99.9431032 101.8692906 101.20790840 99.6189239
This resource came from; from https://jrnold.github.io/r4ds-exercise-solutions/iteration.html
humps <- c("five", "four", "three", "two", "one", "no")
for (i in humps) {
cat(str_c("Alice the camel has ", rep(i, 3), " humps.",
collapse = "\n"
), "\n")
if (i == "no") {
cat("Now Alice is a horse.\n")
} else {
cat("So go, Alice, go.\n")
}
cat("\n")
}
Alice the camel has five humps.
Alice the camel has five humps.
Alice the camel has five humps.
So go, Alice, go.
Alice the camel has four humps.
Alice the camel has four humps.
Alice the camel has four humps.
So go, Alice, go.
Alice the camel has three humps.
Alice the camel has three humps.
Alice the camel has three humps.
So go, Alice, go.
Alice the camel has two humps.
Alice the camel has two humps.
Alice the camel has two humps.
So go, Alice, go.
Alice the camel has one humps.
Alice the camel has one humps.
Alice the camel has one humps.
So go, Alice, go.
Alice the camel has no humps.
Alice the camel has no humps.
Alice the camel has no humps.
Now Alice is a horse.
monkeys <- c("Five", "Four", "Three", "Two", "One", "No")
for (i in monkeys) {
cat(str_c(rep(i,2), " little monkeys jumping on the bed.",
collapse = "\n"
), "\n")
if (i == "No") {
cat("None fell off and bumped his head.\n","Mamma called the doctor and the doctor said:\n","Put those monkeys back to bed.\n")
} else {
cat("One fell off and bumped his head.\n","Mamma called the doctor and the doctor said:\n","No more monkeys jumping on the bed.\n")
}
cat("\n")
}
Five little monkeys jumping on the bed.
Five little monkeys jumping on the bed.
One fell off and bumped his head.
Mamma called the doctor and the doctor said:
No more monkeys jumping on the bed.
Four little monkeys jumping on the bed.
Four little monkeys jumping on the bed.
One fell off and bumped his head.
Mamma called the doctor and the doctor said:
No more monkeys jumping on the bed.
Three little monkeys jumping on the bed.
Three little monkeys jumping on the bed.
One fell off and bumped his head.
Mamma called the doctor and the doctor said:
No more monkeys jumping on the bed.
Two little monkeys jumping on the bed.
Two little monkeys jumping on the bed.
One fell off and bumped his head.
Mamma called the doctor and the doctor said:
No more monkeys jumping on the bed.
One little monkeys jumping on the bed.
One little monkeys jumping on the bed.
One fell off and bumped his head.
Mamma called the doctor and the doctor said:
No more monkeys jumping on the bed.
No little monkeys jumping on the bed.
No little monkeys jumping on the bed.
None fell off and bumped his head.
Mamma called the doctor and the doctor said:
Put those monkeys back to bed.
Here I go. Did it.