Source file ⇒ lec16.Rmd
In the uniform distribution every number is equally likely. To create a vector of 5 numbers from the uniform distribution on the interval 0 to 1 use the code:
runif(n=5,min=0,max=1)
## [1] 0.8240253 0.6254761 0.4460579 0.4345161 0.8846122
I have added chapter 3 on functions in Data Camp’s Intermediate R to your hw 6. Yeah!!
R has a special mechanism for allowing you to use the same name in different places in your code and have it refer to different objects.
For example, you want to be able to create new variables in your functions and not worry if there are variables with the same name already in the workspace.
For example:
w <- 3
my_func=function(x,y,z){
w <- x^2
print(w)
}
my_func(2,3,4)
## [1] 4
w
## [1] 3
What is happening here is that w in the function my_func
is a separate copy of the w outside myfunc
. Because it is a separate copy changing w inside the function doesn’t mutate w outside of the function.
To understand this better we need to discuss environments.
When you call a function, R creates a new workspace containing just the variables defined by the arguments of that function. This collection of variables is called a frame.
We can list the frame, using `ls()’
my_func=function(x,y,z){
w <- x^2
print(ls())
}
my_func(2,3,4)
## [1] "w" "x" "y" "z"
What is the output of the following function call?
x <- 1; y <- 2
lookatframe <- function(a, b, c) print(ls())
lookatframe(a = 1, b = 2, c = 3)
## [1] "a" "b" "c"
However, R has a way of accessing variables that are not in the frame created by the function.
x <- 2
lookatframe <- function(a, b, c){print(ls()); print(x)}
lookatframe(a = 1, b = 2, c = 3)
## [1] "a" "b" "c"
## [1] 2
What is happening is that R is looking for variables with that name in a sequence of environments. An environment is just a frame (collection of variables) plus a pointer to the next environment to look in.
In our example, R didn’t find the variable x in the environment defined by the function lookatframe
, so it went on to the next one. In this case, this was our main workspace, which is called the Global Environment.
The “next environment to look in” is called the parent environment or enclosing environment.
environments:
The environments form a tree with the empty environment at the bottom. Here is a function that prints the tree of environment names.
tree <- function(env=globalenv()){
cat("+ ", environmentName(env), "\n")
if(environmentName(env) != environmentName(emptyenv())){
env <- parent.env(env)
Recall(env)
}
invisible(NULL) # same as result(NULL) but doesn't print
}
For example
tree()
## + R_GlobalEnv
## + package:DataComputing
## + package:curl
## + package:base64enc
## + package:manipulate
## + package:mosaic
## + package:mosaicData
## + package:car
## + package:lattice
## + package:knitr
## + package:stringr
## + package:tidyr
## + package:lubridate
## + package:dplyr
## + package:ggplot2
## + package:stats
## + package:graphics
## + package:grDevices
## + package:utils
## + package:datasets
## + package:methods
## + Autoloads
## + base
## + R_EmptyEnv
If R reaches the Global Environment and still can’t find the variable, it looks it looks down the tree. This is a list of additional environments, which is used for packages of functions and user attached data.
The environment inside of a function is called the evaluation environment.
x <- 2
lookatframe <- function(a, b, c){
evaluation_env <- environment()
tree(evaluation_env)
}
lookatframe(a = 1, b = 2, c = 3)
## +
## + R_GlobalEnv
## + package:DataComputing
## + package:curl
## + package:base64enc
## + package:manipulate
## + package:mosaic
## + package:mosaicData
## + package:car
## + package:lattice
## + package:knitr
## + package:stringr
## + package:tidyr
## + package:lubridate
## + package:dplyr
## + package:ggplot2
## + package:stats
## + package:graphics
## + package:grDevices
## + package:utils
## + package:datasets
## + package:methods
## + Autoloads
## + base
## + R_EmptyEnv
Note that, the name of the environment evaluation_env
is not returned by function environmentName() inside of tree(). This happens as the name of an environment is stored into the underlying C function and no assignment or replacement method exist, at the moment, for environments.
Since the function lookatframe
is defined in the Global environment, the Global environment is the parent (or enclosing ) environment lookatframe
.
In all of our examples the enclosing environment will be the Global environment.
To learn more about environements take CS 61.
Type the following code into R-studio and see what is the output. Does your result make sense? Discuss with a neighbor.
x <- 1
y <- 2
pow_two <- function(x){
y <- x^2
return(y)
}
pow_two(4)
## [1] 16
y
## [1] 2
x
## [1] 1
Here is another example from lab:
Where is f defined? How does the function g know what f is?
f <- function(n) {
if (n %% 2 == 0) {
n <- n / 2
} else {
n <- 3 * n + 1
}
return(n)
}
g <- function(n) {
count <- 0
while (n != 1) {
n <- f(n)
count <- count + 1
}
return(count)
}
g(6)
## [1] 8
We can “attach” a new environment to our tree containing a data table using attach()
. We actually insert an entry in the environment tree structure in the position given by the pos argument of function attach(). As this parameter defaults to pos=2L, most of the times we attach just underneath the global environment:
attach(mtcars)
## The following object is masked from package:ggplot2:
##
## mpg
tree()
## + R_GlobalEnv
## + mtcars
## + package:DataComputing
## + package:curl
## + package:base64enc
## + package:manipulate
## + package:mosaic
## + package:mosaicData
## + package:car
## + package:lattice
## + package:knitr
## + package:stringr
## + package:tidyr
## + package:lubridate
## + package:dplyr
## + package:ggplot2
## + package:stats
## + package:graphics
## + package:grDevices
## + package:utils
## + package:datasets
## + package:methods
## + Autoloads
## + base
## + R_EmptyEnv
When loading libraries, function library()
work on a similar basis and use the same parameter pos = 2L
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
tree()
## + R_GlobalEnv
## + package:MASS
## + mtcars
## + package:DataComputing
## + package:curl
## + package:base64enc
## + package:manipulate
## + package:mosaic
## + package:mosaicData
## + package:car
## + package:lattice
## + package:knitr
## + package:stringr
## + package:tidyr
## + package:lubridate
## + package:dplyr
## + package:ggplot2
## + package:stats
## + package:graphics
## + package:grDevices
## + package:utils
## + package:datasets
## + package:methods
## + Autoloads
## + base
## + R_EmptyEnv
When R looks for a named object, by default R looks for the name in the current envronment and if a matching name is found, the corresponding value is returned. If the name isn’t found it looks in the next environment down the tree.
Hence you can have for example several objects named pi in different environments.
pi <- 3
base::pi
## [1] 3.141593
pi
## [1] 3
rm(pi)
pi
## [1] 3.141593
The term passing a variable is used when a function is called with a variable you defined previously.
For example:
myAge <- 14
month <- 1
calculateBirthYear <- function(yourAge){
2016-yourAge
}
calculateBirthYear(myAge)
## [1] 2002
The variable myAge
is passed to the function calculateBirthYear
. There are two possibilities how you could have passed the variable myAge to the function. The terms “pass by value” and “pass by reference” are used to describe how variables are passed on. To make it short: pass by value means the actual value is passed on. Pass by reference means a number (called an address) is passed on which defines where the value is stored.
To understand how passing variables to functions works it helps to know a little about how objects are stored in the memory of your computer.
To make it simple, lets think of memory as many blocks which are next to each other. Each block has a number (the memory address). If you define a variable in your code, the value of the variable will be stored somewhere in the memory (your operating system will automatically decide where the best storage place is). The illustration below shows a part of some memory. The gray numbers on top of each block show the address of the block in memory, the colored numbers at the bottom show values which are stored in memory.
memory:
The variables myAge and month are defined in your code, and they will be stored in memory as shown in the illustration above. As example, the value of myAge
is stored at the address 106 and the value of month
is stored at the address 113.
Passing by value means that the value of the function parameter is copied into another location of your memory, and when accessing or modifying the variable within your function, only the copy is accessed/modified and the original value is left untouched. Passing by value is how your values are passed in R.
The following example shows a variable passed by value:
myAge <- 14
calculateBirthYear(myAge)
memory:
As soon as your software starts processing the calculateBirthYear function, the value myAge is copied to somewhere else in your computers memory. To make this more clear, the variable within the function is named age in this example.
function calculateBirthYear(age){
birthYear <- 2016-age
birthYear
}
Everything that is happening now with age does not affect the value of myAge (which is outside of calculateBirthYear‘s function scope) at all.
So how do you modify/update a variable outside of your function? When passing a variable by value, the only way to update the source variable is by using the returning value of the function.
A very simple example:
increaseAge <- function(age) age+1
myAge <- 14
myAge <- increaseAge(myAge)
myAge
## [1] 15
memory:
The variable myAge now holds the value 15
Passing by reference means that the memory address of the variable (a pointer to the memory location) is passed to the function. This is unlike passing by value, where the value of a variable is passed on. R and Python pass by Value. The programming language C can pass by value or by reference.
Discuss with your neighbor how to modify the code below so that the variable a is updated outside the function (.i.e. a outputs 15 not 5)
triple <- function(x) {
x <- 3*x
x
}
a <- 5
triple(a)
## [1] 15
a
## [1] 5
math_magic <- function(a,b=1){
if(b==0){
return(0)
}
a*b +a/b
}
math_magic(4,0)
## [1] 0
increment <- function(x, inc=1) {
x <- x+inc
x
}
count <- 5
count <- increment(count,2)
count
## [1] 7