Comes in handy, whenever in need to replace one value by another.
Create a vector with some numeric values. Let’s say that ten of the values range from 2 to 6, and just a single value is 100. This is going to be our outlier, which we will replace with NA.
set.seed(111) # for reproducible results
x <- c(sample(2:6, 10, replace = TRUE),100)
# Note that in the line above replace = TRUE is very different from that what we want to use.
# This replace here is an argument within sample(), and replace() is the function!
x
## [1] 4 5 3 4 3 4 2 4 4 2 100
You can calculate the mean of this vector…
mean(x)
## [1] 12.27273
but your SD is pretty high (much higher than your mean!). This is obviously the outlier’s fault.
sd(x)
## [1] 29.11045
Let’s replace the outliter with NA. This may not be the most elegant way to get rid of the outlier but sometimes there is no choice.
We are going to use replace() for that. This function needs three arguments:
1) the vector in which we want to replace an element/s
2) position/s of the element/s we want to replace
3) the value with which we want to replace
In our case:
1) vector = x
2) position = 11; you can use which() to establish the position
3) value = NA
position <- which(x==100)
replace(x, position, NA)
## [1] 4 5 3 4 3 4 2 4 4 2 NA
Works!!
So, just for completness. Rewrite the vector with 100 replaced by NA, and calculate mean and sd.
x <- replace(x, which(x==100), NA)
# note that for the position argument we put the condition with which(), which gives us position! ;)
mean(x); sd(x)
## [1] NA
## [1] NA
Ooops! Doesn’t work?!
This is because you have NA value in your vector now, such values require a special handling, i.e. na.rm = TRUE in mean(), and sd()
mean(x, na.rm = TRUE); sd(x, na.rm = TRUE)
## [1] 3.5
## [1] 0.9718253
So basically you just removed the NA value, and calculated the mean and sd. You could also just remove your outlier, and had exactly the same way. But, as always, there is always more then one way to skin the cat;) and here you just wanted to explore the replace().
R is great!