The purpose of the future package is to provide a very simple and uniform way of evaluating R expressions asynchronously using various resources available to the user.
v <- {
cat("hello world")
3.14
}
## hello world
v
## [1] 3.14
It works by assigning the value of an expression to variable v and we then print the value of v. Moreover, when the expression for v is evaluated we also print a message.
library(future)
v %<-% {
cat("hello world\n")
3.14
}
v
## hello world
## [1] 3.14
library(future)
plan(multisession)
x <- v %<-% {
cat("Hello world!\n")
3.14
}
v
## Hello world!
## [1] 3.14
x
## MultisessionFuture:
## Label: '<none>'
## Expression:
## {
## cat("Hello world!\n")
## 3.14
## }
## Lazy evaluation: FALSE
## Asynchronous evaluation: TRUE
## Local evaluation: TRUE
## Environment: R_GlobalEnv
## Capture standard output: TRUE
## Capture condition classes: 'condition' (excluding 'nothing')
## Globals: <none>
## Packages: <none>
## L'Ecuyer-CMRG RNG seed: <none> (seed = FALSE)
## Resolved: TRUE
## Value: 56 bytes of class 'numeric'
## Early signaling: FALSE
## Owner process: e6d5773e-c1bb-1031-6630-265a9b0da16b
## Class: 'MultisessionFuture', 'ClusterFuture', 'MultiprocessFuture', 'Future', 'environment'
future 实现了以下这些类型的future:
可选策略 | 支持的操作系统 | 描述 |
---|---|---|
sequential | 全部 | 按当前进程顺序运行 |
multisession: | 全部 | 在后台R进程session中运行 |
multicore | 不支持Window | 并行asynchronously |
cluster | 全部 | 当前、本地和/或远程机器上的外部 R session |
multiprocess在未来 (>= 1.20.0) 被弃用,取而代之的是multisession和multicore。
同步future一个接一个地解决,最常见的是由创建它们的 R 进程解决。当解决同步未来时,它会阻塞主进程直到解决。
除非另有说明,否则顺序future是默认的。
plan(sequential)
pid <- Sys.getpid()# process ID
pid
## [1] 51786
x1 <- a %<-% {
pid <- Sys.getpid()
cat("Future 'a' ...\n")
3.3145
}
x2 <- b %<-% {
rm(pid)
cat("Future 'a' ...\n")
Sys.getpid()
}
x3 <- c %<-% {
cat("Future 'a' ...\n")
2 * a
}
## Future 'a' ...
b
## Future 'a' ...
## Warning in rm(pid): object 'pid' not found
## [1] 51786
c
## Future 'a' ...
## [1] 6.629
a
## [1] 3.3145
pid
## [1] 51786
由于任务是顺序评估,因此三个future中的每一个都会在创建的那一刻立即得到解决。另请注意pid,在调用环境中,分配了当前进程的进程 ID 的环境既没有被覆盖也没有被删除。这是因为future是在本地环境中评估的。由于使用了同步(单)处理,future b由主 R 进程(仍在本地环境中)解析,这就是b和的值pid相同的原因。
按照设计,这些future是非阻塞的,也就是说,在创建调用进程后,调用进程可用于其他任务,包括创建额外的future。只有当调用进程试图访问尚未解决的未来的值,或者当所有可用的 R 进程都忙于为其他未来服务时试图创建另一个异步未来时,它才会阻塞。
我们从多时段future开始,因为所有操作系统都支持它们。在与调用 R 进程相同的机器上运行的后台 R 会话中评估多会话未来。这是我们的多会话评估示例:
plan(multisession)
pid <- Sys.getpid()
pid
## [1] 51786
a %<-% {
pid <- Sys.getpid()
cat("Future 'a' ...\n")
3.14
}
b %<-% {
rm(pid)
cat("Future 'b' ...\n")
Sys.getpid()
}
c %<-% {
cat("Future 'c' ...\n")
2 * a
}
## Future 'a' ...
b
## Future 'b' ...
## Warning in rm(pid): object 'pid' not found
## [1] 52115
c
## Future 'c' ...
## [1] 6.28
a
## [1] 3.14
pid
## [1] 51786
可以看到进程的ID发生了变化。这种方法会在后台启动新的进程。如果所有后台会话都忙于为其他 futures 服务,则下一个多会话 future的创建将被阻止,直到后台会话再次可用为止。availableCores()查看可用的进程数量。
availableCores()
## system
## 8
在 R 支持asynchronously进程的操作系统(基本上是除 Windows 之外的所有操作系统)上,在后台生成 R 会话的替代方法是分叉现有的 R 进程。要使用多核 futures,在支持时指定:
plan(multicore)
## Warning in supportsMulticoreAndRStudio(...): [ONE-TIME WARNING] Forked
## processing ('multicore') is not supported when running R from RStudio
## because it is considered unstable. For more details, how to control forked
## processing or not, and how to silence this warning in future R sessions, see ?
## parallelly::supportsMulticore
像多会话期货一样,运行的最大并行进程数将由 决定availableCores。这种方式代码运行速率会更快。另一方面,进程分叉在某些 R 环境中也被认为是不稳定的。
集群future评估临时集群上的表达式(由 parallel 包实现)。例如,假设可以访问三个节点n1,n2和n3,然后可以将它们用于异步评估,如下所示:
plan(cluster, workers = c("n1", "n2", "n3"))
pid <- Sys.getpid()
pid
a %<-% {
pid <- Sys.getpid()
cat("Future 'a' ...\n")
3.14
}
b %<-% {
rm(pid)
cat("Future 'b' ...\n")
Sys.getpid()
}
c %<-% {
cat("Future 'c' ...\n")
2 * a
}
b
c
a
pid
创建的任何类型的集群parallel::makeCluster()都可以用于集群future。例如,上面的集群可以显式设置为:
cl <- parallel::makeCluster(c("n1", "n2", "n3"))
plan(cluster, workers = cl)
Futures will relay output produced by functions such as cat(), print() and str().
library(future)
fa <- future({
print("Use print function to print some message\n");
cat("Use cat function to print some message\n");
100L
})
fa
## SequentialFuture:
## Label: '<none>'
## Expression:
## {
## print("Use print function to print some message\n")
## cat("Use cat function to print some message\n")
## 100L
## }
## Lazy evaluation: FALSE
## Asynchronous evaluation: FALSE
## Local evaluation: TRUE
## Environment: R_GlobalEnv
## Capture standard output: TRUE
## Capture condition classes: 'condition' (excluding 'nothing')
## Globals: <none>
## Packages: <none>
## L'Ecuyer-CMRG RNG seed: <none> (seed = FALSE)
## Resolved: TRUE
## Value: 56 bytes of class 'integer'
## Early signaling: FALSE
## Owner process: e6d5773e-c1bb-1031-6630-265a9b0da16b
## Class: 'SequentialFuture', 'UniprocessFuture', 'Future', 'environment'
a <- value(fa)
## [1] "Use print function to print some message\n"
## Use cat function to print some message
a
## [1] 100
我们在调用value 的时候,就会输出信息,信息不会被赋值。如果想要获取这些信息,需要使用capture.output()。
library(future)
fa <- future({
print("Use print function to print some message\n");
cat("Use cat function to print some message\n");
100L
})
stdout <- capture.output(a <- value(fa))
a
## [1] 100
stdout
## [1] "[1] \"Use print function to print some message\\n\""
## [2] "Use cat function to print some message"
library("future")
library("listenv")
## IMPORTANT:
## 1. The below usage of lazy futures will only work when they are
## all evaluated in the same process.
## 2. We disable the capturing of standard output (stdout=NA) to avoid
## 'sink stack is full' errors
## 3. We disable the capturing of most conditions (condition="error") to
## avoid stacking up too many conditions
oplan <- plan(sequential)
## Defines the first 100 Fibonacci numbers
## (0, 1, 1, 2, 3, 5, 8, ...)
## but calculate only the ones need when
## a number is actually requested.
x <- listenv()
x[[1]] <- 0
x[[2]] <- 1
for (i in 3:150) {
x[[i]] %<-% { x[[i - 2]] + x[[i - 1]] } %lazy% TRUE %stdout% NA %conditions% "error"
}
## At this point nothing has been calculated,
## because lazy evaluation is in place.
## Get the 7:th Fibonnaci numbers (should be 8)
print(x[[7]])
## [1] 8
## At this point x[1:7] have been calculated,
## but nothing beyond.
## Let's get the 50:th number.
print(x[[50]])
## [1] 7778742049
## Reset plan
plan(oplan)
library(future.apply) # default plan is sequential
n <- 16
x <- rnorm(n)
lapply(1 : 5, function(id){
print(paste("id = ", id, sep = ""))
Sys.sleep(0.5)
sum(x[1 : id])
})
## [1] "id = 1"
## [1] "id = 2"
## [1] "id = 3"
## [1] "id = 4"
## [1] "id = 5"
## [[1]]
## [1] -1.233339
##
## [[2]]
## [1] -1.271531
##
## [[3]]
## [1] 0.2293369
##
## [[4]]
## [1] -0.9652846
##
## [[5]]
## [1] -0.6214609
future_lapply(1 : 5, function(id){
print(paste("id = ", id, sep = ""))
Sys.sleep(0.5)
sum(x[1 : id])
})
## [1] "id = 1"
## [1] "id = 2"
## [1] "id = 3"
## [1] "id = 4"
## [1] "id = 5"
## [[1]]
## [1] -1.233339
##
## [[2]]
## [1] -1.271531
##
## [[3]]
## [1] 0.2293369
##
## [[4]]
## [1] -0.9652846
##
## [[5]]
## [1] -0.6214609
tmp <- function(n){
res <- future_lapply(1 : 5, function(id){
Sys.sleep(0.5)
sum(x[1 : id])
})
return(res)
}
tmp(n)
## [[1]]
## [1] -1.233339
##
## [[2]]
## [1] -1.271531
##
## [[3]]
## [1] 0.2293369
##
## [[4]]
## [1] -0.9652846
##
## [[5]]
## [1] -0.6214609
plan(cluster) # a parallel plan
lapply(1 : 5, function(id){
print(paste("id = ", id, sep = ""))
Sys.sleep(0.5)
sum(x[1 : id])
})
## [1] "id = 1"
## [1] "id = 2"
## [1] "id = 3"
## [1] "id = 4"
## [1] "id = 5"
## [[1]]
## [1] -1.233339
##
## [[2]]
## [1] -1.271531
##
## [[3]]
## [1] 0.2293369
##
## [[4]]
## [1] -0.9652846
##
## [[5]]
## [1] -0.6214609
future_lapply(1 : 5, function(id){
print(paste("id = ", id, sep = ""))
Sys.sleep(0.5)
sum(x[1 : id])
})
## [1] "id = 1"
## [1] "id = 2"
## [1] "id = 3"
## [1] "id = 4"
## [1] "id = 5"
## [[1]]
## [1] -1.233339
##
## [[2]]
## [1] -1.271531
##
## [[3]]
## [1] 0.2293369
##
## [[4]]
## [1] -0.9652846
##
## [[5]]
## [1] -0.6214609
tmp(n)
## [[1]]
## [1] -1.233339
##
## [[2]]
## [1] -1.271531
##
## [[3]]
## [1] 0.2293369
##
## [[4]]
## [1] -0.9652846
##
## [[5]]
## [1] -0.6214609
# ------ RNG ------
future_sapply(1 : 5, rnorm) # `future` detects RNG
## Warning: UNRELIABLE VALUE: One of the 'future.apply' iterations
## ('future_sapply-1') unexpectedly generated random numbers without declaring so.
## There is a risk that those random numbers are not statistically sound and the
## overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This
## ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-
## CMRG method. To disable this check, use 'future.seed = NULL', or set option
## 'future.rng.onMisuse' to "ignore".
## Warning: UNRELIABLE VALUE: One of the 'future.apply' iterations
## ('future_sapply-2') unexpectedly generated random numbers without declaring so.
## There is a risk that those random numbers are not statistically sound and the
## overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This
## ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-
## CMRG method. To disable this check, use 'future.seed = NULL', or set option
## 'future.rng.onMisuse' to "ignore".
## Warning: UNRELIABLE VALUE: One of the 'future.apply' iterations
## ('future_sapply-3') unexpectedly generated random numbers without declaring so.
## There is a risk that those random numbers are not statistically sound and the
## overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This
## ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-
## CMRG method. To disable this check, use 'future.seed = NULL', or set option
## 'future.rng.onMisuse' to "ignore".
## Warning: UNRELIABLE VALUE: One of the 'future.apply' iterations
## ('future_sapply-4') unexpectedly generated random numbers without declaring so.
## There is a risk that those random numbers are not statistically sound and the
## overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This
## ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-
## CMRG method. To disable this check, use 'future.seed = NULL', or set option
## 'future.rng.onMisuse' to "ignore".
## Warning: UNRELIABLE VALUE: One of the 'future.apply' iterations
## ('future_sapply-5') unexpectedly generated random numbers without declaring so.
## There is a risk that those random numbers are not statistically sound and the
## overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This
## ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-
## CMRG method. To disable this check, use 'future.seed = NULL', or set option
## 'future.rng.onMisuse' to "ignore".
## [[1]]
## [1] 0.7937676
##
## [[2]]
## [1] 0.4458305 -0.4639370
##
## [[3]]
## [1] -0.2912784 -0.8178072 0.8174065
##
## [[4]]
## [1] -2.6451568 -2.0489655 -0.8032186 -0.5391596
##
## [[5]]
## [1] -0.3048726 -0.2305664 1.9000627 0.9261816 0.6889935
future_sapply(1 : 5, rnorm, future.seed = TRUE)
## [[1]]
## [1] 0.3332837
##
## [[2]]
## [1] -1.591530 -1.381579
##
## [[3]]
## [1] -0.4826246 -0.4757832 -0.8440203
##
## [[4]]
## [1] 0.1786749 -1.8292555 1.5784089 -0.7774268
##
## [[5]]
## [1] -0.008376357 0.139623080 0.650083650 -0.266630147 1.948146623
future_sapply(1 : 5, rnorm, future.seed = TRUE)
## [[1]]
## [1] -0.2426914
##
## [[2]]
## [1] 0.3249415 0.5029080
##
## [[3]]
## [1] 0.3451973 0.2275098 -0.3740153
##
## [[4]]
## [1] -0.2657622 -0.4918470 -0.1710568 -0.9133990
##
## [[5]]
## [1] -0.3145897 -0.6221692 -1.1144129 0.7851857 -0.4723381
future_sapply(1 : 5, rnorm, future.seed = 10)
## [[1]]
## [1] 2.453928
##
## [[2]]
## [1] -0.5940859 0.4417292
##
## [[3]]
## [1] -2.16701523 0.06915936 -0.82262772
##
## [[4]]
## [1] -1.432698 -1.222038 -1.110026 -1.028221
##
## [[5]]
## [1] -0.5759535 -1.6197445 2.4734210 -0.7285216 -0.1637224
future_sapply(1 : 5, rnorm, future.seed = 10)
## [[1]]
## [1] 2.453928
##
## [[2]]
## [1] -0.5940859 0.4417292
##
## [[3]]
## [1] -2.16701523 0.06915936 -0.82262772
##
## [[4]]
## [1] -1.432698 -1.222038 -1.110026 -1.028221
##
## [[5]]
## [1] -0.5759535 -1.6197445 2.4734210 -0.7285216 -0.1637224
library(parallel)
# ------ simple example 1 ------
cls <- makeCluster(8)
# we can split 1 to 100 directly
idx_split <- clusterSplit(cls, 1 : 1000)
res <- parLapply(cls,
idx_split,
function(x){
return(sum(x))
})
sum(unlist(res))
## [1] 500500
stopCluster(cls)
rm(cls)
library(parallel)
a <- rnorm(12)
slow_function <- function(invec){
res <- 0
for(i in 1 : length(invec)){
Sys.sleep(0.1)
res <- res + invec[i]
}
return(res)
}
cls <- makeCluster(4)
ind_seq <- clusterSplit(cls, a)
clusterExport(cls, varlist = "slow_function")
res_par <- parSapply(cls, ind_seq, slow_function)
res <- sum(res_par)
library(parallel)
f <- function(invec, b ){
return(sum(invec + b))
}
a <- rnorm(100)
bb <- 1
cls <- makeCluster(4)
id_seq <- clusterSplit(cls, a)
x <- parSapply(cls, id_seq, f, b = bb )
stopCluster(cls)
rm(cls)
R 最有用的特性之一是它的交互式解释器。这使得学习和试验 R 变得非常容易。它允许您像使用计算器一样使用 R 来执行算术运算、显示数据集、生成绘图和创建模型。
不久之后,新的 R 用户会发现需要重复执行一些操作。也许他们想重复运行模拟以找到结果的分布。也许他们需要执行一个传递给它的各种不同参数的函数。或者他们可能需要为许多不同的数据集创建一个模型。
重复执行可以手动完成,但是执行重复操作变得相当繁琐,即使使用命令行编辑也是如此。幸运的是,R 不仅仅是一个交互式计算器。它有自己的内置语言,旨在自动执行繁琐的任务,例如重复执行 R 计算。
R 带有各种循环结构来解决这个问题。for循环是更常见的循环结构之一,但是andrepeat语句while也非常有用。此外,还有“应用”函数系列,其中包括apply、lapply、sapply、eapply、mapply、rapply和其他。
foreach是它支持并行执行,也就是说,它可以在您计算机上的多个处理器/内核或集群的多个节点上执行那些重复的操作。如果每个操作都需要一分钟以上的时间,而您想执行数百次,则整个运行时间可能需要数小时。但是使用foreach,该操作可以在集群上的数百个处理器上并行执行,从而将执行时间缩短到几分钟。
我们来看一个最简单的例子,这就类似一个循环,返回结果是一个列表:
library(foreach)
x <- foreach(i=1:3) %do% sqrt(i)
x
## [[1]]
## [1] 1
##
## [[2]]
## [1] 1.414214
##
## [[3]]
## [1] 1.732051
这看起来有点奇怪,因为它看起来有点像一个for循环,但它是使用二元运算符实现的,称为%do%. 此外,与for循环不同的是,它返回一个值。
我们还可以指定要在 R 表达式中使用的其他变量,如以下示例所示:
x <- foreach(a=1:3, b=rep(10, 3)) %do% (a + b)
x
## [[1]]
## [1] 11
##
## [[2]]
## [1] 12
##
## [[3]]
## [1] 13
注意这里需要括号。我们也可以使用大括号:
x <- foreach(a=1:3, b=rep(10, 3)) %do% {
a + b
}
x
## [[1]]
## [1] 11
##
## [[2]]
## [1] 12
##
## [[3]]
## [1] 13
我们称为a和b为迭代变量,这两个迭代变量长度需要相同,如果不相同,则根据短的为准。
x <- foreach(a=1:1000, b=rep(10, 2)) %do% {
a + b
}
x
## [[1]]
## [1] 11
##
## [[2]]
## [1] 12
请注意,可以在大括号之间放置多个语句,并且可以使用赋值语句来保存计算的中间值。但是,如果您使用赋值作为在循环的不同执行之间进行通信的方式,那么您的代码将无法正确并行运行,
如果希望以数字向量返回结果
x <- foreach(i=1:3, .combine='c') %do% exp(i)
x
## [1] 2.718282 7.389056 20.085537
如果希望以将所有结果求和或者乘积
x <- foreach(i=1:3, .combine='sum') %do% exp(i)
#x <- foreach(i=1:3, .combine='+') %do% exp(i)
# x <- foreach(i=1:3, .combine=sum') %do% exp(i)
x
## [1] 30.19287
x <- foreach(i=1:3, .combine='*') %do% exp(i)
x
## [1] 403.4288
以矩阵方式返回结果
x <- foreach(i=1:4, .combine='cbind') %do% rnorm(4)
x
## result.1 result.2 result.3 result.4
## [1,] 0.50875670 -0.9799064 0.008747247 -0.34442546
## [2,] 0.06812255 1.3499767 -1.309682930 -0.58937244
## [3,] 0.25552643 -0.4431785 -1.102955333 1.34077851
## [4,] 2.11312265 -0.7140267 0.164489357 0.05318948
也可以使用用户自己编写的函数。请注意,此cfun函数有两个参数。该foreach函数知道函数c、cbind和rbind接受很多参数,并且将使用最多 100 个参数(默认情况下)调用它们以提高性能。
cfun <- function(...) NULL
x <- foreach(i=1:4, .combine='cfun', .multicombine=TRUE) %do% rnorm(4)
x
## NULL
如果希望使用不超过 10 个参数调用 combine 函数,可以使用以下.maxcombine选项指定:
cfun <- function(...) NULL
x <- foreach(i=1:4, .combine='cfun', .multicombine=TRUE, .maxcombine=10) %do% rnorm(4)
x
## NULL
.inorder选项用于指定参数组合的顺序是否重要。默认值为TRUE,但如果组合函数为”+“。可以指定.inorder为FALSE。实际上,此选项仅在并行执行 R 表达式时很重要,因为在顺序运行时结果总是按顺序计算。事实上,如果表达式执行的时间长度非常不同,则可以按任何顺序返回结果,这样效率更高。
system.time(
foreach(i=4:1, .combine='c') %dopar% {
Sys.sleep(3 * i)
i
}
)
## Warning: executing %dopar% sequentially: no parallel backend registered
## user system elapsed
## 0.005 0.001 30.017
system.time(
foreach(i=4:1, .combine='c', .inorder=FALSE) %dopar% {
Sys.sleep(3 * i)
i
}
)
## user system elapsed
## 0.004 0.001 30.016
foreach函数会自动从向量、列表、矩阵或数据框等创建迭代器。iterators包提供了一个调用的函数irnorm,每次调用它可以返回指定数量的随机数。
library(iterators)
x <- foreach(a=irnorm(5, count=5), .combine='cbind') %do% a
x
## result.1 result.2 result.3 result.4 result.5
## [1,] 0.9475001 1.20064192 -2.118519 -1.532407626 -0.4576409
## [2,] 0.3915409 -0.60181145 1.501910 -0.003637352 0.1589688
## [3,] -0.1035352 -0.15922928 1.652527 -1.603287153 -0.2778248
## [4,] -1.7104410 -0.04409682 -1.060242 -1.125723565 0.2792607
## [5,] 0.1679036 1.63316639 -1.494106 0.722020888 -1.8482118
这在处理大量数据时非常有用。迭代器允许根据操作的需要即时生成数据,而不是要求在开始时生成所有数据。
例如,假设我们想要对一千个随机向量求和:
# system.time(
# {
# set.seed(123)
# x <- numeric(4)
# i <- 0
# while (i < 100000) {
# x <- x + rnorm(4)
# i <- i + 1
# }
# x
# }
#
# )
# 以使用icount生成从 1 到 1000 的值的函数来完成:
system.time(
{
set.seed(123)
x <- foreach(icount(1000), .combine='+') %do% rnorm(4)
x
}
)
## user system elapsed
## 0.233 0.004 0.239
尽管foreach它本身可能是一个有用的结构,但该foreach包的真正意义在于进行并行计算。要使前面的任何示例并行运行,您所要做的就是替换%do%为%dopar%. 但是对于我们一直在做的那种快速运行的操作,并行执行它们没有多大意义。并行运行许多小任务通常比顺序运行它们需要更多的时间来执行,如果它已经运行得很快,就没有动力让它运行得更快。但是,如果我们并行执行的操作需要一分钟或更长时间,就会开始有一些动力。
x <- matrix(runif(500), 100)
y <- gl(2, 50) # 创建factor
library(randomForest)
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
如果我们想创建一个有 1000 棵树的随机森林模型,而我们的计算机有四个核心,我们可以通过执行randomForest四次函数将问题分成四个部分,ntree参数设置为 250。当然,我们必须组合生成的randomForest对象,但是randomForest包中有一个函数combine可以做到这一点。
首先是按照顺序
rf <- foreach(ntree=rep(250, 4), .combine=combine) %do%
randomForest(x, y, ntree=ntree)
rf
##
## Call:
## randomForest(x = x, y = y, ntree = ntree)
## Type of random forest: classification
## Number of trees: 1000
## No. of variables tried at each split: 2
并行,修改%do% 为%dopar%,
rf <- foreach(ntree=rep(250, 4), .combine=combine, .packages='randomForest') %dopar%
randomForest(x, y, ntree=ntree)
rf
##
## Call:
## randomForest(x = x, y = y, ntree = ntree)
## Type of random forest: classification
## Number of trees: 1000
## No. of variables tried at each split: 2
条件判断
x <- foreach(a=irnorm(1, count=10), .combine='c') %:% when(a >= 0) %do% sqrt(a)
x
## [1] 0.4055020 1.0835713 0.8704032 0.3653185 1.4166866 0.8115083
大部分并行计算都是为了做三件事:将问题拆分成多个部分,并行执行各个部分,然后将结果组合在一起。
使用foreach包,迭代器帮助您将问题拆分成多个部分,%dopar%函数并行执行这些部分,指定的.combine函数将结果组合在一起。