# clusterApply {parallel}

### Description

These functions provide several ways to parallelize computations using a cluster.

### Usage

clusterCall(cl = NULL, fun, ...) clusterApply(cl = NULL, x, fun, ...) clusterApplyLB(cl = NULL, x, fun, ...) clusterEvalQ(cl = NULL, expr) clusterExport(cl = NULL, varlist, envir = .GlobalEnv) clusterMap(cl = NULL, fun, ..., MoreArgs = NULL, RECYCLE = TRUE, SIMPLIFY = FALSE, USE.NAMES = TRUE, .scheduling = c("static", "dynamic")) clusterSplit(cl = NULL, seq) parLapply(cl = NULL, X, fun, ...) parSapply(cl = NULL, X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) parApply(cl = NULL, X, MARGIN, FUN, ...) parRapply(cl = NULL, x, FUN, ...) parCapply(cl = NULL, x, FUN, ...) parLapplyLB(cl = NULL, X, fun, ...) parSapplyLB(cl = NULL, X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)

### Arguments

- cl
- a cluster object, created by this package or by package snow. If
`NULL`

, use the registered default cluster. - fun, FUN
- function or character string naming a function.
- expr
- expression to evaluate.
- seq
- vector to split.
- varlist
- character vector of names of objects to export.
- envir
- environment from which t export variables
- x
- a vector for
`clusterApply`

and`clusterApplyLB`

, a matrix for`parRapply`

and`parCapply`

. - ...
- additional arguments to pass to
`fun`

or`FUN`

: beware of partial matching to earlier arguments. - MoreArgs
- additional arguments for
`fun`

. - RECYCLE
- logical; if true shorter arguments are recycled.
- X
- A vector (atomic or list) for
`parLapply`

and`parSapply`

, an array for`parApply`

. - MARGIN
- vector specifying the dimensions to use.
- simplify, USE.NAMES
- logical; see
`sapply`

. - SIMPLIFY
- logical; see
`mapply`

. - .scheduling
- should tasks be statically allocated to nodes or dynamic load-balancing used?

### Details

`clusterCall`

calls a function `fun`

with identical arguments `...`

on each node.

`clusterEvalQ`

evaluates a literal expression on each cluster node. It is a parallel version of `evalq`

, and is a convenience function invoking `clusterCall`

.

`clusterApply`

calls `fun`

on the first node with arguments `seq[[1]]`

and `...`

, on the second node with `seq[[2]]`

and `...`

, and so on, recycling nodes as needed.

`clusterApplyLB`

is a load balancing version of `clusterApply`

. If the length `p`

of `seq`

is not greater than the number of nodes `n`

, then a job is sent to `p`

nodes. Otherwise the first `n`

jobs are placed in order on the `n`

nodes. When the first job completes, the next job is placed on the node that has become free; this continues until all jobs are complete. Using `clusterApplyLB`

can result in better cluster utilization than using `clusterApply`

, but increased communication can reduce performance. Furthermore, the node that executes a particular job is non-deterministic.

`clusterMap`

is a multi-argument version of `clusterApply`

, analogous to `mapply`

and `Map`

. If `RECYCLE`

is true shorter arguments are recycled (and either none or all must be of length zero); otherwise, the result length is the length of the shortest argument. Nodes are recycled if the length of the result is greater than the number of nodes. (`mapply`

always uses `RECYCLE = TRUE`

, and has argument `SIMPLIFY = TRUE`

. `Map`

always uses `RECYCLE = TRUE`

.)

`clusterExport`

assigns the values on the master R process of the variables named in `varlist`

to variables of the same names in the global environment (aka ‘workspace’) of each node. The environment on the master from which variables are exported defaults to the global environment.

`clusterSplit`

splits `seq`

into a consecutive piece for each cluster and returns the result as a list with length equal to the number of nodes. Currently the pieces are chosen to be close to equal in length: the computation is done on the master.

`parLapply`

, `parSapply`

, and `parApply`

are parallel versions of `lapply`

, `sapply`

and `apply`

. `parLapplyLB`

, `parSapplyLB`

are load-balancing versions, intended for use when applying `FUN`

to different elements of `X`

takes quite variable amounts of time, and either the function is deterministic or reproducible results are not required.

`parRapply`

and `parCapply`

are parallel row and column `apply`

functions for a matrix `x`

; they may be slightly more efficient than `parApply`

but do less post-processing of the result.

### Values

For `clusterCall`

, `clusterEvalQ`

and `clusterSplit`

, a list with one element per node.

For `clusterApply`

and `clusterApplyLB`

, a list the same length as `seq`

.

`clusterMap`

follows `mapply`

.

`clusterExport`

returns nothing.

`parLapply`

returns a list the length of `X`

.

`parSapply`

and `parApply`

follow `sapply`

and `apply`

respectively.

`parRapply`

and `parCapply`

always return a vector. If `FUN`

always returns a scalar result this will be of length the number of rows or columns: otherwise it will be the concatenation of the returned values.

An error is signalled on the master if any of the workers produces an error.

### Note

These functions are almost identical to those in package snow.

Two exceptions: `parLapply`

has argument `X`

not `x`

for consistency with `lapply`

, and `parSapply`

has been updated to match `sapply`

.

### Examples

## Use option cl.cores to choose an appropriate cluster size. cl <- makeCluster(getOption("cl.cores", 2)) clusterApply(cl, 1:2, get("+"), 3) xx <- 1 clusterExport(cl, "xx") clusterCall(cl, function(y) xx + y, 2) ## Use clusterMap like an mapply example clusterMap(cl, function(x, y) seq_len(x) + y, c(a = 1, b = 2, c = 3), c(A = 10, B = 0, C = -10)) parSapply(cl, 1:20, get("+"), 3) ## A bootstrapping example, which can be done in many ways: clusterEvalQ(cl, { ## set up each worker. Could also use clusterExport() library(boot) cd4.rg <- function(data, mle) MASS::mvrnorm(nrow(data), mle$m, mle$v) cd4.mle <- list(m = colMeans(cd4), v = var(cd4)) NULL }) res <- clusterEvalQ(cl, boot(cd4, corr, R = 100, sim = "parametric", ran.gen = cd4.rg, mle = cd4.mle)) library(boot) cd4.boot <- do.call(c, res) boot.ci(cd4.boot, type = c("norm", "basic", "perc"), conf = 0.9, h = atanh, hinv = tanh) stopCluster(cl) ## or library(boot) run1 <- function(...) { library(boot) cd4.rg <- function(data, mle) MASS::mvrnorm(nrow(data), mle$m, mle$v) cd4.mle <- list(m = colMeans(cd4), v = var(cd4)) boot(cd4, corr, R = 500, sim = "parametric", ran.gen = cd4.rg, mle = cd4.mle) } cl <- makeCluster(mc <- getOption("cl.cores", 2)) ## to make this reproducible clusterSetRNGStream(cl, 123) cd4.boot <- do.call(c, parLapply(cl, seq_len(mc), run1)) boot.ci(cd4.boot, type = c("norm", "basic", "perc"), conf = 0.9, h = atanh, hinv = tanh) stopCluster(cl)

Documentation reproduced from R 3.0.2. License: GPL-2.