Skip to Content

Ubuntu R ForEach / DoMC not using multiple cores

I have built a function in R (running on Ubuntu 12.04 LTS 64bit, 4 core i7 server with multithreading and 6gb ram) where I've installed R using the standard packages:

sudo apt-get install r-base r-recommended r-base-dev
sudo apt-get install r-cran-multicore r-cran-iterators r-cran-foreach r-cran-domc 

NB: I also installed foreach & doMC inside R (which didn't help either), like I installed the deldir package:

install.packages(c("deldir"), dependencies = TRUE)

My function runs fine, but it does not use parallel cores (just maxes out 1 of the 8):

library(deldir)
library(foreach)
library(doMC)
registerDoMC(cores=8)
 
#getDoParWorkers()
#getDoParName()
#getDoParVersion()
 
# loop through files
inputfiles <- dir(path="/home/geoadmin/data/objects/", pattern='.txt')
for( inputfilenr in 1:length(inputfiles))
{
# set file variables    
curinputfile = paste("/home/geoadmin/data/objects/",inputfiles[[inputfilenr]], sep = "", collapse = NULL)
print (curinputfile)
curoutputfile = paste("/home/geoadmin/data/objects/",substr(inputfiles[[inputfilenr]], start=1, stop=10), '.out', sep = "", collapse = NULL)
# select the point x/y coordinates into a data frame...
points <- read.csv(curinputfile, header = TRUE, sep = ",", dec=".", fill = TRUE)
# set calculation variables, precision on 3 digits only because of the RDW coordinate system
voro = deldir(points$x, points$y, digits=3, list(ndx=2,ndy=2), rw=c(min(points$x)-abs(min(points$x)-max(points$x)), max(points$x)+abs(min(points$x)-max(points$x)), min(points$y)-abs(min(points$y)-max(points$y)), max(points$y)+abs(min(points$y)-max(points$y))))
tiles = tile.list(voro)
poly = array()
# start loop
  poly <- foreach (i=1:length(tiles), .combine=cbind) %dopar% 
    {
    # load tile info
    tile = tiles[[i]]
    # start with EWKB notation
    curpoly = "POLYGON(("
    # add list of coordinates by looping through the points in tile
    for (j in 1:length(tiles[[i]]$x)) { curpoly = sprintf("%s %.6f %.6f,",curpoly,tile$x[[j]],tile$y[[j]]) }
    # then again the first point to close the polygon and end the EWKB notation, adding that to the poly array
    sprintf("%s %.6f %.6f))",curpoly,tile$x[[1]],tile$y[[1]])
    }
write.csv(t(poly), file = curoutputfile, row.names = FALSE) 
}

So the results are good, but no parallelism...

doMC did register correctly:

> getDoParWorkers()
[1] 8
> getDoParName()
[1] "doMC"
> getDoParVersion()
[1] "1.2.5"

If I look at the usage (with top):

top - 01:03:19 up 9 min,  3 users,  load average: 1.02, 0.86, 0.45
Tasks: 131 total,   2 running, 127 sleeping,   0 stopped,   2 zombie
Cpu(s): 12.5%us,  0.0%sy,  0.0%ni, 87.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   6104932k total,  1240512k used,  4864420k free,    16656k buffers
Swap:  6283260k total,        0k used,  6283260k free,   141996k cached
 
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
1553 zzzzzzzz  20   0  913m 850m 3716 R  100 14.3   8:22.03 R

So just maxing out one core. Does anyone have any idea what could cause foreach/doMC to not use multiple cores?

> sessionInfo()
R version 2.14.1 (2011-12-22)
Platform: x86_64-pc-linux-gnu (64-bit)
 
locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C                 LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
 
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
 
other attached packages:
[1] doMC_1.2.5      multicore_0.1-7 iterators_1.0.6 foreach_1.4.0
[5] deldir_0.0-19
 
loaded via a namespace (and not attached):
[1] codetools_0.2-8