Ubuntu R ForEach / DoMC not using multiple cores
I have built a function in R (running on Ubuntu 12.04 LTS 64bit, 4 core i7 server with multithreading and 6gb ram) where I've installed R using the standard packages:
sudo apt-get install r-base r-recommended r-base-dev sudo apt-get install r-cran-multicore r-cran-iterators r-cran-foreach r-cran-domc
NB: I also installed foreach & doMC inside R (which didn't help either), like I installed the deldir package:
install.packages(c("deldir"), dependencies = TRUE)
My function runs fine, but it does not use parallel cores (just maxes out 1 of the 8):
library(deldir) library(foreach) library(doMC) registerDoMC(cores=8) #getDoParWorkers() #getDoParName() #getDoParVersion() # loop through files inputfiles <- dir(path="/home/geoadmin/data/objects/", pattern='.txt') for( inputfilenr in 1:length(inputfiles)) { # set file variables curinputfile = paste("/home/geoadmin/data/objects/",inputfiles[[inputfilenr]], sep = "", collapse = NULL) print (curinputfile) curoutputfile = paste("/home/geoadmin/data/objects/",substr(inputfiles[[inputfilenr]], start=1, stop=10), '.out', sep = "", collapse = NULL) # select the point x/y coordinates into a data frame... points <- read.csv(curinputfile, header = TRUE, sep = ",", dec=".", fill = TRUE) # set calculation variables, precision on 3 digits only because of the RDW coordinate system voro = deldir(points$x, points$y, digits=3, list(ndx=2,ndy=2), rw=c(min(points$x)-abs(min(points$x)-max(points$x)), max(points$x)+abs(min(points$x)-max(points$x)), min(points$y)-abs(min(points$y)-max(points$y)), max(points$y)+abs(min(points$y)-max(points$y)))) tiles = tile.list(voro) poly = array() # start loop poly <- foreach (i=1:length(tiles), .combine=cbind) %dopar% { # load tile info tile = tiles[[i]] # start with EWKB notation curpoly = "POLYGON((" # add list of coordinates by looping through the points in tile for (j in 1:length(tiles[[i]]$x)) { curpoly = sprintf("%s %.6f %.6f,",curpoly,tile$x[[j]],tile$y[[j]]) } # then again the first point to close the polygon and end the EWKB notation, adding that to the poly array sprintf("%s %.6f %.6f))",curpoly,tile$x[[1]],tile$y[[1]]) } write.csv(t(poly), file = curoutputfile, row.names = FALSE) }
So the results are good, but no parallelism...
doMC did register correctly:
> getDoParWorkers() [1] 8 > getDoParName() [1] "doMC" > getDoParVersion() [1] "1.2.5"
If I look at the usage (with top):
top - 01:03:19 up 9 min, 3 users, load average: 1.02, 0.86, 0.45 Tasks: 131 total, 2 running, 127 sleeping, 0 stopped, 2 zombie Cpu(s): 12.5%us, 0.0%sy, 0.0%ni, 87.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 6104932k total, 1240512k used, 4864420k free, 16656k buffers Swap: 6283260k total, 0k used, 6283260k free, 141996k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1553 zzzzzzzz 20 0 913m 850m 3716 R 100 14.3 8:22.03 R
So just maxing out one core. Does anyone have any idea what could cause foreach/doMC to not use multiple cores?
> sessionInfo() R version 2.14.1 (2011-12-22) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] doMC_1.2.5 multicore_0.1-7 iterators_1.0.6 foreach_1.4.0 [5] deldir_0.0-19 loaded via a namespace (and not attached): [1] codetools_0.2-8
