Skip to Content

Speed up a sampling function [migrated]

I have the following dataset


Over the monitoring period of 100 days, if an event occurs then for that day kkk$positive==1 else kkk$positive==0

The following function, samples k=s.freq=1:12 number of times j=s.length=1:30 consecutive elements from the sample space and checks if at least in one of the samples an event was recorded. The sample space consists of all possible j consecutive combinations of kkk$days. The elements of the sample space are overlapping. Before each of the k=s.freq=1:12 samples, the sample space is reevaluated and elements that have common days with the sampled element are removed before the next sampling occurs (for the total of k samplings).

my.sampling<-function(days, status, s.length, s.freq, iter){
    #build the data frame
    ddd<-data.frame(days=days, positive=status)
#length of each sample
#Buld the sample space<-matrix(nrow=dim(ddd)[1]-j+1, ncol=j+1)
    #Create the elements of sample space: each row of the is an element of the sample space
    for (i in 1:j){[,i]<-i:(n.row-1+i)
    #adds if the each of the possible samples of the sample space at least one event was observed or not[,j+1]<-apply(,1,function(x) {
    #Build the initial sample space as data frame 
suc.list<-rep(NA, iter) #Initiate a vector to keep track of successes
    #Now the sampling takes place
    for (i in 1:iter) { #for each iteration
        k<-s.freq #number of samples drawn from the sample space
        k.t=0 #controller to break the while loop
        while (k.t<=k | dim(sampling.start)[1]>0) { #breaks the while loop if k reaches the specified limit (k=s.freq) or if the reduction of the sample space (see below) does not leave any elements in the sample space
            s<-sampling.start[sample(nrow(sampling.start),1),] #samples the first trial
            suc<-suc+s[1,j+1] #adds up if there is a success or not ( a tleast one event in the sample was observed)
            sampling.start <- sampling.start[apply(sampling.start, 1, function(x) !any(x %in% as.numeric(s[,1:j]))),] #sample space reduction: Elements of the sample space that intersect with the sample that was just selected are removed from the sample space before the next sampling occurs
            k.t<-k.t+1 #controller for the wile loop
    #hist(suc.list, breaks=sqrt(iter))

For example:

my.sampling(kkk$days, kkk$positive, s.length=3, s.freq=5, iter=300)

My problem is that the function requires about 100msec for each iteration and I need to run this function about 100.000 times with at least 1000 iterations each time. I have used preallocation of the vectors and apply to speed things up, however its still too slow.

Parallel processing is applicable to my problem as I can split the data on which the function will have to run on multiple cores, however if my calculations are right I need about 120 days of processing pro CPU core.

Is this possible to reduce the run time of the function? I have tried cpmfun but without much improvement. Otherwise I have to resort to some type of AWS EC2 solution.

Thanks a lot