parFor Worker unable to find file. Unrecognized function or variable 'parameters'
Show older comments
Hi there I have trouble to paralize the the loop.
It is from here:
if I try to run the code below, parfor complains about the "iteration". If I set the iteration=epch, the the follwowing error occurs:
The source code ("XX".m) for the parfor-loop that is trying to execute on the worker could not be found.
Caused by:
Unrecognized function or variable 'parameters'.
Error using remoteParallelFunction (line 84)
Worker unable to find file.
Unrecognized function or variable 'parameters'.
But the "parameters" are defined-as you can see- inside the link above.
Any idea is welcome.!
Chris
start = tic;
iteration = 0;
parpool(32);
parfor epoch = 1:numEpochs
reset(mbq);
while hasdata(mbq)
iteration = iteration + 1;
dlXT = next(mbq);
dlX = dlXT(1,:);
dlT = dlXT(2,:);
% Evaluate the model gradients and loss using dlfeval and the
% modelGradients function.
[gradients,loss] = dlfeval(accfun,parameters,dlX,dlT,dlX0,dlT0,dlU0);
% Update learning rate.
learningRate = initialLearnRate / (1+decayRate*iteration);
% Update the network parameters using the adamupdate function.
[parameters,averageGrad,averageSqGrad] = adamupdate(parameters,gradients,averageGrad, ...
averageSqGrad,iteration,learningRate);
end
% Plot training progress.
loss = double(gather(extractdata(loss)));
addpoints(lineLoss,iteration, loss);
D = duration(0,0,toc(start),'Format','hh:mm:ss');
title("Epoch: " + epoch + ", Elapsed: " + string(D) + ", Loss: " + loss)
drawnow
end
2 Comments
Edric Ellis
on 13 Aug 2021
It doesn't look like you've defined parameters inside the parfor loop? The first reference I see is in the dlfeval call. To be a legal parfor loop, you'd need to define parameters just after the start of each iteration of parfor.
I suspect however that this loop isn't parallelisable in this way. Surely each iteration of the loop depends on the previous iteration (via the parameters variable).
CSCh
on 13 Aug 2021
Answers (1)
Raymond Norris
on 13 Aug 2021
0 votes
As Edric mentioned, this is/may not parallizable. But if it were, you'd need to make modifications to the for-loop. You've included the updating of the figure within the parfor, which won't work as you're expecting (the pool of workers don't have access to your client MATLAB displaying the plot). There are ways around this, but won't address the fundemental issue of whether the loop can be parallelized.
I would suggest circling back to the Training Options section of the example, where it describes using GPUs to help train the model. I suspect this is where you find you best option to speed up the code.
12 Comments
CSCh
on 16 Aug 2021
Walter Roberson
on 16 Aug 2021
Your parfor version has
parfor epoch = 1:numEpochs
which looks like it was derived from a non-parallel version,
for epoch = 1:numEpochs
Now... suppose you were to replace that with
for epoch = randperm(numEpochs)
so that the epoch were only done one at a time, but they were done in a random order. So instead of the parameters for epoch 2 being whatever was estimated for epoch 1, the parameters for epoch 2 might be what was estimated after doing epochs 17, 4, 38, 5 (in that order.... as an example.) The results that would be produced would certainly be different than proceeding in sequence. Would the results be acceptable though?
Now, suppose you were able to do two epoch simultaneously -- say epoch = 1 and epoch = 2. The parameters would be the same for both, so instead of the parameters for epoch 2 being informed by the results for epoch 1, both epoch 1 and 2 would use the same initial parameters. Suppose that epoch 1 and epoch 2 take very nearly the same amount of time... epoch 1 might finish first, or epoch 2 might finish first, When one of them finished and you went on to epoch 3, would it be acceptable that the parameters would be set according to whichever one of the two had finished most recently? So due to accidents of timing, epoch 3 might use the parameters generated from epoch 1, or possibly epoch 3 might use the parameters generated from epoch 2 (depending which finished first), and epoch 4 might end up using the estimates from the other one of the two... but that due to accidents of timing, epoch 3 might not be ready to go until both epoch 1 and 2 were finished, so epoch 3 and epoch 4 might end up both using whichever of epoch 1 and epoch 2 finished last. The results would be different than proceeding one at a time in sequence, and the results would not be repeatable, but would the resutls be acceptable ??
If the results would be acceptable, then we could probably work something out for you.
CSCh
on 16 Aug 2021
Walter Roberson
on 16 Aug 2021
In some cases, you can do a parallel sweep through all of the files, extracting statistical information or "features" from each one, and recording those. Then you would have a step that sythesized all of the statistical or feature information into a single best set of parameters and gradients, so that the adamupdate step did not have to be run for each epoch. That would be followed by a parallel sweep that could run in any order because it did not need to update estimates.
Generally speaking, NN can be trained in parallel if you can do this kind of sweep to figure out the best parameters to use for all iterations, or if you can do partial training, getting out parameters, and then do a "fix-up" based upon all of the partial results.
I would not expect GAN to be trainable in parallel -- except for different versions of the net (different weights) with the one with the best result being the one that was chosen.
CSCh
on 18 Aug 2021
Raymond Norris
on 18 Aug 2021
Do you have access to an NVIDIA GPU?
CSCh
on 18 Aug 2021
CSCh
on 24 Aug 2021
Raymond Norris
on 24 Aug 2021
If updating the NVIDIA driver causes an error, then it's not surprising that you don't get any acceleration. I suspect you need to resolve the NVIDIA driver issue first.
What error is getting thrown?
CSCh
on 24 Aug 2021
Walter Roberson
on 25 Aug 2021
Could you remind us if your computation is single precision or double precision?
512 cores looks to be one of:
- GeForce GTX 580 (late 2010) (GF110 -- Fermi based), which does double precision at 1/8 of the single precision rate, about 197 gigaflops
- GeForce GTX 750 (early 2014) (GM107 -- Maxwell based), which does double precision at 1/32 of the single precision rate, about 34 gigaflops
You may have noticed that the older card does double precision about 6 times faster than the newer card. When you are doing double-precision work you need to pay a lot of attention to the specifications for the individual model !!
CSCh
on 25 Aug 2021
Categories
Find more on Parallel for-Loops (parfor) in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!