Validation parallel cluster profile error because of the plugin function 'independentSubmitFcn.m' error

22 views (last 30 days)
Got error in validating parallel cluster profile, the error message is:
Error Report: Job submission failed because the plugin function 'independentSubmitFcn.m' errored.
Caused by:Brace indexing is not supported for variables of this type.
I used matlab-slurm plugins provided in this github repo. This seems confusing since the same cluster profile can be validated several days ago.
Thanks for any reply!
  2 Comments
Damian Pietrus
Damian Pietrus on 27 Oct 2023
A few questions before we do some troubleshooting:
  • Is the client that's submitting jobs on the cluster itself, or are you on a remote machine?
  • Have you made any edits to the plugin files themselves?
  • Does the error continue after restarting MATLAB?
We can try to manually submit the job to get more information from the log file. Please make sure that the Slurm cluster is set as your default from the "Parallel" drop-down menu, then try the following steps:
c=parcluster;
% Independent job
j=batch(c,@pwd,1,{});
If the job successfully submits, we can then wait for the job to finish before getting the log file. If the job does not submit, please let me know if the error message is the same as in your post or if it changed.
% If the job submitted, wait for it to finish
j.wait
% Get the log file for the independent job
c.getDebugLog(j.Tasks(1));
Wei Jianwen
Wei Jianwen on 9 Nov 2023
Hi Damian,
There are some additional infomation:
  • This client submits a slurm job with a remote client, need to input username and password when parpool is started
  • I don't modify the plugin function 'independentSubmitFcn.m' mentioned in error message
  • I can start parallel pool normally after restarting MATLAB, but the same error may occur after several days
  • I set slurm cluster as default in cluster profile manager, is that right?
Since restarting MATLAB can fix this problem, I haven't tried to manually submit jobs, I will write aother comment for this post if I do so.
Thanks for your reply! :D

Sign in to comment.

Answers (1)

Damian Pietrus
Damian Pietrus on 10 Nov 2023
Hey Wei,
Thanks for sending that additional information. MATLAB uses SSH to connect to and run commands on a remote cluster. When MATLAB is left open for a long period of time, that connection may end up breaking down for one reason or another. Once it's broken, any additional interactions with the cluster will fail until a new connection is established. To work around the issue you can restart MATLAB or you can try the following to see if it helps:
clear all force
c=parcluster;
% Interact with the cluster here. You can use the Job Monitor, submit a
% new job, etc.

Categories

Find more on Third-Party Cluster Configuration in Help Center and File Exchange

Products


Release

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!