- Network connectivity issues
- Insufficient computer resources or restrictions are placed on computer resources
- Licensing issues
- The job storage location is not set in a shared filesystem
- The job storage location needs to be cleared or changed
Why does MATLAB Parallel Server validation fail or stall at the SPMD/Pool job test stage (communicating batch jobs)?
9 views (last 30 days)
Show older comments
MathWorks Support Team
on 26 Oct 2022
Edited: MathWorks Support Team
on 2 Oct 2024 at 12:42
Why does MATLAB Parallel Server validation fail or stall at the SPMD/Pool job test stage (communicating batch jobs)?
Accepted Answer
MathWorks Support Team
on 2 Oct 2024 at 0:00
Edited: MathWorks Support Team
on 2 Oct 2024 at 12:42
This can be caused by a number of possible issues, including, but not limited to the following:
Make sure that each worker is able to communicate with each other over the network and that the appropriate ports are opened. If you don't know which ports should be opened, take a look at the link below.
How do I configure MATLAB Parallel Server using the MATLAB Job Scheduler to work within a firewall?
Please check your hosts file to make sure that any manual entries are added correctly. Entries added incorrectly can result in network connectivity issues.
Please make sure that MATLAB has the ability to access at least the minimum system requirements when validating your cluster. If you're unsure what the minimum system requirements are, take a look at the link below.
System Requirements
It is also possible that there is an issue with the Network License Manager. There are several different types of Network License Manager errors, such as the Network License Manager being misconfigured, not running, or its ports are blocked. Check the Network License Manager for any faults. Otherwise, create a full validation report to see if you can find a License Manager error or a log file in the validation report.
The location of where job data is stored across workers must be in a shared filesystem. If you're submitting a job on a compute or head node, this is JobStorageLocation. If you're submitting remotely to a cluster that uses Slurm, PBS, LSF, HTCondor, or Grid Engine and you don't have a shared filesystem with the cluster, then this is the RemoteJobStorageLocation.
For a variety of reasons, the job storage location may need to be cleared. You may want to attempt clearing your JobStorageLocation set in your cluster profile or choosing a different location, if you don't want to clear it.
0 Comments
More Answers (0)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!