unable to pass "Parallel pool test" on remote Parallel server
5 views (last 30 days)
I have set up MATLAB Parallel Server on our cluster. The MATLAB Job Scheduler is running on the headnode, and is able to talk to all of the workers on the compute nodes.
If I run MATLAB as a client on the headnode, I can pass all of the cluster profile validation tests. However, if I run the same tests on a different client machine (outside of the cluster), all of the tests pass except for the "Parallel pool test (parpool)". It fails after about 6 minutes with the following error:
Error Report: Failed to initialize the interactive session.
Error using parallel.internal.pool.InteractiveClient>iThrowIfBadParallelJobStatus (line 789)
The interactive communicating job errored with the following message: Client unable to connect to worker. Check whether a firewall is blocking communication between the worker machine and the MATLAB client machine.
I have the headnode set up so that it is nat-ing the cluster node traffic out of the cluster, so I am not sure why this isn't working. What is different between this test and the others, that this one would be failing when the others pass? It seems to me that in the previous tests, the client is talking to the MJS, and that is all, but in this case the workers need to talk directly to the client (according to the error message), which should be working (I can ssh from the worker machine to the client without issue). If the converse is true, and the client has to talk directly to the worker, I don't see how this would ever work in a cluster situation.
On another track, it may be that some ports are being blocked by filtering on our network switches. What ports do the workers need to be able to talk to the client?
Thank you for any help!
Jason Ross on 30 Jul 2019
The required ports are documented here. Note that they are configurable in the mdce_def or mjs_def (.bat or .sh, dependingon platform) files in <matlabroot>/toolbox/distcomp/. There is some more detail in that file, as well.
It may be useful to set the hostname, IP, or ports explictly on the client host. To do that, use the pctconfig command in a fresh session of MATLAB before you attempt to run any other parallel commands. The client tries to "get this right" but in some cases you need to be explicit about the exact IP of the host and/or hostname to use.