MDCE Admin center is ok, But parallel computing does not recognize workers
Show older comments
Hi All,
I have a desktop with XPSP3x86 os and a laptop with win7x64. Both have MATLAB distributed computing server.
I start job manager on XPSP3x86 and worker on both computers. Using Admin Center i can check everything that is ok. But checking with cluster profile manager makes me confused. It's check yields to passed results, but it didn't recognize all workers (i.e. 2).
what is the problem? Thanks in advanced
Answers (2)
Jason Ross
on 19 Oct 2012
0 votes
Do you have a 64-bit worker and a 32-bit worker? This is not a recommended configuration, due to the underlying technology that PCT/MDCS uses, which requires them to have matching word sizes and processor endianness -- and you'll see behavior like you are seeing.
From the system requirements page:
Homogeneous cluster configurations are recommended. Parallel processing constructs that work on the infrastructure enabled by matlabpool—parfor, spmd, distributed arrays, and message passing functions—cannot be used on a heterogeneous cluster configuration. The underlying MPI infrastructure requires that all cluster computers have matching word sizes and processor endianness. A limited set of functions in Parallel Computing Toolbox can work in heterogeneous cluster configurations.
Sina
on 21 Oct 2012
11 Comments
Jason Ross
on 22 Oct 2012
This looks normal, if you have three workers and three tasks. The three tasks get worked on, then it would move onto the next 20, taking them as the work on the first job completes. If something is not working on the fist job, you'll need to check for errors in the job.
Also, you can spawn other workers on your host with the "startworker" command, e.g.
startworker -name worker1
There's no need to have a VM to have another worker. It just adds overhead and hurts performance. There might be another reason you have a VM, but it's not strictly necessary for this example.
If possible, I'd recommend you get all the machines on the same OS, most likely Windows 7 64-bit. You'll be able to get the worker counts you expect and benefit from the larger word sizes.
Sina
on 22 Oct 2012
Jason Ross
on 22 Oct 2012
What happens if you use something like rand?
Sina
on 22 Oct 2012
Jason Ross
on 23 Oct 2012
You should be able to just kill/cancel the job, then.
I don't know why that no-op function would get in such a state ... it's not doing much of anything.
If you run Admin Center and run the connectivity tests, does anything fail?
Sina
on 23 Oct 2012
Jason Ross
on 23 Oct 2012
There might also be something interesting in the debug log:
You can also cancel the job:
Then look at the ErrorMessage, OutputArguments. Although I bet they will say "cancelled by the user", and the debug log might shed more light on what's getting hung up.
Sina
on 23 Oct 2012
Jason Ross
on 23 Oct 2012
matlabpool and job/task are two different ways of working.
If you open a matlabpool, workers will be consumed while it is open. The number will be equal to the size of the pool. You can close the matlabpool and the workers should be freed.
The job and task interface allows more control, you can specify the number of workers you want, and tasks can be queued. The workers will be freed when they complete their work.
Jason Ross
on 5 Nov 2012
Edited: Jason Ross
on 5 Nov 2012
Yes, you can start more workers on the machines and you will be able to access more labs. You can do this using AdminCenter or via the "startworker" command in matlabroot\toolbox\disctomp\bin
Be careful with starting more, though -- a good starting point for worker count is one per (compute, not virtual/hyperthreaded) core and 2 GB of RAM per worker. If you start exceeding those, it's possible your performance will actually decrease as you could run out of RAM (and use much slower swap -- especially on a VM), processor capacity, network bandwidth, etc.
Categories
Find more on Job and Task Creation in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!