MDCE Admin center is ok, But parallel computing does not recognize workers

Question

0 votes

Hi All,

I have a desktop with XPSP3x86 os and a laptop with win7x64. Both have MATLAB distributed computing server.

I start job manager on XPSP3x86 and worker on both computers. Using Admin Center i can check everything that is ok. But checking with cluster profile manager makes me confused. It's check yields to passed results, but it didn't recognize all workers (i.e. 2).

what is the problem? Thanks in advanced

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Jason Ross on 19 Oct 2012

0 votes

Do you have a 64-bit worker and a 32-bit worker? This is not a recommended configuration, due to the underlying technology that PCT/MDCS uses, which requires them to have matching word sizes and processor endianness -- and you'll see behavior like you are seeing.

From the system requirements page:

Homogeneous cluster configurations are recommended. Parallel processing constructs that work on the infrastructure enabled by matlabpool—parfor, spmd, distributed arrays, and message passing functions—cannot be used on a heterogeneous cluster configuration. The underlying MPI infrastructure requires that all cluster computers have matching word sizes and processor endianness. A limited set of functions in Parallel Computing Toolbox can work in heterogeneous cluster configurations.

http://www.mathworks.com/products/distriben/requirements.html#mathworks-job-manager

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Answer 2

Sina on 21 Oct 2012

Open in MATLAB Online

0 votes

Thanks for replying,

I found that (after many tries) if the workers have same cpu architecture (x86 or 64) then they see each other in matlab pool. This is true when the grid is as following :

        Physical pc       virtual pc on that
        *x86 xpsp3*         
                          *x86 xpsp3*
                          *x86 xpsp3*
        x64 win7
                          *x86 xpsp3*

so i have 4 workers. Now the question is when i start cluster with 3 workers and submit a job on that, there is a job there that mess me. see below please:

>> myCluster.Jobs

ans =

 Job: 2-by-1
 ============
    #    ID           Type        State       FinishTime  Username  #tasks
 -------------------------------------------------------------------------
    1   134           pool      running                     hormoz       3
    2   135    independent       queued                     hormoz      20

myCluster.Jobs(1).Tasks

ans =

 MJSTask: 3-by-1
 ================
    #    ID        State       FinishTime       Function  Error
 --------------------------------------------------------------
    1     1      running                   @distcomp.nop       
    2     2      running                   @distcomp.nop       
    3     3      running                   @distcomp.nop

so, my job never started and program hangs. please help me.

11 Comments
Show 9 older comments Hide 9 older comments

Sina on 22 Oct 2012

Open in MATLAB Online

Dear Jason,

I have problem with the @distcomp.nop function. As i start my workers, some of them are busy. Submitting job are done very well, and they are finished normally. But all new tasks are done on the idle workers.

Now i have a busy worker with the follow data:

 myCluster.Jobs
 ans = 
 Job: 4-by-1
 ============
    #    ID           Type        State       FinishTime  Username  #tasks
 -------------------------------------------------------------------------
    1     9    independent      pending                     hormoz       0
    2    10    independent     finished                -    hormoz      20
    3    11           pool      running                     hormoz       1
    4    12    independent     finished                -    hormoz      20

then:

 myCluster.Jobs(3).Tasks
 ans = 
 Task ID 1 from Job 11 Information
 =================================
                State: running
             Function: @distcomp.nop
            StartTime: Mon Oct 22 19:12:03 GMT+03:30 2012
     Running Duration: 0 days 2h 6m 6s
 - Task Result Properties
      ErrorIdentifier: 
         ErrorMessage:

and:

 myCluster.Jobs(3).Tasks.Function
 ans = 
    @distcomp.nop

for timming we have:

 myCluster.Jobs(3).StartTime
 ans =
 Mon Oct 22 19:11:35 GMT+03:30 2012

and now the time is 21:23. what is this?

Sina on 4 Nov 2012

Edited: Sina on 4 Nov 2012

Dear Jason,

Thanks for all your replies. Another question comes up. local profile uses all cores from local CPU: 2 labs.

But while i use MATLAB pool with profile1 (knows 2 computers. my computer and a virtual mashine with 2 core processor), i have 4 labs (two pc with 2 dual core processors). Unfrtunately, matlabpool just knows 2 labs. is it possible to use 4 labs and how?

Jason Ross on 5 Nov 2012

Edited: Jason Ross on 5 Nov 2012

Yes, you can start more workers on the machines and you will be able to access more labs. You can do this using AdminCenter or via the "startworker" command in matlabroot\toolbox\disctomp\bin

Be careful with starting more, though -- a good starting point for worker count is one per (compute, not virtual/hyperthreaded) core and 2 GB of RAM per worker. If you start exceeding those, it's possible your performance will actually decrease as you could run out of RAM (and use much slower swap -- especially on a VM), processor capacity, network bandwidth, etc.

Sign in to comment.

MDCE Admin center is ok, But parallel computing does not recognize workers

0 Comments
Show -2 older comments Hide -2 older comments

Answers (2)

0 Comments
Show -2 older comments Hide -2 older comments

11 Comments
Show 9 older comments Hide 9 older comments

Categories

Products

Tags

Community Treasure Hunt

MDCE Admin center is ok, But parallel computing does not recognize workers

0 Comments Show -2 older comments Hide -2 older comments

Answers (2)

0 Comments Show -2 older comments Hide -2 older comments

11 Comments Show 9 older comments Hide 9 older comments

Categories

Products

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

0 Comments
Show -2 older comments Hide -2 older comments

11 Comments
Show 9 older comments Hide 9 older comments