PCT : cannot cancel a running job without a PCT chrash

2 views (last 30 days)
Hello
I think i have an infinite loop running in a worker. Each time i try to cancel it, PCT crashes. When i restart parallel pool with :
if isempty(gcp('nocreate'))
p = parpool(1);
else
p = gcp('nocreate');
end
and then i ask for jobs with :
p.Cluster.Jobs
which gives :
ans =
Job
Properties:
ID: 1
Type: concurrent
Username: blafa
State: running
SubmitDateTime: 02-Apr-2019 14:49:24
StartDateTime: 02-Apr-2019 14:49:33
Running Duration: 0 days 0h 14m 47s
NumWorkersRange: [1 1]
AutoAttachFiles: true
Auto Attached Files: List files
AutoAddClientPath: true
AttachedFiles: {}
AdditionalPaths: 9 paths
Associated Tasks:
Number Pending: 0
Number Running: 1
Number Finished: 0
Task ID of Errors: []
Task ID of Warnings: []
when i try to cancel it, PCT crashes :
p.Cluster.Jobs.cancel
The client lost connection to worker 1. This might be due to network problems, or the interactive communicating job might have
errored.

Accepted Answer

Edric Ellis
Edric Ellis on 3 Apr 2019
When you run a parallel pool, PCT uses a parallel.Job behind the scenes to launch and co-ordinate the workers. By directly cancelling that Job, you're asking the PCT Cluster object to forcibly terminate all the worker processes. This causes the parallel pool session to abort, because the workers have been shut down. This is precisely what you're seeing here.
Could I ask: where is the actual problem you're encountering?
  2 Comments
Mikaël LE GRAND
Mikaël LE GRAND on 4 Apr 2019
Hello Edric, thanks for your quicly answer.
I understand your comments. But the Job we talk about reappears each time i launch parallel pool, even after a reboot of my computer ! Like i said above, when i try to cancel it, parallel pool crashes. So, when i restart parallel pool and i refresh the Job Monitor, this damned job is still there, and so on.
I found no way to destroy it for ever... It continues to run, i think, because, shame on me, there is an infinite loop in the code of the worker. So, it never vanishes...
Edric Ellis
Edric Ellis on 4 Apr 2019
The job reappears because it is being used behind the scenes by the parallel pool. This is entirely normal - you should leave it running, and when the parallel pool is deleted, the job will be deleted.

Sign in to comment.

More Answers (0)

Products


Release

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!