Parallel Pool Corruption on Linux
7 views (last 30 days)
Show older comments
I'm using MATLAB's parallel computing capabilities to speed up code execution. Interestingly, when I tried to initialize the pool today, I got the following error.
Error using parallel.Cluster/parpool
Parallel pool failed to start with the following error. For more detailed information, validate the profile 'Processes' in the Cluster Profile Manager.
Error in wecSimPCT (line 32)
parpool(p); % open the pool
Caused by:
Error using parallel.internal.pool.AbstractInteractiveClient>iThrowWithCause
Failed to start pool.
Error using save
Unable to write to file '/home/gusmano.2/.matlab/local_cluster_jobs/R2022b/Job2.in.mat' because it appears to be corrupt.
I validated the 'Processes' cluster as suggested in the error code and got the following report.
VALIDATION REPORT
Profile: Processes
Scheduler Type: Local
Stage: Cluster connection test (parcluster)
Status: Passed
Start Time: Tue Jan 14 09:54:12 EST 2025
Finish Time: Tue Jan 14 09:54:12 EST 2025
Running Duration: 0 min 0 sec
Description:
Details:
Error Report:
Command Line Output:
Debug Log:
Stage: Job test (createJob)
Status: Failed
Start Time: Tue Jan 14 09:54:12 EST 2025
Finish Time: Tue Jan 14 09:54:13 EST 2025
Running Duration: 0 min 0 sec
Description: Unable to write to file '/home/gusmano.2/.matlab/local_cluster_jobs/R2022b/Job3.in.mat' because it appears to be corrupt.
Details:
Error Report: Unable to write to file '/home/gusmano.2/.matlab/local_cluster_jobs/R2022b/Job3.in.mat' because it appears to be corrupt.
Command Line Output:
Debug Log: CLIENT LOG OUTPUT
Caught an error and throwing as caller. Original error:
Error using save
Unable to write to file '/home/gusmano.2/.matlab/local_cluster_jobs/R2022b/Job3.in.mat' because it appears to be corrupt.
Error in parallel.internal.files.MATLABFileSystem/save (line 50)
[varargout{1:nargout}] = save(filename, '-struct', 'data', varargin{:});
Error in parallel.internal.types.FileFormat>iSaveMat (line 98)
fs.save(filename, data, '-append');
Error in parallel.internal.types.FileFormat/save (line 33)
obj.SaveFunction(varargin{:});
Error in parallel.internal.cluster.FileStorage/setFields (line 785)
fileFormat.save(obj.FileSystem, filename, structToSave);
Error in parallel.internal.cluster.CJSSupport/setProperties (line 248)
obj.Storage.setFields( type, sId, mappedProps, values );
Error in parallel.internal.cluster.CJSSupport/setJobProperties (line 451)
obj.setProperties( 'job', jobsid, propName, val );
Error in parallel.internal.cluster.CJSJobMixin/hSetPropertyNoCheck (line 141)
obj.Support.setJobProperties( obj.SupportID, obj.Variant, propName, val );
Error in parallel.Job/hSetProperty (line 716)
hSetPropertyNoCheck( obj, propName, val );
Error in parallel.internal.customattr.GetSetImpl/setMultipleProperties (line 22)
obj.hSetProperty( propNames(useGetSet), newValues(useGetSet) );
Error in parallel.internal.customattr.GetSetImpl.setImpl (line 251)
GetSetImpl.setMultipleProperties( objOrObjs, useGetSet, vectorizeSet, p, v );
Error in parallel.internal.customattr.CustomGetSet/hSetAllowNonPublic (line 85)
[ varargout{1:nargout} ] = GetSetImpl.setImpl( allowNonPublic, vectorizeSet, objOrObjs, varargin{:} );
Error in parallel.Cluster/applyJobProperties (line 381)
job.hSetAllowNonPublic( names, values );
Error in parallel.Cluster/createJob (line 52)
obj.applyJobProperties( job, varargin{:} );
Error in parallel.internal.types.ValidationStages>iCreateAndSubmitJob (line 467)
job = cluster.createJob;
Error in parallel.internal.types.ValidationStages>@()iCreateAndSubmitJob(jobVariant,runInfo)
Error in parallel.internal.types.ValidationStages>iCallWithNoHotlinks (line 391)
[varargout{1:nargout}] = fcn();
Error in parallel.internal.types.ValidationStages>iRunJobStage (line 201)
[commandWindowOutput, job] = evalc(iWrapForEvalc(createAndSubmitJobFcn));
Error in parallel.internal.types.ValidationStages>iRunIndependentJobStage (line 173)
[eventData, runInfo] = iRunJobStage(stage, runInfo, parallel.internal.types.Variant.IndependentJob);
Error in parallel.internal.types.ValidationStages/run (line 74)
[eventData, runInfo] = obj.RunFunction(obj, runInfo);
Error in parallel.internal.validator.Validator/runValidationSuite (line 191)
[eventData, stageRunInfo] = currentStage.run(stageRunInfo);
Error in parallel.internal.validator.Validator/validate (line 103)
status = obj.runValidationSuite(profileName, suite);
Error in parallel.internal.ui.AbstractValidationManager/validate (line 36)
obj.Validator.validate(profileName, validationSuite);
Error in parallel.internal.ui.ValidationManager.validateProfile (line 36)
parallel.internal.ui.ValidationManager.getOrCreateInstance().validate(profileName, suite);
Deleting Job 3
Stage: SPMD job test (createCommunicatingJob)
Status: Failed
Start Time: Tue Jan 14 09:54:13 EST 2025
Finish Time: Tue Jan 14 09:54:13 EST 2025
Running Duration: 0 min 0 sec
Description: Unable to write to file '/home/gusmano.2/.matlab/local_cluster_jobs/R2022b/Job4.in.mat' because it appears to be corrupt.
Details:
Error Report: Unable to write to file '/home/gusmano.2/.matlab/local_cluster_jobs/R2022b/Job4.in.mat' because it appears to be corrupt.
Command Line Output:
Debug Log:
Stage: Pool job test (createCommunicatingJob)
Status: Skipped
Start Time:
Finish Time:
Running Duration:
Description: Validation skipped due to previous failure.
Details:
Error Report:
Command Line Output:
Debug Log:
Stage: Parallel pool test (parpool)
Status: Skipped
Start Time:
Finish Time:
Running Duration:
Description: Validation skipped due to previous failure.
Details:
Can anyone help me understand what's going on here? I've tried to start the pool in a different directory, remove the default directory and other things like that to get rid of any corruption, but to no avail. I'm especially confused as to why it started properly before and why it isn't now.
0 Comments
Answers (1)
colordepth
on 14 Jan 2025
Moved: Walter Roberson
on 14 Jan 2025
It seems that the permissions of your "local_cluster_jobs" folder is not working as expected by MATLAB. Have you tried deleting this folder located under "/home/<your system username>/.matlab/" and then trying again? This approach has been discussed in a similar situation in this MATLAB Answer: https://www.mathworks.com/matlabcentral/answers/487249-issue-running-parallel-toolbox
Additionally, you might find it helpful to follow the general troubleshooting steps outlined in this answer by the MathWorks Support Team, which also suggests deleting the "local_cluster_jobs" folder as one of the steps: https://www.mathworks.com/matlabcentral/answers/92124-why-am-i-unable-to-use-parpool-or-validate-with-the-local-or-processes-profile-of-parallel-compu
0 Comments
See Also
Categories
Find more on Startup and Shutdown in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!