Parrellel server dir() for files local to the server

4 views (last 30 days)
Hello,
I wish to move computation from a local machine with .m files stored locally to a parrellel compute server with the .m files stored on the server.
Processing the files sequentially on my local machine this usally looks something like this.
Files = dir('C:\my_data'); % Retrieve all patients .m files names
for i=1:length(Files) %
load(strcat('C:\my_data,Files(i).name')) % Load each file in turn
% Put functions to run on data
end
I now want to move this compute to a parrellel server, I have a PS liscense and the server is validated, I have also uploaded the files to the server.
However I cannot figure out how to call the dir() command so that it queries the files on the server (as they are about 1Tb total in size - so too large to transfere to the remote server eachtime). I had though it would look something like this;
Files = dir('~/home/user/Database/Physionet/training/'); % Rather than query locally, querey the data on the server
However the directory isn't found correctly, Can anyone explain to me how to point to this data on the parrellel compute server? Or if anyone has suggestions on better ways to do this please let me know!
Kind regards,
Christopher

Accepted Answer

Raymond Norris
Raymond Norris on 25 Apr 2022
For starters, you don't want to hard code files/paths in your code. Your code should be functions so that you can pass in root folder locations to where you want to read/write. I'll show you an example, but first a couple of questions.
How do you submit your code to the cluster? Are you using parpool or batch. For example
c = parcluster('cluster');
pool = c.parpool(16);
Files = dir('~/home/user/Database/Physionet/training/');
parfor i=1:length(Files)
% Had a typo in your line. Also, will want to make sure Files(i).name
% is always a MAT-file (think at least about . and ..)
load(strcat('C:\my_data',Files(i).name))
...
end
Or
c = parcluster('cluster');
job = c.batch(@mycode,...,'Pool',16);
I'm guess you want the former, but you probably gonna need the latter. It also depends on what you're going to do with the data after the parfor finishes (or while it's running). I have a thought, but you might need to update to R2022a.
  13 Comments
Christopher McCausland
Christopher McCausland on 10 May 2022
Hi Raymond,
As a final round up for anyone following, to find your directories locally on clusters the easiest method is just to use home -> Parallel -> manage clusters -> select relevent cluster -> properties -> edit -> files and folders -> manually specify folders to add to the workers search path OR use AdditionalPaths.
This will add the paths for workers and is quite a nice way of doing things!
Lastly, Raymond, thank you so much for all the help and being so patient, I really appreciate all the time and effort you put in, i've learnt a lot from you! The in depth answers were brillent!
Kind regards,
Christopher
Raymond Norris
Raymond Norris on 10 May 2022
Keep in mind that if you place the additional folder names in the profile, they will be used for each job you submit to the cluster. Adding it to the call to batch explicitly sets it for that job. In the case of adding paths, there's no overhead to speak of. However, wait until you need to debug a job where you can't understand why a job fails, only to discover that you included another path (listed in the profile) that was shadowing your other function. Listing the additional paths in the call to batch doesn't solve this issue, but it hopefully at least puts it in your face that you are adding /home/cmcausland/work/... to your job.

Sign in to comment.

More Answers (0)

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!