train a deep learning model (resnet-50 network) on a remote HPC cluster

18 views (last 30 days)
I am trying to run a code, which uses a pre-trained ResNet-50 network, on a remote HPC cluster by submitting batch GPU jobs. I get the following error at this line:
net = resnet50
Error using resnet50
resnet50 requires the Deep Learning Toolbox Model for ResNet-50 Network support
package for the pretrained weights. To install this support package, use the <a
href="matlab:
matlab.addons.supportpackage.internal.explorer.showSupportPackages('RESNET50',
'tripwire')">Add-On Explorer</a>. To obtain the untrained layers, use
resnet50('Weights','none'), which does not require the support package.
It seems the Deep Learning Toolbox Model for ResNet-50 Network add-on is not installed on the cluster. How can I install this add-on on it?
Thanks

Accepted Answer

David Willingham
David Willingham on 14 Oct 2022
Just to confirm, you're sending batch jobs to a HPC cluster that has MATLAB parallel server installed?
If so, one option to try would be:
  1. save resnet50 as as MAT file
  2. attach the MAT file when submitting the job
  3. have a load MAT file command in the function you're submitting.
  1 Comment
EK_47
EK_47 on 14 Oct 2022
Brilliant! Thank you for your answer. It solved my problem.
Yes, the HPC cluster has MATLAB paraller server installed.
In your point 1, you said "save resnet50 as a MAT file". I was not sure what you mean by "save resnet50". What I did was just I called it in MATLAB on my local machine
basenet = resnet50;
then saved it as
save('basenet.mat','basenet');
and then transferred this MAT file into the remote cluster and loaded it there.
Thanks

Sign in to comment.

More Answers (0)

Products


Release

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!