There's a program that I would like to run in parallel, as I have about a dozen cores available to me. However, I only have 128GB of RAM, which puts some constraints on how I want to parallelize the code.
A is a list of 50 matrices. Each matrix (and all matrices involved) take up about 1GB of memory, which is where the memory constraint comes in. Schematically, I want to execute the code
B = longCalculation(i)
Since longCalculation takes the longest to run, I would like to parallelize that - i.e., convert the first for loop into a parfor loop. However, each worker needs access to all of A, and I can't just make a copy for each worker due to memory constraints. Paralellizing the second for loop, and only giving each worker access to a small part of A, won't speed up the code that much. Any suggestions on changing/modifying this code so that it can be run in parallel? Thanks!