Can we manipulate a file without opening it

5 views (last 30 days)
This question was flagged by Cris LaPierre
Hello,
I have a question which I explain in bellow. Consider the following loop:
for i=1:10^6
A = Read a csv file;
A = perform some operations on A;
A= save the performed operations;
end
Apparently, the most time conssumming part is reading the file. If I use A=csvread(); then this is very time consumming. If I use fopen stuff it is
computationally cheaper but still time conssuming.
Do you have an idea to rewduce the computational time for what I intend to do?
I hope there is a way to do the above operations without actually opening any file (updating an existing file and saving the updates to the same file without opening it).
Any idea?
Thanks in advance!
Babak
  8 Comments
Mohammad Shojaei Arani
Mohammad Shojaei Arani on 25 Nov 2022
I am wondering whther there is a way to solve my problem?
Stephen23
Stephen23 on 25 Nov 2022
Don't read and write the file on every iteration. Just use an array and indexing.

Sign in to comment.

Answers (2)

Matt J
Matt J on 25 Nov 2022
Edited: Matt J on 25 Nov 2022
If you have one single file, the reading and saving of the file should probably happen outside the loop. Use the parfor loop to loop over sections of the data and keep them in Matlab memory until you are ready to save all of the results.
A = Read a csv file;
parfor i=1:10^6
A(i,:) = perform some operations on A(i,:);
end
A= save the performed operations;
  3 Comments
Mohammad Shojaei Arani
Mohammad Shojaei Arani on 25 Nov 2022
Well, it is difficult to show my codes as they are long but I explain (I believe you do not need to know more than this).
The problem is that I have a very complex minimization problem in which none of the MATLAB solvers can solve (in the vicinity of any feasible solution there are infinite number of feasible and infeasible solutions). I, therefore, am using a very sophisticated algorithm called 'Grey-Wolf Optimizer' (GWO). GWO can solve my problem but sometimes it get traped (stagnation). The workaound to this is to re-run it several times. This is time conssumung. I, therefore, wish to run it at once using several workers.
My optimization problem never stops (I set Iter_no = 10^9) . Once there is a better solution it appears in the command window. However, it might get traped and then I have to stop the code and re-run it again in a hope that it does not get traped again. Now, I want to do this same job but using multiple workers to save time.
Of course, each worker can save the results in a separate csv file and at the end I can check which file has a better result. However, this is not what I want. What I want is to do exactly what I did for a single worker (no parfor) : I would like to see the new update from all workers in the comand window. I do not care about the order at all.
How to do this? Then I need to have a single csv file (let's call it Results.csv) and use parfor and send it to, say 10, workers. Bellow shows more details:
parfor n=1:10
Run the optimizer
for iteration =1: 10^9
.................
if (there is a better solution. Let's call it BetterS (it is a vector) and assume that its objective value is
BetterObj (it is a scalar))
fileID=fopen('Results.csv');
A=str2double(strsplit(fgetl(fileID),','));
fclose(fileID);
A=[A;[BetterS BetterObj]];
A=sortrows(A,length(A),'descend');
A=A(end,:);
disp('Estimated parameters : ');
disp(num2str(A))
writematrix(A,'Results.csv');
end
end
I hope there is way to do this!!!
Matt J
Matt J on 25 Nov 2022
Edited: Matt J on 25 Nov 2022
I don't think you should be using files to store and retrieve optimization results. I would structure the loops like this,
I=1000;
J=300;
bestValue=inf;
bestSolution=[];
for i=1:I %Loop over batches
s(1:J).Value=nan;
s(1:J).Solution=nan;
parfor j=1:J %Do a batch of optimizations in parallel
[x,fval,exitflag]=Run the optimizer
if exiflag<0 %optimization failed
continue
end
s(j).Value=fval;
s(j).Solution=x;
end
[minf,k]=min([s.Value]);
if minf<bestVal
bestVal=minf;
bestSolution=s(k).Solution;
end
end

Sign in to comment.


Walter Roberson
Walter Roberson on 25 Nov 2022
I suggest that you switch to using parfeval() . Approximately
while you haven't gotten tired of it all
while number of active workers is less than number of cores
use parfeval() to create a new worker passing in a different initial condition
end
wait for a worker to finish, using a timeout
if any worker has been active longer than you want, cancel() the worker, end
if any workers have finished, fetch their results and update the notion of best, end
end
when you get tired of it all, cancel all remaining workers
  1 Comment
Mohammad Shojaei Arani
Mohammad Shojaei Arani on 26 Nov 2022
Hi Walter,
This is a nice approach, indeed! (I did not know about such things like parfeval before)
However, the problem is that in my optimization problem I do not set any termination criteria (it works forever). The reason is that my problem is super complex and it is difficult to know how much time the optimization solver needs and such things are problem-specific. Therefore, these are things that a 'human' should check rather than a 'machine' (well, I am not saying this is impossible to code but I think that at this stage of knowledge about meta-heuristic search algorithms it should be difficult). Sometimes, I get a solution rather fast (if stagnation does not occur) sometimes it is the opposite. It is difficult to tell a code to stop if it gets stagnated as the concepts of 'slow' and 'fast' are relative (sure, for single optimization problem I can come up with an approximative measure of slowness or fastness but my code should solve any generic problem).
So, for me if there is no way to see the best result (of all workers) in the command window this is not useful. All the solutions being proposed so far assume that there is a 'termination criteria' for a worker and this is the bottleneck which precludes to observe the best outcome in the command window (while all workers are still woirking. Actually they never finish).
I think, at this point I admit that I cannot solve my problem using a single csv file. Therefore, I use several csv files (= number of workers).
Thanks a lot!
Babak

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!