I need to create a loop in which it skips data sets that have a better version of them. e.i if their is an M1 that has an M2 it only reads the M2
3 views (last 30 days)
Show older comments
Michael Lauria
on 16 Jun 2017
Commented: Michael Lauria
on 19 Jun 2017
I have to graph massive amounts of csv files but only the newest types of them. A lot of the files have M2, M3, and M4 versions and I only want the newest kind. Is there any way to rid of all the M1-M3 that have better versions?
2 Comments
Image Analyst
on 16 Jun 2017
Define massive. Are you talking about tens of thousands or millions of files?
Tell us how the version can be determined. Is the M-number encoded into the filename? Or inside the file somewhere? Or are they in different folders and so you need to check two different folders for a file of the same name in both and just use the one with the latest date?
Accepted Answer
Image Analyst
on 17 Jun 2017
Use dir() to get the filenames. Then get a new list of filenames where you chop off the last number (assuming they go up only to 9, not to 10 and beyond). Then use ismember to see if the filename occurs twice or more. If it does, get the files, using indexes that ismember tells you, and find out which one has the biggest number. Keep any that occur only once, or if twice, keep just the largest number. Keep these in an output list.
% fileInfo = dir('*.dat');
% fileNames = {fileInfo.name}
% if isempty(fileNames)
% uiwait(errordlg('No files found'));
% return;
% end
% Make up sample data for testing.
fileNames = {'file1_m1.dat', 'file1_m2.dat', 'file2_m1.dat', 'file3_m1.dat', 'file4_m1.dat'}
% Create array for filenames without the final character in the base file name.
noVersions = cell(1, length(fileNames));
for k = 1 :length(fileNames)
% Get base file name without last character.
[~, thisString, ext] = fileparts(fileNames{k});
noVersions{k} = thisString(1:end-1);
end
celldisp(noVersions);
% See if any string is in there more than twice.
uniqueStrings = cell(length(fileNames), 1);
numUnique = 0; % Keep track of how many files we collect so we can truncate the array afterwards.
for k = 1 :length(fileNames)
thisString = noVersions{k};
fprintf('Checking for multiple occurrences of %s...\n', thisString);
[ia, ib] = ismember(thisString, noVersions)
if ib ~= k
% This string occurs earlier than element k
% Overwrite the first occurrence of it with this later version number.
uniqueStrings{ib} = fileNames{k};
else
% This is the first time it appears. Add it to the list.
uniqueStrings{k} = fileNames{k};
numUnique = numUnique + 1;
end
end
celldisp(uniqueStrings);
% Find out which cells are empty.
emptyCells = find(cellfun(@isempty, uniqueStrings))
% Remove those empty ones to get the final list.
uniqueStrings(emptyCells) = []
The above intuitive brute force method works, though if you wait, I'm sure Andrei will give you a cryptic one-liner (probably using cellfun()) that will do the same thing.
2 Comments
Image Analyst
on 17 Jun 2017
By the way, 5000 files is not that big. I routinely analyze folders with this number of files in them. Just another walk in the park. No big deal.
More Answers (1)
John D'Errico
on 16 Jun 2017
Edited: John D'Errico
on 16 Jun 2017
Oh come on. It looks as if you just got a big job dumped on you, and you are freaking out. So your solution is to ask multiple vague questions on Answers that have no serious answer, except to start writing code.
You eat a programming elephant one byte at a time. Use loops. So what? Don't worry if they are not optimally efficient, as long as the thing gets done, who cares if it took a few more minutes to run? If you find there are programming bottlenecks, then and only then do you worry about optimization.
Programming elegance applies only to the second, or even third time you will need to do something, and even then don't bother too much unless it is critical to the success of your code.
See Also
Categories
Find more on File Operations in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!