I need to create a loop in which it skips data sets that have a better version of them. e.i if their is an M1 that has an M2 it only reads the M2

3 views (last 30 days)
I have to graph massive amounts of csv files but only the newest types of them. A lot of the files have M2, M3, and M4 versions and I only want the newest kind. Is there any way to rid of all the M1-M3 that have better versions?
  2 Comments
Image Analyst
Image Analyst on 16 Jun 2017
Define massive. Are you talking about tens of thousands or millions of files?
Tell us how the version can be determined. Is the M-number encoded into the filename? Or inside the file somewhere? Or are they in different folders and so you need to check two different folders for a file of the same name in both and just use the one with the latest date?
Michael Lauria
Michael Lauria on 16 Jun 2017
I started coding as a summer job yesterday and there are around 5000 files in which they all end with m1,m2,m3, or m4 in which i only need the latest versions of m (i.e. if i have an m1,m2, and an m3 of the same name all i need it the m3). I was wondering if there was a way tell the program to move only the latest version/m of each csv into a folder. I am very new to this so sorry if i seem clueless i really dont know much...yet.

Sign in to comment.

Accepted Answer

Image Analyst
Image Analyst on 17 Jun 2017
Use dir() to get the filenames. Then get a new list of filenames where you chop off the last number (assuming they go up only to 9, not to 10 and beyond). Then use ismember to see if the filename occurs twice or more. If it does, get the files, using indexes that ismember tells you, and find out which one has the biggest number. Keep any that occur only once, or if twice, keep just the largest number. Keep these in an output list.
% fileInfo = dir('*.dat');
% fileNames = {fileInfo.name}
% if isempty(fileNames)
% uiwait(errordlg('No files found'));
% return;
% end
% Make up sample data for testing.
fileNames = {'file1_m1.dat', 'file1_m2.dat', 'file2_m1.dat', 'file3_m1.dat', 'file4_m1.dat'}
% Create array for filenames without the final character in the base file name.
noVersions = cell(1, length(fileNames));
for k = 1 :length(fileNames)
% Get base file name without last character.
[~, thisString, ext] = fileparts(fileNames{k});
noVersions{k} = thisString(1:end-1);
end
celldisp(noVersions);
% See if any string is in there more than twice.
uniqueStrings = cell(length(fileNames), 1);
numUnique = 0; % Keep track of how many files we collect so we can truncate the array afterwards.
for k = 1 :length(fileNames)
thisString = noVersions{k};
fprintf('Checking for multiple occurrences of %s...\n', thisString);
[ia, ib] = ismember(thisString, noVersions)
if ib ~= k
% This string occurs earlier than element k
% Overwrite the first occurrence of it with this later version number.
uniqueStrings{ib} = fileNames{k};
else
% This is the first time it appears. Add it to the list.
uniqueStrings{k} = fileNames{k};
numUnique = numUnique + 1;
end
end
celldisp(uniqueStrings);
% Find out which cells are empty.
emptyCells = find(cellfun(@isempty, uniqueStrings))
% Remove those empty ones to get the final list.
uniqueStrings(emptyCells) = []
The above intuitive brute force method works, though if you wait, I'm sure Andrei will give you a cryptic one-liner (probably using cellfun()) that will do the same thing.

More Answers (1)

John D'Errico
John D'Errico on 16 Jun 2017
Edited: John D'Errico on 16 Jun 2017
Oh come on. It looks as if you just got a big job dumped on you, and you are freaking out. So your solution is to ask multiple vague questions on Answers that have no serious answer, except to start writing code.
You eat a programming elephant one byte at a time. Use loops. So what? Don't worry if they are not optimally efficient, as long as the thing gets done, who cares if it took a few more minutes to run? If you find there are programming bottlenecks, then and only then do you worry about optimization.
Programming elegance applies only to the second, or even third time you will need to do something, and even then don't bother too much unless it is critical to the success of your code.
  1 Comment
Michael Lauria
Michael Lauria on 16 Jun 2017
I started coding as a summer job yesterday and there are around 5000 files in which they all end with m1,m2,m3, or m4 in which i only need the latest versions of m (i.e. if i have an m1,m2, and an m3 of the same name all i need it the m3). I was wondering if there was a way tell the program to move only the latest version/m of each csv into a folder. I am very new to this so sorry if i seem clueless i really dont know much...yet.

Sign in to comment.

Categories

Find more on File Operations in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!