MATLAB Answers

How to find an exact string match in a list of folder names

39 views (last 30 days)
Richard Rees
Richard Rees on 15 Mar 2020
Commented: Image Analyst on 16 Mar 2020
Hello erveryone,
I have a problem trying to extract data from a sequence of files, based on exact string names contained within the subfolder names. The problem I have is extracting data to Y_NODE and XY_NODE because contain cannot differentiate between 'Y_High' and 'XY_High' and is extracting all the data into Y_High variable. I have tried contains, matches, strcmp, strfind etc but I cannot get it to match correctly and assign the data to the correct cell array.
I cannot attached the raw data because it is too large, but the list of folder names is attached.
Could someone help please?
pattern = ["No_High1_add_on","X_High1_add_on","Y_High1_add_on","XY_High1_add_on"];
for k = 1:numberOfFolders
% Get this folder and print it out.
thisFolder = listOfFolderNames{k};
if contains(thisFolder,pattern(1))
J = 1;
elseif contains(thisFolder,pattern(2))
J = 2;
elseif contains(thisFolder,pattern(3))
J = 3;
elseif contains(thisFolder,pattern(4))
J = 4;
else
continue
end
filePattern = sprintf('%s/*node.csv', thisFolder);
baseFileNames = dir(filePattern);
numberOfImageFiles = length(baseFileNames);
if numberOfImageFiles >= 1
% Go through all those files.
for f = 1 : numberOfImageFiles
fullFileName = fullfile(thisFolder, baseFileNames(f).name);
if J == 1
NO_NODE{k} = importdata(fullFileName);
elseif J == 2
X_NODE{k} = importdata(fullFileName);
elseif J == 3
Y_NODE{k} = importdata(fullFileName);
elseif J == 4
XY_NODE{k} = importdata(fullFileName);
else
end
end
end
fprintf(' Folder %s has no files in it.\n', thisFolder);
end

  0 Comments

Sign in to comment.

Accepted Answer

Guillaume
Guillaume on 16 Mar 2020
Edited: Guillaume on 16 Mar 2020
It's simple to solve: rather than testing first for 'X' then 'Y' then 'XY', test first for 'XY' then 'X' or 'Y' then the other. If the first test pass, then it's guaranteed to be 'XY'.
Note that a bunch of if...elseif... that all do the same thing is usually a bad design. It's not easy to extend to many more patterns. If you had 30 different patterns, would you write 30 different tests. A loop would make the code much simpler:
pattern = ["No_High1_add_on", "XY_High1_add_on", "X_High1_add_on", "Y_High1_add_on"]; %XY pattern MUST precede X and Y pattern since it is a superset
for k = 1:numel(listOfFolderNames)
% Get this folder and print it out.
thisFolder = listOfFolderNames{k};
matchedpattern = 0
for patternindex = 1:numel(pattern)
if contains(thisFolder, pattern(patternindex))
matchedpattern = patternindex;
break
end
end
if matchedpattern == 0, continue, end %no match found
%...
Similarly later on I would not use different named variables to store the data. The design is very likely to end up forcing you to copy a bunch of time each time you want to process each variable, when again a loop would avoid the repetition. I would store the imported file in a cell array of cell arrays:
pattern = ["No_High1_add_on", "XY_High1_add_on", "X_High1_add_on", "Y_High1_add_on"]; %XY pattern MUST precede X and Y pattern since it is a superset
patterndata = cell(size(pattern)); %cell array to store the imported files for each pattern
for k = 1:numel(listOfFolderNames)
%...
for f = 1 : numberOfImageFiles
fullFileName = fullfile(thisFolder, baseFileNames(f).name);
patterndata{matchedpattern}{end+1} = importdata(fullFileName); %#ok<AGROW> Number of files in each category is unknown so have no choice but to grow the array
end
end
Note that unlike your original code, the above does not leave empty cells in each cell array. (On a given k your original code only filled one of the NO_NODE, X_NODE, etc. cell array leaving the others with an empty k cell.

  1 Comment

Richard Rees
Richard Rees on 16 Mar 2020
Hi, that is very nice and thank you for the explainations aswell, they will be taken onboard.

Sign in to comment.

More Answers (1)

Image Analyst
Image Analyst on 15 Mar 2020
strcmp() should work. I'd like to see code where it doesn't. contains() won't work - it will operate as you said since 'Y_High' is contained inside 'XY_High'. But I really think strcmp() should.
At first I thought maybe it's because you're comparing strings to character arrays. Your pattern is a string array, not a cell array of character arrays like listOfFolderNames probably is. Strings and character arrays are now different types of variables in MATLAB, as of a few versions ago. But when I did a test, it shows this is not the case and they still match despite being of different variable types:
s1 = "abc" % A string
s2 = 'abc' % A character vector
e1 = isequal(s1, s2)
e2 = strcmp(s1, s2)
e3 = contains(s1, s2)
e1, e2, and e3 all show as true.

  2 Comments

Richard Rees
Richard Rees on 16 Mar 2020
Thanks for the reply, it still will not work. If i change every input into a string, it will skip that section and input nothing. Just to recap on my logic, lets say I have a single subfolder A and is generated as a char, what I want to do is find whether is contains the pattern within it, precisely as is written in Pattern (string). This is assigned according to its designated variable, in this case "XY_NODE".
Pattern = ["XY_High1_add_on"];
A = 'D:X folders.....\RS_boundary_test\XY_High1_add_on_Coupled Stress&3PWP (4)\004'
Attached is the full code. It bring in the data using the recurse_m file I think you created or definately distribtued. If I change any string inputs to charachers and adapt to the input according i.e. pattern(1) --> pattern(:,:,4) I still suffer from the same problem
Image Analyst
Image Analyst on 16 Mar 2020
Does this work:
locations = strfind(A, Pattern)
It tells you what index Pattern starts at in A.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!