Using regexpressions with 'dir' function

46 views (last 30 days)
Galo
Galo on 21 Sep 2022
Edited: dpb on 21 Sep 2022
Hi, I have a folder (MainFolder) that contains other subfolders (Subfolder1, Subfolder2, Subfolder3, ....). Each of those subfolders contain different png images. The format of the name of those images is 'somerandomnameOD1.png' or 'somerandomnameOS1.png'. The last number of the names varies from 1 to 6.
I am trying to use dir function with some regexpressions to extract the path of all those images that end with a pattern like 'OD' or 'OS' followed by the number 1, 2 or 6 (OD1, OD2, OD6, OS1, OS2, OS6) from all the subfolders within the 'MainFolder':
a=dir('C:\folder\folder2\Desktop\MainFolder\**/*O(D|S) (1|2|3).png'));
But the result is an empty struct.
I know I can simply use this:
a=dir('C:\folder\folder2\Desktop\MainFolder\**/*.png'));
and then filter that structure for the names that I need.
But I woudl like to know if it is possible to use reg expressions with a 'dir' function.
Thanks.

Accepted Answer

dpb
dpb on 21 Sep 2022
Edited: dpb on 21 Sep 2022
"I woudl like to know if it is possible to use reg expressions with a 'dir' function."
Nope. Unsupported by the OS; "filename globbing" isn't the same thing as a regular expression.
The '*' and '?' wildcards are supported only.
There are ways by using find or grep but it's simpler to just retrieve with what isolation can be done with the limited wildcards and then apply regexp or other matching tools to that result than to build the commands for the shell.
  1 Comment
Walter Roberson
Walter Roberson on 21 Sep 2022
Right, the suppport for ? and * and ** is provided by dir() itself. dir() is calling into operating system functions to retrieve the contents of the directory, and those functions do not support wildcards or patterns, they just return a list of what is present in the directory, so dir() is handling the filtering.
Historically there has been differences in whether a filename of * means the same thing as *.* on the different operating systems. Unix globbing (filename processing) says that *.* requires that an actual period be present as part of the name, which is different from the historic DOS implementation of 8+3 filenames -- the 8+3 filenames literally do not store the period so 'ABC' and 'ABC.' were indistinguishable because both were stored internally as ABC followed by 5 nulls for the 8 part, and 3 nulls for the extension part. And historically, directories in that time frame had a DIR extension that was routinely hidden, so DEF.DIR was the formal name for folder DEF but eventually the extension for directories stopped being used.
All of which is to say that the processing of wildcards in names is more complicated than one might expect at first, and you need to know which release and which operating system to figure out the precise details. But it has been well over a decade since unix-style filename globbing characters were paid attention to.

Sign in to comment.

More Answers (0)

Categories

Find more on File Operations in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!