Most efficient method to search through file names?
Show older comments
I have a large number of files that all have file name formats that are of the form
SSSTTTTMMYY
Where the 'encoding' of the file name breaks down into something like this:
SSS - Three letter code referencing a location (locations that I know and have a MATLAB table that relates these codes to a location name)
TTTT - That represents the 'type' of data that we have captured (also values we already have)
MMYY - Is simply the month and year that data was taken.
So for example, we may have something like:
LDNACPD0618
where LDN = London, ACPD = Average captured pollution data, 0618 = June, 2018.
So here is the actual question:
I want to build a function that can search through these file names that can search based on:
- Search based on choice of site location e.g. All data from site LDN
- Get all files between a number of dates e.g. Select all data between 0118 - 0318
- Search based on choice of 'type' of data e.g. All data that is ACPD
- Or a combination of the above e.g. All data from LDN between 0118 - 0318
What is the most efficient way to do this other than making three separate functions to check each section of the file name? Would something like a regular expression work?
Many thanks for your help and advice in advance!
1 Comment
"Would something like a regular expression work?"
Matching the SSS and TTTT parts would not be too difficult, but matching a range of dates really requires converting to date (e.g. date number or datetime) and then doing a logical comparison.
Start by splitting the names up (e.g. using regexp or indexing) and then:
- comapre SSS using strcmp
- compare TTTT using strcmp
- convert MMYY to datetime and compare using logical comparisons.
Accepted Answer
More Answers (1)
Folder = 'D:\Your\Folder';
FileList = dir(fullfile(Folder, '*.*'));
NameList = {FileList.name};
% NameList = {'SSSTTTT0617', 'SSSTTTT0631', 'WWWQQQQ0724'}
Data.Location = cellfun(@(s) s(1:3), a, 'UniformOutput', 0);
Data.Type = cellfun(@(s) s(4:7), a, 'UniformOutput', 0);
Data.Date = cellfun(@(s) sscanf(s(8:11), '%d'), a, 'UniformOutput', 1);
% Data which have the Location = 'SSS':
Match = FindData(Data, 'Location', 'SSS')
% Data which have the Location = 'SSS' and the date 0631:
Match = FindData(Data, 'Location', 'SSS', 'Date', 631)
% Data which have the Type 'TTTT' a date between 0631 and 0801:
Match = FindData(Data, 'Type', 'TTTT', 'DateRange', [631, 801])
... etc
function Match = FindData(Data, varargin)
Match = true(size(Data));
for k = 1:2:numel(varargin)
switch lower(varargin{k})
case 'location'
Match = Match & strcmp(Data.Location, varargin{k+1});
case 'type'
Match = Match & strcmp(Data.Type, varargin{k+1});
case 'date'
Match = Match & (Data.Date == varargin{k+1});
case 'daterange'
Match = Match & (Data.Date >= varargin{k+1}(1) & ...
Data.Date <= varargin{k+1}(2));
otherwise
error('Unknown job: %s', varargin{k})
end
end
% Maybe:
% Match = find(Match);
end
3 Comments
S G
on 10 Jun 2019
Jan
on 10 Jun 2019
Data is a struct with three fields, which contains the arrays of the different parts of the data. If you provide some real test data, a more matching answer is possible. I've guessed, that the file names can be obtained by dir in a specific folder. This was my best guess for this explanation:
I have a large number of files that all have file name formats that are of the form SSSTTTTMMYY
S G
on 10 Jun 2019
Categories
Find more on Dates and Time in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!