Filtering multiple .csv files based on data from a table

5 views (last 30 days)
Hello all,
I am trying to filter multiple .csv files (based on data in a separate table. To be more specific I am trying to filter the data from dataTables by d_Data.Actual.Calculated_Flow variable, so that it filters the data that are below minFlow or above maxFlow, which are listed in dataList. dataList has several columns the first one lists Tablename that corresponds with names of the tables in dataTables, the 4th & 5th lists corresponding maxFlow & minFlow that I would like to apply for filtering. How can I achieve this?
here is the code:
dataListDir="zoznam_merani.xlsx";
dataDir='C:\Users\U430746\OneDrive - Danfoss\Desktop\EHA\EHA 1\Matlab\Matlab\Simulation EHA V1\Test Data\OneDrive_2024-08-16\Flow control\EHA_Flow_Control_standard mode\EHA_Flow_Control';
% Loads dataList
dataList=readtable(dataListDir);
% Loads direction to .csv files
dataFiles = dir(fullfile(dataDir, '*.csv'));
% Inicialize a cell array for input of tables
dataTables = cell(1, length(dataFiles));
% Loads tables to the dataTables cell array
for i = 1:length(dataFiles)
fileName = fullfile(dataDir, dataFiles(i).name);
dataTables{i} = readtable(fileName,"VariableNamingRule","preserve");
end
% Inicialize a cell array for filtered data
filteredDataTables = cell(1, length(dataFiles));
% Runs through all the tables in dataTables
for i = 1:length(dataTables)
% Finds a row that corresponds with the TableName
rowIndex = strcmp(dataList.TableName, dataFiles(i).name);
if any(rowIndex)
% Gets minFlow & maxFlow for the actual table
minFlow = dataList.minFlow(rowIndex);
maxFlow = dataList.maxFlow(rowIndex);
% Filters the tables based on minFlow & maxFlow
filteredDataTables{i} = dataTables{i}(dataTables{i}.("d_Data.Actual.Calculated_Flow") >= minFlow & dataTables{i}.("d_Data.Actual.Calculated_Flow") <= maxFlow, :);
else
% If the tableName is not available executes no changes
filteredDataTables{i} = dataTables{i};
end
end

Accepted Answer

Jeremy Hughes
Jeremy Hughes on 20 Aug 2024
It looks like you're loading all the data into memory, then looping to do the filtering. And to the best of my ability to tell, it looks like you're doing it right. (Without files, it's hard to say)
Are you having any specific problems?
You might have better luck doing the filtering on each table as you bring it in, so I reorganized this code a bit:
% Loads direction to .csv files
dataListDir="zoznam_merani.xlsx";
dataList=readtable(dataListDir,"TextType","string");
% Truncated this for readability, also One Drive can be problematic when
% accessed in a loop, but I see that more when writing. This is probably
% fine.
dataDir='C:\Users\...whereever...\';
dataFiles = dir(fullfile(dataDir, '*.csv'));
dataFiles = string({datafiles.name})';
% Loads tables to the dataTables cell array
dataTables = cell(1, height(dataFiles));
filteredDataTables = cell(1, height(dataFiles));
for i = 1:height(dataFiles) % Changed to length => height since length can give strange results
T = readtable(fullfile(dataDir, dataFiles(i)),"VariableNamingRule","preserve");
% I assume this condition is right assuming TableName lists the files
% as they are named in dataDir.
rowIndex = (dataList.TableName == dataFiles(i));
% **** Potentially, this is a problem if there are multiple rows with
% the same file name, but so check here for that case. ****
if any(rowIndex)
% Gets minFlow & maxFlow for the actual table
minFlow = dataList.minFlow(rowIndex);
maxFlow = dataList.maxFlow(rowIndex);
% Filters the tables based on minFlow & maxFlow
aboveMin = T.("d_Data.Actual.Calculated_Flow") >= minFlow;
belowMax = T.("d_Data.Actual.Calculated_Flow") <= maxFlow
filteredDataTables{i} = T(aboveMin & belowMax, :);
else
% If the tableName is not available executes no changes
filteredDataTables{i} = T;
end
dataTables{i} = T;
end
  2 Comments
Simon
Simon on 21 Aug 2024
Edited: Simon on 21 Aug 2024
Hello Jeremy.
Thank you for your answer!
I have already solved the issue. I had done a silly mistake you could not see. I switched the of minFlow & maxFlow in the dataList table.
Thanks again for your feedback.
One more thing. I am new to this forum, is there any way to mark the question as closed/answered to spare other members from unnecessary work?
Jeremy Hughes
Jeremy Hughes on 21 Aug 2024
If you accept an answer, it will highlight that answer. Others can still see it and answer, but they will see that you have accepted it.

Sign in to comment.

More Answers (1)

Saurabh
Saurabh on 20 Aug 2024
Hi Simon,
What I understood is that you want to filter multiple csv files based on the "d_Data.Actual.Calculated_Flow" column, by applying `minFlow` and `maxFlow` values from a separate `dataList` table that matches table names with CSV filename. I looked over your code and didn't see any issues.
Here are some things to check and consider:
  1. Variable Naming Rule: You are using the "VariableNamingRule","preserve" option in “readtable”. Ensure that the variable name "d_Data.Actual.Calculated_Flow" is exactly as it appears in your CSV files. MATLAB is case-sensitive, and any mismatch will cause an error.
  2. Table Name Matching: When matching “dataFiles (i)”.name with “dataList.TableName,” ensure that the names are consistent. If “dataList.TableName” does not include the file extension (e.g., .csv), you might need to adjust the comparison to remove the extension from “dataFiles(i).name”.
  3. Row Index Check: The condition if any(rowIndex) checks if there is any match. Ensure that “dataList.TableName” contains the expected names and that there are no leading/trailing spaces or mismatches.
  4. Data Types: Verify that “minFlow” and “maxFlow” are numeric values and that “dataTables{i}.("d_Data.Actual.Calculated_Flow")” is a numeric column. Any type mismatch can cause errors during the filtering operation.
Here's a refined version of script with some adjustments:
dataListDir = "zoznam_merani.xlsx";
dataDir = 'C:\Users\U430746\OneDrive - Danfoss\Desktop\EHA\EHA 1\Matlab\Matlab\Simulation EHA V1\Test Data\OneDrive_2024-08-16\Flow control\EHA_Flow_Control_standard mode\EHA_Flow_Control';
% Load dataList
dataList = readtable(dataListDir);
% Load .csv files
dataFiles = dir(fullfile(dataDir, '*.csv'));
% Initialize cell arrays for tables
dataTables = cell(1, length(dataFiles));
filteredDataTables = cell(1, length(dataFiles));
% Load tables into dataTables
for i = 1:length(dataFiles)
fileName = fullfile(dataDir, dataFiles(i).name);
dataTables{i} = readtable(fileName, "VariableNamingRule", "preserve");
end
% Filter dataTables
for i = 1:length(dataTables)
% Strip the .csv extension for comparison
tableName = erase(dataFiles(i).name, '.csv');
rowIndex = strcmp(dataList.TableName, tableName);
if any(rowIndex)
% Get minFlow and maxFlow for the current table
minFlow = dataList.minFlow(rowIndex);
maxFlow = dataList.maxFlow(rowIndex);
% Filter the table
flowData = dataTables{i}.("d_Data.Actual.Calculated_Flow");
filteredDataTables{i} = dataTables{i}(flowData >= minFlow & flowData <= maxFlow, :);
else
% No matching table name
filteredDataTables{i} = dataTables{i};
end
end
Some Debugging tips:
  1. Use “disp” or “fprintf” to print out variables at different stages.
  2. Use “class” and “whos” to verify that variables are of the expected types.
I hope this was helpful
  1 Comment
Simon
Simon on 20 Aug 2024
Hello Saurabh,
Thank you for your answer!
I have already solved the issue. I had done a silly mistake you could not see. I switched the of minFlow & maxFlow in the dataList table.
However, your suggestions are very useful as well.
Thank you again. :)

Sign in to comment.

Categories

Find more on Matrices and Arrays in Help Center and File Exchange

Products


Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!