I want to avoid dynamically naming my tables after a join.
4 views (last 30 days)
Show older comments
I certianly have a lot to learn, but I do know I am not supposed to dynamically create variable names- and I assume that applies to table names as well.
My question is twofold:
- How do I get out of the situation I am in- the real tables are large and very complicated with all kinds of issues and my process is actually working.
- I definitelty would also like to know what I can do better next time to approach this so I don't end up where I am now :)
My current situation has me creating new tables for export to another application based on an initial filter step. I am going to do many iterations of the filter, then use the filtered tables to join with other tables and on and on, and I'd like the final tables and exported files to be named somehow that I and other humans can easily recognize them (ie descriptive terms like 'Steve' and 'Jim'.)
My code is something like this:
T = table(["AAA"; "BBB"; "BBB"; "CCC"; "CCC"; "CCC"; "DDD"], ...
[datetime('2023-07-07 10:55:19'); datetime('2023-07-28 09:31:00'); datetime('2023-07-28 14:02:11');
datetime('2023-01-23 17:35:24'); datetime('2023-01-23 17:31:48'); datetime('2023-02-12 03:10:11');...
datetime('2023-03-01 02:00:11')], ...
[9999; 12; 34; 1111; 2222; 3333; 5556], ...
'VariableNames', {'var1', 'var2', 'var3'});
OtherTable = table(["AAA"; "BBB"; "BBB"; "CCC"; "CCC"; "CCC"], ...
[datetime('2023-07-07 10:55:19'); datetime('2023-07-28 09:31:00'); datetime('2023-07-28 14:02:11');
datetime('2023-01-23 17:35:24'); datetime('2023-01-23 17:31:48'); datetime('2023-02-12 03:10:11')], ...
[9999; 12; 34; 1111; 2222; 3333], ...
'VariableNames', {'var1', 'var2', 'var3'});
%% now I want to group and filter the data
onlyA=T(T.var1=='AAA',:)
BandC=T(T.var1=='BBB' | T.var1=='CCC',:)
CinFeb23=T(T.var1=='CCC' & T.var2=='2023-02-12 03:10:11',:)
and I have lots of these groupings already written out and they don't seem to lend themselves to automation.
Then I go on to a join and export (lots more data wrangling steps of course):
%% This section needs to loop (or something) over all of the above filtered/grouped tables
% (onlyA, BandC, CinFeb23) etc. - I need to findd and replace onlyA
% essentially.
onlyA_joined=innerjoin(onlyA,OtherTable,"LeftKeys",["var1"],"RightKeys",["var1"]);
onlyA_joined.Properties.VariableNames = {'onlyA_name','onlyA_date','onlyA_value',...
'onlyA_date_again','onlyA_value_again'};
onlyA_joined
writetable(onlyA_joined,'final_file.xlsx','sheet','onlyA');
% now I want to do it all over again, except have 'BandC', and then 'CinFeb23' replace all the
% places where 'onlyA' is used as a name.
Lots more steps involved, but I'd like to somehow not have to go in and manually change 'onlyA' every single time as I have a lot of iterations.
My guess is I want to loop and use a cell of strings with the names etc. but not sure how to handle the differewnt cases such as table name, variable name, sheet name etc.
Thansk for any insight!
6 Comments
dpb
on 18 Oct 2023
If your task is important to the organization, then it should be able to be escalated to the level needed...I guess the risk in that is them deciding it isn't that important, after all! (In a prior life I was known for being the one who raised issues nobody knew existed before :) )
Stephen23
on 19 Oct 2023
Edited: Stephen23
on 19 Oct 2023
Some comments to your two specific questions:
- Use arrays. MATLAB is designed around arrays, so rather than storing lots of separate variables you use arrays and indexing. When faced with any challenge, always as yourself "how can I do this with an array?". E.g. use cell arrays and structure arrays to store other arrays.
- Understand that meta-data is data, and data belongs in variables (not in variable names). Once you start forcing meta-data into variable names (e.g. "onlyA", "BandC") then you have already started down a dead-end road to a place you really don't want to be at: there be dragons!
Follow these very simple principles and you will find that your code becomes more generalizable and expandable: i.e. rather than painfully copy-and-pasting-and-modifying code (having copies of code is a sign something is wrong) you use e.g. loops... and then (much as Voss showed) processing more than just three filter criteria does not require more copy-and-pasting but simply expanding one single array to add some new criteria (which the rest of your code automatically processes).
Accepted Answer
Voss
on 18 Oct 2023
Edited: Voss
on 18 Oct 2023
Maybe something like this will help:
T = table(["AAA"; "BB"; "BBB"; "CCC"; "CCC"; "CCC"; "DDD"], ...
[datetime('2023-07-07 10:55:19'); datetime('2023-07-28 09:31:00'); datetime('2023-07-28 14:02:11');
datetime('2023-01-23 17:35:24'); datetime('2023-01-23 17:31:48'); datetime('2023-02-12 03:10:11');...
datetime('2023-03-01 02:00:11')], ...
[9999; 12; 34; 1111; 2222; 3333; 5556], ...
'VariableNames', {'var1', 'var2', 'var3'})
OtherTable = table(["AAA"; "BBB"; "BBB"; "CCC"; "CCC"; "CCC"], ...
[datetime('2023-07-07 10:55:19'); datetime('2023-07-28 09:31:00'); datetime('2023-07-28 14:02:11');
datetime('2023-01-23 17:35:24'); datetime('2023-01-23 17:31:48'); datetime('2023-02-12 03:10:11')], ...
[9999; 12; 34; 1111; 2222; 3333], ...
'VariableNames', {'var1', 'var2', 'var3'})
% define the filters you want to use:
filters = struct( ...
"onlyA", T.var1=='AAA', ...
"BandC", T.var1=='BBB' | T.var1=='CCC', ...
"CinFeb23", T.var1=='CCC' & T.var2=='2023-02-12 03:10:11')
% apply each filter to T, join the result with OtherTable, and
% write the joined table to a sheet in the output file:
filter_names = fieldnames(filters);
var_suffixes = {'_name','_date','_value','_date_again','_value_again'};
output_file = 'final_file.xlsx';
for ii = 1:numel(filter_names)
filtered_T = T(filters.(filter_names{ii}),:);
joined_T = innerjoin(filtered_T,OtherTable,"LeftKeys",["var1"],"RightKeys",["var1"]);
joined_T.Properties.VariableNames = strcat(filter_names{ii},var_suffixes);
writetable(joined_T,output_file,'sheet',filter_names{ii});
end
% check the output file:
sheet_names = sheetnames(output_file);
for ii = 1:numel(sheet_names)
disp(sheet_names(ii) + ":");
readtable(output_file,'Sheet',sheet_names(ii))
end
2 Comments
dpb
on 18 Oct 2023
"Maybe something like this will help:"
Precisely!!! the lesson I've been preaching--separate data from code.
More Answers (0)
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!