Alternatives for using EVAL to access data in multi-layered struct?
4 views (last 30 days)
Show older comments
So, I have read many forum topics regarding the use of EVAL and it being bad practice, though in my situation I feel it makes my code more compact and actually more readable. Nevertheless, I was wondering whether there are any alternatives using for example indexing, though I do not yet see how I would implement that in this case.
In short; I have a Matlab class which imports data from a given folder, possibly containing a multitude of files of different formats (e.g. CSV, XLSX, or other). The data is sorted in a structure according "data.(filetype).(filename).(tab)" with 'tab' applying to e.g. Excel Workbook files but being omitted for CSV files. Each files' data is then stored into a table, textdata and headers, which adds another layer. To access specific data I use a recursive function to return the structure tree as strings like {'data.xlsx.file1.tab1'; 'data.xslx.file1.tab2'; 'data.xlsx.file1.tab1'}.
I currently use EVAL to acquire specific data such as 'variable1' from all tables, in order to avoid constantly having to determine the fieldnames and using a 3 or 4 layered for-loop, which becomes additionally cumbersome when having to include exceptions like for example missing variables. At the moment I am also pondering about how to write data back to specific fields without have to use EVAL again or some form of 'string-split-at-the-dots' and then dynamically inputting the fieldnames. But then again, maybe my whole approach for using such a multi-layered struct is already poor to begin with, so any suggestions and/or alternatives are more than welcome.
0 Comments
Accepted Answer
Stephen23
on 19 Oct 2018
Edited: Stephen23
on 19 Oct 2018
One simple solution is to use getfield and setfield to access nested structures. Instead of returning the structure location as one character vector like this:
S = 'data.xlsx.file1.tab1'
you should return it in a cell array of char vectors, like this:
C = {'xlsx','file1','tab1'}
(this will require only a simple change to the recursive function). Then you can trivially access the data using getfield:
getfield(data,C{:})
and that is all! Here is a simple working demonstration:
>> data.xlsx.file1.tab = 1;
>> data.xlsx.file2.tab = 2;
>> data.csv.file2 = 3;
>> C = {'xlsx','file2','tab'};
>> getfield(data,C{:})
ans = 2
No ugly loops, no evil eval, no problems!
"But then again, maybe my whole approach for using such a multi-layered struct is already poor to begin with..."
Personally I am not a big fan of nested structures, and I notice that they tend to be overused by beginners wanting to reflect the minutae of how they see their data-organization. One of the main risks (which you are doing) is encoding meta-data like filenames and tab names into the code (as filednames). This is a bad way to write code: it make code complex and makes accessing that meta-data slow and buggy. Your approach is very fragile, e.g. because there are many filenames that are not valid fieldnames: what would your code do with the filename a-1.csv ? Or a.2.csv? The approach of mixing meta-data (like filenames and tab names) into data is simply flawed, and should be avoided. Meta-data is data, and it should be stored as data in it own right. Consider those example filenames: if we put them into a structure field named filename, then the code will never break depending on the name itself:
S.filename = 'a-1.2.3-4.csv'
You should consider that a table is a very powerful option and has many advantages for processing groups of data.
Personally I would probably use a single non-scalar structure, where the meta-data are simply encoded as data in fields:
data(1).type = 'xlsx'
data(1).name = 'file1'
data(1).tab = 'tab1'
data(1).data = ...
data(2).type = 'csv'
data(2).name = 'file2'
data(2).tab = [];
data(2).data = ...
This would make accessing and processing the data quite simple, and has some neat syntaxes that you will find very handy:
2 Comments
Philip Borghesani
on 19 Oct 2018
Edited: Philip Borghesani
on 19 Oct 2018
This does work fine and is simple code however in the long run using this along with setfield will produce quite a bit slower and possibly more difficult to restructure code. If the performance is acceptable then this is a perfectly fine solution. It can also be mixed with handle use in some spots for gradual improvement.
More Answers (1)
Philip Borghesani
on 18 Oct 2018
I think this is where you went wrong: "To access specific data I use a recursive function to return the structure tree as strings like {'data.xlsx.file1.tab1'; 'data.xslx.file1.tab2'; 'data.xlsx.file1.tab1'}."
Instead store your table objects as handle objects inside the structure data. Then have the recursive function return the handle(s) to the data object(s) in a cell array or object array. Access will then be fast and there will be no need for eval to read or modify the table objects.
See Also
Categories
Find more on Structures in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!