When is it better to use a multi-level-struct than a table?

48 views (last 30 days)
I am processing data logged in ~4000 text files. I initally read the data into a multi-level structure because the heirarchical nature seemed to make more sense for how I collected the data. 5 different configurations, each tested at 20 different positions, with each position containing 40 angles, each angle being a seperate experiment with environmental parameters (time, temp, speed), and 42 data channels, each having a mean or RMS value, a tare value, and a standard deviation (I can calculate these as I read the files and then store only scalars in the struct).
I abandoned the struct because reading the data back out was too burdensome. For instance, if I want to plot data channel 5 mean against data channel 1 mean for a certain angle (say 10deg) at all locations of one config, I thought I would use something like:
% pseudo code just for illustration, haven't tried, wouldn't work
x = [data.config(3).pos(:).ang(10).chan(5).mean];
y = [data.config(3).pos(:).ang(10).chan(1).mean];
plot(x,y)
But what I learned is that you cannot address more than one level of a struct at a time, instead, you must run a series of nested loops, one each for every criteria you want to query by, and move it's contents into a temporary variable for the next loop to operate on.
With a table on the other hand, I can store everything in one large flat table where each row is a an angle (one row for each experiment) and just have a ton of columns. The downside to this in my mind is that the table now contains soooo much more repetative data. for instance: the struct could parent all of the sub structs back to one of the five configurations, but the table must have ~4000 extra cells so that each row knows what config it is a member of. The upside is that querying out data is much simpler. eg:
% also example code which I haven't tried, may not be correct
x = data.mean(config==3 & ang==10 & chan==5);
y = data.mean(config==3 & ang==10 & chan==1);
plot(x,y)
So I am guessing it is a matter of preference, but going through all of this is making me wonder when and why do you chose a mutli-level struct over a table, and are there other even better options?
  6 Comments
Stephen23
Stephen23 on 23 Oct 2023
Edited: Stephen23 on 23 Oct 2023
"If you have S(J).A(K).B(L) and you are doing sweeps over J K L..."
What relevance does that have to the specific example give by the OP? Not much.
"there would be another arrayfun version that iterates over structure members that just isn't coming to mind at the moment but I am sure is possible"
It is possible if you pass scalar structures as the function inputs. But warning: tectonic plates move much faster.
(hint: that approach is the partner to version 1, just like version 3 is the partner to version 2)
"So getfield() is one of the options that does not require creating temporary variables (other than internally)"
And yet... it is not really an option. None of those "versions" actually deliver what the OP requires: the numeric vectors x and y (for plotting, as the OP clearly states).
Versions 1 & 2 are the nested loops the OP already knows about. Version 3 (very slowly) creates nested cell arrays inside nested cell arrays inside another cell array. Flattening multiply nested cell arrays (to get the numeric vectors x & y, which are what the OP needs) requires either multiple comma-separated lists (with associated temporary variables) or more nested loops or recursion... or some other even worse kind of horror. So you are right back to square one.
@cdlapoin: these examples should make it quite clear why you should be using tables.
cdlapoin
cdlapoin on 23 Oct 2023
Stephen, I see your point and it's well taken. A couple thousand duplicate values is not really a problem if the performance is fine, and my datasets are not so large that a small performance hit would be much of a problem anyway.

Sign in to comment.

Accepted Answer

Walter Roberson
Walter Roberson on 23 Oct 2023
Edited: Walter Roberson on 23 Oct 2023
We are discussing in https://www.mathworks.com/matlabcentral/answers/556024-what-frustrates-you-about-matlab-2#answer_1337061 why row-by-row access to a table can be much slower than some of the alternatives. A lot is going to depend on how you use the data after it has been put into the data structure.
If all of the data is numeric, using a numeric array will be typically be fastest... but again it depends on the data access patterns. Sometimes cell arrays are faster, as recently explored in https://www.mathworks.com/matlabcentral/answers/2035921-access-time-of-data-in-cell-array-vs-matrix#answer_1336881
  2 Comments
cdlapoin
cdlapoin on 23 Oct 2023
I see, so I could just hold all the same data in a flat matrix, and keep track of my column names seperately and that might be faster in some cases, but in most cases the difference would be negligible. (that is my reading of the linked topic anyway).
I'm not really hearing that there is ever a time where the nested data structures would be the better option.
Just using tables from now on may be what I go with then. I like the readability of calling variables by name, and I like having the workspace kept clean by storing those variables within the table.
Stephen23
Stephen23 on 24 Oct 2023
Edited: Stephen23 on 24 Oct 2023
Use tables.
Most likely you will spend far more time writing, debugging, and maintaining your code than your code will spend running. Therefore making sure that your data and code is clear and correct is of the uttmost importance, and will save you time overall. Tables are a great way to achive that clarity.
"I'm not really hearing that there is ever a time where the nested data structures would be the better option."
Something like this would be difficult without nested structures or a similar data type:
It implements a https://en.wikipedia.org/wiki/Trie using actual MATLAB (not low-level) code.

Sign in to comment.

More Answers (0)

Categories

Find more on Structures in Help Center and File Exchange

Products


Release

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!