How to preallocate memory for building this structure, indexing fieldnames?
6 views (last 30 days)
Show older comments
I have in several files a structure called "Result" and would like to merge all of them into one structure. My difficulty is, that the fieldnames following right after "Result." are build by a string identifying an experiment name, and as this experiment name and the amount of experiment names are unknown to this moment, I have to address them by indexing.
So far this indexing works, it merges my data correctly, but preallocation of memory is missing:
START HERE A LOOP THROUGH MANY FILES, RETRIEVING THE NEXT ID
NewData = load(id); % the file referenced in id contains a structure called "Result"
casename = fieldnames(NewData.Result);
cases = size(casename,1);
% preallocation of memory could fit in here, in this line
for caseIndex = 1:cases
Result.(casename{caseIndex}).MyValue = ...
NewData.Result.casename{caseIndex}).MyValue;
end
END HERE THE LOOP THROUGH MANY FILES
Now I tried to preallocate memory by the following failing attempt:
Result.(casename{1:cases}).MyValue = zeros(cases,1);
This one also failed:
Result.(casename{[1:cases]}).MyValue = zeros(cases,1);
Do you have any idea how the correct syntax has to look like?
2 Comments
James Tursa
on 9 Mar 2015
How many files are you talking about? Are the case names in each file unique, or is there potential overlap of names amongst files? There may be a way to do some meaningful pre-allocation for your proposed struct organization, but are we talking about a Result struct with 100's or 1000's (or more) of field names?
Accepted Answer
Stephen23
on 9 Mar 2015
Edited: Stephen23
on 9 Mar 2015
Unlike numeric and character arrays, according to the documentation both structures and cell arrays do not require completely contiguous memory. It is sufficient to preallocate just the cell array or structure itself, but this does not require also preallocating the arrays stored inside that cell array or structure: these can simply be empty, as they are not stored in the same memory location as the structure or cell array itself. You can read more about them here:
It is apparently slower to try to preallocate the data arrays (inside the structure or cell array):
Quoting Jan Simon from the above link: For this reasons it is e.g. useless to "pre-allocate" the elements of a cell array, while pre-allocating the cell itself is strongly recommended. The same also applies to structures.
This topic is also addressed very well by Loren Shure in one of her blogs:
Where she says: Of course it depends on your specifics, but since each field is its own MATLAB array, there is not necessarily a need to initialize them all up front. The key however is to try to not grow either the structure itself or any of its contents incrementally.
5 Comments
Stephen23
on 9 Mar 2015
Edited: Stephen23
on 9 Mar 2015
These are two different issues: the number of fields and the number of experiments. What you are doing now mixes these two concepts together, with the resulting difficulties that you are facing.
Your statements, e.g. "that I do not know to which final size (to which quantity of fields) my structure might grow, gathering more and more data while looping through all my data files" do not actually tell us anything about how your data is organized: does each file correspond to one experiment, or multiple experiments? Do the measured values (fields) change between experiments?
You need to seriously consider using a non-scalar structure, depending on how your data is arranged, and in particular based on this question: Are the fields the same for each experiment?
For example, every experiment might have the following four values:
Results.Temperature = ...
Results.Parameters = ...
Results.Sensor1 = ...
Results.Sensor2 = ...
If they are the same, then a non-scalar structure would be the simplest, fastest and neatest option for storing your data.
More Answers (1)
Adam
on 9 Mar 2015
Why do you need to pre-allocate? Aren't you simply copying values from one struct to another without any dynamic resizing going on of any individual field of the new struct? I don't see that pre-allocating zeros and then over-writing them with the same size of your actual data will gain you anything.
10 Comments
Adam
on 9 Mar 2015
Stephen's answer is the more complete so the right one to accept, but if you gained something useful from my answer too then that is good :)
James Tursa
on 9 Mar 2015
Edited: James Tursa
on 9 Mar 2015
Some clarification about comments above:
"... Dynamically created fields don't require presizing when you create the struct (and they can't be since a field can contain anything)."
Assuming we are only talking about the field names here (not the field elements themselved). While they don't require pre-allocation, there is a benefit. The amount of benefit depends on the number of fields to be added. Adding field names dynamically (e.g. in a loop) causes MATLAB to re-allocate memory for the field names and add more value addresses iteratively as well ... it is the equivalent of assigning to a cell array index in a loop without pre-allocating the cell array first (cells and structs are stored very similarly internally). Since you are only copying field variable addresses each iteration the copying overhead isn't likely to be much, but it is extra overhead that could potentially be avoided (if one knows all the field names up front).
"... You could try to create the struct upfront with all its fields already containing pre-allocated arrays, but as mentioned this is un-necessary and slower rather than faster if you are simply going to copy data over the top of those pre-sized arrays anyway."
Yes and no. If one is talking only about creating a struct with the proper field names up front, then pre-allocation does make sense and will be faster ... although the overhead savings could be quite small and negligible depending on the number of fields in question (and in fact the extra code to do this may wipe out the small savings altogether). If one is talking about pre-allocating the field elements themselves with variables (e.g., zeros), then this doesn't typically make sense as the references discuss (they get overwritten downstream anyway so the pre-allocation can be a waste of time and resources).
DISCLAIMER: I add these comments for clarification only. The fact is I am in agreement with others who have already posted that there are better ways to organize the data for easier and more efficient access (using dynamic field names in code is notoriously slow and limits how you can access and manipulate the data).
See Also
Categories
Find more on Structures in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!