Populating a large struct
6 views (last 30 days)
Show older comments
I am populating a struct with ~100,000 elements and am wondering the fastest way to do so. I feel like this portion of my code should be more efficient given >1 hr to complete. The data can be easily segmented into pieces as it is being read in from text files 1:n. The input data has the following columns: ID (1:x), Date (double), Value (double). Although not as pertinent to this question as an FYI not all IDs have the same number of dates and some IDs are split up into multiple pieces (not sorted). See pseudo code below for current approach. An alternative approach (also below) I have considered is populating the struct one file at a time and then appending the temporary struct for each file to a "master struct." I think this will speed up both the logical indexing of text_file but, more importantly perhaps, speed up the struct population as temporary structs will be smaller. Any advice or tips would be appreciated.
Current implementation:
%read in n text files
text_file=readxls...
%preallocate struct with x unique IDs found in text_file
TheStruct(x)=struct('values',[]);
for ID = 1:100000
%query the rows within text_file that have the given ID
TheStruct(ID).values=text_file(text_file(:,1)==ID,2:3);
end
Proposed implementation:
%declare blank struct
TheStruct=struct('values',[]);
%read in text files 1:n
for file=1:5
text_file_n=readxls...
%preallocate struct with x unique IDs found in text_file n
tmp_struct(20000)=struct('values',[]);
for ID = 1:20000
%query the rows within text_file n that have the given ID
tmp_struct(ID).values=text_file_n(text_file_n(:,1)==ID,2:3);
end
TheStruct=[TheStruct, tmp_struct];
end
0 Comments
Answers (1)
Walter Roberson
on 29 Mar 2020
Do not name the variable struct as, that conflicts with the struct call.
Preallocate
TheStruct(x) = struct('values', []); %struct call not struct variable
You only show field values in your code. If you have other fields include them in the struct call.
Your proposed second version would get hammered in performance by the need to continually reallocate the structure array. MATLAB can only grow arrays in-place under uncommon circumstances and normally needs to allocate new memory and copy from the old. The preallocation is very important to avoid that.
See Also
Categories
Find more on Structures in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!