Populating a large struct

6 views (last 30 days)
Joel Sandler
Joel Sandler on 29 Mar 2020
Commented: Joel Sandler on 29 Mar 2020
I am populating a struct with ~100,000 elements and am wondering the fastest way to do so. I feel like this portion of my code should be more efficient given >1 hr to complete. The data can be easily segmented into pieces as it is being read in from text files 1:n. The input data has the following columns: ID (1:x), Date (double), Value (double). Although not as pertinent to this question as an FYI not all IDs have the same number of dates and some IDs are split up into multiple pieces (not sorted). See pseudo code below for current approach. An alternative approach (also below) I have considered is populating the struct one file at a time and then appending the temporary struct for each file to a "master struct." I think this will speed up both the logical indexing of text_file but, more importantly perhaps, speed up the struct population as temporary structs will be smaller. Any advice or tips would be appreciated.
Current implementation:
%read in n text files
text_file=readxls...
%preallocate struct with x unique IDs found in text_file
TheStruct(x)=struct('values',[]);
for ID = 1:100000
%query the rows within text_file that have the given ID
TheStruct(ID).values=text_file(text_file(:,1)==ID,2:3);
end
Proposed implementation:
%declare blank struct
TheStruct=struct('values',[]);
%read in text files 1:n
for file=1:5
text_file_n=readxls...
%preallocate struct with x unique IDs found in text_file n
tmp_struct(20000)=struct('values',[]);
for ID = 1:20000
%query the rows within text_file n that have the given ID
tmp_struct(ID).values=text_file_n(text_file_n(:,1)==ID,2:3);
end
TheStruct=[TheStruct, tmp_struct];
end

Answers (1)

Walter Roberson
Walter Roberson on 29 Mar 2020
Do not name the variable struct as, that conflicts with the struct call.
Preallocate
TheStruct(x) = struct('values', []); %struct call not struct variable
You only show field values in your code. If you have other fields include them in the struct call.
Your proposed second version would get hammered in performance by the need to continually reallocate the structure array. MATLAB can only grow arrays in-place under uncommon circumstances and normally needs to allocate new memory and copy from the old. The preallocation is very important to avoid that.
  2 Comments
Joel Sandler
Joel Sandler on 29 Mar 2020
Apologies for the confusion over the variable name for "TheStruct" ... no conflict between the struct call and variable name in my code. Also, I am currently preallocating per both instances of my pseudo code.
Does your response change if you understood x to be about 20,000 unique records per file and n to be just 5 files?
Joel Sandler
Joel Sandler on 29 Mar 2020
I've updated my question accordingly

Sign in to comment.

Categories

Find more on Structures in Help Center and File Exchange

Tags

Products


Release

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!