What is the best way to implement thousands of data in multiple matrices ?

2 views (last 30 days)
I have a folder with 10.000 .mat files, each of them is a structure with 8 different variables (one value each).
Each of these files is named accordingly to the row and column it is corresponding to in a 330x300 matrix (eg 123_25 is the file with the values of the 123th row and 25th column).
I have 8 matrices 330x300 in size, one for each variable and filled with NaNs. I would like to know which technique is the best to insert the values of my .mat files into these matrices ?
So far I tried 2 methods:
  • A double-loop with j and k being respectively the row and column indexes (for every j, k loops from 1 to 300). Everytime k changes, the corresponding .mat file is loaded and its values are inserted in the corresponding matrices. The 8 matrices are loaded BEFORE the loops.
%% Example
load the 8 matrices
for j = 1:330
for k = 1:300
load([j '_' k '.mat']);
matrix_1(j,k) = j_k.variable1;
matrix_2(j,k) = j_k.variable2;
matrix_3(j,k) = j_k.variable3;
etc . . .
  • The same double-loop but instead of loading the 8 matrices, I use the "matfile()" function and replace only the "j" and "k" indexed NaN with the corresponding .mat file's values.. As before, the corresponding .mat file is loaded at every iteration.
%% Example
% I create n.matfile() for each matrix
for i = 1:numel(list_names)
save('-v7.3',[temp_folder list_names{i} '.mat'], list_names{i});
m.(list_names{i}) = matfile(list_names{i},'Writable',true);
%%% list_names is a list with the names of the variables. In this example the list would go from matrix_1 to matrix_8
for j = 1:330
for k = 1:300
load([j '_' k '.mat']);
m.matrix_1(j,k) = j_k.variable1;
m.matrix_2(j,k) = j_k.variable2;
etc . . .
By going into the matlfile documentation I read that the most efficient way to deal with it would be to load everything at once in the memory and do all the replacements. Please note than the 10.000 files altogether are never heavier than 250Mb, and my pc has 16Gb of RAM.
I would like to try another method which would be:
Loading all the .mat files in the memory, loading the 8 matrices, inserting all the values with a loop without loading the files at every iteration. However I face a difficulty which is that my .mat files may have a different name, but they are all constructed the same way. So when I load a file and have it in the workspace, if I load another file it replaces the previous one, hence I can not load 2 files at the same time. Is there a way to load these files altogether at once even though they are built the same way, or is there a way to create dynamic names for variables (I know it is a bad idea) so I can load more than 1 file at a time ?
Finally, which method would be the fastest ? Maybe there is another one I didn't think of ?
I hope I was clear in my explanations, if not I apologize and I will try to explain again as clearly as possible.
Have a good day and thank you !

Accepted Answer

Stephen23 on 18 May 2020
Edited: Stephen23 on 18 May 2020
I would not use either of the first two methods, because they are an easy way to get latent, almost undetectable errors in your data. You might assume that your data is perfect and write your code accordingly, but that is not a robust approach. Consider what would happen if one of the files is missing any of those variables: the variable value from the previous loop iteration would get used without any warning whatsoever. No doubt you will say "but my data are perfect and are not missing anything..." sure, sure.
Much more robust (and more efficient) is to ensure that every file imports the required data:
S = load(...);
and the simply access the fields of the structure S:
This also answers your next question:
"is there a way to load these files altogether at once even though they are built the same way"
Of course, this is MATLAB, so just use a structure array! If every mat file contains exactly the same variable names (as they should) then your task is easy, you can just do this:
S(j,k) = load(...)
If the mat files contain different variable names then go and yell at the person who created them.
"is there a way to create dynamic names for variables ...so I can load more than 1 file at a time ?"
Importing multiple files can be done simply and efficiently using indexing into one array, there is absolutely no need to use ugly, slow, inefficient, complex, overused-by-beginners dynamic variable names.
V.D-C on 18 May 2020
Thank you again !
I only want to import the variables in the RESULT substructure. The other ones are not important but have to be saved in case somebody wants to take a look at the other parameters.
I tested you suggestion and it works !! Hopefully this solution will stick with me for my whole programming life :)
So thank you very much for your answers !!

Sign in to comment.

More Answers (1)

Steven Lord
Steven Lord on 18 May 2020
Rather than creating 8 individual variables why not create a 3-dimensional array of size [330 300 8]?
Z = NaN(6, 5, 4);
for pages = 1:4
for columns = 1:5
for rows = 1:6
Z(rows, columns, pages) = (rows*pages)+(columns^(pages-1));
Z(4, 2, 3) % 4*3 + 2^2 = 16
Although with the way your data is ordered, you'd want pages to be the innermost loop. That way you can load your data as soon as rows and columns are defined and iterate through the loaded data in the pages loop, filling in the appropriate elements in Z at each iteration.
  1 Comment
V.D-C on 18 May 2020
Hello, thank you for your answer !
I didn't think at all of doing it this way, I will try it !

Sign in to comment.


Find more on Variables in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!