Improving speed of large interpolation (33 million to 33 billion elements)

5 views (last 30 days)
I am importing a bin file with ~33million elements and need to interpolate 1000 elements between the existing elements. As you can imagine, this isn't spedy and quickly consumes all the RAM on my machine. The code looks something like this.
data = fopen('ThirtyMillionElementFile.bin','r');
fclose('ThirtyMillionElementFile.bin');
data = interp(data,1000); %If I assign the interpolated value to a new variable such as...
% interpolatedData = interp(data,1000); the rest of the code breaks (seems related to memory issues).
filteredData = filter(importedFilterValues1, importedFilterValues2, data);
Overall this works but takes about 30 seconds. If I feed a larger file such as 66million elements and interp by 1000 the computer will completely lock up due to all 32 Gb of RAM and 100% of disk being used. Since I need to run larger files, how can I go about speeding this up and potentially improving the performance? In addition, I am not able to modify the filter in anyway. The whole idea is to use a legacy filter design.
I have found that tall arrays are not compatible with interp. And due to the 6 operations being done after interp and before the gather, tall arays will slow the code down to about an hour run time instead of 3 minutes total.
gpuArray doesn't help because I'm limited by 4Gb of Ram which will cause the code to error out.
I absolutely can and will add more RAM, but I am curious to know if there are other solutions that can be used in conjunction with increased hardware specs.
(Side Question: How is matlab able to store these massive arrays? There is no way I have enough RAM)
  5 Comments
David Almodovar
David Almodovar on 8 Jun 2020
First off, you were correct about my typos for the zeros line i've corrected the original comment. This is what it actually read:
allValues = zeros(1,length(data1)*2);
So I did some timing testing and using your line:
allValues = reshape([reshape(data,1,[]); reshape(data1,1,[])],1,[]);
%time elapsed = 13.126 seconds
Then my original code:
j =1;
allValues = zeros(1,length(data1)*2);
for i = 1:length(data1)
allValues(j) =data(i);
j=j+1;
allValues(j) = data1(i);
j=j+1;
end
%time elapsed = 12.609 seconds
Then I tried using numel with my original code:
j =1;
allValues = zeros(1,numel(data1)*2);
for i = 1:length(data1)
allValues(j) =data(i);
j=j+1;
allValues(j) = data1(i);
j=j+1;
end
%time elapsed = 8.564 seconds
I then tried another matrix method:
allValues = [data;data1];
allValues = allValues(:);
%time elapsed = 8.242 seconds
So the interleaving method does impact the time but the interpolation still takes over 1 minute which is where the majority of the memory is consumed.
I also attempted the above tests using a tall matrix, but killed the program after 45 minutes on the gather step. I may just not understand how talls work yet.
Do you have any anymore ideas on how to break up data into chunks so I could potentially feed it into the gpu? I only have 4gb of Ram on my p1000.
David Goodmanson
David Goodmanson on 9 Jun 2020
Edited: David Goodmanson on 9 Jun 2020
Hi David,
for the decimation do you mean
filteredData = filteredData(1:1000:length(filteredData)); ?
You have not mentioned the lengths of importedFilterValues1 and importedFilterValues2 but from the very fact that you have not done so, I am speculating that those lengths are reasonable, much less than the size of data and data1 and fit into memory with no problem. Assuming that is the case, since interp generally does not depend on data that is a long distance away in the array, and since the filter function can be done section by section, as can every other step that you show (with the help of a moderate sized buffer), is there any reason that you can't proceed section by section?

Sign in to comment.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!