How to plot billions of points efficiently?
Show older comments
I have 3 billions of 2D points to be plotted, which requires a lot of memory. Thus, I can only do this on a server, which has about 1TB memory. But the server does not have a decent graphics card, thus export of figures uses CPU to render and takes more than 5 hours. But this procedure needs to be done many times, because I need to change the scale of axes according to the shape of the scatter diagrams. My desktop has a decent graphics card. Could I utilize the desktop's ability when data can not fit into its memory?
Here is an example of my figures.

6 Comments
Eli4ph
on 4 Jul 2018
Stephen23
on 4 Jul 2018
@Eli4ph: I think it depends on what features of the distribution that you really need to keep. For example subsampling using indexing is trivial and very efficient, but might easily miss some extrema from the plot. Merging close points is more complex, but will keep the extrema. So the question comes down to what information you need to be obtained from the plot.
Eli4ph
on 4 Jul 2018
" In linear scale, it can be easily done by using round(). "
Yes, I also thought of using round, or some kind of tolerance.
"In linear scale, it can be easily done by using round(). But in log scale, I have no clean way to do this. Do you have any idea?"
Convert to linear scale, round to whatever precision, get the unique X-Y pairs, use the indices to plot a subset of the data. I think with a few billion points this might be possible with the memory that you have available, but you would have to try.
Eli4ph
on 4 Jul 2018
Accepted Answer
More Answers (2)
Steven Lord
on 5 Jul 2018
0 votes
Consider storing your data as a tall array and using the tall visualization capabilities introduced in release R2017b.
Categories
Find more on Startup and Shutdown in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
