Import part of dataset in a HDF5 file, by 'member' and/or 'logical array'

14 views (last 30 days)
Dear friends,
I am trying to open quite big (~3G) hdf5 files in matlab and compute it parallely. But files are too big, and it takes so long time to load it and also the RAM is broken because the workspace became full, so I hope there is a way I can open just small part of the matrix.
For example, if I do h5disp('Data.h5'), I get:
Group '/'
Dataset 'data'
Size: 12000
MaxSize: Inf
Datatype: H5T_COMPOUND
Member 'A': H5T_STD_U32LE (uint32)
Member B': H5T_STD_U32LE (uint32)
Member 'C': H5T_STD_U64LE (uint64)
Member 'D': H5T_ARRAY
Size: 15
Base Type: H5T_STD_U16LE (uint16)
Member 'E': H5T_ARRAY
Size: 30000x15
Base Type: H5T_STD_U32LE (uint32)
ChunkSize: 1
Filters: deflate(1)
FillValue: H5T_COMPOUND
It seems with high-level function 'h5read()' I can import the data in the unit of chunks. However, each chunk contains all members - ABCDE. In this case E takes the most of the size and is the reason for the long importing time. Is there any method to only import A, B, or C without loading D E?
Moreover i have one more problem. I know that with 'h5read()' I can import just 'some' chunks in the file in the form of h5read(filename,ds,start,count,stride). However, it seems 'stride' can be only one interger. Can I import the portion of data defind by indexing array, such as [1,100,121,400,3254,...] or [1 0 0 1 0 1 0 ...]?
I tried to deal with it by myself and even looked into the low-level functions, but it is beyond my limit. It seems many friends here have already given such question in this community, but I found no satisfying answer for this problem. If anyone can help please answer me.

Answers (2)

MJFcoNaN
MJFcoNaN on 21 Jun 2022
Hello,
The "start, count, stride" is suitable for slicing a huge matix. For example this will only read a "vector" from a 2D matrix thus much less RAM needed.
% fix 2nd dimension
data=h5read('yourfile','needed dataset',[1 1],[inf 1]);
% or fix 1st dim
data=h5read('yourfile','needed dataset',[1 1],[1 inf]);
then you can deal with it in matlab.
  3 Comments
MJFcoNaN
MJFcoNaN on 22 Jun 2022
I may misunderstand...Don't you want to limit RAM consumption?
PS: "Can I import the portion of data defind by indexing array, such as [1,100,121,400,3254,...] or [1 0 0 1 0 1 0 ...]?" There is no direct way, but you can read data one by one in a loop by setting count equal 1 of course...
한범
한범 on 22 Jun 2022
The datatype of the dataset is H5T_COMPOUND and there is only one dimension for it. You slicing method does not work.
What I want to do is to read and import just specific 'member' of this H5T_COMPOUND chunks.

Sign in to comment.


Walter Roberson
Walter Roberson on 22 Jun 2022
The approach seems to be to use the H5T utilities to create a prototype containing only the members that you want to read, and then pass the prototype to the HDF read routine.
This is not convenient, but it does appear to be possible.

Tags

Products


Release

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!