h5create

Create HDF5 dataset

Syntax

h5create(filename,ds,sz)

h5create(filename,ds,sz,Name=Value)

Description

h5create(filename,ds,sz) creates a dataset ds, whose name includes its full location, in the HDF5 file filename, and with a size specified by sz.

example

h5create(filename,ds,sz,Name=Value) specifies options using one or more name-value arguments. For example, ChunkSize=[5 5] specifies 5-by-5 chunks of the dataset that can be stored individually in the HDF5 file.

example

Examples

collapse all

Create Fixed-Size Dataset

Open Live Script

Create a fixed-size 100-by-200-by-300 dataset myDataset with full path /g1/g2/myDataset.

h5create("myFile.h5","/g1/g2/myDataset",[100 200 300])

Write data to myDataset. Because the dimensions of myDataset are fixed, the amount of data to be written must match the size of the dataset.

myData = ones(100,200,300);
h5write("myFile.h5","/g1/g2/myDataset",myData)
h5disp("myFile.h5")

HDF5 myFile.h5 
Group '/' 
    Group '/g1' 
        Group '/g1/g2' 
            Dataset 'myDataset' 
                Size:  100x200x300
                MaxSize:  100x200x300
                Datatype:   H5T_IEEE_F64LE (double)
                ChunkSize:  []
                Filters:  none
                FillValue:  0.000000

Create and Compare Datasets with Compression

Open Live Script

Create two HDF5 files, each containing a 1000-by-2000 dataset. Use the deflate filter with maximum compression for the first dataset, and use the SZIP filter with entropy encoding for the second. You must specify a chunk size when applying compression filters.

h5create("myFileDeflate.h5","/myDatasetDeflate",[1000 2000], ...
         ChunkSize=[50 80],Deflate=9)
h5create("myFileSZIP.h5","/myDatasetSZIP",[1000 2000], ...
         ChunkSize=[50 80],SZIPEncodingMethod="entropy")

Display the contents of the two files and observe the different filters.

h5disp("myFileDeflate.h5")

HDF5 myFileDeflate.h5 
Group '/' 
    Dataset 'myDatasetDeflate' 
        Size:  1000x2000
        MaxSize:  1000x2000
        Datatype:   H5T_IEEE_F64LE (double)
        ChunkSize:  50x80
        Filters:  deflate(9)
        FillValue:  0.000000

h5disp("myFileSZIP.h5")

HDF5 myFileSZIP.h5 
Group '/' 
    Dataset 'myDatasetSZIP' 
        Size:  1000x2000
        MaxSize:  1000x2000
        Datatype:   H5T_IEEE_F64LE (double)
        ChunkSize:  50x80
        Filters:  szip
        FillValue:  0.000000

Write randomized data to each dataset.

myData = rand([1000 2000]);
h5write("myFileDeflate.h5","/myDatasetDeflate",myData)
h5write("myFileSZIP.h5","/myDatasetSZIP",myData)

Compare the compression filters by examining the sizes of the resulting files. For this data, the deflate filter provides greater compression.

deflateListing = dir("myFileDeflate.h5");
SZIPListing = dir("myFileSZIP.h5");
deflateFileSize = deflateListing.bytes

deflateFileSize = 
15117631

SZIPFileSize = SZIPListing.bytes

SZIPFileSize = 
16027320

sizeRatio = deflateFileSize/SZIPFileSize

sizeRatio = 
0.9432

Create Dataset with Unlimited Dimension

Open Live Script

Create a two-dimensional dataset myDataset3 that is unlimited along the second dimension. You must specify the ChunkSize name-value argument when setting any dimension of the dataset to Inf.

h5create("myFile.h5","/myDataset3",[200 Inf],ChunkSize=[20 20])

Write data to myDataset3. You can write data of any size along the second dimension because this dimension is unlimited. Additionally, because one dimension of the dataset is unlimited, you must specify the start and count arguments when writing data to the dataset.

myData = rand(200,500);
h5write("myFile.h5","/myDataset3",myData,[1 1],[200 500])

Display the entire contents of the HDF5 file.

h5disp("myFile.h5")

HDF5 myFile.h5 
Group '/' 
    Dataset 'myDataset3' 
        Size:  200x500
        MaxSize:  200xInf
        Datatype:   H5T_IEEE_F64LE (double)
        ChunkSize:  20x20
        Filters:  none
        FillValue:  0.000000

Input Arguments

collapse all

`filename` — Name of HDF5 file
string scalar | character vector

Name of the HDF5 file, specified as a string scalar or character vector. If filename does not already exist, then the h5create function creates the file.

Depending on the location to which you are writing, filename can take one of these forms.

Location

Form

Current folder

To write to the current folder, specify the name of the file in filename.

Example: "myFile.h5"

Other folders

To write to a folder different from the current folder, specify the full or relative path name in filename.

Example: "C:\myFolder\myFile.h5"

Example: "/myFolder/myFile.h5"

Remote location

To write to a remote location, specify filename as a uniform resource locator (URL) of the form:

scheme_name://path_to_file/my_file.h5

Based on the remote location, scheme_name can be one of the values in this table.

Remote Location	`scheme_name`
Amazon S3™	`s3`
Windows Azure^® Blob Storage	`wasb`, `wasbs`

For more information, see Work with Remote Data.

Example: "s3://my_bucket/my_path/my_file.h5"

`ds` — Dataset name
string scalar | character vector

Dataset name, specified as a string scalar or character vector containing the full pathname of the dataset to be created. If you specify a dataset that does not currently exist, then the h5create function creates the dataset. Additionally, if you specify intermediate groups that do not currently exist, then the h5create function creates those groups.

Example: "/myDataset"

Example: "/g1/g2/myNestedDataset"

`sz` — Dataset size
scalar | row vector

Dataset size, specified as a scalar or row vector. To specify an unlimited dimension, specify the corresponding element of sz as Inf. In this case, you must also specify ChunkSize.

Example: 50

Example: [2000 1000]

Example: [100 200 Inf]

Data Types: double

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: h5create("myFile.h5","/dataset1",[1000 2000],ChunkSize=[50 80],CustomFilterID=307,CustomFilterParameters=6) creates the 1000-by-2000 dataset dataset1 in the HDF5 file myFile.h5 using 50-by-80 chunks, the registered bzip2 filter (identifier 307), and a compression block size of 6.

`Datatype` — Data type of dataset
`"double"` (default) | `"single"` | `"uint64"` | `"uint32"` | `"uint16"` | `…`

Data type of the dataset, specified as one of these values, representing MATLAB^® data types:

"double"
"single"
"uint64"
"int64"
"uint32"
"int32"
"uint16"
"int16"
"uint8"
"int8"
"string"

Data Types: string | char

`ChunkSize` — Chunk size
scalar | row vector

Chunk size, specified as a scalar or row vector containing the dimensions of the chunk. If any entry of sz is Inf, then you must specify ChunkSize. The length of ChunkSize must equal the length of sz, and each entry of ChunkSize must be less than or equal to the corresponding entry of sz.

Example: 10

Example: [20 10 100]

Data Types: double

`Deflate` — Deflate compression level
`0` (default) | integer scalar value from 0 to 9

Deflate compression level, specified as an integer scalar value from 0 to 9. The default value of 0 indicates no compression. A value of 1 indicates the least compression, and a value of 9 indicates the most. If you specify Deflate, you must also specify ChunkSize.

You cannot specify both Deflate and SZIPEncodingMethod in the same function call.

Data Types: double

`FillValue` — Fill value for missing data
`0` (default) | numeric value

Fill value for missing data in numeric datasets, specified as a numeric value.

`Fletcher32` — 32-bit Fletcher checksum filter
`false` or `0` (default) | `true` or `1`

32-bit Fletcher checksum filter, specified as a numeric or logical 1 (true) or 0 (false). A Fletcher checksum filter verifies that the transferred data in a file is error-free. If you specify Fletcher32, you must also specify ChunkSize.

Data Types: logical | double

`Shuffle` — Shuffle filter
`false` or `0` (default) | `true` or `1`

Shuffle filter, specified as a numeric or logical 1 (true) or 0 (false). A shuffle filter improves the compression ratio by rearranging the byte order of data stored in memory. If you specify Shuffle, you must also specify ChunkSize.

Data Types: logical | double

`TextEncoding` — Text encoding
`"UTF-8"` (default) | `"system"`

Text encoding, specified as one of these values:

"UTF-8" — Represent characters using UTF-8 encoding.
"system" — Represent characters as bytes using the system encoding (not recommended).

Data Types: string | char

`CustomFilterID` — Filter identifier
positive integer

Filter identifier for the registered filter plugin assigned by The HDF Group, specified as a positive integer. For a list of registered filters, see the Filters page on The HDF Group website.

If you do not specify a value for CustomFilterID, then the dataset does not use dynamically loaded filters for compression.

If you specify CustomFilterID, you must also specify ChunkSize.

`CustomFilterParameters` — Filter parameters
numeric scalar | numeric row vector

Filter parameters for third-party filters, specified as a numeric scalar or numeric row vector. If you specify CustomFilterID without also specifying this argument, then the h5create function passes an empty vector to the HDF5 library and the filter uses default parameters.

This name-value argument corresponds to the cd_values argument of the H5Pset_filter function in the HDF5 library.

If you specify CustomFilterParameters, you must also specify CustomFilterID.

`SZIPEncodingMethod` — Encoding method for SZIP compression
`"entropy"` | `"nearestneighbor"`

Since R2024b

Encoding method for SZIP compression, specified as "entropy" or "nearestneighbor". The entropy method is best suited for data that has already been processed; the nearestneighbor method preprocesses the data and then applies the entropy method. If you specify SZIPEncodingMethod, you must also specify ChunkSize.

You cannot specify both SZIPEncodingMethod and Deflate in the same function call.

Data Types: string | char

`SZIPPixelsPerBlock` — Number of pixels per block for SZIP compression
`16` (default) | even integer from 2 to 32

Since R2024b

Number of pixels (HDF5 data elements) per block for SZIP compression, specified as an even integer from 2 to 32. If you specify SZIPPixelsPerBlock, you must also specify SZIPEncodingMethod. The value of SZIPPixelsPerBlock must be less than or equal to the number of elements in each dataset chunk.

Example: 32

More About

collapse all

Chunk Storage in HDF5

Chunk storage refers to a method of storing a dataset in memory by dividing it into smaller pieces of data known as chunks. Chunking a dataset can improve performance when operating on a subset of the dataset, since the chunks can be read and written to the HDF5 file individually.

Tips

To enable both the deflate and SZIP filters on the same dataset, use the low-level H5P.set_deflate and H5P.set_szip functions.

Version History

Introduced in R2011a

expand all

R2024b: Create datasets with SZIP compression

You can create datasets with SZIP compression by using the SZIPEncodingMethod and SZIPPixelsPerBlock name-value arguments.

R2022a: Use dynamically loaded filters to create dataset

You can use the CustomFilterID and CustomFilterParameters name-value arguments to enable compression using dynamically loaded filters.

R2020b: Create HDF5 files at a remote location

You can create HDF5 files in remote locations, such as Amazon S3, Windows Azure Blob Storage, and HDFS™.

R2020b: Create HDF5 files with Unicode names

You can create HDF5 files whose names are encoded as Unicode characters.

h5create

Syntax

Description

Examples

Create Fixed-Size Dataset

Create and Compare Datasets with Compression

Create Dataset with Unlimited Dimension

Input Arguments

`filename` — Name of HDF5 file
string scalar | character vector

`ds` — Dataset name
string scalar | character vector

`sz` — Dataset size
scalar | row vector

Name-Value Arguments

`Datatype` — Data type of dataset
`"double"` (default) | `"single"` | `"uint64"` | `"uint32"` | `"uint16"` | `…`

`ChunkSize` — Chunk size
scalar | row vector

`Deflate` — Deflate compression level
`0` (default) | integer scalar value from 0 to 9

`FillValue` — Fill value for missing data
`0` (default) | numeric value

`Fletcher32` — 32-bit Fletcher checksum filter
`false` or `0` (default) | `true` or `1`

`Shuffle` — Shuffle filter
`false` or `0` (default) | `true` or `1`

`TextEncoding` — Text encoding
`"UTF-8"` (default) | `"system"`

`CustomFilterID` — Filter identifier
positive integer

`CustomFilterParameters` — Filter parameters
numeric scalar | numeric row vector

`SZIPEncodingMethod` — Encoding method for SZIP compression
`"entropy"` | `"nearestneighbor"`

`SZIPPixelsPerBlock` — Number of pixels per block for SZIP compression
`16` (default) | even integer from 2 to 32

More About

Chunk Storage in HDF5

Tips

Version History

R2024b: Create datasets with SZIP compression

R2022a: Use dynamically loaded filters to create dataset

R2020b: Create HDF5 files at a remote location

R2020b: Create HDF5 files with Unicode names

See Also

Topics

h5create

Syntax

Description

Examples

Create Fixed-Size Dataset

Create and Compare Datasets with Compression

Create Dataset with Unlimited Dimension

Input Arguments

filename — Name of HDF5 file string scalar | character vector

ds — Dataset name string scalar | character vector

sz — Dataset size scalar | row vector

Name-Value Arguments

Datatype — Data type of dataset "double" (default) | "single" | "uint64" | "uint32" | "uint16" | …

ChunkSize — Chunk size scalar | row vector

Deflate — Deflate compression level 0 (default) | integer scalar value from 0 to 9

FillValue — Fill value for missing data 0 (default) | numeric value

Fletcher32 — 32-bit Fletcher checksum filter false or 0 (default) | true or 1

Shuffle — Shuffle filter false or 0 (default) | true or 1

TextEncoding — Text encoding "UTF-8" (default) | "system"

CustomFilterID — Filter identifier positive integer

CustomFilterParameters — Filter parameters numeric scalar | numeric row vector

SZIPEncodingMethod — Encoding method for SZIP compression "entropy" | "nearestneighbor"

SZIPPixelsPerBlock — Number of pixels per block for SZIP compression 16 (default) | even integer from 2 to 32

More About

Chunk Storage in HDF5

Tips

Version History

R2024b: Create datasets with SZIP compression

R2022a: Use dynamically loaded filters to create dataset

R2020b: Create HDF5 files at a remote location

R2020b: Create HDF5 files with Unicode names

See Also

Topics

`filename` — Name of HDF5 file
string scalar | character vector

`ds` — Dataset name
string scalar | character vector

`sz` — Dataset size
scalar | row vector

`Datatype` — Data type of dataset
`"double"` (default) | `"single"` | `"uint64"` | `"uint32"` | `"uint16"` | `…`

`ChunkSize` — Chunk size
scalar | row vector

`Deflate` — Deflate compression level
`0` (default) | integer scalar value from 0 to 9

`FillValue` — Fill value for missing data
`0` (default) | numeric value

`Fletcher32` — 32-bit Fletcher checksum filter
`false` or `0` (default) | `true` or `1`

`Shuffle` — Shuffle filter
`false` or `0` (default) | `true` or `1`

`TextEncoding` — Text encoding
`"UTF-8"` (default) | `"system"`

`CustomFilterID` — Filter identifier
positive integer

`CustomFilterParameters` — Filter parameters
numeric scalar | numeric row vector

`SZIPEncodingMethod` — Encoding method for SZIP compression
`"entropy"` | `"nearestneighbor"`

`SZIPPixelsPerBlock` — Number of pixels per block for SZIP compression
`16` (default) | even integer from 2 to 32