Main Content

subsetByReadIndices

Class: matlab.io.datastore.Subsettable
Package: matlab.io.datastore

Create subset of datastore or file-set with the specified read indices

Syntax

subds = subsetByReadIndices(ds,indices)

Description

subds = subsetByReadIndices(ds,indices) creates a subset of the specified datastore or file-set using the specified read indices. The subset subds is of the same type as the input.

Input Arguments

expand all

Input datastore or file-set, specified as a matlab.io.Datastore, FileSet, DsFileSet, or BlockedFileSet object.

Indices of files to include in the subset, specified as a numeric vector of indices or a logical vector. The subsetByReadIndices method creates a subset subds containing files corresponding to the elements in the logical vector that have a value of true.

  • numeric vector: Vector containing unique indices of files in the input datastore.

  • logical vector: Vector the same length as the number of files in the input datastore.

Attributes

Abstracttrue
Accessprotected

To learn about attributes of methods, see Method Attributes.

Examples

expand all

Build a datastore with subset processing support and use it to bring your data into MATLAB®.

Create a class definition file that contains the code implementing your datastore. Save this file in your working folder or in a folder that is on the MATLAB path. The name of the .m file must be the same as the name of your object constructor function. In this example, create the MyHDF5Datastore class in a file named MyHDF5Datastore.m. The .m class definition contains the following steps:

  • Step 1: Inherit from the matlab.io.Datastore and matlab.io.datastore.Subsettable classes.

  • Step 2: Define the constructor as well as the subsetByReadIndices and maxpartitions methods.

  • Step 3: Define your custom file-reading function. Here, the MyHDF5Datastore class creates and uses the listHDF5Datasets function.

%% STEP 1
classdef MyHDF5Datastore < matlab.io.Datastore ...
                       & matlab.io.datastore.Subsettable

    properties
        Filename            (1, 1) string
        Datasets            (:, 1) string {mustBeNonmissing} = "/"
        CurrentDatasetIndex (1, 1) double {mustBeInteger, mustBeNonnegative} = 1
    end

%% STEP 2
    methods
        function ds = MyHDF5Datastore(Filename, Location)
            arguments
                Filename (1, 1) string
                Location (1, 1) string {mustBeNonmissing} = "/"
            end

            ds.Filename = Filename;
            ds.Datasets = listHDF5Datasets(ds.Filename, Location);
        end

        function [data, info] = read(ds, varargin)
            if ~hasdata(ds)
                error(message("No more datasets to read."));
            end

            dataset = ds.Datasets(ds.CurrentDatasetIndex);
            data = { h5read(ds.Filename, dataset, varargin{:}) };
            if nargout > 1
                info =   h5info(ds.Filename, dataset);
            end

            ds.CurrentDatasetIndex = ds.CurrentDatasetIndex + 1;
        end

        function tf = hasdata(ds)
            tf = ds.CurrentDatasetIndex <= numel(ds.Datasets);
        end

        function reset(ds)
            ds.CurrentDatasetIndex = 1;
        end
    end

    methods (Access = protected)
        function subds = subsetByReadIndices(ds, indices)
            datasets = ds.Datasets(indices);

            subds = copy(ds);
            subds.Datasets = datasets;
            reset(subds);
        end

        function n = maxpartitions(ds)
            n = numel(ds.Datasets);
        end
    end
end

%% STEP 3
function datasets = listHDF5Datasets(filename, location, args)
    arguments
        filename (1, 1) string
        location (1, 1) string
        args.IncludeSubGroups (1, 1) logical = true
    end

    if strlength(location) == 0
        location = "/";
    end

    info = h5info(filename, location);

    datasets = listDatasetsInH5infoStruct(info, location, IncludeSubGroups=args.IncludeSubGroups);
end

function datasets = listDatasetsInH5infoStruct(S, location, args)
    arguments
        S (1, 1) struct
        location (1, 1) string
        args.IncludeSubGroups (1, 1) logical = true
    end

    datasets = string.empty(0, 1);

    if isfield(S, "Datatype")
        datasets = location;
    elseif isfield(S, "Datasets")
        if ~isempty(S.Datasets)
            datasets = location + "/" + {S.Datasets.Name}';
        end

        if args.IncludeSubGroups
            listFcn = @(group) listDatasetsInH5infoStruct(group, group.Name, IncludeSubGroups=true);
        else
            listFcn = @(group) string(group.Name);
        end

        childDatasets = arrayfun(listFcn, S.Groups, UniformOutput=false);
        childDatasets = vertcat(childDatasets{:});

        datasets = [datasets; childDatasets];
    end

end

Extended Capabilities

Version History

Introduced in R2022b