LearningPipeline

Machine learning pipeline

Since R2026a

Description

A LearningPipeline object is a container that holds and connects multiple steps (components) of a machine learning workflow as a direct acyclic graph (DAG). The pipeline components can contain data-dependent parameters, such as learnables. You can train a pipeline with training data and then use the trained pipeline for inference by passing new data through it. Create a pipeline by connecting one or more components or pipelines in series or in parallel.

Creation

You can create a pipeline using automatic or manual connections.

Automatic connections
Create and change a pipeline automatically by using the series and parallel object functions. You can also use these functions to combine pipelines and components. For example, after creating three components c1, c2, and c3, you can specify p1 = series(c1,c2); p2 = series(p1,c3).
The insert object function places components in a pipeline by automatically cutting and reconnecting the existing connections. The replace object function automatically makes the correct connections when the new and replaced components are compatible.
Automatic creation functions automatically solve naming issues (such as having multiple components or pipeline ports with the same name) and connect the components.
For more details on automatic connections, see Port Tags for Automatic Connection.
Manual connections
p = LearningPipeline creates a pipeline without processing components or connections. You can build upon the pipeline by using the add, remove, connect, and disconnect object functions.
The add and remove functions do not connect the newly added components or remaining components in the pipeline. After using add or remove, you must use the connect function to define connections.

Properties

expand all

The software sets pipeline properties when you create the pipeline. You can modify the pipeline properties (excluding Components, Connections, HasLearnables, and HasLearned) using dot notation at any time. You cannot modify the Components, Connections, HasLearnables, and HasLearned properties directly.

`Name` — Pipeline identifier
character vector | string scalar

Identifier of the pipeline, specified as a character vector or string scalar.

Data Types: char | string

`Inputs` — Names of input ports
character vector | string array | cell array of character vectors

Names of the input ports, specified as a character vector, string array, or cell array of character vectors.

Data Types: char | string | cell

`Outputs` — Names of output ports
character vector | string array | cell array of character vectors

Names of the output ports, specified as a character vector, string array, or cell array of character vectors.

Data Types: char | string | cell

`InputTags` — Tags that enable automatic connection of pipeline inputs
nonnegative integer vector

Tags that enable the automatic connection of the pipeline inputs with other pipelines or components, specified as a nonnegative integer vector. If you specify InputTags, the number of tags must match the number of inputs in Inputs.

Data Types: single | double

`OutputTags` — Tags that enable automatic connection of pipeline outputs
nonnegative integer vector

Tags that enable the automatic connection of the pipeline outputs with other pipelines or components, specified as a nonnegative integer vector. If you specify OutputTags, the number of tags must match the number of outputs in Outputs.

Data Types: single | double

`Components` — Pipeline components
structure

Pipeline components, specified as a structure where each field contains a component.

Data Types: struct

`Connections` — Paths along which data passes between components
Read-only: two-column table

This property is read-only.

Paths along which data passes between components, returned as a two-column table. Each row corresponds to a connection between two ports, a source and a destination. Data passes from the source to the destination.

Data Types: table

`HasLearnables` — Indicator for learnables
Read-only: `0` (`false`) | `1` (`true`)

This property is read-only.

Indicator for the learnables in the pipeline, returned as 0 (false) or 1 (true). A value of 1 indicates that the pipeline contains learnables in at least one component.

Data Types: logical

`HasLearned` — Indicator showing learning status of pipeline
Read-only: `0` (`false`) | `1` (`true`)

This property is read-only.

Indicator showing the learning status of the pipeline, returned as 0 (false) or 1 (true). A value of 1 indicates that the learn object function has been applied to the pipeline, and the learnables of the pipeline components are nonempty.

Data Types: logical

Object Functions

expand all

Automatic Connection

`insert`	Insert component or pipeline into existing pipeline
`parallel`	Connect components or pipelines in parallel to create pipeline
`replace`	Replace existing pipeline component with new component
`series`	Connect components in series to create pipeline

Manual Connection

`add`	Add new component or pipeline to existing pipeline
`connect`	Create connections between pipeline components
`disconnect`	Remove connections between ports in pipeline
`remove`	Remove existing components or pipelines from pipeline

Hierarchy

expand Expand subpipelines in pipeline

Execution

`learn`	Initialize and evaluate pipeline or component
`run`	Execute pipeline or component for inference after learning
`prune`	Remove unnecessary components and dependencies from pipeline
`reset`	Reset pipeline or component
`crossvalidate`	Cross-validate pipeline

Visualization and Description

`view`	View diagram of pipeline inputs, outputs, components, and connections
`describe`	Display summary of pipeline components

Deployment

package Create deployable archive or standalone application from pipeline

Examples

collapse all

Create and Explore Simple Pipeline

Create and explore a pipeline with two components.

Create a component for removing missing data.

removeMissing = observationRemoverComponent

removeMissing = 

  observationRemoverComponent with properties:

              Name: "ObservationRemover"
            Inputs: ["DataIn1"    "DataIn2"]
         InputTags: [1 2]
           Outputs: ["DataOut1"    "DataOut2"]
        OutputTags: [1 2]

   
Structural Parameters (locked)
       NumDataFlow: 2
    ReferenceInput: 1
    FunctionHandle: @ismissing

   
Run Parameters (unlocked)
        RunRemoval: 0


Show all parameters

removeMissing has two inputs and two outputs.

Create a component for normalizing.

normalizer = normalizerComponent

normalizer = 

  normalizerComponent with properties:

             Name: "Normalizer"
           Inputs: "DataIn"
        InputTags: 1
          Outputs: "DataScaled"
       OutputTags: 1

   
Learnables (HasLearned = false)
            Scale: []
           Center: []
    UsedVariables: []


Show all parameters

The normalizer component has one input, one output, and three learnables: Scale, Center, and SelectedVariables. The learnable parameters have not been learned yet (the HasLearned property of the pipeline is false).

Create a pipeline that contains the removeMissing and normalizer components. Use the series object function to create the pipeline automatically.

pipeline = series(removeMissing,normalizer)

pipeline = 

  LearningPipeline with properties:

             Name: "defaultName"
           Inputs: ["DataIn1"    "DataIn2"]
        InputTags: [1 2]
          Outputs: ["DataScaled"    "DataOut2"]
       OutputTags: [1 2]

       Components: struct with 2 entries
      Connections: [5×2 table]

    HasLearnables: true
       HasLearned: false


Show summary of the components

Explore the Components property.

pipeline.Components

ans = 

  struct with fields:

    ObservationRemover: [1×1 observationRemoverComponent]
            Normalizer: [1×1 normalizerComponent]

The Components property is a structure with one field per component. You can further index into the components to explore their properties. For example, you can enter pipeline.Components.ObservationRemover.

Explore the pipeline connections.

pipeline.Connections

ans =

  5×2 table

               Source                        Destination         
    _____________________________    ____________________________

    "DataIn1"                        "ObservationRemover/DataIn1"
    "DataIn2"                        "ObservationRemover/DataIn2"
    "ObservationRemover/DataOut1"    "Normalizer/DataIn"         
    "Normalizer/DataScaled"          "DataScaled"                
    "ObservationRemover/DataOut2"    "DataOut2"

View the pipeline.

view(pipeline)

View pipeline

Algorithms

expand all

Port Tags for Automatic Connection

To connect components automatically, the software uses port tags. A port tag is a numeric value assigned to each component or pipeline port. The software makes connections between ports that have the same tag. Tags are used only to connect pipelines automatically (with functions like series, parallel, or insert). After a component or pipeline is connected, tags do not have any effect on the execution of the pipeline. Automatic connections are overridden when you manually add or remove edges of a pipeline (with functions like connect or disconnect).

Tags associate component ports to a specific data path through the pipeline. Data passes through the pipeline in one direction only—from inputs, through components, to outputs. Many pipelines have only one unique data path through the entire pipeline, called the main data path (tag value of 1). Some pipelines require more than one data path because distinct data variables have different processing requirements. By assigning different numeric tags to different component ports, the software can automatically connect a pipeline with distinct data paths.

The built-in machine learning components have tagged ports that support typical workflows. Machine learning components assume that the main data path through a machine learning pipeline takes raw data, turns it into predictor data, and finally returns predictions (tag value of 1). Machine learning pipelines can have two other distinct data paths: one for the observed response (tag value of 2) and another for the observation weights (tag value of 3). By following these data path conventions, the software automatically connects the built-in components to create a machine learning pipeline that matches a typical workflow.

A pipeline component can also have ports with external tags (tag value of 0) or no tags (tag value of NaN). When the software attempts to connect ports automatically, it creates pipeline input and output ports for component ports with 0 tags, without trying to pass information through other components. The software ignores component ports with NaN tags.

To better understand how components are connected in a given pipeline, you can use the describe and view object functions.

Version History

Introduced in R2026a

LearningPipeline

Description

Creation

Properties

Name — Pipeline identifier character vector | string scalar

Inputs — Names of input ports character vector | string array | cell array of character vectors

Outputs — Names of output ports character vector | string array | cell array of character vectors

InputTags — Tags that enable automatic connection of pipeline inputs nonnegative integer vector

OutputTags — Tags that enable automatic connection of pipeline outputs nonnegative integer vector

Components — Pipeline components structure

Connections — Paths along which data passes between components Read-only: two-column table

HasLearnables — Indicator for learnables Read-only: 0 (false) | 1 (true)

HasLearned — Indicator showing learning status of pipeline Read-only: 0 (false) | 1 (true)

Object Functions

Automatic Connection

Manual Connection

Hierarchy

Execution

Visualization and Description

Deployment

Examples

Create and Explore Simple Pipeline

Algorithms

Port Tags for Automatic Connection

Version History

See Also

`Name` — Pipeline identifier
character vector | string scalar

`Inputs` — Names of input ports
character vector | string array | cell array of character vectors

`Outputs` — Names of output ports
character vector | string array | cell array of character vectors

`InputTags` — Tags that enable automatic connection of pipeline inputs
nonnegative integer vector

`OutputTags` — Tags that enable automatic connection of pipeline outputs
nonnegative integer vector

`Components` — Pipeline components
structure

`Connections` — Paths along which data passes between components
Read-only: two-column table

`HasLearnables` — Indicator for learnables
Read-only: `0` (`false`) | `1` (`true`)

`HasLearned` — Indicator showing learning status of pipeline
Read-only: `0` (`false`) | `1` (`true`)