Main Content

DriftDiagnostics

Diagnostics information of batch drift detection

    Description

    DriftDiagnostics object stores the diagnostics information after performing permutation testing for batch drift detection

    Creation

    Create a DriftDiagnostics object by using detectdrift to test for drift between baseline and target data sets.

    Properties

    expand all

    This property is read-only.

    Baseline data set, specified as a numeric array, categorical array, or a table.

    Data Types: double | categorical | table

    This property is read-only.

    Indices of categorical variables in data, specified as a numeric array. If data does not have any categorical variables, then this property is empty ([]).

    Data Types: double

    This property is read-only.

    95% confidence interval bounds for estimated p-values for each variable, specified as a 2-by-k matrix of positive scalar values from 0 to 1, where k is the number of variables. The rows of ConfidenceIntervals correspond to the lower and upper bounds of the confidence intervals, respectively.

    If you set 'EstimatePValues' to false in the call to detectdrift, then the function does not compute the confidence interval bounds and the ConfidenceIntervals property has NaNs instead.

    Data Types: double

    This property is read-only.

    Drift status for each variable, specified as a string array with these possible values:

    Drift StatusCondition
    DriftUpper < DriftThreshold
    WarningDriftThreshold < Lower < WarningThreshold or DriftThreshold < Upper < WarningThreshold
    StableLower > WarningThreshold

    Lower and Upper are the lower and upper confidence interval bounds for an estimated p-value.

    Data Types: string

    This property is read-only.

    Threshold to determine drift status, specified as a scalar value from 0 to 1. If the upper bound of the confidence interval for the estimated p-value is below DriftThreshold, then drift status is 'Drift'.

    Data Types: double

    This property is read-only.

    List of metrics detectdrift uses to quantify the difference between baseline and target data for each variable during permutation testing, specified as a string array.

    Data Types: string

    This property is read-only.

    Metric values for the corresponding variables, specified as a row vector with the number of columns equal to the number of variables specified for drift detection. The metric corresponding to each variable is stored in the Metrics property.

    Data Types: double

    This property is read-only.

    Multiple hypothesis testing correction, specified as either 'Bonferroni' or 'FalseDiscoveryRate'.

    If you set 'EstimatePValues' to false in the call to detectdrift, then the function ignores the MultipleTestCorrection name-value argument.

    Data Types: string

    This property is read-only.

    Drift status for overall data detectdrift estimates using the multiple test correction method in MultipleTestCorrection. Multiple test corrections provide a conservative estimate of drift status when the testing is done for multiple variables.

    If you set 'EstimatePValues' to false in the call to detectdrift, then the function does not populate MultipleTestDriftStatus.

    Data Types: string

    This property is read-only.

    Number of permutation tests detectdrift performs for each variable to determine the drift status for that variable, specified as an array of integer values.

    If you set 'EstimatePValues' to false in the call to detectdrift, then NumPermutations is a row vector of ones, corresponding to the baseline and target data as you provide, and the metric values are the initial computations using baseline and target data for each variable.

    Data Types: double

    This property is read-only.

    Permutation testing results for each variable, specified as a k-by-1 table, where k is the number of variables. Each row corresponds to one variable and holds a 1-by-1 cell array containing the metric values in a vector of size equal to the number of permutations for that variable. To access the metric values for the second variable, for example, use DDiagnostics.PermutationResults{2,1}{1,1}.

    If you set 'EstimatePValues' to false in the call to detectdrift, then PermutationResults holds only the initial metric values for each variable.

    You can visualize the test results using plotPermutationResults.

    Data Types: table

    This property is read-only.

    Estimated p-value for each variable, specified as a vector of scalar values from 0 to 1.

    If you set 'EstimatePValues' to false in the call to detectdrift, then PValues is a vector of NaNs.

    Data Types: double

    This property is read-only.

    Target data set, specified as a numeric array, categorical array, or a table.

    Data Types: single | double | categorical | table

    This property is read-only.

    Variables specified for drift detection in the call to detectdrift, specified as a string array.

    Data Types: string

    This property is read-only.

    Threshold to determine warning versus drift status, specified as a scalar value from 0 to 1.

    Data Types: double

    Object Functions

    ecdfCompute empirical cumulative distribution function (ecdf) for baseline and target data specified for drift detection
    histcountsCompute histogram bin counts for specified variables in baseline and target data for drift detection
    plotDriftStatusVisualize p-values and confidence intervals
    plotEmpiricalCDFVisualize empirical cumulative distribution function (ecdf) of a variable specified for drift detection
    plotHistogramVisualize histogram for a variable in drift detection
    plotPermutationResultsPlot histogram of permutation results for a variable
    summarySummary table for DriftDiagnostics object

    Examples

    collapse all

    Load the sample data.

    load humanactivity

    For details on the data set, enter Description at the command line.

    Assign the first 250 observations as baseline data and next 250 as target data for variables 1 to 15.

    baseline = feat(1:250,1:15);
    target = feat(251:500,1:15);

    Test for drift on all variables.

    DDiagnostics = detectdrift(baseline,target);

    Display a summary of the test results.

    summary(DDiagnostics)
        Multiple Test Correction Drift Status: Drift
    
               DriftStatus    PValue       ConfidenceInterval   
               ___________    ______    ________________________
    
        x1      "Drift"       0.001     2.5317e-05     0.0055589
        x2      "Drift"       0.001     2.5317e-05     0.0055589
        x3      "Drift"       0.001     2.5317e-05     0.0055589
        x4      "Drift"       0.001     2.5317e-05     0.0055589
        x5      "Drift"       0.001     2.5317e-05     0.0055589
        x6      "Drift"       0.001     2.5317e-05     0.0055589
        x7      "Drift"       0.001     2.5317e-05     0.0055589
        x8      "Stable"      0.863        0.84012       0.88372
        x9      "Stable"      0.726        0.69722       0.75344
        x10     "Drift"       0.001     2.5317e-05     0.0055589
        x11     "Stable"      0.496        0.46456       0.52746
        x12     "Stable"      0.249        0.22247       0.27702
        x13     "Drift"       0.001     2.5317e-05     0.0055589
        x14     "Stable"      0.574        0.54267       0.60489
        x15     "Warning"     0.094       0.076629        0.1138
    

    Summary table shows the drift status and the estimated p-value for each variable tested for drift detection. You can also see the 95% confidence interval bounds for the p-values.

    Plot drift status for variables x10 to x15.

    plotDriftStatus(DDiagnostics,Variables=(10:15))

    Figure contains an axes object. The axes object with title Estimated P-Values and Confidence Intervals contains 5 objects of type errorbar, constantline. These objects represent Stable, Warning, Drift, Warning Threshold, Drift Threshold.

    Compute the ecdf values for variables x13 and x15.

    E = ecdf(DDiagnostics,Variables=["x13","x15"])
    E=2×3 table
                     x             F_Baseline         F_Target   
               ______________    ______________    ______________
    
        x13    {501x1 double}    {501x1 double}    {501x1 double}
        x15    {501x1 double}    {501x1 double}    {501x1 double}
    
    

    x contains the common domain over which ecdf computes the empirical cumulative distribution function for baseline and target data of a variable. Access the common domain for x13.

    E.x{1}
    ans = 501×1
    
        0.0420
        0.0420
        0.0423
        0.0424
        0.0424
        0.0425
        0.0425
        0.0426
        0.0426
        0.0426
          ⋮
    
    

    Access the ecdf values for x15 in baseline data .

    E.F_Baseline{2}
    ans = 501×1
    
             0
             0
        0.0040
        0.0080
        0.0080
        0.0080
        0.0080
        0.0080
        0.0120
        0.0120
          ⋮
    
    

    Plot the ecdf values for variables x13 and x15.

    tiledlayout(1,2)
    ax1 = nexttile;
    plotEmpiricalCDF(DDiagnostics,ax1,Variable="x13")
    ax2= nexttile;
    plotEmpiricalCDF(DDiagnostics,ax2,Variable="x15")

    Figure contains 2 axes objects. Axes object 1 with title ECDF for x13 contains 2 objects of type stair. These objects represent Baseline, Target. Axes object 2 with title ECDF for x15 contains 2 objects of type stair. These objects represent Baseline, Target.

    You can also visualize the permutation test results for a variable. Plot permutation results for variable x13.

    figure 
    plotPermutationResults(DDiagnostics,Variable="x13")

    Figure contains an axes object. The axes object with title Permutation Results for x13 contains 3 objects of type histogram, constantline. These objects represent $<$ 0.00072112, $\geq$ 0.00072112.

    The plot also shows the metric threshold value with a straight line. Based on the histogram of metric values obtained during permutation testing, the probability that a metric value being greater than the threshold value if baseline and target data for variable x13 had the same distribution is very small. The plot also displays the estimated p-value, 0.001, and the drift status decision, Drift, right below the plot title.

    Generate baseline and target data with three variables, where the distribution parameters of the second and third variables change for target data.

    rng('default') % For reproducibility
    baseline = [normrnd(0,1,100,1),wblrnd(1.1,1,100,1),betarnd(1,2,100,1)];
    target = [normrnd(0,1,100,1),wblrnd(1.2,2,100,1),betarnd(1.7,2.8,100,1)];

    Compute the initial metrics for all variables between the baseline and target data without estimating p-values.

    DDiagnostics = detectdrift(baseline,target,EstimatePValues=false)
    DDiagnostics = 
      DriftDiagnostics
    
               VariableNames: ["x1"    "x2"    "x3"]
        CategoricalVariables: []
                     Metrics: ["Wasserstein"    "Wasserstein"    "Wasserstein"]
                MetricValues: [0.2022 0.3468 0.0559]
    
    
      Properties, Methods
    
    

    detectdrift only computes the initial metrics value for each variable using the baseline and target data. The properties associated with permutation testing and p-value estimation are either empty or contain NaNs.

    summary(DDiagnostics)
              MetricValue       Metric    
              ___________    _____________
    
        x1      0.20215      "Wasserstein"
        x2      0.34676      "Wasserstein"
        x3     0.055922      "Wasserstein"
    

    summary method only displays the metrics used and the initial metric value for each of the specified variables.

    plotDriftStatus and plotPermutationResults do not produce plots and return warning messages. plotEmpiricalCDF and plotHistogram plot the ecdf and the histogram, respectively, for the first variable by default. They both return NaN for the p-value and drift status associated with the variable.

    plotEmpiricalCDF(DDiagnostics)

    Figure contains an axes object. The axes object with title ECDF for x1 contains 2 objects of type stair. These objects represent Baseline, Target.

    plotHistogram(DDiagnostics)

    Figure contains an axes object. The axes object with title Histogram for x1 contains 2 objects of type bar. These objects represent Baseline, Target.

    Version History

    Introduced in R2022a