Main Content

bindata

Binned predictor variables

Description

example

bdata = bindata(sc) binned predictor variables returned as a table. This is a table of the same size as the data input, but only the predictors specified in the creditscorecard object's PredictorVars property are binned and the remaining ones are unchanged.

example

bdata = bindata(sc,data) returns a table of binned predictor variables. bindata returns a table of the same size as the creditscorecard data, but only the predictors specified in the creditscorecard object's PredictorVars property are binned and the remaining ones are unchanged.

example

bdata = bindata(sc,Name,Value) binned predictor variables returned as a table using optional name-value pair arguments. This is a table of the same size as the data input, but only the predictors specified in the creditscorecard object's PredictorVars property are binned and the remaining ones are unchanged.

Examples

collapse all

This example shows how to use the bindata function to simply bin or discretize data.

Suppose bin ranges of

  • '0 to 30'

  • '31 to 50'

  • '51 and up'

are determined for the age variable (via manual or automatic binning). If a data point with age 41 is given, binning this data point means placing it in the bin for 41 years old, which is the second bin, or the '31 to 50' bin. Binning is then the mapping from the original data, into discrete groups or bins. In this example, you can say that a 41-year old is mapped into bin number 2, or that it is binned into the '31 to 50' category. If you know the Weight of Evidence (WOE) value for each of the three bins, you could also replace the data point 41 with the WOE value corresponding to the second bin. bindata supports the three binning formats just mentioned:

  • Bin number (where the 'OutputType' name-value pair argument is set to 'BinNumber'); this is the default option, and in this case, 41 is mapped to bin 2.

  • Categorical (where the 'OutputType' name-value pair argument is set to 'Categorical'); in this case, 41 is mapped to the '31 to 50' bin.

  • WOE value (where the 'OutputType' name-value pair argument is set to 'WOE'); in this case, 41 is mapped to the WOE value of bin number 2.

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011). Use the 'IDVar' argument to indicate that 'CustID' contains ID information and should not be included as a predictor variable.

load CreditCardData 
sc = creditscorecard(data,'IDVar','CustID')
sc = 
  creditscorecard with properties:

                GoodLabel: 0
              ResponseVar: 'status'
               WeightsVar: ''
                 VarNames: {'CustID'  'CustAge'  'TmAtAddress'  'ResStatus'  'EmpStatus'  'CustIncome'  'TmWBank'  'OtherCC'  'AMBalance'  'UtilRate'  'status'}
        NumericPredictors: {'CustAge'  'TmAtAddress'  'CustIncome'  'TmWBank'  'AMBalance'  'UtilRate'}
    CategoricalPredictors: {'ResStatus'  'EmpStatus'  'OtherCC'}
           BinMissingData: 0
                    IDVar: 'CustID'
            PredictorVars: {'CustAge'  'TmAtAddress'  'ResStatus'  'EmpStatus'  'CustIncome'  'TmWBank'  'OtherCC'  'AMBalance'  'UtilRate'}
                     Data: [1200x11 table]

Perform automatic binning.

sc = autobinning(sc);

Show the bin information for 'CustAge'.

bininfo(sc,'CustAge')
ans=8×6 table
         Bin         Good    Bad     Odds        WOE       InfoValue
    _____________    ____    ___    ______    _________    _________

    {'[-Inf,33)'}     70      53    1.3208     -0.42622     0.019746
    {'[33,37)'  }     64      47    1.3617     -0.39568     0.015308
    {'[37,40)'  }     73      47    1.5532     -0.26411    0.0072573
    {'[40,46)'  }    174      94    1.8511    -0.088658     0.001781
    {'[46,48)'  }     61      25      2.44      0.18758    0.0024372
    {'[48,58)'  }    263     105    2.5048      0.21378     0.013476
    {'[58,Inf]' }     98      26    3.7692      0.62245       0.0352
    {'Totals'   }    803     397    2.0227          NaN     0.095205

These are the first 10 age values in the original data, used to create the creditscorecard object.

data(1:10,'CustAge')
ans=10×1 table
    CustAge
    _______

      53   
      61   
      47   
      50   
      68   
      65   
      34   
      50   
      50   
      49   

Bin scorecard data into bin numbers (default behavior).

bdata = bindata(sc);

According to the bin information, the first age should be mapped into the fourth bin, the second age into the fifth bin, etc. These are the first 10 binned ages, in bin-number format.

bdata(1:10,'CustAge')
ans=10×1 table
    CustAge
    _______

       6   
       7   
       5   
       6   
       7   
       7   
       2   
       6   
       6   
       6   

Bin the scorecard data and show their bin labels. To do this, set the bindata name-value pair argument for 'OutputType' to 'Categorical'.

bdata = bindata(sc,'OutputType','Categorical');

These are the first 10 binned ages, in categorical format.

bdata(1:10,'CustAge')
ans=10×1 table
    CustAge 
    ________

    [48,58) 
    [58,Inf]
    [46,48) 
    [48,58) 
    [58,Inf]
    [58,Inf]
    [33,37) 
    [48,58) 
    [48,58) 
    [48,58) 

Convert the scorecard data to WOE values. To do this, set the bindata name-value pair argument for 'OutputType' to 'WOE'.

bdata = bindata(sc,'OutputType','WOE');

These are the first 10 binned ages, in WOE format. The ages are mapped to the WOE values that are internally displayed using the bininfo function.

bdata(1:10,'CustAge')
ans=10×1 table
    CustAge 
    ________

     0.21378
     0.62245
     0.18758
     0.21378
     0.62245
     0.62245
    -0.39568
     0.21378
     0.21378
     0.21378

This example shows how to use the bindata function's optional input for the data to bin. If not provided, bindata bins the creditscorecard training data. However, if a different dataset needs to be binned, for example, some "test" data, this can be passed into bindata as an optional input.

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011). Use the 'IDVar' argument to indicate that 'CustID' contains ID information and should not be included as a predictor variable.

load CreditCardData 
sc = creditscorecard(data,'IDVar','CustID')
sc = 
  creditscorecard with properties:

                GoodLabel: 0
              ResponseVar: 'status'
               WeightsVar: ''
                 VarNames: {'CustID'  'CustAge'  'TmAtAddress'  'ResStatus'  'EmpStatus'  'CustIncome'  'TmWBank'  'OtherCC'  'AMBalance'  'UtilRate'  'status'}
        NumericPredictors: {'CustAge'  'TmAtAddress'  'CustIncome'  'TmWBank'  'AMBalance'  'UtilRate'}
    CategoricalPredictors: {'ResStatus'  'EmpStatus'  'OtherCC'}
           BinMissingData: 0
                    IDVar: 'CustID'
            PredictorVars: {'CustAge'  'TmAtAddress'  'ResStatus'  'EmpStatus'  'CustIncome'  'TmWBank'  'OtherCC'  'AMBalance'  'UtilRate'}
                     Data: [1200x11 table]

Perform automatic binning.

sc = autobinning(sc);

Show the bin information for 'CustAge'.

bininfo(sc,'CustAge')
ans=8×6 table
         Bin         Good    Bad     Odds        WOE       InfoValue
    _____________    ____    ___    ______    _________    _________

    {'[-Inf,33)'}     70      53    1.3208     -0.42622     0.019746
    {'[33,37)'  }     64      47    1.3617     -0.39568     0.015308
    {'[37,40)'  }     73      47    1.5532     -0.26411    0.0072573
    {'[40,46)'  }    174      94    1.8511    -0.088658     0.001781
    {'[46,48)'  }     61      25      2.44      0.18758    0.0024372
    {'[48,58)'  }    263     105    2.5048      0.21378     0.013476
    {'[58,Inf]' }     98      26    3.7692      0.62245       0.0352
    {'Totals'   }    803     397    2.0227          NaN     0.095205

For the purpose of illustration, take a few rows from the original data as "test" data and display the first 10 age values in the test data.

tdata = data(101:110,:);
tdata(1:10,'CustAge')
ans=10×1 table
    CustAge
    _______

      34   
      59   
      64   
      61   
      28   
      65   
      55   
      37   
      49   
      51   

Convert the test data to WOE values. To do this, set the bindata name-value pair argument for 'OutputType' to 'WOE', passing the test data (tdata) as an optional input.

bdata = bindata(sc,tdata,'OutputType','WOE')
bdata=10×11 table
    CustID    CustAge     TmAtAddress    ResStatus    EmpStatus    CustIncome    TmWBank     OtherCC     AMBalance    UtilRate    status
    ______    ________    ___________    _________    _________    __________    ________    ________    _________    ________    ______

     101      -0.39568     -0.087767     -0.095564      0.2418     -0.011271      0.76889    0.053364    -0.11274     0.048576      0   
     102       0.62245       0.14288      0.019329    -0.19947       0.20579     -0.13107    -0.26832    -0.11274     0.048576      1   
     103       0.62245       0.02263      0.019329      0.2418       0.47972     -0.12109    0.053364     0.24418     0.092164      0   
     104       0.62245       0.02263     -0.095564      0.2418       0.47972     -0.12109    0.053364     0.24418     0.048576      0   
     105      -0.42622       0.02263      0.019329      0.2418      -0.06843      0.76889    0.053364    -0.11274     0.092164      0   
     106       0.62245       0.02263      0.019329    -0.19947       0.20579     -0.13107    0.053364    -0.11274     -0.22899      0   
     107       0.21378     -0.087767     -0.095564      0.2418       0.47972      0.26704    0.053364    -0.11274     0.048576      0   
     108      -0.26411     -0.087767      0.019329    -0.19947      -0.29217     -0.13107    0.053364    -0.11274     0.048576      0   
     109       0.21378     -0.087767     -0.095564      0.2418     -0.026696     -0.13107    0.053364     0.24418     0.048576      0   
     110       0.21378     -0.087767      0.019329      0.2418       0.20579     -0.13107    0.053364    -0.29895     -0.22899      0   

These are the first 10 binned ages, in WOE format. The ages are mapped to the WOE values displayed internally by bininfo.

bdata(1:10,'CustAge')
ans=10×1 table
    CustAge 
    ________

    -0.39568
     0.62245
     0.62245
     0.62245
    -0.42622
     0.62245
     0.21378
    -0.26411
     0.21378
     0.21378

Create a creditscorecard object using the CreditCardData.mat file to load the data with missing values. The variables CustAge and ResStatus have missing values.

load CreditCardData.mat 
head(dataMissing,5)
    CustID    CustAge    TmAtAddress     ResStatus     EmpStatus    CustIncome    TmWBank    OtherCC    AMBalance    UtilRate    status
    ______    _______    ___________    ___________    _________    __________    _______    _______    _________    ________    ______

      1          53          62         <undefined>    Unknown        50000         55         Yes       1055.9        0.22        0   
      2          61          22         Home Owner     Employed       52000         25         Yes       1161.6        0.24        0   
      3          47          30         Tenant         Employed       37000         61         No        877.23        0.29        0   
      4         NaN          75         Home Owner     Employed       53000         20         Yes       157.37        0.08        0   
      5          68          56         Home Owner     Employed       53000         14         Yes       561.84        0.11        0   

Use creditscorecard with the name-value argument 'BinMissingData' set to true to bin the missing numeric or categorical data in a separate bin. Apply automatic binning.

sc = creditscorecard(dataMissing,'IDVar','CustID','BinMissingData',true);
sc = autobinning(sc);

disp(sc)
  creditscorecard with properties:

                GoodLabel: 0
              ResponseVar: 'status'
               WeightsVar: ''
                 VarNames: {'CustID'  'CustAge'  'TmAtAddress'  'ResStatus'  'EmpStatus'  'CustIncome'  'TmWBank'  'OtherCC'  'AMBalance'  'UtilRate'  'status'}
        NumericPredictors: {'CustAge'  'TmAtAddress'  'CustIncome'  'TmWBank'  'AMBalance'  'UtilRate'}
    CategoricalPredictors: {'ResStatus'  'EmpStatus'  'OtherCC'}
           BinMissingData: 1
                    IDVar: 'CustID'
            PredictorVars: {'CustAge'  'TmAtAddress'  'ResStatus'  'EmpStatus'  'CustIncome'  'TmWBank'  'OtherCC'  'AMBalance'  'UtilRate'}
                     Data: [1200x11 table]

Display and plot bin information for numeric data for 'CustAge' that includes missing data in a separate bin labelled <missing>.

[bi,cp] = bininfo(sc,'CustAge');
disp(bi)
         Bin         Good    Bad     Odds       WOE       InfoValue 
    _____________    ____    ___    ______    ________    __________

    {'[-Inf,33)'}     69      52    1.3269    -0.42156      0.018993
    {'[33,37)'  }     63      45       1.4    -0.36795      0.012839
    {'[37,40)'  }     72      47    1.5319     -0.2779     0.0079824
    {'[40,46)'  }    172      89    1.9326    -0.04556     0.0004549
    {'[46,48)'  }     59      25      2.36     0.15424     0.0016199
    {'[48,51)'  }     99      41    2.4146     0.17713     0.0035449
    {'[51,58)'  }    157      62    2.5323     0.22469     0.0088407
    {'[58,Inf]' }     93      25      3.72     0.60931      0.032198
    {'<missing>'}     19      11    1.7273    -0.15787    0.00063885
    {'Totals'   }    803     397    2.0227         NaN      0.087112
plotbins(sc,'CustAge')

Figure contains an axes object. The axes object with title CustAge, ylabel WOE contains 3 objects of type bar, line. These objects represent Good, Bad.

Display and plot bin information for categorical data for 'ResStatus' that includes missing data in a separate bin labelled <missing>.

[bi,cg] = bininfo(sc,'ResStatus');
disp(bi)
         Bin          Good    Bad     Odds        WOE       InfoValue 
    ______________    ____    ___    ______    _________    __________

    {'Tenant'    }    296     161    1.8385    -0.095463     0.0035249
    {'Home Owner'}    352     171    2.0585     0.017549    0.00013382
    {'Other'     }    128      52    2.4615      0.19637     0.0055808
    {'<missing>' }     27      13    2.0769     0.026469    2.3248e-05
    {'Totals'    }    803     397    2.0227          NaN     0.0092627
plotbins(sc,'ResStatus')

Figure contains an axes object. The axes object with title ResStatus, ylabel WOE contains 3 objects of type bar, line. These objects represent Good, Bad.

For the 'CustAge' and 'ResStatus' predictors, there is missing data (NaNs and <undefined>) in the training data, and the binning process estimates a WOE value of -0.15787 and 0.026469 respectively for missing data in these predictors, as shown above.

For the purpose of illustration, take a few rows from the original data as test data and introduce some missing data.

tdata = dataMissing(11:14,:);
tdata.CustAge(1) = NaN;
tdata.TmAtAddress(2) = NaN;
tdata.ResStatus(3) = '<undefined>';
tdata.EmpStatus(4) = '<undefined>';
disp(tdata)
    CustID    CustAge    TmAtAddress     ResStatus      EmpStatus     CustIncome    TmWBank    OtherCC    AMBalance    UtilRate    status
    ______    _______    ___________    ___________    ___________    __________    _______    _______    _________    ________    ______

      11        NaN           24        Tenant         Unknown          34000         44         Yes        119.8        0.07        1   
      12         48          NaN        Other          Unknown          44000         14         Yes       403.62        0.03        0   
      13         65           63        <undefined>    Unknown          48000          6         No        111.88        0.02        0   
      14         44           75        Other          <undefined>      41000         35         No        436.41        0.18        0   

Convert the test data to WOE values. To do this, set the bindata name-value pair argument for 'OutputType' to 'WOE', passing the test data tdata as an optional input.

bdata = bindata(sc,tdata,'OutputType','WOE');
disp(bdata)
    CustID    CustAge     TmAtAddress    ResStatus    EmpStatus    CustIncome    TmWBank     OtherCC     AMBalance    UtilRate    status
    ______    ________    ___________    _________    _________    __________    ________    ________    _________    ________    ______

      11      -0.15787      0.02263      -0.095463    -0.19947      -0.06843     -0.12109    0.053364     0.24418     0.048576      1   
      12       0.17713          NaN        0.19637    -0.19947       0.20579     -0.13107    0.053364     0.24418     0.092164      0   
      13       0.60931      0.02263       0.026469    -0.19947       0.47972     -0.25547    -0.26832     0.24418     0.092164      0   
      14      -0.04556      0.02263        0.19637         NaN     -0.011271     -0.12109    -0.26832     0.24418     0.048576      0   

For the 'CustAge' and 'ResStatus' predictors, because there is missing data in the training data, the missing values in the test data get mapped to the WOE value estimated for the <missing> bin. Therefore, a missing value for 'CustAge' is replaced with -0.15787, and a missing value for 'ResStatus' is replaced with 0.026469.

For 'TmAtAddress' and 'EmpStatus', the training data has no missing values, therefore there is no bin for missing data, and there is no way to estimate a WOE value for missing data. Therefore, for these predictors, the WOE transformation leaves missing values as missing (that is, sets a WOE value of NaN).

These rules apply when 'OutputType' is set to 'WOE' or 'WOEModelInput'. The rationale is that if a data-based WOE value exists for missing data, it should be used for the WOE transformation and for subsequent steps (for example, fitting a logistic model or scoring).

On the other hand, when 'OutputType' is set to 'BinNumber' or 'Categorical', bindata leaves missing values as missing, since this allows you to subsequently treat the missing data as you see fit.

For example, when 'OutputType' is set to 'BinNumber', missing values are set to NaN:

bdata = bindata(sc,tdata,'OutputType','BinNumber');
disp(bdata)
    CustID    CustAge    TmAtAddress    ResStatus    EmpStatus    CustIncome    TmWBank    OtherCC    AMBalance    UtilRate    status
    ______    _______    ___________    _________    _________    __________    _______    _______    _________    ________    ______

      11        NaN            2             1            1           3            3          2           1           2          1   
      12          6          NaN             3            1           6            2          2           1           1          0   
      13          8            2           NaN            1           7            1          1           1           1          0   
      14          4            2             3          NaN           5            3          1           1           2          0   

And when 'OutputType' is set to 'Categorical', missing values are set to '<undefined>':

bdata = bindata(sc,tdata,'OutputType','Categorical');
disp(bdata)
    CustID      CustAge      TmAtAddress     ResStatus      EmpStatus      CustIncome       TmWBank     OtherCC      AMBalance       UtilRate      status
    ______    ___________    ___________    ___________    ___________    _____________    _________    _______    _____________    ___________    ______

      11      <undefined>    [23,83)        Tenant         Unknown        [33000,35000)    [23,45)        Yes      [-Inf,558.88)    [0.04,0.36)      1   
      12      [48,51)        <undefined>    Other          Unknown        [42000,47000)    [12,23)        Yes      [-Inf,558.88)    [-Inf,0.04)      0   
      13      [58,Inf]       [23,83)        <undefined>    Unknown        [47000,Inf]      [-Inf,12)      No       [-Inf,558.88)    [-Inf,0.04)      0   
      14      [40,46)        [23,83)        Other          <undefined>    [40000,42000)    [23,45)        No       [-Inf,558.88)    [0.04,0.36)      0   

bindata supports the following types of WOE transformation:

  • When the 'OutputType' name-value argument is set to 'WOE', bindata simply applies the WOE transformation to all predictors and keeps the rest of the variables in the original data in place and unchanged.

  • When the 'OutputType' name-value pair argument is set to 'WOEModelInput', bindata returns a table that can be used directly as an input for fitting a logistic regression model for the scorecard. In this case, bindata:

  • Applies WOE transformation to all predictors.

  • Returns predictor variables, but no IDVar or unused variables are included in the output.

  • Includes the mapped response variable as the last column.

  • The fitmodel function calls bindata internally using the 'WOEModelInput' option to fit the logistic regression model for the creditscorecard model.

Create a creditscorecard object using the CreditCardData.mat file to load the data (using a dataset from Refaat 2011). Use the 'IDVar' argument to indicate that 'CustID' contains ID information and should not be included as a predictor variable.

load CreditCardData 
sc = creditscorecard(data,'IDVar','CustID')
sc = 
  creditscorecard with properties:

                GoodLabel: 0
              ResponseVar: 'status'
               WeightsVar: ''
                 VarNames: {'CustID'  'CustAge'  'TmAtAddress'  'ResStatus'  'EmpStatus'  'CustIncome'  'TmWBank'  'OtherCC'  'AMBalance'  'UtilRate'  'status'}
        NumericPredictors: {'CustAge'  'TmAtAddress'  'CustIncome'  'TmWBank'  'AMBalance'  'UtilRate'}
    CategoricalPredictors: {'ResStatus'  'EmpStatus'  'OtherCC'}
           BinMissingData: 0
                    IDVar: 'CustID'
            PredictorVars: {'CustAge'  'TmAtAddress'  'ResStatus'  'EmpStatus'  'CustIncome'  'TmWBank'  'OtherCC'  'AMBalance'  'UtilRate'}
                     Data: [1200x11 table]

Perform automatic binning.

sc = autobinning(sc);

Show the bin information for 'CustAge'.

bininfo(sc,'CustAge')
ans=8×6 table
         Bin         Good    Bad     Odds        WOE       InfoValue
    _____________    ____    ___    ______    _________    _________

    {'[-Inf,33)'}     70      53    1.3208     -0.42622     0.019746
    {'[33,37)'  }     64      47    1.3617     -0.39568     0.015308
    {'[37,40)'  }     73      47    1.5532     -0.26411    0.0072573
    {'[40,46)'  }    174      94    1.8511    -0.088658     0.001781
    {'[46,48)'  }     61      25      2.44      0.18758    0.0024372
    {'[48,58)'  }    263     105    2.5048      0.21378     0.013476
    {'[58,Inf]' }     98      26    3.7692      0.62245       0.0352
    {'Totals'   }    803     397    2.0227          NaN     0.095205

These are the first 10 age values in the original data, used to create the creditscorecard object.

data(1:10,'CustAge')
ans=10×1 table
    CustAge
    _______

      53   
      61   
      47   
      50   
      68   
      65   
      34   
      50   
      50   
      49   

Convert the test data to WOE values. To do this, set the bindata name-value pair argument for 'OutputType' to 'WOE'.

bdata = bindata(sc,'OutputType','WOE');

These are the first 10 binned ages, in WOE format. The ages are mapped to the WOE values displayed internally by bininfo.

bdata(1:10,'CustAge')
ans=10×1 table
    CustAge 
    ________

     0.21378
     0.62245
     0.18758
     0.21378
     0.62245
     0.62245
    -0.39568
     0.21378
     0.21378
     0.21378

These are the first 10 binned ages, in WOE format. The ages are mapped to the WOE values displayed internally by bininfo.

bdata(1:10,'CustAge')
ans=10×1 table
    CustAge 
    ________

     0.21378
     0.62245
     0.18758
     0.21378
     0.62245
     0.62245
    -0.39568
     0.21378
     0.21378
     0.21378

The size of the original data and the size of bdata output are the same because bindata leaves unused variables (such as 'IDVar') unchanged and in place.

whos data bdata
  Name          Size             Bytes  Class    Attributes

  bdata      1200x11            108987  table              
  data       1200x11             84603  table              

The response values are the same in the original data and in the binned data because, by default, bindata does not modify response values.

disp([data.status(1:10) bdata.status(1:10)])
     0     0
     0     0
     0     0
     0     0
     0     0
     0     0
     1     1
     0     0
     1     1
     1     1

When fitting a logistic regression model with WOE data, set the 'OutputType' name-value pair argument to 'WOEModelInput'.

bdata = bindata(sc,'OutputType','WOEModelInput');

The binned predictor data is the same as when the 'OutputType' name-value pair argument is set to 'WOE'.

bdata(1:10,'CustAge')
ans=10×1 table
    CustAge 
    ________

     0.21378
     0.62245
     0.18758
     0.21378
     0.62245
     0.62245
    -0.39568
     0.21378
     0.21378
     0.21378

However, the size of the original data and the size of bdata output are different. This is because bindata removes unused variables (such as 'IDVar').

whos data bdata
  Name          Size            Bytes  Class    Attributes

  bdata      1200x10            99167  table              
  data       1200x11            84603  table              

The response values are also modified in this case and are mapped so that "Good" is 1 and "Bad" is 0.

disp([data.status(1:10) bdata.status(1:10)])
     0     1
     0     1
     0     1
     0     1
     0     1
     0     1
     1     0
     0     1
     1     0
     1     0

Input Arguments

collapse all

Credit scorecard model, specified as a creditscorecard object. Use creditscorecard to create a creditscorecard object.

Data to bin given the rules set in the creditscorecard object, specified using a table. By default, data is set to the creditscorecard object's raw data.

Before creating a creditscorecard object, perform a data preparation task to have an appropriately structured data as input to a creditscorecard object.

Data Types: table

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: bdata = bindata(sc,'OutputType','WOE','ResponseFormat','Mapped')

Output format, specified as the comma-separated pair consisting of 'OutputType' and a character vector with the following values:

  • BinNumber — Returns the bin numbers corresponding to each observation.

  • Categorical — Returns the bin label corresponding to each observation.

  • WOE — Returns the Weight of Evidence (WOE) corresponding to each observation.

  • WOEModelInput — Use this option when fitting a model. This option:

    • Returns the Weight of Evidence (WOE) corresponding to each observation.

    • Returns predictor variables, but no IDVar or unused variables are included in the output.

    • Discards any predictors whose bins have Inf or NaN WOE values.

    • Includes the mapped response variable as the last column.

    Note

    When the bindata name-value pair argument 'OutputType' is set to 'WOEModelInput', the bdata output only contains the columns corresponding to predictors whose bins do not have Inf or NaN Weight of Evidence (WOE) values, and bdata includes the mapped response as the last column.

    Missing data (if any) are included in the bdata output as missing data as well, and do not influence the rules to discard predictors when 'OutputType' is set to 'WOEModelInput'.

Data Types: char

Response values format, specified as the comma-separated pair consisting of 'ResponseFormat' and a character vector with the following values:

  • RawData — The response variable is copied unchanged into the bdata output.

  • Mapped — The response values are modified (if necessary) so that "Good" is mapped to 1, and "Bad" is mapped to 0.

Data Types: char

Output Arguments

collapse all

Binned predictor variables, returned as a table. This is a table of the same size (see exception in the following Note) as the data input, but only the predictors specified in the creditscorecard object's PredictorVars property are binned and the remaining ones are unchanged.

Note

When the bindata name-value pair argument 'OutputType' is set to 'WOEModelInput', the bdata output only contains the columns corresponding to predictors whose bins do not have Inf or NaN Weight of Evidence (WOE) values, and bdata includes the mapped response as the last column.

Missing data (if any) are included in the bdata output as missing data as well, and do not influence the rules to discard predictors when 'OutputType' is set to 'WOEModelInput'.

References

[1] Anderson, R. The Credit Scoring Toolkit. Oxford University Press, 2007.

[2] Refaat, M. Credit Risk Scorecards: Development and Implementation Using SAS. lulu.com, 2011.

Version History

Introduced in R2014b