This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

ExhaustiveSearcher

Create exhaustive nearest neighbor searcher

Description

ExhaustiveSearcher model objects store the training data, distance metric, and parameter values of the distance metric for an exhaustive nearest neighbor search. The exhaustive search algorithm finds the distance from each query observation to all n observations in the training data, which is an n-by-K numeric matrix.

Once you create an ExhaustiveSearcher model object, find neighboring points in the training data to the query data by performing a nearest neighbor search using knnsearch or a radius search using rangesearch. The exhaustive search algorithm is more efficient than the Kd-tree algorithm when K is large (that is, K > 10), and it is more flexible than the Kd-tree algorithm with respect to distance metric choices. The ExhaustiveSearcher model object also supports sparse data.

Creation

Use either the createns function or the ExhaustiveSearcher function (described here) to create an ExhaustiveSearcher object. Both functions use the same syntax except that the createns function has the 'NSMethod' name-value pair argument, which you use to choose the nearest neighbor search method. The createns function also creates a KDTreeSearcher object. Specify 'NSMethod','exhaustive' to create an ExhaustiveSearcher object. The default is 'exhaustive' if K > 10, the training data is sparse, or the distance metric is not the Euclidean, city block, Chebychev, or Minkowski.

Syntax

Mdl = ExhaustiveSearcher(X)
Mdl = ExhaustiveSearcher(X,Name,Value)

Description

example

Mdl = ExhaustiveSearcher(X) creates an exhaustive nearest neighbor searcher object (Mdl) using the n-by-K numeric matrix of training data (X).

example

Mdl = ExhaustiveSearcher(X,Name,Value) specifies additional options using one or more name-value pair arguments. You can specify the distance metric and set the distance metric parameter (DistParameter) property. For example, ExhaustiveSearcher(X,'Distance','chebychev') creates an exhaustive nearest neighbor searcher object that uses the Chebychev distance. To specify DistParameter, use the Cov, P, or Scale name-value pair argument.

Input Arguments

expand all

Training data that prepares the exhaustive searcher algorithm, specified as a numeric matrix. X has n rows, each corresponding to an observation (that is, an instance or example), and K columns, each corresponding to a predictor (that is, a feature).

Data Types: single | double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'Distance','mahalanobis','Cov',eye(3) specifies to use the Mahalanobis distance when searching for nearest neighbors and a 3-by-3 identity matrix for the covariance matrix in the Mahalanobis distance metric.

Distance metric used when you call knnsearch or rangesearch to find nearest neighbors for future query points, specified as the comma-separated pair consisting of 'Distance' and a character vector, string scalar, or function handle.

This table describes the supported distance metrics specified as character vectors or string scalars.

ValueDescription
'chebychev'Chebychev distance (maximum coordinate difference).
'cityblock'City block distance.
'correlation'One minus the sample linear correlation between observations (treated as sequences of values).
'cosine'One minus the cosine of the included angle between observations (treated as row vectors).
'euclidean'Euclidean distance.
'hamming'Hamming distance, which is the percentage of coordinates that differ.
'jaccard'One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ.
'minkowski'Minkowski distance. The default exponent is 2. To specify a different exponent, use the 'P' name-value pair argument.
'mahalanobis'Mahalanobis distance, computed using a positive definite covariance matrix. To change the value of the covariance matrix, use the 'Cov' name-value pair argument.
'seuclidean'Standardized Euclidean distance. Each coordinate difference between rows in Mdl.X and the query matrix is scaled by dividing by the corresponding element of the standard deviation computed from Mdl.X. To specify another scaling, use the 'Scale' name-value pair argument.
'spearman'One minus the sample Spearman's rank correlation between observations (treated as sequences of values).

For more details, see Distance Metrics.

You can specify a function handle for a custom distance metric by using @ (for example, @distfun). A custom distance function must:

  • Have the form function D2 = distfun(ZI, ZJ).

  • Take as arguments:

    • A 1-by-K vector ZI containing a single row from X or from the query points Y, where K is the number of columns in X.

    • An m-by-K matrix ZJ containing multiple rows of X or Y, where m is a positive integer.

  • Return an m-by-1 vector of distance D2, where D2(j) is the distance between the observations ZI and ZJ(j,:).

The software does not use the distance metric for creating an ExhaustiveSearcher model object, so you can alter the distance metric by using dot notation after creating the object.

Example: 'Distance','mahalanobis'

Covariance matrix for the Mahalanobis distance metric, specified as the comma-separated pair consisting of 'Cov' and a K-by-K positive definite matrix, where K is the number of columns in X. This argument is valid only if 'Distance' is 'mahalanobis'.

Example: 'Cov',eye(3)

Data Types: single | double

Exponent for the Minkowski distance metric, specified as the comma-separated pair consisting of 'P' and a positive scalar. This argument is valid only if 'Distance' is 'minkowski'.

Example: 'P',3

Data Types: single | double

Scale parameter value for the standardized Euclidean distance metric, specified as the comma-separated pair consisting of 'Scale' and a nonnegative numeric vector of length K, where K is the number of columns in X. The software scales each difference between the training and query data using the corresponding element of Scale. This argument is valid only if 'Distance' is 'seuclidean'.

Example: 'Scale',quantile(X,0.75) - quantile(X,0.25)

Data Types: single | double

Properties

expand all

This property is read-only.

Training data that prepares the exhaustive searcher algorithm, specified as a numeric matrix. X has n rows, each corresponding to an observation (that is, an instance or example), and K columns, each corresponding to a predictor (that is, a feature).

The input argument X of createns or ExhaustiveSearcher sets this property.

Data Types: single | double

Distance metric used when you call knnsearch or rangesearch to find nearest neighbors for future query points, specified as a character vector or string scalar ('chebychev', 'cityblock', 'correlation', 'cosine', 'euclidean', 'hamming', 'jaccard', 'minkowski', 'mahalanobis', 'seuclidean', or 'spearman'), or a function handle.

The 'Distance' name-value pair argument of createns or ExhaustiveSearcher sets this property.

The software does not use the distance metric for creating an ExhaustiveSearcher model object, so you can alter it by using dot notation.

Distance metric parameter values, specified as empty ([]) or a positive scalar.

This table describes the distance parameters of the supported distance metrics.

Distance MetricParameter Description
'mahalanobis'

A positive definite matrix representing the covariance matrix used for computing the Mahalanobis distance. By default, the software sets the covariance using nancov(Mdl.X).

The 'Cov' name-value pair argument of createns or ExhaustiveSearcher sets this property.

You can alter DistParameter by using dot notation, for example, Mdl.DistParameter = CovNew, where CovNew is a K-by-K positive definite numeric matrix.

'minkowski'

A positive scalar indicating the exponent of the Minkowski distance. By default, the exponent is 2.

The 'P' name-value pair argument of createns or ExhaustiveSearcher sets this property.

You can alter DistParameter by using dot notation, for example, Mdl.DistParameter = PNew, where PNew is a positive scalar.

'seuclidean'

A positive numeric vector indicating the values used by the software to scale the predictors when computing the standardized Euclidean distance. By default, the software:

  1. Estimates the standard deviation of each predictor (column) of X using scale = nanstd(Mdl.X)

  2. Scales each coordinate difference between the rows in X and the query matrix by dividing by the corresponding element of scale

The 'Scale' name-value pair argument of createns or ExhaustiveSearcher sets this property.

You can alter DistParameter by using dot notation, for example, Mdl.DistParameter = sNew, where sNew is a K-dimensional positive numeric vector.

If Mdl.Distance is not one of the parameters listed in this table, then Mdl.DistParameter is [], which means that the specified distance metric formula has no parameters.

Data Types: single | double

Object Functions

knnsearchFind k-nearest neighbors using object
rangesearchFind all neighbors within specified distance using object

Examples

collapse all

Load Fisher's iris data set.

load fisheriris
X = meas;
[n,k] = size(X)
n = 150
k = 4

X has 150 observations and 4 predictors.

Prepare an exhaustive nearest neighbor searcher using the entire data set as training data.

Mdl1 = ExhaustiveSearcher(X)
Mdl1 = 
  ExhaustiveSearcher with properties:

         Distance: 'euclidean'
    DistParameter: []
                X: [150x4 double]

Mdl1 is an ExhaustiveSearcher model object, and its properties appear in the Command Window. The object contains information about the trained algorithm, such as the distance metric. You can alter property values using dot notation.

Alternatively, you can prepare an exhaustive nearest neighbor searcher by using createns and specifying 'exhaustive' as the search method.

Mdl2 = createns(X,'NSMethod','exhaustive')
Mdl2 = 
  ExhaustiveSearcher with properties:

         Distance: 'euclidean'
    DistParameter: []
                X: [150x4 double]

Mdl2 is also an ExhaustiveSearcher model object, and it is equivalent to Mdl1.

To search X for the nearest neighbors to a batch of query data, pass the ExhaustiveSearcher model object and the query data to knnsearch or rangesearch.

Load Fisher's iris data set. Focus on the petal dimensions.

load fisheriris
X = meas(:,[3 4]); % Predictors

Prepare an exhaustive nearest neighbor searcher. Specify the Mahalanobis distance metric.

Mdl = createns(X,'Distance','mahalanobis')
Mdl = 
  ExhaustiveSearcher with properties:

         Distance: 'mahalanobis'
    DistParameter: [2x2 double]
                X: [150x2 double]

Because the distance metric is Mahalanobis, createns creates an ExhaustiveSearcher model object by default.

Access properties of Mdl by using dot notation. For example, use Mdl.DistParameter to access the Mahalanobis covariance parameter.

Mdl.DistParameter
ans = 2×2

    3.1163    1.2956
    1.2956    0.5810

You can pass query data and Mdl to:

  • knnsearch to find indices and distances of nearest neighbors

  • rangesearch to find indices of all nearest neighbors within a distance that you specify

Create an ExhaustiveSearcher model object and alter the Distance property by using dot notation.

Load Fisher's iris data set.

load fisheriris
X = meas;

Train a default exhaustive searcher algorithm using the entire data set as training data.

Mdl = ExhaustiveSearcher(X)
Mdl = 
  ExhaustiveSearcher with properties:

         Distance: 'euclidean'
    DistParameter: []
                X: [150x4 double]

Specify that the neighbor searcher use the Mahalanobis metric to compute the distances between the training and query data.

Mdl.Distance = 'mahalanobis'
Mdl = 
  ExhaustiveSearcher with properties:

         Distance: 'mahalanobis'
    DistParameter: [4x4 double]
                X: [150x4 double]

You can pass Mdl and the query data to either knnsearch or rangesearch to find the nearest neighbors to the points in the query data based on the Mahalanobis distance.

Create an exhaustive searcher object by using the createns function. Pass the object and query data to the knnsearch function to find k-nearest neighbors.

Load Fisher's iris data set.

load fisheriris

Remove five irises randomly from the predictor data to use as a query set.

rng('default');             % For reproducibility
n = size(meas,1);           % Sample size
qIdx = randsample(n,5);     % Indices of query data
X = meas(~ismember(1:n,qIdx),:);
Y = meas(qIdx,:);

Prepare an exhaustive nearest neighbor searcher using the training data. Specify the Mahalanobis distance for finding nearest neighbors.

Mdl = createns(X,'Distance','mahalanobis')
Mdl = 
  ExhaustiveSearcher with properties:

         Distance: 'mahalanobis'
    DistParameter: [4x4 double]
                X: [145x4 double]

Because the distance metric is Mahalanobis, createns creates an ExhaustiveSearcher model object by default.

The software uses the covariance matrix of the predictors (columns) in the training data for computing the Mahalanobis distance. To display this value, use Mdl.DistParameter.

Mdl.DistParameter
ans = 4×4

    0.6547   -0.0368    1.2320    0.5026
   -0.0368    0.1914   -0.3227   -0.1193
    1.2320   -0.3227    3.0671    1.2842
    0.5026   -0.1193    1.2842    0.5800

Find the indices of the training data (Mdl.X) that are the two nearest neighbors of each point in the query data (Y).

IdxNN = knnsearch(Mdl,Y,'K',2)
IdxNN = 5×2

     5     6
    98    95
   104   128
   135    65
   102   115

Each row of IdxNN corresponds to a query data observation. The column order corresponds to the order of the nearest neighbors with respect to ascending distance. For example, based on the Mahalanobis metric, the second nearest neighbor of Y(3,:) is X(128,:).

Extended Capabilities

Introduced in R2010a