createns
Create nearest neighbor searcher object
Description
creates an NS
= createns(X
)ExhaustiveSearcher
model object or a
KDTreeSearcher
model object using the
n-by-K numeric matrix of the training data
X
.
specifies additional options using one or more name-value arguments. For example,
you can set NS
= createns(X
,Name=Value
)NSMethod
to specify the type of object to create,
such as NS = createns(X,NSMethod="hnsw")
to create an hnswSearcher
model object.
Examples
Train Default Exhaustive Nearest Neighbor Searcher
Load Fisher's iris data set.
load fisheriris
X = meas;
[n,k] = size(X)
n = 150
k = 4
X
has 150 observations and 4 predictors.
Prepare an exhaustive nearest neighbor searcher using the entire data set as training data.
Mdl1 = ExhaustiveSearcher(X)
Mdl1 = ExhaustiveSearcher with properties: Distance: 'euclidean' DistParameter: [] X: [150x4 double]
Mdl1
is an ExhaustiveSearcher
model object, and its properties appear in the Command Window. The object contains information about the trained algorithm, such as the distance metric. You can alter property values using dot notation.
Alternatively, you can prepare an exhaustive nearest neighbor searcher by using createns
and specifying 'exhaustive'
as the search method.
Mdl2 = createns(X,'NSMethod','exhaustive')
Mdl2 = ExhaustiveSearcher with properties: Distance: 'euclidean' DistParameter: [] X: [150x4 double]
Mdl2
is also an ExhaustiveSearcher
model object, and it is equivalent to Mdl1
.
To search X
for the nearest neighbors to a batch of query data, pass the ExhaustiveSearcher
model object and the query data to knnsearch
or rangesearch
.
Grow Default Kd-Tree
Grow a four-dimensional Kd-tree that uses the Euclidean distance.
Load Fisher's iris data set.
load fisheriris
X = meas;
[n,k] = size(X)
n = 150
k = 4
X
has 150 observations and 4 predictors.
Grow a four-dimensional Kd-tree using the entire data set as training data.
Mdl1 = KDTreeSearcher(X)
Mdl1 = KDTreeSearcher with properties: BucketSize: 50 Distance: 'euclidean' DistParameter: [] X: [150x4 double]
Mdl1
is a KDTreeSearcher
model object, and its properties appear in the Command Window. The object contains information about the grown four-dimensional Kd-tree, such as the distance metric. You can alter property values using dot notation.
Alternatively, you can grow a Kd-tree by using createns
.
Mdl2 = createns(X)
Mdl2 = KDTreeSearcher with properties: BucketSize: 50 Distance: 'euclidean' DistParameter: [] X: [150x4 double]
Mdl2
is also a KDTreeSearcher
model object, and it is equivalent to Mdl1
. Because X
has four columns and the default distance metric is Euclidean, createns
creates a KDTreeSearcher
model by default.
To find the nearest neighbors in X
to a batch of query data, pass the KDTreeSearcher
model object and the query data to knnsearch
or rangesearch
.
Grow Kd-Tree Using Minkowski Distance Metric
Grow a Kd-tree that uses the Minkowski distance with an exponent of five.
Load Fisher's iris data set. Create a variable for the petal dimensions.
load fisheriris
X = meas(:,3:4);
Grow a Kd-tree. Specify the Minkowski distance with an exponent of five.
Mdl = createns(X,'Distance','minkowski','P',5)
Mdl = KDTreeSearcher with properties: BucketSize: 50 Distance: 'minkowski' DistParameter: 5 X: [150x2 double]
Because X
has two columns and the distance metric is Minkowski, createns
creates a KDTreeSearcher
model object by default.
Search for Nearest Neighbors of Query Data Using Mahalanobis Distance
Create an exhaustive searcher object by using the createns
function. Pass the object and query data to the knnsearch
function to find k-nearest neighbors.
Load Fisher's iris data set.
load fisheriris
Remove five irises randomly from the predictor data to use as a query set.
rng('default'); % For reproducibility n = size(meas,1); % Sample size qIdx = randsample(n,5); % Indices of query data X = meas(~ismember(1:n,qIdx),:); Y = meas(qIdx,:);
Prepare an exhaustive nearest neighbor searcher using the training data. Specify the Mahalanobis distance for finding nearest neighbors.
Mdl = createns(X,'Distance','mahalanobis')
Mdl = ExhaustiveSearcher with properties: Distance: 'mahalanobis' DistParameter: [4x4 double] X: [145x4 double]
Because the distance metric is Mahalanobis, createns
creates an ExhaustiveSearcher
model object by default.
The software uses the covariance matrix of the predictors (columns) in the training data for computing the Mahalanobis distance. To display this value, use Mdl.DistParameter
.
Mdl.DistParameter
ans = 4×4
0.6547 -0.0368 1.2320 0.5026
-0.0368 0.1914 -0.3227 -0.1193
1.2320 -0.3227 3.0671 1.2842
0.5026 -0.1193 1.2842 0.5800
Find the indices of the training data (Mdl.X
) that are the two nearest neighbors of each point in the query data (Y
).
IdxNN = knnsearch(Mdl,Y,'K',2)
IdxNN = 5×2
5 6
98 95
104 128
135 65
102 115
Each row of IdxNN
corresponds to a query data observation. The column order corresponds to the order of the nearest neighbors with respect to ascending distance. For example, based on the Mahalanobis metric, the second nearest neighbor of Y(3,:)
is X(128,:)
.
Input Arguments
X
— Training data
numeric matrix
Training data, specified as a numeric matrix. X
has
n rows, each corresponding to an observation (that
is, an instance or example), and K columns, each
corresponding to a predictor (that is, a feature).
Data Types: single
| double
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: NS = createns(X,'Distance','mahalanobis')
creates an
ExhaustiveSearcher
model object that uses the Mahalanobis
distance metric when searching for nearest neighbors.
NSMethod
— Nearest neighbor search method
"kdtree"
| "exhaustive"
| "hnsw"
Nearest neighbor search method used to define the type of object
created, specified as "kdtree"
,
"exhaustive"
, or "hnsw"
.
"kdtree"
—createns
creates aKDTreeSearcher
model object using the Kd-tree algorithm."exhaustive"
—createns
creates anExhaustiveSearcher
model object using the exhaustive search algorithm."hnsw"
—createns
creates anhnswSearcher
model object using the Hierarchical Navigable Small Worlds (HNSW) algorithm. AnhnswSearcher
takes longer to create than the other model types, but performs the search more quickly. Typically, the results of anhnswSearcher
are approximate rather than exact results.
The default value is "kdtree"
when these three
conditions are true:
Otherwise, the default value is
"exhaustive"
.
Example: NSMethod="exhaustive"
Data Types: char
| string
Distance
— Distance metric
'euclidean'
(default) | character vector or string scalar of distance metric | function handle
Distance metric used when you call knnsearch
or rangesearch
to find nearest
neighbors for future query points, specified as a character vector or
string scalar of the distance metric name, or a function handle.
For all types of nearest neighbor searchers,
createns
supports these distance
metrics.
Value | Description |
---|---|
'chebychev' | Chebychev distance (maximum coordinate difference) |
'cityblock' | City block distance |
'euclidean' | Euclidean distance |
'minkowski' | Minkowski distance. The default exponent is 2. To specify a different exponent, use the
'P' name-value
argument. |
If createns
uses the exhaustive search
algorithm ('NSMethod'
is
'exhaustive'
), then
createns
also supports these distance
metrics.
Value | Description |
---|---|
'correlation' | One minus the sample linear correlation between observations (treated as sequences of values) |
'cosine' | One minus the cosine of the included angle between observations (treated as row vectors) |
'fasteuclidean' | Euclidean distance computed by using an alternative
algorithm that saves time when the number of predictors
is at least 10. In some cases, this faster algorithm can
reduce accuracy. Algorithms starting with
'fast' do not support sparse
data. For details, see Algorithms. |
'fastseuclidean' | Standardized Euclidean distance computed by using an
alternative algorithm that saves time when the number of
predictors is at least 10. In some cases, this faster
algorithm can reduce accuracy. Algorithms starting with
'fast' do not support sparse
data. For details, see Algorithms. |
'hamming' | Hamming distance, which is the percentage of coordinates that differ |
'jaccard' | One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ |
'mahalanobis' | Mahalanobis distance |
'seuclidean' | Standardized Euclidean distance |
'spearman' | One minus the sample Spearman's rank correlation between observations (treated as sequences of values) |
The HNSW algorithm (NSMethod="hnsw"
) supports the
same distance metrics listed for the exhaustive search algorithm except
those starting with fast
:
"fasteuclidean"
and
"fastseuclidean"
. The "hnsw"
algorithm does not support custom distance metrics.
If createns
uses the exhaustive search algorithm
('NSMethod'
is
'exhaustive'
), then you can also specify a function
handle for a custom distance metric by using @
(for
example, @distfun
). A custom distance function must:
Have the form
function D2 = distfun(ZI,ZJ)
.Take as arguments:
A 1-by-K vector
ZI
containing a single row fromX
or from the query pointsY
, where K is the number of columns inX
.An m-by-K matrix
ZJ
containing multiple rows ofX
orY
, where m is a positive integer.
Return an m-by-1 vector of distances
D2
, whereD2(
is the distance between the observationsj
)ZI
andZJ(
.j
,:)
For more details, see Distance Metrics.
Example: 'Distance','minkowski'
Data Types: char
| string
| function_handle
P
— Exponent for Minkowski distance metric
2
(default) | positive scalar
Exponent for the Minkowski distance metric, specified as a positive scalar. This argument is
valid only when Distance
is
"minkowski"
.
The value of P
sets the value of the
DistParameter
property in the model
object.
Example: P=3
Data Types: single
| double
Cov
— Covariance matrix for Mahalanobis distance metric
cov(X,"omitrows")
(default) | positive definite matrix
Covariance matrix for the Mahalanobis distance metric, specified as a
K-by-K positive definite matrix, where
K is the number of columns in X
. This
argument is valid only when Distance
is
"mahalanobis"
.
The value of Cov
sets the value of the
DistParameter
property in the model object.
Example: Cov=eye(3)
Data Types: single
| double
Scale
— Scale parameter value for standardized Euclidean distance metric
std(X,"omitnan")
(default) | nonnegative numeric vector
Scale parameter value for the standardized Euclidean distance metric, specified as a
nonnegative numeric vector of length K, where K is
the number of columns in X
. The software scales each difference
between the training and query data using the corresponding element of
Scale
. This argument is valid only when
Distance
is "seuclidean"
.
The value of Scale
sets the value of the
DistParameter
property in the model object.
Example: Scale=quantile(X,0.75) - quantile(X,0.25)
Data Types: single
| double
BucketSize
— Maximum number of data points in each leaf node
50
(default) | positive integer
Maximum number of data points in each leaf node of the
Kd-tree, specified as the comma-separated pair
consisting of 'BucketSize'
and a positive
integer.
This argument is valid only when you create a
KDTreeSearcher
model object.
Example: 'BucketSize',10
Data Types: single
| double
MaxNumLinksPerNode
— Number of connections created for each node
min(16,N)
, where N
is the number of rows in X
(default) | positive scalar
This property is read-only.
Number of connections created for each node, specified as a positive scalar.
Typically, set a number from about 5 to 48. MaxNumLinksPerNode
cannot
exceed TrainSetSize
.
Data Types: double
TrainSetSize
— Size of list for potential nearest neighbors
200
(default) | positive integer
This property is read-only.
Size of the list for potential nearest neighbors, specified as a positive
integer. Typically, set a number from about 100 to 2000.
TrainSetSize
must be at least
MaxNumLinksPerNode
and no more than
N
, the number of rows of
X
.
You can try to search for a good value of TrainSetSize
as
follows:
Specify a large enough value to achieve a very accurate search result (close to an exact search).
Run query tests with decreased values, until the accuracy reaches a reasonably good range (recall rate of 0.95–0.99).
Data Types: double
Output Arguments
NS
— Nearest neighbor searcher
ExhaustiveSearcher
model object | KDTreeSearcher
model object | hnswSearcher
model object
Nearest neighbor searcher, returned as an ExhaustiveSearcher
, KDTreeSearcher
, or hnswSearcher
model object.
After you create a nearest neighbor searcher model object, you can use it
to find points in the training data that are close to the query data by
performing a nearest neighbor search using knnsearch
. For
ExhaustiveSearcher
or
KDTreeSearcher
model objects only, you can also
perform a radius search using rangesearch
.
Algorithms
Fast Euclidean Distance Algorithm
The values of the Distance
argument that begin fast
(such as 'fasteuclidean'
and 'fastseuclidean'
)
calculate Euclidean distances using an algorithm that uses extra memory to save
computational time. This algorithm is named "Euclidean Distance Matrix Trick" in Albanie
[1] and elsewhere. Internal
testing shows that this algorithm saves time when the number of predictors is at least 10.
Algorithms starting with 'fast'
do not support sparse data.
To find the matrix D of distances between all the points xi and xj, where each xi has n variables, the algorithm computes distance using the final line in the following equations:
The matrix in the last line of the equations is called the Gram matrix. Computing the set of squared distances is faster, but slightly less numerically stable, when you compute and use the Gram matrix instead of computing the squared distances by squaring and summing. For a discussion, see Albanie [1].
References
[1] Albanie, Samuel. Euclidean Distance Matrix Trick. June, 2019. Available at https://www.robots.ox.ac.uk/%7Ealbanie/notes/Euclidean_distance_trick.pdf.
Version History
Introduced in R2010aR2024a: Create Hierarchical Navigable Small Worlds (HNSW) searcher
Create an HNSW searcher by specifying NSMethod="hnsw"
when
creating the searcher model object. The HNSW algorithm is an approximate nearest
neighbor searcher that searches more quickly than other searcher types. For details,
see hnswSearcher
.
An HNSW searcher supports the same Distance
metrics listed
for the exhaustive search algorithm except those starting with
fast
: "fasteuclidean"
and
"fastseuclidean"
. HNSW does not support custom distance
metrics.
R2023a: Fast Euclidean distance using a cache
The 'fasteuclidean'
and 'fastseuclidean'
distance metrics accelerate the computation of Euclidean distances by using a cache
and a different algorithm (see Algorithms).
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: United States.
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)