# hnswSearcher

Hierarchical Navigable Small Worlds (HNSW) approximate nearest neighbor search

*Since R2024a*

## Description

An `hnswSearcher`

model object stores the training data, distance
metric, and parameter values of the distance metric for a Hierarchical Navigable Small Worlds
(HNSW) approximate nearest neighbor search. For an algorithm overview, see Approximate KNN Search Using Hierarchical Navigable Small Worlds (HNSW) Algorithm.

After you create an `hnswSearcher`

model object, you can use it to find
points in the training data that are close to the query data by performing a nearest neighbor
search using `knnsearch`

. The HNSW search algorithm is faster than
other KNN search algorithms. However, HNSW search results might not be exact.

## Creation

### Description

creates an HNSW searcher object based on the data in the matrix `Mdl`

= hnswSearcher(`X`

)`X`

. Each
row of `X`

represents one observation, and each column represents one
feature.

sets Properties and additional options
using name-value arguments. For example, to use the Minkowski distance, set
`Mdl`

= hnswSearcher(`X`

,`Name=Value`

)`Distance="minkowski"`

.

### Input Arguments

`X`

— Input data

numeric matrix

Input data, specified as a numeric matrix. The rows of `X`

correspond to observations, and the columns correspond to variables.

The value of `X`

sets the value of the `X`

property
in the model object.

**Data Types: **`single`

| `double`

**Name-Value Arguments**

Specify optional pairs of arguments as
`Name1=Value1,...,NameN=ValueN`

, where `Name`

is
the argument name and `Value`

is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.

**Example: **`Distance="minkowski",P=3`

specifies to use the Minkowski
distance metric with an exponent of 3.

`Distance`

— Distance metric

`"euclidean"`

(default) | character vector or string scalar

Distance metric used when you call `knnsearch`

to find nearest neighbors for future query points, specified
as a character vector or string scalar of the distance metric name.

The value of `Distance`

sets the value of the `Distance`

property in the model object.

Value | Description |
---|---|

`"chebychev"` | Chebychev distance (maximum coordinate difference) |

`"cityblock"` | City block distance |

`"correlation"` | One minus the sample linear correlation between observations (treated as sequences of values) |

`"cosine"` | One minus the cosine of the included angle between observations (treated as row vectors) |

`"euclidean"` | Euclidean distance |

`"hamming"` | Hamming distance, which is the percentage of coordinates that differ |

`"jaccard"` | One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ |

`"mahalanobis"` | Mahalanobis distance |

`"minkowski"` | Minkowski distance. The default exponent is 2. |

`"seuclidean"` | Standardized Euclidean distance |

`"spearman"` | One minus the sample Spearman's rank correlation between observations (treated as sequences of values) |

**Example: **`Distance="minkowski"`

**Data Types: **`char`

`Cov`

— Covariance matrix for Mahalanobis distance metric

`cov(X,"omitrows")`

(default) | positive definite matrix

Covariance matrix for the Mahalanobis distance metric, specified as a
*K*-by-*K* positive definite matrix, where
*K* is the number of columns in `X`

. This
argument is valid only when `Distance`

is
`"mahalanobis"`

.

The value of `Cov`

sets the value of the
`DistParameter`

property in the model object.

**Example: **`Cov=eye(3)`

**Data Types: **`single`

| `double`

`P`

— Exponent for Minkowski distance metric

`2`

(default) | positive scalar

Exponent for the Minkowski distance metric, specified as a positive scalar. This argument is
valid only when `Distance`

is
`"minkowski"`

.

The value of `P`

sets the value of the
`DistParameter`

property in the model
object.

**Example: **`P=3`

**Data Types: **`single`

| `double`

`Scale`

— Scale parameter value for standardized Euclidean distance metric

`std(X,"omitnan")`

(default) | nonnegative numeric vector

Scale parameter value for the standardized Euclidean distance metric, specified as a
nonnegative numeric vector of length *K*, where *K* is
the number of columns in `X`

. The software scales each difference
between the training and query data using the corresponding element of
`Scale`

. This argument is valid only when
`Distance`

is `"seuclidean"`

.

The value of `Scale`

sets the value of the
`DistParameter`

property in the model object.

**Example: **`Scale=quantile(X,0.75) - quantile(X,0.25)`

**Data Types: **`single`

| `double`

## Properties

You can set the `Distance`

, `MaxNumLinksPerNode`

, and
`TrainSetSize`

properties by using the name-value argument syntax when you
call `hnswSearcher`

. However, you cannot directly specify the
`DistParameter`

property. As noted earlier,
`DistParameter`

is set by the `Cov`

, `P`

,
or `Scale`

name-value argument.

`Distance`

— Distance metric

`'euclidean'`

(default) | character vector

This property is read-only.

Distance metric used when you call `knnsearch`

to find nearest neighbors for future query points, specified as
a character vector of the distance metric name.

Value | Description |
---|---|

`'chebychev'` | Chebychev distance (maximum coordinate difference) |

`'cityblock'` | City block distance |

`'correlation'` | One minus the sample linear correlation between observations (treated as sequences of values) |

`'cosine'` | One minus the cosine of the included angle between observations (treated as row vectors) |

`'euclidean'` | Euclidean distance |

`'hamming'` | Hamming distance, which is the percentage of coordinates that differ |

`'jaccard'` | One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ |

`'mahalanobis'` | Mahalanobis distance |

`'minkowski'` | Minkowski distance. The default exponent is 2. |

`'seuclidean'` | Standardized Euclidean distance |

`'spearman'` | One minus the sample Spearman's rank correlation between observations (treated as sequences of values) |

**Data Types: **`char`

`DistParameter`

— Distance metric parameter values

`[]`

| positive scalar | numeric vector | numeric matrix

This property is read-only.

Distance metric parameter values, specified as a positive scalar, numeric vector, or
numeric matrix. This property is nonempty only when you specify `Distance`

as
`"seuclidean"`

, `"minkowski"`

, or
`"mahalanobis"`

.

If

`Distance`

is`"seuclidean"`

,`DistParameter`

is a vector of scaling factors for each dimension, specified as a positive vector. The value is set by the`Scale`

name-value argument. The default value is`std(X,"omitnan")`

.If

`Distance`

is`"minkowski"`

,`DistParameter`

is the exponent of the Minkowski distance, specified as a positive scalar. The value is set by the`P`

name-value argument. The default value is 2.If

`Distance`

is`"mahalanobis"`

,`DistParameter`

is a covariance matrix, specified as a numeric matrix. The value is set by the`Cov`

name-value argument. The default value is`cov(X,"omitrows")`

.

**Data Types: **`single`

| `double`

`MaxNumLinksPerNode`

— Number of connections created for each node

`min(16,N)`

, where `N`

is the number
of rows in `X`

(default) | positive scalar

This property is read-only.

Number of connections created for each node, specified as a positive scalar.
Typically, set a number from about 5 to 48. `MaxNumLinksPerNode`

cannot
exceed `TrainSetSize`

.

**Data Types: **`double`

`TrainSetSize`

— Size of list for potential nearest neighbors

`200`

(default) | positive integer

This property is read-only.

Size of the list for potential nearest neighbors, specified as a positive
integer. Typically, set a number from about 100 to 2000.
`TrainSetSize`

must be at least
`MaxNumLinksPerNode`

and no more than
`N`

, the number of rows of
`X`

.

You can try to search for a good value of `TrainSetSize`

as
follows:

Specify a large enough value to achieve a very accurate search result (close to an exact search).

Run query tests with decreased values, until the accuracy reaches a reasonably good range (recall rate of 0.95–0.99).

**Data Types: **`double`

`X`

— Point data

real matrix

This property is read-only.

Point data, specified as a real matrix. Each row of `X`

represents
one point. The value of this property is set by the `X`

input
argument.

**Data Types: **`single`

| `double`

## Object Functions

`knnsearch` | Find k-nearest neighbors using searcher object |

## Examples

### Create HNSW Searcher

Create an HNSW searcher object using the data in a pseudorandom input matrix X.

rng default % For reproducibility X = 50*randn(1000,20); Mdl = hnswSearcher(X)

Mdl = hnswSearcher with properties: MaxNumLinksPerNode: 16 TrainSetSize: 200 Distance: 'euclidean' DistParameter: [] X: [1000x20 double]

`Mdl`

is an `hnswSearcher`

model object with default properties.

Find the closest data point to the point `ones(1,20)`

and display its distance.

[idx,D] = knnsearch(Mdl,ones(1,20))

idx = 174

D = 118.4583

Perform the same search using an exact KNN search, and compare the results.

[idx2,D2] = knnsearch(X,ones(1,20))

idx2 = 174

D2 = 118.4583

The approximate `hnswSearcher`

algorithm gives the same results as the exact KNN search algorithm

### Create HNSW Searcher with Nondefault Properties

Create an HNSW searcher object using the `"cityblock"`

distance and twice as many links per node as the default.

rng default % For reproducibility X = 50*randn(1000,20); Mdl = hnswSearcher(X,Distance="cityblock",MaxNumLinksPerNode=32)

Mdl = hnswSearcher with properties: MaxNumLinksPerNode: 32 TrainSetSize: 200 Distance: 'cityblock' DistParameter: [] X: [1000x20 double]

`Mdl`

has the specified nondefault properties.

Find the closest data point to the point `ones(1,20)`

and display its distance.

[idx,D] = knnsearch(Mdl,ones(1,20))

idx = 738

D = 420.2990

Perform the same search using an exact KNN search, and compare the results.

`[idx,D] = knnsearch(X,ones(1,20),Distance="cityblock")`

idx = 738

D = 420.2990

`hnswSearcher`

gives the same results as the exact searcher.

**HNSW Speed and Accuracy for Large Data**

Create a large data set and compare the speed of an HNSW searcher to the speed of an exhaustive searcher for 1000 query points.

Create the data.

rng default % For reproducibility A = diag(1:100); B = randn(100,10); K = kron(A,B); ss = size(K)

`ss = `*1×2*
10000 1000

The data `K`

has `1e4`

rows and `1e3`

columns.

Create an HNSW searcher object `h`

from the data `K`

.

tic; h = hnswSearcher(K); chnsw = toc

chnsw = 10.6544

Create 1000 random query points with 1000 features (columns). Specify to search for five nearest neighbors.

Y = randn(1000, 1000); tic; [idx, D] = knnsearch(h,Y,K=5); thnsw = toc

thnsw = 0.0797

Create an exhaustive searcher object `e`

from the data K.

tic e = ExhaustiveSearcher(K); ces = toc

ces = 0.0021

Creating an exhaustive searcher is much faster than creating an HNSW searcher.

Time the search using `e`

and compare the result to the time using the HNSW searcher `h`

.

tic; [idx0, D0] = knnsearch(e,Y,K=5); tes = toc

tes = 1.4841

In this case, the HNSW searcher computes about 20 times faster than the exhaustive searcher.

Determine how many results of the HNSW search differ in some way from the results of the exhaustive search.

v = abs(idx - idx0); % Count any difference in the five found entries vv = sum(v,2); % vv = 0 means no difference nz = nnz(vv) % Out of 1000 rows how many differ at all?

nz = 118

Here, 118 of 1000 HNSW search results differ from the exhaustive search results.

Try to improve the accuracy of the HNSW searcher by training with nondefault parameters. Specifically, use larger values for `MaxNumLinksPerNode`

and `TrainSetSize`

. These settings affect the speed of training and finding nearest neighbors.

tic; h2 = hnswSearcher(K,MaxNumLinksPerNode=48,TrainSetSize=2000); chnsw2 = toc

chnsw2 = 78.4836

With these parameters, the searcher takes about seven times as long to train.

tic; [idx2, D2] = knnsearch(h2,Y,K=5); thnsw2 = toc

thnsw2 = 0.1049

The speed of finding nearest neighbors using HNSW decreases, but is still more than ten times faster than the exhaustive search.

v = abs(idx2 - idx0); vv = sum(v,2); nz = nnz(vv)

nz = 57

For the slower but more accurate HNSW search, only 57 of 1000 results differ in any way from the exact results. Summarize the timing results in a table.

timet = table([chnsw;ces;chnsw2],[thnsw;tes;thnsw2],'RowNames',["HNSW";"Exhaustive";"HNSW2"],'VariableNames',["Creation" "Search"])

`timet=`*3×2 table*
Creation Search
_________ ________
HNSW 10.654 0.079741
Exhaustive 0.0021304 1.4841
HNSW2 78.484 0.10492

## Algorithms

The HNSW algorithm is described in Malkov and Yashunin [1]. A brief description of the algorithm appears in Approximate KNN Search Using Hierarchical Navigable Small Worlds (HNSW) Algorithm.

## Alternative Functionality

You can also create an HNSW searcher object by using `createns`

with the specification `NSMethod="hnsw"`

. The
resulting object is the same as the object you create using the
`hnswSearcher`

function.

## References

[1] Malkov, Yu. A., and D. A. Yashunin. *Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs.* Available at https://arxiv.org/abs/1603.09320.

## Version History

**Introduced in R2024a**

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)