Fast k-NN in R^4 ?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Fast k-NN in R^4 ?

Martin Maechler
We are looking for a fast algorithm (implemented in C or even
available for R) to compute the k nearest neighbors of a given
point from n points in R^4 (i.e. 4-dim. Euclidean space).

Even more concretely, the n points in 4- space are in a (n x 4) matrix Y,
and he has m other points in 4-space as (m x 4) matrix X

The result should be an  (m x k) matrix Ik  where
Ik[i,] contains k indices in 1:n with the meaning that
Y[Ik[i,]] contains the k nearest neighbours of X[i,] in Y[,]

A simply implementation {also the one used inside of the
class package knn() function} will be of complexity O(m * n),
but the fast algorithms known really should provide O(m * log(n))
complexity.

I see that packages 'gstat' and 'spatstat' mention solutions to such
problems, but it seems this is just for 2-d point patterns, not
4-d or "general dim" ?

Pointers, hints, ... are very welcome.
       
Martin Maechler, Seminar fuer Statistik,
ETH (Federal Inst. Technology) Zurich, SWITZERLAND   <><



Reply | Threaded
Open this post in threaded view
|

Fast k-NN in R^4 ?

Roger Bivand
Administrator
On Tue, 6 Jul 2004, Martin Maechler wrote:

> We are looking for a fast algorithm (implemented in C or even
> available for R) to compute the k nearest neighbors of a given
> point from n points in R^4 (i.e. 4-dim. Euclidean space).
>
> Even more concretely, the n points in 4- space are in a (n x 4) matrix Y,
> and he has m other points in 4-space as (m x 4) matrix X
>
> The result should be an  (m x k) matrix Ik  where
> Ik[i,] contains k indices in 1:n with the meaning that
> Y[Ik[i,]] contains the k nearest neighbours of X[i,] in Y[,]
>
> A simply implementation {also the one used inside of the
> class package knn() function} will be of complexity O(m * n),
> but the fast algorithms known really should provide O(m * log(n))
> complexity.

Some time ago I ported ANN (http://www.cs.umd.edu/~mount/ANN/) to R as a
package, but never heard back from the authors when I wrote to ask for
permission to release. They claim to be about O(kd log n) time for
k-nearest neighbours in d dimensions (p. 31,
http://www.cs.umd.edu/~mount/Papers/dist.pdf) so this might be a starting
point. Please let me know if you'd like a copy of the draft package.

Roger

>
> I see that packages 'gstat' and 'spatstat' mention solutions to such
> problems, but it seems this is just for 2-d point patterns, not
> 4-d or "general dim" ?
>
> Pointers, hints, ... are very welcome.
>
> Martin Maechler, Seminar fuer Statistik,
> ETH (Federal Inst. Technology) Zurich, SWITZERLAND   <><
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/r-sig-geo
>

--
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Breiviksveien 40, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 93 93
e-mail: Roger.Bivand at nhh.no



Roger Bivand
Department of Economics
Norwegian School of Economics
Helleveien 30
N-5045 Bergen, Norway
Reply | Threaded
Open this post in threaded view
|

Fast k-NN in R^4 ?

Tim Keitt
In reply to this post by Martin Maechler
This is a really good resource:

http://www.cs.sunysb.edu/~algorith/major_section/1.6.shtml

T.

On Tue, 2004-07-06 at 11:20, Martin Maechler wrote:

> We are looking for a fast algorithm (implemented in C or even
> available for R) to compute the k nearest neighbors of a given
> point from n points in R^4 (i.e. 4-dim. Euclidean space).
>
> Even more concretely, the n points in 4- space are in a (n x 4) matrix Y,
> and he has m other points in 4-space as (m x 4) matrix X
>
> The result should be an  (m x k) matrix Ik  where
> Ik[i,] contains k indices in 1:n with the meaning that
> Y[Ik[i,]] contains the k nearest neighbours of X[i,] in Y[,]
>
> A simply implementation {also the one used inside of the
> class package knn() function} will be of complexity O(m * n),
> but the fast algorithms known really should provide O(m * log(n))
> complexity.
>
> I see that packages 'gstat' and 'spatstat' mention solutions to such
> problems, but it seems this is just for 2-d point patterns, not
> 4-d or "general dim" ?
>
> Pointers, hints, ... are very welcome.
>
> Martin Maechler, Seminar fuer Statistik,
> ETH (Federal Inst. Technology) Zurich, SWITZERLAND   <><
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://www.stat.math.ethz.ch/mailman/listinfo/r-sig-geo
--
Timothy H. Keitt
Section of Integrative Biology
University of Texas at Austin
http://www.keittlab.org/



Reply | Threaded
Open this post in threaded view
|

Summary: Fast k-NN in R^4 ?

Martin Maechler
In reply to this post by Martin Maechler
Summarizing the reactions & findings to my question:

1)  Timothy Keitt said:

   This is a really good resource:
   http://www.cs.sunysb.edu/~algorith/major_section/1.6.shtml

which points to http://www.cs.sunysb.edu/~algorith/files/nearest-neighbor.shtml
which then mentions implementations  ANN, Ranger, LEDA, (and
less relevant ones).  Ranger seems a standalone visualization.
LEDA is well known but a bit "big" for me which leaves ANN, see
below.

2) Roger Bivand has an experimental version of an R package
   'ann' which interfaces to ANN mentioned above, aka,
   http://www.cs.umd.edu/~mount/ANN/
   
   Note that "A" stands for "Approximate", but the software can
   also compute the exactly k-NN solution.

   Roger sent me the package privately and is currently travelling for a
   while.  I hope that I will be eventually encourage him (:-)
   to release 'ann' to CRAN.  AFAIK, one major obstacle has been the
   non-standard copyright/licence and the difficulty to make
   contact with the ANN authors.

3) Frank Hardisty has been interested in the related problem of
   'MST in arbitrary dimensions' where he said he found
   ``that there is no known MST solution that is better than O(n^[R/2]+1,
     where n is the number of points and R the number of dimensions.
     Which is fairly awful. The reference is: Okabe, A., B. Bots,
     K. Sugihara and S. N. Chiu (2000). Spatial Tesselations.  NY, Wiley.
     It's a good book. ''

   (and he also attached relevant papers which I haven't studied yet).

   My conclusion here: The multivariate MST seems a harder
   problem than finding multivariate k-NN.

----

Conclusion: I hope (with good reason) that there will soon be an
            R package 'ann' (or similarly called) interfacing to ANN.

Thanks again for your support!
Martin Maechler, ETH Zurich



Reply | Threaded
Open this post in threaded view
|

Fast k-NN in R^4 ?

Edzer J. Pebesma
In reply to this post by Martin Maechler
Martin, gstat uses quadtrees for points in up to 3 dimensions; I think
that quadtrees are generally inefficient for higher dimensions, where
k-D trees are more efficient.

The gstat quadtree code is "burried" rather deep, and not available
as a simple R function (although it could be made as such).

Best regards,
--
Edzer

Martin Maechler wrote:

>We are looking for a fast algorithm (implemented in C or even
>available for R) to compute the k nearest neighbors of a given
>point from n points in R^4 (i.e. 4-dim. Euclidean space).
>
>Even more concretely, the n points in 4- space are in a (n x 4) matrix Y,
>and he has m other points in 4-space as (m x 4) matrix X
>
>The result should be an  (m x k) matrix Ik  where
>Ik[i,] contains k indices in 1:n with the meaning that
>Y[Ik[i,]] contains the k nearest neighbours of X[i,] in Y[,]
>
>A simply implementation {also the one used inside of the
>class package knn() function} will be of complexity O(m * n),
>but the fast algorithms known really should provide O(m * log(n))
>complexity.
>
>I see that packages 'gstat' and 'spatstat' mention solutions to such
>problems, but it seems this is just for 2-d point patterns, not
>4-d or "general dim" ?
>
>Pointers, hints, ... are very welcome.
>
>Martin Maechler, Seminar fuer Statistik,
>ETH (Federal Inst. Technology) Zurich, SWITZERLAND   <><
>
>_______________________________________________
>R-sig-Geo mailing list
>R-sig-Geo at stat.math.ethz.ch
>https://www.stat.math.ethz.ch/mailman/listinfo/r-sig-geo
>  
>