# Finding Intersection between Vector and Array

**QUESTION:** Are there any tricks to speed up the following
problem. Suppose I have a 3 x 10000 array, and I want to find which rows match an arbitrary three-element array. I can do this by cycling
through each column one at a time and doing an intersection:

row = [11, 12, 18] match = WHERE(array[0,*] EQ row[0] AND $ array[1,*] EQ row[1] AND $ array[2,*] EQ row[2])

But, this seems slow. Are there any tricks to make this run faster? I want to do this for a large image, so the overhead is going to be pretty significant if I can't find a trick.

** ANSWER:** The answer, as usual for this kind of problem, is provided by JD Smith.

The functions **REBIN **and **TOTAL **along a dimension does the trick:

match_rows = Where( Total(array EQ Rebin(row, 3, 10000, /SAMPLE), 1) EQ 3. )

What we really need is for **ARRAY_EQUAL **to allow a **DIMENSION **keyword,
similar to **MIN/MAX/MEDIAN** etc. This would be a boolean short-circuiting
array equal along any dimension. As soon as it finds an element along a
particular dimension that isn't equal, it aborts and moves to the next
set. This **TOTAL **trick is just a cheap way to do that.

You could also use **MIN **to accomplish the same thing:

match_rows = Where( Min(array EQ Rebin(row, 3, 10000, /SAMPLE), DIMENSION=1) )

This ensures the minimum along the short dimension is 1 (i.e. all
elements match). You could even use **PRODUCT**:

match_rows = Where( Product(array EQ Rebin(row, 3, 10000, /SAMPLE), 1) )

since for the product to be 1, all must have been 1.

If you are really concerned about speed, you should test these against
each other on your hardward/data and see which is faster. For long
integers, on my machine, the **TOTAL **version is actually faster by about
20%. You can gain another 5-10% or so by using the following uglier
version:

match_rows = Where( Total(array EQ Rebin(row, 3, 10000, /SAMPLE), 1, $ /PRESERVE_TYPE) EQ 3B )

This performs the **TOTAL **and comparison all in bytes, rather than
converting to floats (which is unnecessary since the maximum total here
is 3). Notice the use of **SAMPLE **to slightly speedup **REBIN**. Omitting
it results in about a 5% hit.

By the way, for 3x10000, all of these are very fast, around 1/500 of a second, so it shouldn't give you any trouble for interactive use.

Copyright © 2006 David W. Fanning

Last Updated 5 February 2006