Finding Intersection between Vector and Array
QUESTION: Are there any tricks to speed up the following problem. Suppose I have a 3 x 10000 array, and I want to find which rows match an arbitrary three-element array. I can do this by cycling through each column one at a time and doing an intersection:
row = [11, 12, 18] match = WHERE(array[0,*] EQ row AND $ array[1,*] EQ row AND $ array[2,*] EQ row)
But, this seems slow. Are there any tricks to make this run faster? I want to do this for a large image, so the overhead is going to be pretty significant if I can't find a trick.
ANSWER: The answer, as usual for this kind of problem, is provided by JD Smith.
The functions REBIN and TOTAL along a dimension does the trick:
match_rows = Where( Total(array EQ Rebin(row, 3, 10000, /SAMPLE), 1) EQ 3. )
What we really need is for ARRAY_EQUAL to allow a DIMENSION keyword, similar to MIN/MAX/MEDIAN etc. This would be a boolean short-circuiting array equal along any dimension. As soon as it finds an element along a particular dimension that isn't equal, it aborts and moves to the next set. This TOTAL trick is just a cheap way to do that.
You could also use MIN to accomplish the same thing:
match_rows = Where( Min(array EQ Rebin(row, 3, 10000, /SAMPLE), DIMENSION=1) )
This ensures the minimum along the short dimension is 1 (i.e. all elements match). You could even use PRODUCT:
match_rows = Where( Product(array EQ Rebin(row, 3, 10000, /SAMPLE), 1) )
since for the product to be 1, all must have been 1.
If you are really concerned about speed, you should test these against each other on your hardward/data and see which is faster. For long integers, on my machine, the TOTAL version is actually faster by about 20%. You can gain another 5-10% or so by using the following uglier version:
match_rows = Where( Total(array EQ Rebin(row, 3, 10000, /SAMPLE), 1, $ /PRESERVE_TYPE) EQ 3B )
This performs the TOTAL and comparison all in bytes, rather than converting to floats (which is unnecessary since the maximum total here is 3). Notice the use of SAMPLE to slightly speedup REBIN. Omitting it results in about a 5% hit.
By the way, for 3x10000, all of these are very fast, around 1/500 of a second, so it shouldn't give you any trouble for interactive use.
Copyright © 2006 David W. Fanning
Last Updated 5 February 2006