faster then where possible? [message #66276] |
Thu, 07 May 2009 08:06  |
rogass
Messages: 200 Registered: April 2008
|
Senior Member |
|
|
Hi,
i'm searching for some alternative approaches to compute the following
"much" faster:
-> matrix1 has m columns and n rows, matrix2 has 2 columns and n rows
-> the values in matrix2 are NOT in matrix1, but within the min-max-
range of matrix1
szm1=size(matrix1,/dimensions)
szm2=size(matrix2,/dimensions)
index={ind:ptr_new()}
indices=replicate(index,szm2[1])
for j=0ull,szm1[1] do begin
helpindex= where(matrix1[*,j] ge matrix2[0,j] and matrix1[*,j] le
matrix2[1,j],c)
if c gt 0 then begin
indices[j] = ptr_new(uintarr(c))
(*indices)[j]=helpindex
endif else continue
endfor
It seems to be a typical Nearest-Neighbor-Problem, but all alternative
approaches I tried were always slower. Maybe someone here has a good
idea?
Thank you and best regards
Christian
|
|
|
Re: faster then where possible? [message #66436 is a reply to message #66276] |
Mon, 11 May 2009 02:00  |
rogass
Messages: 200 Registered: April 2008
|
Senior Member |
|
|
On 8 Mai, 16:14, cmanc...@gmail.com wrote:
> On May 8, 7:58 am, Jeremy Bailin <astroco...@gmail.com> wrote:
>
>
>
>
>
>> On May 7, 11:06 am, rog...@googlemail.com wrote:
>
>>> Hi,
>>> i'm searching for some alternative approaches to compute the following
>>> "much" faster:
>
>>> -> matrix1 has m columns and n rows, matrix2 has 2 columns and n rows
>>> -> the values in matrix2 are NOT in matrix1, but within the min-max-
>>> range of matrix1
>
>>> szm1=size(matrix1,/dimensions)
>>> szm2=size(matrix2,/dimensions)
>>> index={ind:ptr_new()}
>>> indices=replicate(index,szm2[1])
>
>>> for j=0ull,szm1[1] do begin
>>> helpindex= where(matrix1[*,j] ge matrix2[0,j] and matrix1[*,j] le
>>> matrix2[1,j],c)
>>> if c gt 0 then begin
>>> indices[j] = ptr_new(uintarr(c))
>>> (*indices)[j]=helpindex
>>> endif else continue
>>> endfor
>
>>> It seems to be a typical Nearest-Neighbor-Problem, but all alternative
>>> approaches I tried were always slower. Maybe someone here has a good
>>> idea?
>
>>> Thank you and best regards
>
>>> Christian
>
>> I don't suppose the data in the rows of matrix1 are sorted? If so, you
>> could use VALUE_LOCATE to figure out the bounds.
>
>> -Jeremy.
>
> Yeah, this sounds like a job for value_locate + histogram, which can
> combine to make histograms with irregularly spaced bins (which, I
> think, is basically what you're doing). To use value_locate you just
> have to have the list you are searching within sorted. This is how
> you would make a histogram with irregularly spaced bins:
>
> bin_mins = findgen(15)+randomu(seed,15)*.2 - .1
> vals_to_bin = randomu(seed,200)*15
>
> find = value_locate( bin_mins, vals_to_bin )
> hist = histogram( find, reverse_indices=ri )
> nbins = n_elements(hist)
>
> for i=0,nbins-1 do begin
> if ri[i+1] eq ri[i] then continue ; no data in this bin
> inds = ri[ ri[i]:ri[i+1]-1 ]
> endfor
>
> So you use value_locate to find which elements belong to which minimum
> value, and then you use reverse_indices to pick out the indexes of the
> elements in each bin. Again, this depends on your list of bin
> minimums being sorted. So if your minimum value from matrix2[0,j] is
> equal to the maximum value in matrix2[1,j] (in the sense that the
> maximum for one bin is the minimum for the next), then the above code
> will exactly solve your problem, and the inds variable above has the
> exact same content as the helpindex variable in your code. If the
> maximum value (matrix2[1,j]) is smaller, such that the maximum value
> in a bin is smaller than the minimum value of the next bin, then you
> can just throw an additional where() in the for loop above. Since you
> only have to search over a small subset of your data set, rather than
> the full data set, this should still be much faster. I.e. the for
> loop above would change to:
>
> for i=0,nbins-1 do begin
> if ri[i+1] eq ri[i] then continue ; no data in this bin
> inds = ri[ ri[i]:ri[i+1]-1 ]
> t = where( vals_to_bin[inds] lt some_max_value, c )
> helpindex = inds[t]
> endfor
>
> In the case that there is overlap between bins, such that the maximum
> for a bin is larger than the minimum of the next bin... well in that
> case the above code wouldn't work at all :( Value locate always puts
> each item in exactly one bin, so if things can potentially be in more
> than one bin it clearly won't work...- Zitierten Text ausblenden -
>
> - Zitierten Text anzeigen -
Dear all,
thank you for your suggestions, especially for the idea due to
value_locate. I will test it carefully.
Best regards
Christian
|
|
|
Re: faster then where possible? [message #66456 is a reply to message #66276] |
Fri, 08 May 2009 07:14  |
Conor
Messages: 138 Registered: February 2007
|
Senior Member |
|
|
On May 8, 7:58 am, Jeremy Bailin <astroco...@gmail.com> wrote:
> On May 7, 11:06 am, rog...@googlemail.com wrote:
>
>
>
>> Hi,
>> i'm searching for some alternative approaches to compute the following
>> "much" faster:
>
>> -> matrix1 has m columns and n rows, matrix2 has 2 columns and n rows
>> -> the values in matrix2 are NOT in matrix1, but within the min-max-
>> range of matrix1
>
>> szm1=size(matrix1,/dimensions)
>> szm2=size(matrix2,/dimensions)
>> index={ind:ptr_new()}
>> indices=replicate(index,szm2[1])
>
>> for j=0ull,szm1[1] do begin
>> helpindex= where(matrix1[*,j] ge matrix2[0,j] and matrix1[*,j] le
>> matrix2[1,j],c)
>> if c gt 0 then begin
>> indices[j] = ptr_new(uintarr(c))
>> (*indices)[j]=helpindex
>> endif else continue
>> endfor
>
>> It seems to be a typical Nearest-Neighbor-Problem, but all alternative
>> approaches I tried were always slower. Maybe someone here has a good
>> idea?
>
>> Thank you and best regards
>
>> Christian
>
> I don't suppose the data in the rows of matrix1 are sorted? If so, you
> could use VALUE_LOCATE to figure out the bounds.
>
> -Jeremy.
Yeah, this sounds like a job for value_locate + histogram, which can
combine to make histograms with irregularly spaced bins (which, I
think, is basically what you're doing). To use value_locate you just
have to have the list you are searching within sorted. This is how
you would make a histogram with irregularly spaced bins:
bin_mins = findgen(15)+randomu(seed,15)*.2 - .1
vals_to_bin = randomu(seed,200)*15
find = value_locate( bin_mins, vals_to_bin )
hist = histogram( find, reverse_indices=ri )
nbins = n_elements(hist)
for i=0,nbins-1 do begin
if ri[i+1] eq ri[i] then continue ; no data in this bin
inds = ri[ ri[i]:ri[i+1]-1 ]
endfor
So you use value_locate to find which elements belong to which minimum
value, and then you use reverse_indices to pick out the indexes of the
elements in each bin. Again, this depends on your list of bin
minimums being sorted. So if your minimum value from matrix2[0,j] is
equal to the maximum value in matrix2[1,j] (in the sense that the
maximum for one bin is the minimum for the next), then the above code
will exactly solve your problem, and the inds variable above has the
exact same content as the helpindex variable in your code. If the
maximum value (matrix2[1,j]) is smaller, such that the maximum value
in a bin is smaller than the minimum value of the next bin, then you
can just throw an additional where() in the for loop above. Since you
only have to search over a small subset of your data set, rather than
the full data set, this should still be much faster. I.e. the for
loop above would change to:
for i=0,nbins-1 do begin
if ri[i+1] eq ri[i] then continue ; no data in this bin
inds = ri[ ri[i]:ri[i+1]-1 ]
t = where( vals_to_bin[inds] lt some_max_value, c )
helpindex = inds[t]
endfor
In the case that there is overlap between bins, such that the maximum
for a bin is larger than the minimum of the next bin... well in that
case the above code wouldn't work at all :( Value locate always puts
each item in exactly one bin, so if things can potentially be in more
than one bin it clearly won't work...
|
|
|
Re: faster then where possible? [message #66460 is a reply to message #66276] |
Fri, 08 May 2009 04:58  |
Jeremy Bailin
Messages: 618 Registered: April 2008
|
Senior Member |
|
|
On May 7, 11:06 am, rog...@googlemail.com wrote:
> Hi,
> i'm searching for some alternative approaches to compute the following
> "much" faster:
>
> -> matrix1 has m columns and n rows, matrix2 has 2 columns and n rows
> -> the values in matrix2 are NOT in matrix1, but within the min-max-
> range of matrix1
>
> szm1=size(matrix1,/dimensions)
> szm2=size(matrix2,/dimensions)
> index={ind:ptr_new()}
> indices=replicate(index,szm2[1])
>
> for j=0ull,szm1[1] do begin
> helpindex= where(matrix1[*,j] ge matrix2[0,j] and matrix1[*,j] le
> matrix2[1,j],c)
> if c gt 0 then begin
> indices[j] = ptr_new(uintarr(c))
> (*indices)[j]=helpindex
> endif else continue
> endfor
>
> It seems to be a typical Nearest-Neighbor-Problem, but all alternative
> approaches I tried were always slower. Maybe someone here has a good
> idea?
>
> Thank you and best regards
>
> Christian
I don't suppose the data in the rows of matrix1 are sorted? If so, you
could use VALUE_LOCATE to figure out the bounds.
-Jeremy.
|
|
|