comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » Re: remove duplicate elements from a multi-dimensional array efficiently in IDL
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Switch to threaded view of this topic Create a new topic Submit Reply
Re: remove duplicate elements from a multi-dimensional array efficiently in IDL [message #66280] Wed, 06 May 2009 04:52
vino is currently offline  vino
Messages: 36
Registered: March 2008
Member
Hello Jeremy,

Thanks for your idea here... just when i was thinking how to do this,
i found this post....

Thanks to the OP as well!!

Regards,

Vino



On May 4, 7:15 pm, Jeremy Bailin <astroco...@gmail.com> wrote:
> On May 4, 11:12 am, "chenb...@gmail.com" <chenb...@gmail.com> wrote:
>
>>> Jeremy,
>
>>> Thanks for your kind and prompt help!
>>> It took my own routine 18 hours to do the job. I have just plug the
>>> codes you kindly offered into my codes, I'll let you know how
>>> efficient your routine is. Thanks!
>
>>> Bo
>
>> Hi Jeremy,
>
>> Your code helps me save 7 hours! That's a lot. Thanks!
>
>> Bo
>
> No problem! Glad it helped.
>
> -Jeremy.
Re: remove duplicate elements from a multi-dimensional array efficiently in IDL [message #66298 is a reply to message #66280] Mon, 04 May 2009 11:15 Go to previous message
Jeremy Bailin is currently offline  Jeremy Bailin
Messages: 618
Registered: April 2008
Senior Member
On May 4, 11:12 am, "chenb...@gmail.com" <chenb...@gmail.com> wrote:
>> Jeremy,
>
>>   Thanks for your kind and prompt help!
>>   It took my own routine 18 hours to do the job. I have just plug the
>> codes you kindly offered into my codes, I'll let you know how
>> efficient your routine is. Thanks!
>
>> Bo
>
> Hi Jeremy,
>
>   Your code helps me save 7 hours! That's a lot. Thanks!
>
> Bo

No problem! Glad it helped.

-Jeremy.
Re: remove duplicate elements from a multi-dimensional array efficiently in IDL [message #66300 is a reply to message #66298] Mon, 04 May 2009 08:12 Go to previous message
chenbo09@gmail.com is currently offline  chenbo09@gmail.com
Messages: 15
Registered: May 2009
Junior Member
On May 3, 12:47 pm, "chenb...@gmail.com" <chenb...@gmail.com> wrote:
> On May 2, 7:57 pm, Jeremy Bailin <astroco...@gmail.com> wrote:
>
>
>
>> On May 2, 8:47 pm, guillermo.castilla.castell...@gmail.com wrote:
>
>>> On May 1, 12:36 pm, Jeremy Bailin <astroco...@gmail.com> wrote:
>
>>>> On May 1, 1:47 pm, Jeremy Bailin <astroco...@gmail.com> wrote:
>
>>>> > On May 1, 12:13 pm, "chenb...@gmail.com" <chenb...@gmail.com> wrote:
>
>>>> > > Hello, everyone!
>
>>>> > > Is there anyone knows a routine in IDL that be capable to remove
>>>> > > duplicate elements from a multi-dimensional array efficiently? I 'm
>>>> > > now working with huge arrays, and I have written one by myself, it
>>>> > > works but is with low efficiency.
>
>>>> > > example of my problem:
>>>> > > the input array:
>>>> > > 1,10,9,100,200
>>>> > > 2,11,8,101,201
>>>> > > 2,11,8,101,201
>>>> > > 3,10,9,100,200
>>>> > > 4,7,12,99,199
>>>> > > 2,11,8,101,201
>
>>>> > > goal:
>>>> > > remove the duplicate elements with the same values for the second and
>>>> > > the third column.
>
>>>> > > expected output:
>>>> > > 1,10,9,100,200
>>>> > > 2,11,8,101,201
>>>> > > 4,7,12,99,199
>
>>>> > > Thanks for your help!
>
>>>> > > Bo
>
>>> If you don't have handy that ORD function Jeremy pointed out (I didn't
>>> know of it), and assuming your array is of byte type, you can do the
>>> following:
>
>>> input = [[1,10,9,100,200],[2,11,8,101,201],[2,11,8,101,201],$
>>> [3,10,9,100,200],[4,7,12,99,199],[2,11,8,101,201]]
>
>>> keep = Where(Histogram(1000L*input[1,*]+input[2,*], rev=r) GT 0)
>>> keep = r[r[keep]]
>>> print, input[*,keep[sort(keep)]]
>>> 1 10 9 100 200
>>> 2 11 8 101 201
>>> 4 7 12 99 199
>
>>> Cheers
>
>>> Guillermo
>
>> You can find ord at:
>
>> http://web.astroconst.org/jbiu/jbiu-doc/math/ord.html
>
>> -Jeremy.
>
> Jeremy,
>
> Thanks for your kind and prompt help!
> It took my own routine 18 hours to do the job. I have just plug the
> codes you kindly offered into my codes, I'll let you know how
> efficient your routine is. Thanks!
>
> Bo

Hi Jeremy,

Your code helps me save 7 hours! That's a lot. Thanks!

Bo
Re: remove duplicate elements from a multi-dimensional array efficiently in IDL [message #66311 is a reply to message #66300] Sun, 03 May 2009 10:54 Go to previous message
chenbo09@gmail.com is currently offline  chenbo09@gmail.com
Messages: 15
Registered: May 2009
Junior Member
On May 2, 7:47 pm, guillermo.castilla.castell...@gmail.com wrote:
> On May 1, 12:36 pm, Jeremy Bailin <astroco...@gmail.com> wrote:
>
>
>
>> On May 1, 1:47 pm, Jeremy Bailin <astroco...@gmail.com> wrote:
>
>>> On May 1, 12:13 pm, "chenb...@gmail.com" <chenb...@gmail.com> wrote:
>
>>>> Hello, everyone!
>
>>>> Is there anyone knows a routine in IDL that be capable to remove
>>>> duplicate elements from a multi-dimensional array efficiently? I 'm
>>>> now working with huge arrays, and I have written one by myself, it
>>>> works but is with low efficiency.
>
>>>> example of my problem:
>>>> the input array:
>>>> 1,10,9,100,200
>>>> 2,11,8,101,201
>>>> 2,11,8,101,201
>>>> 3,10,9,100,200
>>>> 4,7,12,99,199
>>>> 2,11,8,101,201
>
>>>> goal:
>>>> remove the duplicate elements with the same values for the second and
>>>> the third column.
>
>>>> expected output:
>>>> 1,10,9,100,200
>>>> 2,11,8,101,201
>>>> 4,7,12,99,199
>
>>>> Thanks for your help!
>
>>>> Bo
>
> If you don't have handy that ORD function Jeremy pointed out (I didn't
> know of it), and assuming your array is of byte type, you can do the
> following:
>
> input = [[1,10,9,100,200],[2,11,8,101,201],[2,11,8,101,201],$
> [3,10,9,100,200],[4,7,12,99,199],[2,11,8,101,201]]
>
> keep = Where(Histogram(1000L*input[1,*]+input[2,*], rev=r) GT 0)
> keep = r[r[keep]]
> print, input[*,keep[sort(keep)]]
> 1 10 9 100 200
> 2 11 8 101 201
> 4 7 12 99 199
>
> Cheers
>
> Guillermo
Hi Guillermo,

Thanks for your kind suggestion! Have a nice weekend!

Bo
Re: remove duplicate elements from a multi-dimensional array efficiently in IDL [message #66312 is a reply to message #66311] Sun, 03 May 2009 10:50 Go to previous message
chenbo09@gmail.com is currently offline  chenbo09@gmail.com
Messages: 15
Registered: May 2009
Junior Member
On May 2, 7:47 pm, guillermo.castilla.castell...@gmail.com wrote:
> On May 1, 12:36 pm, Jeremy Bailin <astroco...@gmail.com> wrote:
>
>
>
>> On May 1, 1:47 pm, Jeremy Bailin <astroco...@gmail.com> wrote:
>
>>> On May 1, 12:13 pm, "chenb...@gmail.com" <chenb...@gmail.com> wrote:
>
>>>> Hello, everyone!
>
>>>> Is there anyone knows a routine in IDL that be capable to remove
>>>> duplicate elements from a multi-dimensional array efficiently? I 'm
>>>> now working with huge arrays, and I have written one by myself, it
>>>> works but is with low efficiency.
>
>>>> example of my problem:
>>>> the input array:
>>>> 1,10,9,100,200
>>>> 2,11,8,101,201
>>>> 2,11,8,101,201
>>>> 3,10,9,100,200
>>>> 4,7,12,99,199
>>>> 2,11,8,101,201
>
>>>> goal:
>>>> remove the duplicate elements with the same values for the second and
>>>> the third column.
>
>>>> expected output:
>>>> 1,10,9,100,200
>>>> 2,11,8,101,201
>>>> 4,7,12,99,199
>
>>>> Thanks for your help!
>
>>>> Bo
>
> If you don't have handy that ORD function Jeremy pointed out (I didn't
> know of it), and assuming your array is of byte type, you can do the
> following:
>
> input = [[1,10,9,100,200],[2,11,8,101,201],[2,11,8,101,201],$
> [3,10,9,100,200],[4,7,12,99,199],[2,11,8,101,201]]
>
> keep = Where(Histogram(1000L*input[1,*]+input[2,*], rev=r) GT 0)
> keep = r[r[keep]]
> print, input[*,keep[sort(keep)]]
> 1 10 9 100 200
> 2 11 8 101 201
> 4 7 12 99 199
>
> Cheers
>
> Guillermo
Hi Guillermo,

Thanks for your suggestion! Have a nice weekend!

Bo
Re: remove duplicate elements from a multi-dimensional array efficiently in IDL [message #66313 is a reply to message #66312] Sun, 03 May 2009 10:47 Go to previous message
chenbo09@gmail.com is currently offline  chenbo09@gmail.com
Messages: 15
Registered: May 2009
Junior Member
On May 2, 7:57 pm, Jeremy Bailin <astroco...@gmail.com> wrote:
> On May 2, 8:47 pm, guillermo.castilla.castell...@gmail.com wrote:
>
>
>
>> On May 1, 12:36 pm, Jeremy Bailin <astroco...@gmail.com> wrote:
>
>>> On May 1, 1:47 pm, Jeremy Bailin <astroco...@gmail.com> wrote:
>
>>>> On May 1, 12:13 pm, "chenb...@gmail.com" <chenb...@gmail.com> wrote:
>
>>>> > Hello, everyone!
>
>>>> > Is there anyone knows a routine in IDL that be capable to remove
>>>> > duplicate elements from a multi-dimensional array efficiently? I 'm
>>>> > now working with huge arrays, and I have written one by myself, it
>>>> > works but is with low efficiency.
>
>>>> > example of my problem:
>>>> > the input array:
>>>> > 1,10,9,100,200
>>>> > 2,11,8,101,201
>>>> > 2,11,8,101,201
>>>> > 3,10,9,100,200
>>>> > 4,7,12,99,199
>>>> > 2,11,8,101,201
>
>>>> > goal:
>>>> > remove the duplicate elements with the same values for the second and
>>>> > the third column.
>
>>>> > expected output:
>>>> > 1,10,9,100,200
>>>> > 2,11,8,101,201
>>>> > 4,7,12,99,199
>
>>>> > Thanks for your help!
>
>>>> > Bo
>
>> If you don't have handy that ORD function Jeremy pointed out (I didn't
>> know of it), and assuming your array is of byte type, you can do the
>> following:
>
>> input = [[1,10,9,100,200],[2,11,8,101,201],[2,11,8,101,201],$
>> [3,10,9,100,200],[4,7,12,99,199],[2,11,8,101,201]]
>
>> keep = Where(Histogram(1000L*input[1,*]+input[2,*], rev=r) GT 0)
>> keep = r[r[keep]]
>> print, input[*,keep[sort(keep)]]
>> 1 10 9 100 200
>> 2 11 8 101 201
>> 4 7 12 99 199
>
>> Cheers
>
>> Guillermo
>
> You can find ord at:
>
> http://web.astroconst.org/jbiu/jbiu-doc/math/ord.html
>
> -Jeremy.

Jeremy,

Thanks for your kind and prompt help!
It took my own routine 18 hours to do the job. I have just plug the
codes you kindly offered into my codes, I'll let you know how
efficient your routine is. Thanks!

Bo
Re: remove duplicate elements from a multi-dimensional array efficiently in IDL [message #66318 is a reply to message #66313] Sat, 02 May 2009 17:57 Go to previous message
Jeremy Bailin is currently offline  Jeremy Bailin
Messages: 618
Registered: April 2008
Senior Member
On May 2, 8:47 pm, guillermo.castilla.castell...@gmail.com wrote:
> On May 1, 12:36 pm, Jeremy Bailin <astroco...@gmail.com> wrote:
>
>
>
>> On May 1, 1:47 pm, Jeremy Bailin <astroco...@gmail.com> wrote:
>
>>> On May 1, 12:13 pm, "chenb...@gmail.com" <chenb...@gmail.com> wrote:
>
>>>> Hello, everyone!
>
>>>> Is there anyone knows a routine in IDL that be capable to remove
>>>> duplicate elements from a multi-dimensional array efficiently? I 'm
>>>> now working with huge arrays, and I have written one by myself, it
>>>> works but is with low efficiency.
>
>>>> example of my problem:
>>>> the input array:
>>>> 1,10,9,100,200
>>>> 2,11,8,101,201
>>>> 2,11,8,101,201
>>>> 3,10,9,100,200
>>>> 4,7,12,99,199
>>>> 2,11,8,101,201
>
>>>> goal:
>>>> remove the duplicate elements with the same values for the second and
>>>> the third column.
>
>>>> expected output:
>>>> 1,10,9,100,200
>>>> 2,11,8,101,201
>>>> 4,7,12,99,199
>
>>>> Thanks for your help!
>
>>>> Bo
>
> If you don't have handy that ORD function Jeremy pointed out (I didn't
> know of it), and assuming your array is of byte type, you can do the
> following:
>
> input = [[1,10,9,100,200],[2,11,8,101,201],[2,11,8,101,201],$
>         [3,10,9,100,200],[4,7,12,99,199],[2,11,8,101,201]]
>
> keep = Where(Histogram(1000L*input[1,*]+input[2,*], rev=r) GT 0)
> keep = r[r[keep]]
> print, input[*,keep[sort(keep)]]
>        1      10       9     100     200
>        2      11       8     101     201
>        4       7      12      99     199
>
> Cheers
>
> Guillermo

You can find ord at:

http://web.astroconst.org/jbiu/jbiu-doc/math/ord.html

-Jeremy.
Re: remove duplicate elements from a multi-dimensional array efficiently in IDL [message #66319 is a reply to message #66318] Sat, 02 May 2009 17:47 Go to previous message
guillermo.castilla.ca is currently offline  guillermo.castilla.ca
Messages: 27
Registered: September 2008
Junior Member
On May 1, 12:36 pm, Jeremy Bailin <astroco...@gmail.com> wrote:
> On May 1, 1:47 pm, Jeremy Bailin <astroco...@gmail.com> wrote:
>
>
>
>> On May 1, 12:13 pm, "chenb...@gmail.com" <chenb...@gmail.com> wrote:
>
>>> Hello, everyone!
>
>>> Is there anyone knows a routine in IDL that be capable to remove
>>> duplicate elements from a multi-dimensional array efficiently? I 'm
>>> now working with huge arrays, and I have written one by myself, it
>>> works but is with low efficiency.
>
>>> example of my problem:
>>> the input array:
>>> 1,10,9,100,200
>>> 2,11,8,101,201
>>> 2,11,8,101,201
>>> 3,10,9,100,200
>>> 4,7,12,99,199
>>> 2,11,8,101,201
>
>>> goal:
>>> remove the duplicate elements with the same values for the second and
>>> the third column.
>
>>> expected output:
>>> 1,10,9,100,200
>>> 2,11,8,101,201
>>> 4,7,12,99,199
>
>>> Thanks for your help!
>
>>> Bo
>

If you don't have handy that ORD function Jeremy pointed out (I didn't
know of it), and assuming your array is of byte type, you can do the
following:

input = [[1,10,9,100,200],[2,11,8,101,201],[2,11,8,101,201],$
[3,10,9,100,200],[4,7,12,99,199],[2,11,8,101,201]]

keep = Where(Histogram(1000L*input[1,*]+input[2,*], rev=r) GT 0)
keep = r[r[keep]]
print, input[*,keep[sort(keep)]]
1 10 9 100 200
2 11 8 101 201
4 7 12 99 199

Cheers

Guillermo
Re: remove duplicate elements from a multi-dimensional array efficiently in IDL [message #66329 is a reply to message #66319] Fri, 01 May 2009 11:36 Go to previous message
Jeremy Bailin is currently offline  Jeremy Bailin
Messages: 618
Registered: April 2008
Senior Member
On May 1, 1:47 pm, Jeremy Bailin <astroco...@gmail.com> wrote:
> On May 1, 12:13 pm, "chenb...@gmail.com" <chenb...@gmail.com> wrote:
>
>
>
>> Hello, everyone!
>
>> Is there anyone knows a routine in IDL that be capable to remove
>> duplicate elements from a multi-dimensional array efficiently? I 'm
>> now working with huge arrays, and I have written one by myself, it
>> works but is with low efficiency.
>
>> example of my problem:
>> the input array:
>> 1,10,9,100,200
>> 2,11,8,101,201
>> 2,11,8,101,201
>> 3,10,9,100,200
>> 4,7,12,99,199
>> 2,11,8,101,201
>
>> goal:
>> remove the duplicate elements with the same values for the second and
>> the third column.
>
>> expected output:
>> 1,10,9,100,200
>> 2,11,8,101,201
>> 4,7,12,99,199
>
>> Thanks for your help!
>
>> Bo
>
> How's this:
>
> input = [[1,10,9,100,200],[2,11,8,101,201],[2,11,8,101,201],$
>   [3,10,9,100,200],[4,7,12,99,199],[2,11,8,101,201]]
>
> ; Step 1: Map your columns 2 and 3 into a single unique index
> (requires ORD from JBIU):
> col2ord = ord(input[1,*])
> col3ord = ord(input[2,*])
> index = col2ord + (max(col2ord)+1)*col3ord
>
> ; Step 2: Use histogram to find which ones have the same unique index
> h = histogram(index, reverse_indices=ri)
>
> ; Step 3: Get the first one in each bin, and put back in sorted order
> keep = ri[ri[where(h gt 0)]]
> keep = keep[sort(keep)]
>
> ; Step 4: Print them out:
> print, input[*,keep]
>
>        1      10       9     100     200
>        2      11       8     101     201
>        4       7      12      99     199
>
> -Jeremy.

Incidentally, if you're dealing with huge arrays and run into memory
problems with the histogram, you can replace:

index = col2ord + (max(col2ord)+1)*col3ord

with

index = ord(col2ord + (max(col2ord)+1)*col3ord)

which will make the histogram as compact as possible.

-Jeremy.
Re: remove duplicate elements from a multi-dimensional array efficiently in IDL [message #66330 is a reply to message #66329] Fri, 01 May 2009 10:47 Go to previous message
Jeremy Bailin is currently offline  Jeremy Bailin
Messages: 618
Registered: April 2008
Senior Member
On May 1, 12:13 pm, "chenb...@gmail.com" <chenb...@gmail.com> wrote:
> Hello, everyone!
>
> Is there anyone knows a routine in IDL that be capable to remove
> duplicate elements from a multi-dimensional array efficiently? I 'm
> now working with huge arrays, and I have written one by myself, it
> works but is with low efficiency.
>
> example of my problem:
> the input array:
> 1,10,9,100,200
> 2,11,8,101,201
> 2,11,8,101,201
> 3,10,9,100,200
> 4,7,12,99,199
> 2,11,8,101,201
>
> goal:
> remove the duplicate elements with the same values for the second and
> the third column.
>
> expected output:
> 1,10,9,100,200
> 2,11,8,101,201
> 4,7,12,99,199
>
> Thanks for your help!
>
> Bo

How's this:

input = [[1,10,9,100,200],[2,11,8,101,201],[2,11,8,101,201],$
[3,10,9,100,200],[4,7,12,99,199],[2,11,8,101,201]]

; Step 1: Map your columns 2 and 3 into a single unique index
(requires ORD from JBIU):
col2ord = ord(input[1,*])
col3ord = ord(input[2,*])
index = col2ord + (max(col2ord)+1)*col3ord

; Step 2: Use histogram to find which ones have the same unique index
h = histogram(index, reverse_indices=ri)

; Step 3: Get the first one in each bin, and put back in sorted order
keep = ri[ri[where(h gt 0)]]
keep = keep[sort(keep)]

; Step 4: Print them out:
print, input[*,keep]

1 10 9 100 200
2 11 8 101 201
4 7 12 99 199


-Jeremy.
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: Trying to run ENVI_FX_DOIT example
Next Topic: Re: Strange array division problem

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Wed Oct 08 13:52:54 PDT 2025

Total time taken to generate the page: 0.00461 seconds