comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » Re: How to Sort/Uniq a list and keep its original index
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Switch to threaded view of this topic Create a new topic Submit Reply
Re: How to Sort/Uniq a list and keep its original index [message #50698] Thu, 12 October 2006 11:16 Go to next message
JD Smith is currently offline  JD Smith
Messages: 850
Registered: December 1999
Senior Member
On Wed, 11 Oct 2006 17:12:07 -0600, David Fanning wrote:
>
> I haven't tested this, but just off the top of my head:
>
> I = Where(Histogram(indexU, Min=0, Max=N_Elements(testTotal)) $
> EQ 0, count)

I think that will leave out one of the duplicates of each set (since
one of them by definition is unique).

If you're going to use HISTOGRAM, you could use it to do the whole
thing:

h=histogram(testTotal,REVERSE_INDICES=ri)
wh=where(h gt 1,cnt) ;; bins with duplicates
for i=0,cnt-1 do do_something_with,ri[ri[wh[i]]:ri[wh[i]+1]-1]

since it's faster than SORT for well-behaved data. Notice that I didn't
explicitly test for empty bins, since I'm only looping over those bins
with 2 or more entries. If most of your duplicate counts are low (2x, 3x,
etc.), you can see another big speedup by binning the resulting histogram.
Standard sparse data warnings apply.

If you want to use SORT anyway (for simplicity, or for instance
because the data could be very sparse), your could just do the
opposite of what UNIQ does:

indexDUP=where((test eq shift(test,-1)) OR (test eq shift(test,1)))

JD
Re: How to Sort/Uniq a list and keep its original index [message #50703 is a reply to message #50698] Thu, 12 October 2006 09:19 Go to previous messageGo to next message
Jean H. is currently offline  Jean H.
Messages: 472
Registered: July 2006
Senior Member
without the histogram, you could try:

tmp = lindgen(650000)
tmp[indexU] = -1
duplicate = tmp[where tmp ne -1)]

Jean

Dilkushi@gmail.com wrote:
> Dear all
> I have to sort a file with 650,000 records in search of duplicate
> records.. and I need a list of duplicates (not a list without
> duplicates)...
> indexS=sort(testTotal)
> test=testTotal[indexS]
> indexU=uniq(test)
>
> indexU is an index with no duplicates..
> how do I get an index pertaining to the duplicates only?..
> please help..
> thanks in advance
> dilkushi
>
Re: How to Sort/Uniq a list and keep its original index [message #50726 is a reply to message #50703] Wed, 11 October 2006 16:12 Go to previous messageGo to next message
David Fanning is currently offline  David Fanning
Messages: 11724
Registered: August 2001
Senior Member
Dilkushi@gmail.com writes:

> I have to sort a file with 650,000 records in search of duplicate
> records.. and I need a list of duplicates (not a list without
> duplicates)...
> indexS=sort(testTotal)
> test=testTotal[indexS]
> indexU=uniq(test)
>
> indexU is an index with no duplicates..
> how do I get an index pertaining to the duplicates only?..

I haven't tested this, but just off the top of my
head:

I = Where(Histogram(indexU, Min=0, Max=N_Elements(testTotal)) $
EQ 0, count)

Cheers,

David

--
David Fanning, Ph.D.
Fanning Software Consulting, Inc.
Coyote's Guide to IDL Programming: http://www.dfanning.com/
Sepore ma de ni thui. ("Perhaps thou speakest truth.")
Re: How to Sort/Uniq a list and keep its original index [message #50752 is a reply to message #50698] Wed, 18 October 2006 11:36 Go to previous message
Dilkushi@gmail.com is currently offline  Dilkushi@gmail.com
Messages: 21
Registered: August 2006
Junior Member
Thank you JD
this is waht i was looking for... perfect...
dilkushi

JD Smith wrote:

> On Wed, 11 Oct 2006 17:12:07 -0600, David Fanning wrote:
>>
>> I haven't tested this, but just off the top of my head:
>>
>> I = Where(Histogram(indexU, Min=0, Max=N_Elements(testTotal)) $
>> EQ 0, count)
>
> I think that will leave out one of the duplicates of each set (since
> one of them by definition is unique).
>
> If you're going to use HISTOGRAM, you could use it to do the whole
> thing:
>
> h=histogram(testTotal,REVERSE_INDICES=ri)
> wh=where(h gt 1,cnt) ;; bins with duplicates
> for i=0,cnt-1 do do_something_with,ri[ri[wh[i]]:ri[wh[i]+1]-1]
>
> since it's faster than SORT for well-behaved data. Notice that I didn't
> explicitly test for empty bins, since I'm only looping over those bins
> with 2 or more entries. If most of your duplicate counts are low (2x, 3x,
> etc.), you can see another big speedup by binning the resulting histogram.
> Standard sparse data warnings apply.
>
> If you want to use SORT anyway (for simplicity, or for instance
> because the data could be very sparse), your could just do the
> opposite of what UNIQ does:
>
> indexDUP=where((test eq shift(test,-1)) OR (test eq shift(test,1)))
>
> JD
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: Weirdest Error Ever
Next Topic: Re: vector of bin indices using histogram?

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Wed Oct 08 11:41:42 PDT 2025

Total time taken to generate the page: 0.00552 seconds