comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » Sorting a matrix
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Switch to threaded view of this topic Create a new topic Submit Reply
Sorting a matrix [message #83346] Fri, 01 March 2013 04:45 Go to next message
Mats Löfdahl is currently offline  Mats Löfdahl
Messages: 263
Registered: January 2012
Senior Member
I'd like to sort a matrix M in such a way that the order of the rows is determined by the value in the first column. When the first column values are the same, the second column should be used, etc.

Something like Craig Marqwardt's multisort (http://cow.physics.wisc.edu/~craigm/idl/down/multisort.pro), with M[0,*] used as key1, M[1,*] as key2, etc. But without actually having to specify the keys one by one. (Never mind which index counts as the column index :o)

As the matrices I have in mind right now are integer arrays and do not have that many possible values (just -1,0,1), I thought about turning the matrix into a 1D string array with the length of each string equal to the number of columns and translating the column values into characters in the string, like A for -1, B for 0, C for 1, and then sorting the string array. But I'd prefer a more general program, in case there is one out there.

Pointers, ideas?
Re: Sorting a matrix [message #83421 is a reply to message #83346] Fri, 01 March 2013 08:49 Go to previous messageGo to next message
Mats Löfdahl is currently offline  Mats Löfdahl
Messages: 263
Registered: January 2012
Senior Member
Den fredagen den 1:e mars 2013 kl. 17:28:07 UTC+1 skrev Jeremy Bailin:
> On 3/1/13 7:45 AM, Mats Löfdahl wrote:
>
>> I'd like to sort a matrix M in such a way that the order of the rows is determined by the value in the first column. When the first column values are the same, the second column should be used, etc.
>
>>
>
>> Something like Craig Marqwardt's multisort (http://cow.physics.wisc.edu/~craigm/idl/down/multisort.pro), with M[0,*] used as key1, M[1,*] as key2, etc. But without actually having to specify the keys one by one. (Never mind which index counts as the column index :o)
>
>>
>
>> As the matrices I have in mind right now are integer arrays and do not have that many possible values (just -1,0,1), I thought about turning the matrix into a 1D string array with the length of each string equal to the number of columns and translating the column values into characters in the string, like A for -1, B for 0, C for 1, and then sorting the string array. But I'd prefer a more general program, in case there is one out there.
>
>>
>
>> Pointers, ideas?
>
>>
>
>
>
> I've done something like this before by generating a single unique
>
> index... something like this:
>
>
>
> matrix = [[4,5,6], [4,6,8], [2,3,4], [4,6,7]]
>
>
>
> matrixshape = size(matrix, /dimen)
>
> ; this gives you the range of each column:
>
> matrixord = lonarr(matrixshape)
>
> for i=0l,matrixshape[0]-1 do matrixord[i,*] = ord(matrix[i,*])
>
> ordmax = max(matrixord, dimen=2)
>
>
>
> ; what do you need to multiply by to get a unique range?
>
> column_multiply = [reverse(product(reverse(ordmax[1:*]+1), /int,
>
> /cumul)), 1]
>
>
>
> ; create a unique key and sort on it
>
> sortkey = total(matrixord * rebin(column_multiply,matrixshape, /sample),
>
> /int, 1)
>
> newmatrix = matrix[*, sort(sortkey)]
>
>
>
> IDL> print, newmatrix
>
> 2 3 4
>
> 4 5 6
>
> 4 6 7
>
> 4 6 8
>
>
>
> You'll need ORD(), which is part of JBIU which is currently inaccessible
>
> because I'm in the process of moving domains. But here's a stub:
>
>
>
> function ord, values
>
>
>
> nvalues=n_elements(values)
>
> sortvalues = sort(values)
>
> uniqvalues = uniq(values[sortvalues])
>
>
>
> nuniq = n_elements(uniqvalues)
>
> ordlist = lindgen(nuniq)
>
>
>
> ; this is basically the histogram(total(/cumulative)) trick
>
> h = histogram(uniqvalues,bin=1,min=0,reverse=ri)
>
> outp = lonarr(size(values, /dimen))
>
> outp[sortvalues] = ordlist[ri[0:nvalues-1]-ri[0]]
>
>
>
> return, outp
>
>
>
> end

Oh, this looks pretty clever. I think I understand the idea but I'll have to digest the details. If I'm right, you find out for each column how many different values there are and then multiply the index for that column with the proper number to avoid overlap with the other column indices when adding.

Thanks!
Re: Sorting a matrix [message #83422 is a reply to message #83346] Fri, 01 March 2013 08:28 Go to previous messageGo to next message
Jeremy Bailin is currently offline  Jeremy Bailin
Messages: 618
Registered: April 2008
Senior Member
On 3/1/13 7:45 AM, Mats L�fdahl wrote:
> I'd like to sort a matrix M in such a way that the order of the rows is determined by the value in the first column. When the first column values are the same, the second column should be used, etc.
>
> Something like Craig Marqwardt's multisort (http://cow.physics.wisc.edu/~craigm/idl/down/multisort.pro), with M[0,*] used as key1, M[1,*] as key2, etc. But without actually having to specify the keys one by one. (Never mind which index counts as the column index :o)
>
> As the matrices I have in mind right now are integer arrays and do not have that many possible values (just -1,0,1), I thought about turning the matrix into a 1D string array with the length of each string equal to the number of columns and translating the column values into characters in the string, like A for -1, B for 0, C for 1, and then sorting the string array. But I'd prefer a more general program, in case there is one out there.
>
> Pointers, ideas?
>

I've done something like this before by generating a single unique
index... something like this:

matrix = [[4,5,6], [4,6,8], [2,3,4], [4,6,7]]

matrixshape = size(matrix, /dimen)
; this gives you the range of each column:
matrixord = lonarr(matrixshape)
for i=0l,matrixshape[0]-1 do matrixord[i,*] = ord(matrix[i,*])
ordmax = max(matrixord, dimen=2)

; what do you need to multiply by to get a unique range?
column_multiply = [reverse(product(reverse(ordmax[1:*]+1), /int,
/cumul)), 1]

; create a unique key and sort on it
sortkey = total(matrixord * rebin(column_multiply,matrixshape, /sample),
/int, 1)
newmatrix = matrix[*, sort(sortkey)]

IDL> print, newmatrix
2 3 4
4 5 6
4 6 7
4 6 8

You'll need ORD(), which is part of JBIU which is currently inaccessible
because I'm in the process of moving domains. But here's a stub:

function ord, values

nvalues=n_elements(values)
sortvalues = sort(values)
uniqvalues = uniq(values[sortvalues])

nuniq = n_elements(uniqvalues)
ordlist = lindgen(nuniq)

; this is basically the histogram(total(/cumulative)) trick
h = histogram(uniqvalues,bin=1,min=0,reverse=ri)
outp = lonarr(size(values, /dimen))
outp[sortvalues] = ordlist[ri[0:nvalues-1]-ri[0]]

return, outp

end
Re: Sorting a matrix [message #83434 is a reply to message #83346] Thu, 07 March 2013 05:32 Go to previous messageGo to next message
Mats Löfdahl is currently offline  Mats Löfdahl
Messages: 263
Registered: January 2012
Senior Member
Den onsdagen den 6:e mars 2013 kl. 06:22:01 UTC+1 skrev Jeremy Bailin:
>> Oh, this looks pretty clever. I think I understand the idea but I'll have to digest the details. If I'm right, you find out for each column how many different values there are and then multiply the index for that column with the proper number to avoid overlap with the other column indices when adding.
>
> Yes, that's exactly right!

OK, good. But I have problems with the ord() function. When called with an array of only zeros, the call to histogram() does not work:

HISTOGRAM: Expression must be an array in this context: UNIQVALUES.
% Execution halted at: ORD 14

Where line 14 is

h = histogram(uniqvalues,bin=1,min=0,reverse=ri)

The reason is that when there is only one value, uniqvalues has only one element and is therefore turned into is a scalar.

The call to histogram works if I chance it to

h = histogram([uniqvalues],bin=1,min=0,reverse=ri)

Could this have bad consequences in other cases or is this what I should do? You described the ord() code you gave as a stub. Does this mean there is a more complete version that maybe checks for this problem (and other problems as well)?
Re: Sorting a matrix [message #83446 is a reply to message #83421] Tue, 05 March 2013 21:22 Go to previous messageGo to next message
Jeremy Bailin is currently offline  Jeremy Bailin
Messages: 618
Registered: April 2008
Senior Member
> Oh, this looks pretty clever. I think I understand the idea but I'll have to digest the details. If I'm right, you find out for each column how many different values there are and then multiply the index for that column with the proper number to avoid overlap with the other column indices when adding.

Yes, that's exactly right!

-Jeremy.
Re: Sorting a matrix [message #83473 is a reply to message #83346] Fri, 08 March 2013 16:14 Go to previous messageGo to next message
Mats Löfdahl is currently offline  Mats Löfdahl
Messages: 263
Registered: January 2012
Senior Member
Den fredagen den 8:e mars 2013 kl. 23:24:44 UTC+1 skrev Gianguido Cianci:
> I was curious to compare this method with a more straightforward way. multisort.pro below does both.
>
> Jeremy's method is twice as fast for large arrays. However, the two methods only give the same result for smallish arrays.

When the results differ, does any of the methods sort correctly?
Re: Sorting a matrix [message #83475 is a reply to message #83346] Fri, 08 March 2013 14:24 Go to previous messageGo to next message
cgguido is currently offline  cgguido
Messages: 195
Registered: August 2005
Senior Member
I was curious to compare this method with a more straightforward way. multisort.pro below does both.
Jeremy's method is twice as fast for large arrays. However, the two methods only give the same result for smallish arrays.

Any ideas what's going on?

;;;;;;;;;;;;;; EXAMPLES ;;;;;;;;;;;;;
multisort, round(randomu(s,20,1e1)*10)
; 1
; 0.00027489662 0.00026106834

multisort, round(randomu(s,20,1e2)*10)
; 0
; 0.00072312355 0.00039005280


;;;;;;;;;;;;;; CODE ;;;;;;;;;;;;;;;
PRO multisort, matrix

compile_opt idl2


matrixshape = size(matrix, /dimen)

t1 = systime(1)


; this gives you the range of each column:
matrixord = lonarr(matrixshape)
for i=0l,matrixshape[0]-1 do matrixord[i,*] = ord(matrix[i,*])
ordmax = max(matrixord, dimen=2)

; what do you need to multiply by to get a unique range?
column_multiply = [reverse(product(reverse(ordmax[1:*]+1), /int, /cumul)), 1]

; create a unique key and sort on it
sortkey = total(matrixord * rebin(column_multiply,matrixshape, /sample), /int, 1)
newmatrix = matrix[*, sort(sortkey)]
t2 = systime(1)


newmatrix2 = matrix
FOR i = matrixshape[0]-1, 0, -1 DO BEGIN
s = bsort(newmatrix2[i, *]);use bsort which maintains order of identical elements
newmatrix2 = newmatrix2[*, s]
ENDFOR
t3 = systime(1)

print,array_equal( newmatrix, newmatrix2)

print, t3-t2, t2-t1


RETURN
END
Re: Sorting a matrix [message #83489 is a reply to message #83434] Thu, 07 March 2013 16:34 Go to previous messageGo to next message
Jeremy Bailin is currently offline  Jeremy Bailin
Messages: 618
Registered: April 2008
Senior Member
On 3/7/13 7:32 AM, Mats L�fdahl wrote:
> Den onsdagen den 6:e mars 2013 kl. 06:22:01 UTC+1 skrev Jeremy Bailin:
>>> Oh, this looks pretty clever. I think I understand the idea but I'll have to digest the details. If I'm right, you find out for each column how many different values there are and then multiply the index for that column with the proper number to avoid overlap with the other column indices when adding.
>>
>> Yes, that's exactly right!
>
> OK, good. But I have problems with the ord() function. When called with an array of only zeros, the call to histogram() does not work:
>
> HISTOGRAM: Expression must be an array in this context: UNIQVALUES.
> % Execution halted at: ORD 14
>
> Where line 14 is
>
> h = histogram(uniqvalues,bin=1,min=0,reverse=ri)
>
> The reason is that when there is only one value, uniqvalues has only one element and is therefore turned into is a scalar.
>
> The call to histogram works if I chance it to
>
> h = histogram([uniqvalues],bin=1,min=0,reverse=ri)
>
> Could this have bad consequences in other cases or is this what I should do? You described the ord() code you gave as a stub. Does this mean there is a more complete version that maybe checks for this problem (and other problems as well)?
>

Ah, interesting! No, I didn't check for that - the code is actually
complete, it's just that the documentation is missing because I didn't
want to obscure the post. ;-)

Thanks for the bugfix!

-Jeremy.
Re: Sorting a matrix [message #83565 is a reply to message #83475] Fri, 08 March 2013 15:51 Go to previous messageGo to next message
Jeremy Bailin is currently offline  Jeremy Bailin
Messages: 618
Registered: April 2008
Senior Member
On 3/8/13 4:24 PM, Gianguido Cianci wrote:
> I was curious to compare this method with a more straightforward way. multisort.pro below does both.
> Jeremy's method is twice as fast for large arrays. However, the two methods only give the same result for smallish arrays.
>
> Any ideas what's going on?
>
> ;;;;;;;;;;;;;; EXAMPLES ;;;;;;;;;;;;;
> multisort, round(randomu(s,20,1e1)*10)
> ; 1
> ; 0.00027489662 0.00026106834
>
> multisort, round(randomu(s,20,1e2)*10)
> ; 0
> ; 0.00072312355 0.00039005280
>
>
> ;;;;;;;;;;;;;; CODE ;;;;;;;;;;;;;;;
> PRO multisort, matrix
>
> compile_opt idl2
>
>
> matrixshape = size(matrix, /dimen)
>
> t1 = systime(1)
>
>
> ; this gives you the range of each column:
> matrixord = lonarr(matrixshape)
> for i=0l,matrixshape[0]-1 do matrixord[i,*] = ord(matrix[i,*])
> ordmax = max(matrixord, dimen=2)
>
> ; what do you need to multiply by to get a unique range?
> column_multiply = [reverse(product(reverse(ordmax[1:*]+1), /int, /cumul)), 1]
>
> ; create a unique key and sort on it
> sortkey = total(matrixord * rebin(column_multiply,matrixshape, /sample), /int, 1)
> newmatrix = matrix[*, sort(sortkey)]
> t2 = systime(1)
>
>
> newmatrix2 = matrix
> FOR i = matrixshape[0]-1, 0, -1 DO BEGIN
> s = bsort(newmatrix2[i, *]);use bsort which maintains order of identical elements
> newmatrix2 = newmatrix2[*, s]
> ENDFOR
> t3 = systime(1)
>
> print,array_equal( newmatrix, newmatrix2)
>
> print, t3-t2, t2-t1
>
>
> RETURN
> END
>

Bwahahah.... oh dear.

The problem is that in this case, column_multiply ends up being a LONG64
in order to fit the maximum value of the PRODUCT. But when you multiply:
matrixord * rebin(column_multiply, matrixshape, /sample)
the values are larger than the maximum LONG64... and wrap around to
become negative numbers!

IDL> print, minmax(matrixord)
0 10
IDL> print, minmax(column_multiply)
1 5054470284992937710
IDL> print, minmax(matrixord * rebin(column_multiply, matrixshape, /sample))
-8337803503723676196 8596744417517336158
IDL> help, matrixord, column_multiply
MATRIXORD LONG = Array[20, 100]
COLUMN_MULTIPLY LONG64 = Array[20]

In principle there are 10^20 possible rows, and it's trying to make sure
it has a unique integer for each possibility.

I guess my advice is to cap it at 19 columns and maybe do it in chunks
if there are more columns.

-Jeremy.
Re: Sorting a matrix [message #83567 is a reply to message #83346] Fri, 08 March 2013 16:31 Go to previous message
Mats Löfdahl is currently offline  Mats Löfdahl
Messages: 263
Registered: January 2012
Senior Member
Den lördagen den 9:e mars 2013 kl. 01:24:00 UTC+1 skrev Gianguido Cianci:
> hmmm try it for yourself :-) I check by plotting the data and checking it is non-strictly monotonically increasing:
>
>
>
> plot, newmatrix[0,*]; looks strange
>
> plot, newmatrix2[0,*]; looks ok

That's interesting. I've also seen problems with the former method for larger arrays but I haven't tracked them down yet. One thing I would check is if some quantity needs to be larger that what is accommodated by its integer type.

But it's really late in Sweden now...
Re: Sorting a matrix [message #83569 is a reply to message #83473] Fri, 08 March 2013 16:24 Go to previous message
cgguido is currently offline  cgguido
Messages: 195
Registered: August 2005
Senior Member
hmmm try it for yourself :-) I check by plotting the data and checking it is non-strictly monotonically increasing:

plot, newmatrix[0,*]; looks strange
plot, newmatrix2[0,*]; looks ok


On Friday, March 8, 2013 6:14:49 PM UTC-6, Mats Löfdahl wrote:
> Den fredagen den 8:e mars 2013 kl. 23:24:44 UTC+1 skrev Gianguido Cianci:
>
>> I was curious to compare this method with a more straightforward way. multisort.pro below does both.
>
>>
>
>> Jeremy's method is twice as fast for large arrays. However, the two methods only give the same result for smallish arrays.
>
>
>
> When the results differ, does any of the methods sort correctly?
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: where, or, and loops. There has to be a better way.
Next Topic: SPAWN with AWK

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Fri Oct 10 13:15:10 PDT 2025

Total time taken to generate the page: 2.32121 seconds