 Sum of Diagonals

QUESTION: Todd Clememts has made asking tough questions on the IDL newsgroup something of an art form. Here is one of his latest.

I thought I'd ask this question, more out of curiosity as to a possible result than for really needing to optimize it. Every once in a while (not often enough to make me worry about optimizing too much), I want to take a not necessarily square matrix and get the sum along the diagonals, such as the following, with the theoretical function Sum_Diag:

IDL> blah = indgen( 4, 4 )
IDL> print, blah
0       1       2       3
4       5       6       7
8       9      10      11
12      13      14      15
IDL> print, sum_diag( blah )
0       5      15      30      30      25      15

which is the series [0, 4+1, 8+5+2, 12+9+6+3, ... ]

Of course, to be difficult, I'd like it to work for non-square matrices as well:

IDL> blah = indgen( 5, 3 )
IDL> print,blah
0       1       2       3       4
5       6       7       8       9
10      11      12      13      14

and the result would be the series [0, 5+1, 10+6+2, 11+7+3, ... ]

The best I've come up with is to count the diagonals, loop through those, and then loop again through the possible row and column indices in that array. It's pretty ugly, and very time consuming for a 512x512 array (most of what I want). It just occured to me now that for the squre matrix, you could take "blah + transpose( blah )" and now you only have to worry about the lower half of the matrix (and the doubled diagonal elements), so it might be a bit better, but still messy.

So if you're bored, I'd love to see a better solution than a bunch of long, nested loops, even if just only for the square case (since that is 90% of the cases in my data). ANSWER: The IDL newsgroup's Math Guy, Craig Markwardt, takes the bait and provides an answer.

How about this solution. It's not a one-liner, and it uses two loops, but remember loops are not always bad if you can do a lot of work inside one iteration. This one makes NX+NY-1 iterations.

;; Set up the problem with some fake data, a NX x NY array
nx = 4 & ny = 3 & mm = indgen(nx, ny)

;; Output array
tt = fltarr(nx+ny-1)

;; Do the work
ll = lindgen(nx>ny)
for i = 0, ny-1 do tt(i)      = total((mm(0+ll,i-ll))(0:i<(nx-1)))
for i = 1, nx-1 do tt(i+ny-1) = total((mm(i+ll,ny-1-ll))(0:(nx-1-i)<(ny-1)))

For the gobledy-gook impaired, first note that the two loops march down the left side of the array and then across the bottom, respectively. Second, I'm using array indexing along two dimensions simultaneously. The expression, mm(0+ll,i-ll), is the actual diagonal of interest. The bit at the end of the expression, (0:i<(nx-1)), is used to trim off the end, which may contain the wrong data if the array is not square.

There you go! This is a speedy devil on my machine.

Editor's Note: Todd reported later that with Craig's solution "the running time went from 3.66 seconds (ahem!) down to 0.15 seconds for each 512x512 image. It took a 2700x2700 "image" to make your code take 3.6 seconds!" Well, yes, but that's we expect from Craig. :-) 