This is a comparison of different for-loop and for-loopless approaches to an indexing problem. A lot can be learned from the timing results. Once more, regarding the original posting from S Bhattacharyya: > Q2) I have a generic array foo=fltarr(a,b). I'd like to copy findgen(b) > into every column. Any way of doing this without loops ? I stand corrected - there are a couple of ways to accomplish the second of the two examples (Q2) without using for-loops. I wrote the various methods into subroutines, and testing the times for each method, using first a small size that easily fit inside memory, and then using large arrays where my machine was forced to page a lot. I ran them on an HP 715 / 64, with 96 Mbyte RAM and 320 Mbyte swapspace. I ran them both under IDL and PV-Wave, versions: IDL. Version 3.1.0 and PV-WAVE v6.01. Sometimes one was faster, sometimes the other (often by a factor of two or more!). I repeated the tests several times, and the results were quite consistent. I didn't perform any averaging, and didn't do anything to make it all particularly accurate. This should not be considered a benchmark between the two programs. My goal was just to get some feel which methods were more effective. I compared both the problem of inserting columns (that of Q2), plus the problem of inserting rows of findgen into an array. These two problems turn out to be quite different, with different optimal solutions. The methods compared are those proposed by: 1. Kenneth P. Bowman 2. me 3. by both David Fanning and Dan Carr (plus some recent news posters) 4. Paul C. Sorenson (paulcs@netcom.com) 5. a brute-force, double-for-loop method (to demonstrate what not to do) 6. exactly like 5 except that the long in the loop variable that is assigned to each element not explicitly converted to float (this happens explicitly). 7. as in 5., but with the loops reversed, causing thrashing 8., 9. Robert Cannon I wrote them each into a similar form (see the listing at the end of this note). Here is the algorithm for each. The version written to insert a row instead of a column is similar in form to these. Compare, for example: insert rows: one_column = findgen(1, columns) for i = 0L, rows-1 do array(i, 0) = one_column with insert columns: one_row = findgen(rows) for j = 0L, columns-1 do array(0, j) = one_row All routines are listed at the end of this note. The algorithms: 1. array = fltarr(rows, columns, /nozero) for j = 0L, columns-1 do array(*, j) = float(j) 2. array = fltarr(rows, columns, /nozero) one_row = findgen(rows) for j = 0L, columns-1 do array(0, j) = one_row 3. array = FINDGEN(rows) # REPLICATE(1, columns) 4. array = (findgen(1,columns))(bytarr(rows),*) 5. array = fltarr(rows, columns, /nozero) for j = 0L, columns-1 do $ for i = 0L, rows-1 do $ array(i, j) = float(i) 6. as in 5., but '= i' instead of '= float(i)' 7. as in 5., but reversed order of the two for statements 8. array = rebin(findgen(1, columns), rows, columns) 9. array = transpose(findgen(columns, rows) mod columns ) Here are the execution times, in seconds, under IDL and PV-Wave, for the specified array sizes, for each of the algorithms above. The left two columns show the times for inserting columns, the second two columns involve inserting rows. Tested using the following number of rows and columns: 3000, 3000 Insert columns Insert rows IDL PV-Wave IDL PV-Wave 1. 22.84 17.58 30.48 21.31 2. 22.77 30.81 4.45 1.48 3. 3.88 9.73 3.87 9.70 4. 12.05 11.93 5.67 7.17 5. 89.24 164.56 87.51 163.15 6. 141.93 385.72 140.45 389.53 7. 92.75 166.44 92.82 169.71 8. 1.30 4.96 9.78 10.29 9. 30.75 28.71 18.40 21.60 Tested using the following number of rows and columns: 1000, 30000 Insert columns Insert rows IDL PV-Wave IDL PV-Wave 1. 84.95 71.09 Inf Inf 2. Inf Inf 28.52 22.69 3. 27.65 42.95 30.88 50.10 4. 129.53 138.84 124.39 126.26 5. 300.62 547.51 296.74 556.55 6. - 1284.83 - 1300.17 7. Inf Inf Inf Inf 8. 20.07 28.76 Inf Inf 9. - 3100.02 158.07 161.23 Where here I poetically define "Inf" to be anything that takes longer than a day (I suspect these would all end up taking about 10^4 seconds). A few conclusions can be drawn from the timings: The most important thing is to differentiate between problems that fit fully within physical memory (RAM) and those that require the use of virtual memory. For example, on my 96 Mbyte machine the first example of using arrays of 3000x3000 floats (= ~36 Mbytes) fit within my machine. However, increasing the problem to 1000x30000 (= ~120 Mbytes) makes it exceed my machines RAM, and it becomes necessary for my machine to swap the array in and out of memory as it works through the elements. This is a very slow operation, that will almost always be the dominant factor in determining how long the routine takes. Although this problem is only about three times larger than the original, the fastest solutions were 10 times longer. Some solutions, that had taken 20 seconds with the first problem, were still not finished after the machine had run for a full day. For problems that do not fit into physical memory it is crucial that all operations scan over the leftmost dimension first (i.e. rows) before later dimensions (columns, etc.). The statement array(i, *) = i inside a for-loop causes the machine to scan through the second dimension before the first dimension. That turned this seemingly simple statement into a problem that was not finished a day later. The slow approach of using double for-loops was many times faster than this more elegant approach when the for-loops were done in the proper order: for j=0L, rows-1 do for i=0L, columns-1 do array(i, j) = ... but reversing the order of the for-loops turned it into another problem that was not finished a day later. The best solution of problems that are inherently row-oriented (such as inserting something into each row of a matrix) are different from the best solution of column-oriented problems. These optimal solutions ended up being the same solutions that were optimal for the problem that fit within physical memory. The only solution that performed pretty well for both row-oriented and column-oriented problems was the elegant cross-product approach (approach 3.), despite the fact that it requires an extra multiply for each array element: FINDGEN(rows) # REPLICATE(1, columns) Comparing the times of solutions 5. and 7., one notes that placing the for-loops in the wrong order has very little effect on execution time as long as the problem fits within physical memory, but has a drastic effect when that is not the case. The optimal solution to the problem of inserting findgen into rows of the array ended up being the for-loop looping over the columns, solution 2.: one_row = findgen(rows) for j = 0L, columns-1 do array(0, j) = one_row The optimal solution for the problem of inserting findgen into the columns was accomplished with rebin, solution 8.: array = rebin(findgen(1, columns), rows, columns) The clever vector indexing approach, solution 4.: (findgen(1,columns))(bytarr(rows),*) performed surprisingly poorly. I can't explain why; it's simply a function of how the interpreter functions with such an indexing scheme. It was interesting to note the burden that a function call creates when it is inside an inner for-loop. In solution 5. an implicit type conversion is made from long integer to float: array(i, j) = i when this is changed to array(i, j) = float(i) (solution 6.), the execution times are much slower, sometimes more than doubly slow. One can see from the execution times that at least part of the code for IDL and PVWave are diverging (presumably the memory administration and indexing systems, and perhaps the interpreter); often one language is twice as fast as the other in a given solution, but then the other language is twice as fast on another problem. Often the times are essentially identical. It's clear that neither company has set a goal of perfecting execution speed under all conditions. As I mentioned, when one does operations along a higher dimension first for a problem that doesn't fit into memory, the problem becomes very slow. I was curious what was happening, because after two days I still hadn't gotten the result from one of these. So I added an extra print statement to my test routine test1_rows.pro to look at the timing as it inserts each row. The best time for this example was about 20 seconds, but each of the 1000 rows took longer than this, due to the thrashing to swap memory in and out. Here are the timings, in seconds for the first few row computations. The time per row keeps growing, but always slower: test1_rows, 1000, 30000 row elapsed time average time/row 1 13.296000 13.327000 2 81.090000 40.560500 3 150.33500 50.121667 4 218.95900 54.750000 5 294.14200 58.832600 6 365.92100 60.992000 7 435.88800 62.274143 8 505.99300 63.253000 9 579.73700 64.418556 10 649.93200 64.997300 11 721.52000 65.595636 12 790.99900 65.919167 13 863.28500 66.408923 14 936.90500 66.923929 15 1008.2590 67.219333 16 1080.0630 67.505813 17 1154.3780 67.906412 18 1227.2110 68.180111 19 1297.6890 68.301053 20 1369.7120 68.487150 21 1444.2740 68.776429 22 1517.4780 68.977636 23 1589.7140 69.119348 I'm guessing that completion will take something like 100,000 seconds, or about 5,000 times longer than the best time for this solution. Morals of the story: A good optimized C compiler worries about a lot of the details of making these things efficient. IDL / PVWave are not designed to optimize the problem for you, so in the case where time becomes crucial, you have to do the optimizing: 1. Avoiding accessing in order of higher dimension first (i.e., columns before rows). 2. Try to avoid an inner for-loop - vectorize the fastest-changing index when possible. 3. Avoid function calls within an inner for-loop. (but this falls under the last, avoid the inner for-loop in general). 4. Careful choice of approach can then optimize a problem between different fairly effective approaches. Interestingly, these solutions can be different between IDL and PV-Wave. One just has to experiment. ************ The same method used in (4) can be applied to the first question: > Q1) I have a generic array foo(x,y). I'd like to divide each column > by its max. Can this be done without looping ? > And provides a solution where the main looping can be done without a for-loop. I still see no way to avoid using a for-loop to get the maximums for each column. ; make an example array: rows = 3 columns = 5 foo = findgen(rows, columns) ; create a vector of 1 / maxes for each column: inv maxes = fltarr(rows) for i = 0L, rows-1 do inv_maxes(i) = 1. / max(foo(i, *)) ; ; Now perform the scaling, scanning through the elements of 'foo' and ; repeatedly through the inv_maxes vector: foo = foo * inv_maxes(*, bytarr(columns)) I didn't check the times on this, but assume it will unfortunately also, like solution 4., perform slowly. ____________________________________________________________________ ; Here are the actual routines I used for the time testing: PRO TEST1, rows, columns strt = systime(1) ; array = fltarr(rows, columns, /nozero) for j = 0L, columns-1 do array(*, j) = j ; print, 'elapsed time: ',systime(1) - strt return END PRO TEST2, rows, columns strt = systime(1) ; array = fltarr(rows, columns, /nozero) one_column = findgen(1, columns) for i = 0L, rows-1 do array(i, 0) = one_column ; print, 'elapsed time: ',systime(1) - strt return END PRO TEST3, rows, columns strt = systime(1) ; array = replicate(1.0, rows) # findgen(1, columns) ; print, 'elapsed time: ',systime(1) - strt return END PRO TEST4, rows, columns strt = systime(1) ; array = (findgen(1,columns))(bytarr(rows),*) ; print, 'elapsed time: ',systime(1) - strt return END PRO TEST5, rows, columns strt = systime(1) ; array = fltarr(rows, columns, /nozero) for j = 0L, columns-1 do $ for i = 0L, rows-1 do $ array(i, j) = j ; print, 'elapsed time: ',systime(1) - strt return ; END PRO TEST6, rows, columns strt = systime(1) ; array = fltarr(rows, columns, /nozero) for j = 0L, columns-1 do $ for i = 0L, rows-1 do $ array(i, j) = float(j) ; print, 'elapsed time: ',systime(1) - strt return ; END PRO TEST7, rows, columns strt = systime(1) ; array = fltarr(rows, columns, /nozero) for i = 0L, rows-1 do $ for j = 0L, columns-1 do $ array(i, j) = j ; print, 'elapsed time: ',systime(1) - strt return ; END PRO TEST8, rows, columns strt = systime(1) ; array = rebin(findgen(1, columns), rows, columns) ; print, 'elapsed time: ',systime(1) - strt return ; END PRO TEST9, rows, columns strt = systime(1) ; array = transpose(findgen(columns, rows) mod columns ) ; print, 'elapsed time: ',systime(1) - strt return ; END ; the following are similar to the above, but insert rows intead of ; columns of 'findgen's: PRO TEST1_rows, rows, columns strt = systime(1) ; array = fltarr(rows, columns, /nozero) for i = 0L, rows-1 do array(i, *) = i ; print, 'elapsed time: ',systime(1) - strt return END PRO TEST2_rows, rows, columns strt = systime(1) ; array = fltarr(rows, columns, /nozero) one_row = findgen(rows) for j = 0L, columns-1 do array(0, j) = one_row ; print, 'elapsed time: ',systime(1) - strt return END PRO TEST3_rows, rows, columns strt = systime(1) ; array = findgen(rows) # replicate(1.0, 1, columns) ; print, 'elapsed time: ',systime(1) - strt return END PRO TEST4_rows, rows, columns strt = systime(1) ; array = (findgen(rows))(*, bytarr(columns)) ; print, 'elapsed time: ',systime(1) - strt return END PRO TEST5_rows, rows, columns strt = systime(1) ; array = fltarr(rows, columns, /nozero) for j = 0L, columns-1 do $ for i = 0L, rows-1 do $ array(i, j) = i ; print, 'elapsed time: ',systime(1) - strt return ; END PRO TEST6_rows, rows, columns strt = systime(1) ; array = fltarr(rows, columns, /nozero) for j = 0L, columns-1 do $ for i = 0L, rows-1 do $ array(i, j) = float(i) ; print, 'elapsed time: ',systime(1) - strt return ; END PRO TEST7_rows, rows, columns strt = systime(1) ; array = fltarr(rows, columns, /nozero) for i = 0L, rows-1 do $ for j = 0L, columns-1 do $ array(i, j) = i ; print, 'elapsed time: ',systime(1) - strt return ; END PRO TEST8_rows, rows, columns strt = systime(1) ; array = rebin(findgen(rows), rows, columns) ; print, 'elapsed time: ',systime(1) - strt return ; END PRO TEST9_rows, rows, columns strt = systime(1) ; array = findgen(rows, columns) mod rows ; print, 'elapsed time: ',systime(1) - strt return ; END __________________________________________________________________