comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » Optimizing code for faster calculation
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Switch to threaded view of this topic Create a new topic Submit Reply
Optimizing code for faster calculation [message #88032] Wed, 12 March 2014 23:33 Go to next message
Kenneth D is currently offline  Kenneth D
Messages: 3
Registered: March 2014
Junior Member
I've been looking at this block of code now for... ever.

I've been editing a program created by my Adviser to reduce run time wherever possible. So far I've reduced the run time by nearly half, and I'm trying to juice any performance I can get from absolutely anywhere. My final project will use an array roughly 17,000 by 17,000. And I have to iterate through the program at least 17,000*10 times. If I'm lucky it won't take a month to process my data-sets now. This code is about all I have left to work with:

exceed_subs = where(min_rmse GT rmse_threshold, counter)
if counter GT 0 then modeled_class(exceed_subs) = "unmodeled"

min_rmse is an array Float[200], such as [0.347272, 0.312437, 0.360164,...]
rmse_threshold = 0.025
modeled_class is an array String[200], such as ["soil","quag","soil","grass",...]

The code find the locations where min_rmse is greater than a threshold value, and replaces those index locations in the string array (modeled_class) with "unmodeled".

This may well be the most efficient way to do this (this code will run a minimum of 17,000 times) but a look at the histograms page at Exelis:
http://www.exelisvis.com/docs/HISTOGRAM.html

shows:
For example, make the histogram of array A:
H = HISTOGRAM(A, REVERSE_INDICES = R)
;Set all elements of A that are in the ith bin of H to 0.
IF R[i] NE R[i+1] THEN A[R[R[I] : R[i+1]-1]] = 0

;The above is usually more efficient than the following:
bini = WHERE(A EQ i, count)
IF count NE 0 THEN A[bini] = 0

Which looks so similar to what I'm trying to do. I tried to implement this with no luck (maybe because strings?). Is there anything else I can do? That is, besides taking out iterations, they simply must be there to do what I need.
Re: Optimizing code for faster calculation [message #88033 is a reply to message #88032] Thu, 13 March 2014 00:55 Go to previous messageGo to next message
Helder Marchetto is currently offline  Helder Marchetto
Messages: 520
Registered: November 2011
Senior Member
On Thursday, March 13, 2014 7:33:31 AM UTC+1, Kenneth D wrote:
> I've been looking at this block of code now for... ever.
>
>
>
> I've been editing a program created by my Adviser to reduce run time wherever possible. So far I've reduced the run time by nearly half, and I'm trying to juice any performance I can get from absolutely anywhere. My final project will use an array roughly 17,000 by 17,000. And I have to iterate through the program at least 17,000*10 times. If I'm lucky it won't take a month to process my data-sets now. This code is about all I have left to work with:
>
>
>
> exceed_subs = where(min_rmse GT rmse_threshold, counter)
>
> if counter GT 0 then modeled_class(exceed_subs) = "unmodeled"
>
>
>
> min_rmse is an array Float[200], such as [0.347272, 0.312437, 0.360164,...]
>
> rmse_threshold = 0.025
>
> modeled_class is an array String[200], such as ["soil","quag","soil","grass",...]
>
>
>
> The code find the locations where min_rmse is greater than a threshold value, and replaces those index locations in the string array (modeled_class) with "unmodeled".
>
>
>
> This may well be the most efficient way to do this (this code will run a minimum of 17,000 times) but a look at the histograms page at Exelis:
>
> http://www.exelisvis.com/docs/HISTOGRAM.html
>
>
>
> shows:
>
> For example, make the histogram of array A:
>
> H = HISTOGRAM(A, REVERSE_INDICES = R)
>
> ;Set all elements of A that are in the ith bin of H to 0.
>
> IF R[i] NE R[i+1] THEN A[R[R[I] : R[i+1]-1]] = 0
>
>
>
> ;The above is usually more efficient than the following:
>
> bini = WHERE(A EQ i, count)
>
> IF count NE 0 THEN A[bini] = 0
>
>
>
> Which looks so similar to what I'm trying to do. I tried to implement this with no luck (maybe because strings?). Is there anything else I can do? That is, besides taking out iterations, they simply must be there to do what I need.

Hi,
I think you should have a look at
http://www.idlcoyote.com/tips/histogram_tutorial.html
You will find the information you need in there.

That said, my guess is that you will need to set the proper binsize in your histogram command. Depending on the type of values you have, you might try using binsize = 0.025, but I don't have time to check if that is a good option.
Hope it helps.
Cheers,
Helder
Re: Optimizing code for faster calculation [message #88034 is a reply to message #88032] Thu, 13 March 2014 05:28 Go to previous messageGo to next message
David Fanning is currently offline  David Fanning
Messages: 11724
Registered: August 2001
Senior Member
Kenneth D writes:

> I've been looking at this block of code now for... ever.

I think you are in good shape here. I'd go worry about something else.
Chances are you only going to have to run this program once. :-)

Cheers,

David

--
David Fanning, Ph.D.
Fanning Software Consulting, Inc.
Coyote's Guide to IDL Programming: http://www.idlcoyote.com/
Sepore ma de ni thue. ("Perhaps thou speakest truth.")
Re: Optimizing code for faster calculation [message #88035 is a reply to message #88032] Thu, 13 March 2014 06:34 Go to previous messageGo to next message
Heinz Stege is currently offline  Heinz Stege
Messages: 189
Registered: January 2003
Senior Member
On Wed, 12 Mar 2014 23:33:31 -0700 (PDT), Kenneth D wrote:

> Is there anything else I can do? That is, besides taking out iterations, they simply must be there to do what I need.

Since the array min_rmse has only 200 elements, I doubt that it is
possible to make this part of the code significantly faster.

Otherwise for me it seems, that "modeled_class" beeing a string array
is more the bottleneck than the where function. It may help to change
the code to something like the following. Start with the lines which
can be executed somewhere in the header of your code:

modeled_class=lonarr(200)
temp=["soil","quag","grass",...,"unmodeled"] ; unique values only!
modeled_class_names=temp[sort(temp)]
unmodeled=value_locate(modeled_class_names,"unmodeled")

Then write the respective indicees into the long integer array
modeled_class.

This way the lines whithin the iteration can be changed to

exceed_subs = where(min_rmse GT rmse_threshold, counter)
if counter GT 0 then modeled_class(exceed_subs) = unmodeled

Are such changes possible with your code?

Cheers, Heinz
Re: Optimizing code for faster calculation [message #88036 is a reply to message #88035] Thu, 13 March 2014 06:46 Go to previous messageGo to next message
chris_torrence@NOSPAM is currently offline  chris_torrence@NOSPAM
Messages: 528
Registered: March 2007
Senior Member
> This way the lines whithin the iteration can be changed to
>
>
>
> exceed_subs = where(min_rmse GT rmse_threshold, counter)
>
> if counter GT 0 then modeled_class(exceed_subs) = unmodeled
>
>
>
> Are such changes possible with your code?
>
>
>
> Cheers, Heinz

I would agree with Heinz. Strings in general are slow. Heck, if you only have 200, I would just use a byte array for the modeled_class index values. You can then convert them at the very end (outside of the loop!) to the string values.

Also, not sure what version of IDL you're using (I'm a bit worried that you are using parentheses for indexing, which is never a good sign...), but if you have IDL 8.0 or higher, you could eliminate the "if counter" check by using the NULL keyword to where:

modeled_class[WHERE(min_rmse gt rmse_threshold, /NULL)] = unmodeled

This will be a "no-op" if none of the values match, and will do the right thing if some of the values match.

Cheers,
Chris
ExelisVIS
Re: Optimizing code for faster calculation [message #88043 is a reply to message #88036] Thu, 13 March 2014 17:46 Go to previous message
Kenneth D is currently offline  Kenneth D
Messages: 3
Registered: March 2014
Junior Member
Thank you so much!

Converting my strings to index values and processing like this:

modeled_class[where(min_rmse GT rmse_threshold, /NULL)] = unmodeled_index

Totally made a difference! Now the biggest ding on my program is the built in function: min()

min_rmse = min(rmse_subset, min_subs, dimension=1)

I'm going to see if a histogram of this might be faster or something...

Additional Info:
My test case is a 200x200 matrix. My real world case is a 17000x17000 matrix
I'm using IDL 8.2.#, I tried my code on 8.3 at our university and the index (i) gave me an error, so I changed those to [i]. I blame it on switching between Python and IDL.
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: Difficulty using cgimage2kml for swath data plotted in a cylindrical (Map_set) projection
Next Topic: Force to print exponential to e-6

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Wed Oct 08 13:40:08 PDT 2025

Total time taken to generate the page: 0.00435 seconds