comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » Re: Quickest method for calculation
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Return to the default flat view Create a new topic Submit Reply
Re: Quickest method for calculation [message #71058 is a reply to message #71057] Mon, 24 May 2010 08:58 Go to previous messageGo to previous message
Karl[1] is currently offline  Karl[1]
Messages: 79
Registered: October 2005
Member
On May 24, 8:51 am, David Fanning <n...@dfanning.com> wrote:
> Gray writes:
>> However, instead of squaring, you should do y.rho*y.rho.
>
> Why is that?
>
> Cheers,
>
> David
>
> --
> David Fanning, Ph.D.
> Fanning Software Consulting, Inc.
> Coyote's Guide to IDL Programming:http://www.dfanning.com/
> Sepore ma de ni thue. ("Perhaps thos speakest truth.")

The IDL interpreter is probably going to compute y.rho^2 by calling
the pow() function, which is a standard C library function. The pow()
function probably uses exp( y * ln(x) ) and probably needs to check/
handle bad input values for these functions. The pow() function
itself may be optimized to spot the easy multiply or use some other
faster calculation, but that is going to vary from platform to
platform. In any case, is is easy to see that a single multiply is
better at this point.

Also, there may be some float -> double and double -> float
conversions going on, which are going to strain memory and CPU even
further. The pow() function takes double arguments and returns
double.

IDL has always been a "do what I say" language, and so it would
probably expect you to write y.rho * y.rho instead of doing this
optimization for you, which it could. But I'd guess that it doesn't
in this case.

Minimizing memory access would be the other thing to look at. If you
go with the multiply, you are going to read the y.rho array once to
compute its square and then store the result in a temp. Then you will
read the temp back again when multiplying it by the temp. You end up
looping through some of the data twice. This is not the optimum cache
behavior.

If it was Really Important for this to run fast, I would consider
writing the function calculation in C so that I end up making only one
pass through all the data:

for (i=0; i<n; i++)
top_tem[i] += y.rho[i] * y.rho[i] * temperature[i]

There are a lot of docs and examples for coding functions in C that
you can call from IDL, so this is not very hard to do and may be worth
doing just to see how much faster it is.

If your compiler is really good, you may be able to turn on an option
that generates SSE code, which may give you another 2x to 3x. The
above pattern is an easy one for the compiler to recognize as SIMD-
exploitable. If the compiler doesn't do this for you and you have the
time, you can write the SSE code yourself.

But yeah, using the multiply instead of the ^ will have the most ROI.

Karl
[Message index]
 
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Read Message
Previous Topic: Landsat HDF
Next Topic: SAVE again

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Sat Oct 11 10:09:55 PDT 2025

Total time taken to generate the page: 0.72098 seconds