comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » Re: Help, no improvement in FFT speed on a multiprocessor system
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Switch to threaded view of this topic Create a new topic Submit Reply
Re: Help, no improvement in FFT speed on a multiprocessor system [message #67897] Mon, 07 September 2009 14:30
Kenneth P. Bowman is currently offline  Kenneth P. Bowman
Messages: 585
Registered: May 2000
Senior Member
In article <4aa56605$1@darkstar>, "Marco" <null@null.net> wrote:

> The arrays are Nx8192 on a side with N a power of 2.
>
> I increased N until the speed dropped of a cliff. Presumably cache/memory
> thrashing.
>
>
> "Kenneth P. Bowman" <k-bowman@null.edu> wrote in message
> news:k-bowman-5D1FE7.08403607092009@news.tamu.edu...
>> In article <4aa34391$1@darkstar>, "Marco" <null@null.net> wrote:
>>
>>> I'm running IDL 7.1 on a Linux 2.6. This is an HP quad processor with
>>> each
>>> processors having 6 cores for 24 cores total.
>>>
>>> Doing large 2-D FFTs (>8Kx8K) I get no benefit from the extra
>>> processors.

I don't know how the Intel cache architecture works, but on some
processors (e.g., IBM Power), a cache miss causes a whole cache
line to be loaded from memory. If you are working on large arrays
and taking large strides through memory, every memory access
can cause a cache miss. This has the effect of completely
destroying the advantages of having a cache. Arrays dimensioned
by powers of two can be the worst cases.

I don't know an easy solution. You could do N 1-D FFTs of size
8192, transpose the output, and then do 8192 1-D FFTs of size N.
That is, "manually" make a 2-D FFT by looping over the second
dimension. It might possibly be faster than doing a 2-D FFT with
miserable cache performance.

Ken Bowman
Re: Help, no improvement in FFT speed on a multiprocessor system [message #67898 is a reply to message #67897] Mon, 07 September 2009 12:58 Go to previous message
Marco is currently offline  Marco
Messages: 5
Registered: April 2009
Junior Member
The arrays are Nx8192 on a side with N a power of 2.

I increased N until the speed dropped of a cliff. Presumably cache/memory
thrashing.


"Kenneth P. Bowman" <k-bowman@null.edu> wrote in message
news:k-bowman-5D1FE7.08403607092009@news.tamu.edu...
> In article <4aa34391$1@darkstar>, "Marco" <null@null.net> wrote:
>
>> I'm running IDL 7.1 on a Linux 2.6. This is an HP quad processor with
>> each
>> processors having 6 cores for 24 cores total.
>>
>> Doing large 2-D FFTs (>8Kx8K) I get no benefit from the extra
>> processors.
>> I can vary the number used in IDL from 24 down to 1, and see that the
>> number actually of processors actually showing a load is the correct
>> number,
>> but from 4 to 24 threads, the speed is the same and no faster than
>> Matlab,
>> which uses only a single processor out of the 24.
>>
>> I've tried varied IDL_CPU_TPOOL_NTHREADS and IDL_CPU_TPOOL_MIN_ELTS, but
>> have not been able to improve the results.
>>
>> Any suggestions?
>>
>> Thanks in advance,
>
> Do your array dimensions have small prime factors (2, 3, 4, or 5)?
>
> Ken Bowman
Re: Help, no improvement in FFT speed on a multiprocessor system [message #67901 is a reply to message #67898] Mon, 07 September 2009 06:40 Go to previous message
Kenneth P. Bowman is currently offline  Kenneth P. Bowman
Messages: 585
Registered: May 2000
Senior Member
In article <4aa34391$1@darkstar>, "Marco" <null@null.net> wrote:

> I'm running IDL 7.1 on a Linux 2.6. This is an HP quad processor with each
> processors having 6 cores for 24 cores total.
>
> Doing large 2-D FFTs (>8Kx8K) I get no benefit from the extra processors.
> I can vary the number used in IDL from 24 down to 1, and see that the
> number actually of processors actually showing a load is the correct number,
> but from 4 to 24 threads, the speed is the same and no faster than Matlab,
> which uses only a single processor out of the 24.
>
> I've tried varied IDL_CPU_TPOOL_NTHREADS and IDL_CPU_TPOOL_MIN_ELTS, but
> have not been able to improve the results.
>
> Any suggestions?
>
> Thanks in advance,

Do your array dimensions have small prime factors (2, 3, 4, or 5)?

Ken Bowman
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: Re: Function to return the current procedure/function
Next Topic: Re: use of loops in IDL for hashtable

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Wed Oct 08 20:03:51 PDT 2025

Total time taken to generate the page: 0.00602 seconds