comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » Multi-core techniques
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Switch to threaded view of this topic Create a new topic Submit Reply
Multi-core techniques [message #70427] Thu, 15 April 2010 18:15 Go to next message
Tim B is currently offline  Tim B
Messages: 4
Registered: December 2009
Junior Member
I am working with various satellite datasets (e.g. Pathfinder SST)
where
most of the prior work has been producing 50km resolution analysis.
Even
an intensive piece of code over a full 30-odd year data set could
still be
finished overnight. The algorithms that I've seen don't excessively
use
array processing (no FFT's or such) and there is a lot of data
independence i.e.
values over the course of a few days are used rather than values over
a complete year.

With a move to higher resolution, i.e. 4km, there is significantly
more data (approx 150 4km
pixels in a 50km pixel). Given the data independence, my usual
approach would be to
create a thread pool around the number of available cores and
calculate each data
'piece' independently. However IDL doesn't expose threads to the
programmer. So I'd
value any thoughts about how to take advantage of multicore CPU's
(heck, even my laptop
is a dual core machine). My thoughts are:

- use C(?) to manage forking separate processes that start IDL and
pass parameters to the
appropriate procedure to run in IDL

- run a number of IDL programs in parallel, the same code but
processing different
temporal regions of the dataset. I can start up IDL in different
windows on an 8-core
machine and each seems to be a separate process.

- use a different language/architecture completely :-)

I'd be interested to hear from anyone else trying to take advantage of
multicore CPU's..

Tim Burgess
Re: Multi-core techniques [message #70505 is a reply to message #70427] Fri, 16 April 2010 10:03 Go to previous message
natha is currently offline  natha
Messages: 482
Registered: October 2007
Senior Member
Hi Allard,

Thank you for your explanation. It helped me to understand what are
you doing in your code...
I started doing some tests and now I have a problem.

On my first execution I forget to destroy the idl_idlbridges and now
if I try to create another bridge I get a "bus error" and I can't
continue.

Your code is not bad and definitely it would be better if the
cgi_process_client could know the procedure/function to start.
Creating .SAV files to pass the parameters is a little bit tricky but
I can't imagine another way to do pass objects, structures, etc.. to
the bridge.
Perhaps another feature to improve could be the possibility to collect
output data. It's possible to use the GetVar from the IDL_IDLbridge
and, actually, I don't how I would add this but it would be cool !

Thanks again Allard. Have fun !

nata
Re: Multi-core techniques [message #70514 is a reply to message #70427] Fri, 16 April 2010 08:28 Go to previous message
wita is currently offline  wita
Messages: 43
Registered: January 2005
Member
Dear Bernat,

There is actually an example included in the header of
cgi_process_manager.pro, but I will try to explain this in more
detail.

To start with, you need to to split your calculations into a set of
independent tasks. Usually, this means splitting it by parameter
ranges, by date, by region, by tile or whatever. These tasks need to
be coded as an array of structures. So if you want to split your
tasks over various inputfiles you could do something like this:
tasks = Replicate({inputfile:""}, <number_of_input_file>)
tasks[0].inputfile="datafile1.dat"
tasks[1].inputfile="datafile2.dat"
etc.

Then you start "cgi_process_manager" and you provide "tasks" as
parameter. The process manager will determine your machine setup and
start as many bridges as it can find CPUs. Next it will start the
task manager which keeps track of which tasks have been processed
and which not. The idea behind it, is that you can separate the logic
of handling tasks from the execution of tasks. For example, if you
have an IDLDataminer license you could easily extend the taskmanager
to read the tasks from a database table. I actually have a python
implementation of such a task manager that I use a lot.

cgi_process_manager will now try to execute the tasks on the
IDL_IDLBridges that it has initialized. The first problem here is
that
you can only send simple variables directly over an IDL_IDLBridge
(scalars, arrays), no structures or objects. The workaround I chose
is
to first save the data in the task structure to a temporary .SAV
file.
The name of this .SAV file is send over to the IDL_IDLBridge and the
procedure "cgi_process_client" is executed on the bridge. The
cgi_process_client procedure restores the contents of the .SAV file
and your parameters in the task will be available on the bridge as a
variable "task".

The cgi_process_manager will continuously monitor the availability
of the bridges and start new processes when a bridge comes available.

Finally, there is one pitfall here: how does the cgi_process_client
procedure know the name of the procedure/function it has to start?
In fact, in the current implementation, it doesn't and you need to
provide it yourself by hardcoding it into the cgi_process_client
procedure. So in this example, line 70 of cgi_process_client.pro
needs to be changed into something like:
my_cpu_intensive_process, inputfile=task.inputfile

I may change this in the future, by coding the execution string into
the task structure itself and use something like:
Execute, task.execstring

A few further remarks
- The process_manager does not collect output from the client
processes.
So your client process must contain logic to store its result.
- The approach with the .SAV seems a bit tricky but I found it to be
quite robust. I also looked at the possibility to use shared memory
between bridges but that is tricky as well. Moreover, in order to
map the shared memory on the bridge you first need to know the
layout
of the structure, which you cannot send over the bridge as well. So
you end up with a chicken-and-egg problem.
- Split your tasks in a couple of large chunks instead of using many
small tasks. Executing each task involves a small overhead, but
this
can become significant if you have to execute many tasks.

Hope this helps.

Allard
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: How to shift the values of x axis from position of x axis to other
Next Topic: inset plot

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Fri Oct 10 04:40:08 PDT 2025

Total time taken to generate the page: 1.51653 seconds