comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » h5_parse() in the profiler
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Switch to threaded view of this topic Create a new topic Submit Reply
h5_parse() in the profiler [message #93414] Fri, 08 July 2016 09:50 Go to next message
MarioIncandenza is currently offline  MarioIncandenza
Messages: 231
Registered: February 2005
Senior Member
Hi IDL Wizards,

I'm working on an application requiring chunking through a huge quantity of HDF5 files. For (EXTREME) ease of coding, my code does
IDL> H5DATA = H5_PARSE(HDF5_file,/READ_DATA)
, and then operates on H5DATA. So so easy to code, but that call to H5_PARSE() is very time-consuming. I ran the IDL Profiler (as elegantly described here: http://www.idlcoyote.com/code_tips/whyslow.html), and found that all the time was being spent in two routines:
Routine Calls Only Total
CREATE_STRUCT (S) 1320 61.130619 0.046311 61.130619 0.046311
H5D_READ (S) 92 53.353344 0.579928 53.353344 0.579928

The 'H5D_READ' I understand, that is the low-level I/O and it is constrained by the system. But the 'CREATE_STRUCT' surprised me.

I guess CREATE_STRUCT() is where the memory allocation is occurring, but does it seem right that this takes more time than the actual disk I/O?

Any insights are welcome. I could rewrite the code to pull specific data out of the HDF5 file by hand, but that would be hundreds of lines of code, and I'd really rather not...

--Edward H.
Re: h5_parse() in the profiler [message #93416 is a reply to message #93414] Fri, 08 July 2016 10:46 Go to previous messageGo to next message
Markus Schmassmann is currently offline  Markus Schmassmann
Messages: 129
Registered: April 2016
Senior Member
On 07/08/2016 06:50 PM, Edward Hyer wrote:
> Hi IDL Wizards,
>
> I'm working on an application requiring chunking through a huge
> quantity of HDF5 files. For (EXTREME) ease of coding, my code does
> IDL> H5DATA = H5_PARSE(HDF5_file,/READ_DATA)
> , and then operates on H5DATA. So so easy to code, but that call to
> H5_PARSE() is very time-consuming. I ran the IDL Profiler (as
> elegantly described here:
> http://www.idlcoyote.com/code_tips/whyslow.html), and found that all
> the time was being spent in two routines:
> Routine Calls Only Total
> CREATE_STRUCT (S) 1320 61.130619 0.046311 61.130619
> 0.046311 H5D_READ (S) 92 53.353344 0.579928
> 53.353344 0.579928
>
> The 'H5D_READ' I understand, that is the low-level I/O and it is
> constrained by the system. But the 'CREATE_STRUCT' surprised me.
>
> I guess CREATE_STRUCT() is where the memory allocation is occurring,
> but does it seem right that this takes more time than the actual disk
> I/O?
>
> Any insights are welcome. I could rewrite the code to pull specific
> data out of the HDF5 file by hand, but that would be hundreds of
> lines of code, and I'd really rather not...
>
> --Edward H.
>
create_struct is called much more often, possibly - without looking into
h5d_read - the struct is being created like that:
temp=[]
for i=1,n-1 do temp=create_struct(temp,tagname[i],tagvalue[i])
struct=temp
terribly inefficient, better to create a string and then use
execute(string)

--Markus Schmassmann, IDL wizard apprentice - at best ;-)
Re: h5_parse() in the profiler [message #93417 is a reply to message #93416] Fri, 08 July 2016 21:38 Go to previous messageGo to next message
MarioIncandenza is currently offline  MarioIncandenza
Messages: 231
Registered: February 2005
Senior Member
On Friday, July 8, 2016 at 10:47:02 AM UTC-7, Markus Schmassmann wrote:
> for i=1,n-1 do temp=create_struct(temp,tagname[i],tagvalue[i])
> terribly inefficient, better to create a string and then use
> execute(string)

Hmmm... Yes! EXECUTE() is a non-starter, this needs to be fully usable in compiled code. But I'm sure there is some clever way to do this with fewer calls to CREATE_STRUCT().
If I come up with something that actually is faster, I'll post to this thread.
Re: h5_parse() in the profiler [message #93418 is a reply to message #93417] Sat, 09 July 2016 07:05 Go to previous message
Jim  Pendleton is currently offline  Jim Pendleton
Messages: 165
Registered: November 2011
Senior Member
On Friday, July 8, 2016 at 10:38:22 PM UTC-6, Edward Hyer wrote:
> On Friday, July 8, 2016 at 10:47:02 AM UTC-7, Markus Schmassmann wrote:
>> for i=1,n-1 do temp=create_struct(temp,tagname[i],tagvalue[i])
>> terribly inefficient, better to create a string and then use
>> execute(string)
>
> Hmmm... Yes! EXECUTE() is a non-starter, this needs to be fully usable in compiled code. But I'm sure there is some clever way to do this with fewer calls to CREATE_STRUCT().
> If I come up with something that actually is faster, I'll post to this thread.

Have you tried returning an ordered hash instead?

Each structure array in IDL represents a chunk of contiguous memory. That is, each of the consecutive tags in the structure is consecutive in memory, with some redirection for items such as strings. The nested calls to CREATE_STRUCT will be much like an array append operation, a = [a, newstuff], which can become quite inefficient for large arrays due to the need to make a new copy of the data at each iteration.

By using the /ORDEREDHASH keyword to H5_READ (added in 2014), the storage of the individual values is not restricted to contiguous memory and the overhead of recursive copying is no longer present.

Jim P.
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: keep track of all libraries used
Next Topic: IDL asynchronous execution fails at >50 child processes

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Wed Oct 08 13:27:41 PDT 2025

Total time taken to generate the page: 0.00475 seconds