comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » Re: Some questions of efficiency when matching items in lists
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Switch to threaded view of this topic Create a new topic Submit Reply
Re: Some questions of efficiency when matching items in lists [message #83011] Fri, 01 February 2013 04:27
Mats Löfdahl is currently offline  Mats Löfdahl
Messages: 263
Registered: January 2012
Senior Member
Den fredagen den 1:e februari 2013 kl. 08:55:18 UTC+1 skrev Mats Löfdahl:
> Den fredagen den 1:e februari 2013 kl. 02:15:19 UTC+1 skrev Bogdanovist:
>
>> On Friday, 1 February 2013 11:59:16 UTC+11, Craig Markwardt wrote:
>
>>> On Thursday, January 31, 2013 7:05:40 PM UTC-5, Bogdanovist wrote:
>
>>>> I have a couple of questions about how to efficiently match items in lists. There are two operations that are done many thousands of times in my processing and are causing a bottleneck. Even small improvements would be welcome.
>
>>
>
>>> For your first question, the obvious thing is that you are reading and writing a file for every operation. File I/O is slow. If you can, keep your values in memory and only write them out when necessary. At the very least, only rewrite your file when you have to; otherwise just /APPEND to the end.
>
>>
>
>>> For your second question: if DATA_ADD has a small number of elements then you are probably doing the best that you can do. You might check out MATCH or MATCH2 in the IDL astronomy library, which have been optimized for cross-matching a lot of elements. Another possibility is to create a hash table indexed by time; this has the benefit of rapid access, but you lose the ability to perform vector operations upon the whole table.
>
>>
>
>>> Craig
>
>>
>
>>
>
>>
>
>> Thanks for the info, unfortunately I can't store the values in memory as each write to file occurs during a separate 'run' of the processing software, which is fired off regularly from the cron (linux machine).
>
>
>
> Maybe you could append both new data and corrections and then deal with the corrections properly later, in the collating phase. That should help with the bottle neck in the near real time part.

Or instead of writing a single file, write all the items in files named as their time stamps. Then corrections will happen automatically as older files are overwritten. And then the collating could proceed much like you do it currently, except from these individual files.
Re: Some questions of efficiency when matching items in lists [message #83012 is a reply to message #83011] Thu, 31 January 2013 23:55 Go to previous message
Mats Löfdahl is currently offline  Mats Löfdahl
Messages: 263
Registered: January 2012
Senior Member
Den fredagen den 1:e februari 2013 kl. 02:15:19 UTC+1 skrev Bogdanovist:
> On Friday, 1 February 2013 11:59:16 UTC+11, Craig Markwardt wrote:
>> On Thursday, January 31, 2013 7:05:40 PM UTC-5, Bogdanovist wrote:
>>> I have a couple of questions about how to efficiently match items in lists. There are two operations that are done many thousands of times in my processing and are causing a bottleneck. Even small improvements would be welcome.
>
>> For your first question, the obvious thing is that you are reading and writing a file for every operation. File I/O is slow. If you can, keep your values in memory and only write them out when necessary. At the very least, only rewrite your file when you have to; otherwise just /APPEND to the end.
>
>> For your second question: if DATA_ADD has a small number of elements then you are probably doing the best that you can do. You might check out MATCH or MATCH2 in the IDL astronomy library, which have been optimized for cross-matching a lot of elements. Another possibility is to create a hash table indexed by time; this has the benefit of rapid access, but you lose the ability to perform vector operations upon the whole table.
>
>> Craig
>
>
>
> Thanks for the info, unfortunately I can't store the values in memory as each write to file occurs during a separate 'run' of the processing software, which is fired off regularly from the cron (linux machine).

Maybe you could append both new data and corrections and then deal with the corrections properly later, in the collating phase. That should help with the bottle neck in the near real time part.
Re: Some questions of efficiency when matching items in lists [message #83013 is a reply to message #83012] Thu, 31 January 2013 17:15 Go to previous message
Matt Francis is currently offline  Matt Francis
Messages: 94
Registered: May 2010
Member
On Friday, 1 February 2013 11:59:16 UTC+11, Craig Markwardt wrote:
> On Thursday, January 31, 2013 7:05:40 PM UTC-5, Bogdanovist wrote:
>
>> I have a couple of questions about how to efficiently match items in lists. There are two operations that are done many thousands of times in my processing and are causing a bottleneck. Even small improvements would be welcome.
>
>
>
>
>
> For your first question, the obvious thing is that you are reading and writing a file for every operation. File I/O is slow. If you can, keep your values in memory and only write them out when necessary. At the very least, only rewrite your file when you have to; otherwise just /APPEND to the end.
>
>
>
> For your second question: if DATA_ADD has a small number of elements then you are probably doing the best that you can do. You might check out MATCH or MATCH2 in the IDL astronomy library, which have been optimized for cross-matching a lot of elements. Another possibility is to create a hash table indexed by time; this has the benefit of rapid access, but you lose the ability to perform vector operations upon the whole table.
>
>
>
> Craig

Thanks for the info, unfortunately I can't store the values in memory as each write to file occurs during a separate 'run' of the processing software, which is fired off regularly from the cron (linux machine).

For the second part I'll checkout MATCH and MATCH2, sounds like I'll have to profile carefully to make sure it's quicker. Indeed data_add has typically less than 100 elements, so maybe it will be no quicker. Thanks again.
Re: Some questions of efficiency when matching items in lists [message #83014 is a reply to message #83013] Thu, 31 January 2013 16:59 Go to previous message
Craig Markwardt is currently offline  Craig Markwardt
Messages: 1869
Registered: November 1996
Senior Member
On Thursday, January 31, 2013 7:05:40 PM UTC-5, Bogdanovist wrote:
> I have a couple of questions about how to efficiently match items in lists. There are two operations that are done many thousands of times in my processing and are causing a bottleneck. Even small improvements would be welcome.


For your first question, the obvious thing is that you are reading and writing a file for every operation. File I/O is slow. If you can, keep your values in memory and only write them out when necessary. At the very least, only rewrite your file when you have to; otherwise just /APPEND to the end.

For your second question: if DATA_ADD has a small number of elements then you are probably doing the best that you can do. You might check out MATCH or MATCH2 in the IDL astronomy library, which have been optimized for cross-matching a lot of elements. Another possibility is to create a hash table indexed by time; this has the benefit of rapid access, but you lose the ability to perform vector operations upon the whole table.

Craig
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: Some questions of efficiency when matching items in lists
Next Topic: Re: reading/writing large files

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Wed Oct 08 15:06:46 PDT 2025

Total time taken to generate the page: 0.00879 seconds