comp.lang.idl-pvwave archive: archive » Re: Some questions of efficiency when matching items in lists

Home » Public Forums » archive » Re: Some questions of efficiency when matching items in lists

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

Re: Some questions of efficiency when matching items in lists [message #83011]

Fri, 01 February 2013 04:27

Mats Löfdahl
Messages: 263
Registered: January 2012

Senior Member

Den fredagen den 1:e februari 2013 kl. 08:55:18 UTC+1 skrev Mats Löfdahl:
> Den fredagen den 1:e februari 2013 kl. 02:15:19 UTC+1 skrev Bogdanovist:
>
>> On Friday, 1 February 2013 11:59:16 UTC+11, Craig Markwardt wrote:
>
>>> On Thursday, January 31, 2013 7:05:40 PM UTC-5, Bogdanovist wrote:
>
>>>> I have a couple of questions about how to efficiently match items in lists. There are two operations that are done many thousands of times in my processing and are causing a bottleneck. Even small improvements would be welcome.
>
>>
>
>>> For your first question, the obvious thing is that you are reading and writing a file for every operation. File I/O is slow. If you can, keep your values in memory and only write them out when necessary. At the very least, only rewrite your file when you have to; otherwise just /APPEND to the end.
>
>>
>
>>> For your second question: if DATA_ADD has a small number of elements then you are probably doing the best that you can do. You might check out MATCH or MATCH2 in the IDL astronomy library, which have been optimized for cross-matching a lot of elements. Another possibility is to create a hash table indexed by time; this has the benefit of rapid access, but you lose the ability to perform vector operations upon the whole table.
>
>>
>
>>> Craig
>
>>
>
>>
>
>>
>
>> Thanks for the info, unfortunately I can't store the values in memory as each write to file occurs during a separate 'run' of the processing software, which is fired off regularly from the cron (linux machine).
>
>
>
> Maybe you could append both new data and corrections and then deal with the corrections properly later, in the collating phase. That should help with the bottle neck in the near real time part.

Or instead of writing a single file, write all the items in files named as their time stamps. Then corrections will happen automatically as older files are overwritten. And then the collating could proceed much like you do it currently, except from these individual files.

Report message to a moderator

Re: Some questions of efficiency when matching items in lists [message #83012 is a reply to message #83011]

Thu, 31 January 2013 23:55

Mats Löfdahl
Messages: 263
Registered: January 2012

Senior Member

Den fredagen den 1:e februari 2013 kl. 02:15:19 UTC+1 skrev Bogdanovist:
> On Friday, 1 February 2013 11:59:16 UTC+11, Craig Markwardt wrote:
>> On Thursday, January 31, 2013 7:05:40 PM UTC-5, Bogdanovist wrote:
>>> I have a couple of questions about how to efficiently match items in lists. There are two operations that are done many thousands of times in my processing and are causing a bottleneck. Even small improvements would be welcome.
>
>> For your first question, the obvious thing is that you are reading and writing a file for every operation. File I/O is slow. If you can, keep your values in memory and only write them out when necessary. At the very least, only rewrite your file when you have to; otherwise just /APPEND to the end.
>
>> For your second question: if DATA_ADD has a small number of elements then you are probably doing the best that you can do. You might check out MATCH or MATCH2 in the IDL astronomy library, which have been optimized for cross-matching a lot of elements. Another possibility is to create a hash table indexed by time; this has the benefit of rapid access, but you lose the ability to perform vector operations upon the whole table.
>
>> Craig
>
>
>
> Thanks for the info, unfortunately I can't store the values in memory as each write to file occurs during a separate 'run' of the processing software, which is fired off regularly from the cron (linux machine).

Maybe you could append both new data and corrections and then deal with the corrections properly later, in the collating phase. That should help with the bottle neck in the near real time part.

Report message to a moderator

Re: Some questions of efficiency when matching items in lists [message #83013 is a reply to message #83012]

Thu, 31 January 2013 17:15

Matt Francis
Messages: 94
Registered: May 2010

Member

On Friday, 1 February 2013 11:59:16 UTC+11, Craig Markwardt wrote:
> On Thursday, January 31, 2013 7:05:40 PM UTC-5, Bogdanovist wrote:
>
>> I have a couple of questions about how to efficiently match items in lists. There are two operations that are done many thousands of times in my processing and are causing a bottleneck. Even small improvements would be welcome.
>
>
>
>
>
> For your first question, the obvious thing is that you are reading and writing a file for every operation. File I/O is slow. If you can, keep your values in memory and only write them out when necessary. At the very least, only rewrite your file when you have to; otherwise just /APPEND to the end.
>
>
>
> For your second question: if DATA_ADD has a small number of elements then you are probably doing the best that you can do. You might check out MATCH or MATCH2 in the IDL astronomy library, which have been optimized for cross-matching a lot of elements. Another possibility is to create a hash table indexed by time; this has the benefit of rapid access, but you lose the ability to perform vector operations upon the whole table.
>
>
>
> Craig

Thanks for the info, unfortunately I can't store the values in memory as each write to file occurs during a separate 'run' of the processing software, which is fired off regularly from the cron (linux machine).

For the second part I'll checkout MATCH and MATCH2, sounds like I'll have to profile carefully to make sure it's quicker. Indeed data_add has typically less than 100 elements, so maybe it will be no quicker. Thanks again.

Report message to a moderator

Re: Some questions of efficiency when matching items in lists [message #83014 is a reply to message #83013]

Thu, 31 January 2013 16:59

Craig Markwardt
Messages: 1869
Registered: November 1996

Senior Member

On Thursday, January 31, 2013 7:05:40 PM UTC-5, Bogdanovist wrote:
> I have a couple of questions about how to efficiently match items in lists. There are two operations that are done many thousands of times in my processing and are causing a bottleneck. Even small improvements would be welcome.

For your first question, the obvious thing is that you are reading and writing a file for every operation. File I/O is slow. If you can, keep your values in memory and only write them out when necessary. At the very least, only rewrite your file when you have to; otherwise just /APPEND to the end.

For your second question: if DATA_ADD has a small number of elements then you are probably doing the best that you can do. You might check out MATCH or MATCH2 in the IDL astronomy library, which have been optimized for cross-matching a lot of elements. Another possibility is to create a hash table indexed by time; this has the benefit of rapid access, but you lose the ability to perform vector operations upon the whole table.

Craig

Report message to a moderator

Previous Topic:	Some questions of efficiency when matching items in lists
Next Topic:	Re: reading/writing large files

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Fri Nov 28 08:21:24 PST 2025

Total time taken to generate the page: 0.03336 seconds