comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » Re: Sorting and comparing 2 files
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Return to the default flat view Create a new topic Submit Reply
Re: Sorting and comparing 2 files [message #47762] Wed, 01 March 2006 07:15 Go to previous message
enod is currently offline  enod
Messages: 41
Registered: November 2004
Member
The files may be too large and cannot be held in memory by IDL,
especially under Windows. It's a good idea to read tens of lines one
time for comparison. Using IDL to do this is simple but somewhat
time-consuming.

For each line in FileA, search the matched one in FileB by comparing
the first two string columns. You need to decompose each line into a
string array first.

However, because the files have been sorted, the searching for target
of next line in FileA can be just started right after the position of
previous matched record in FileB.

Regards,
Tian

Craig Markwardt wrote:

> "sanam" <ajay.pillai@wachovia.com> writes:
>
>> I have a scripting problem and hope you could help me.
>>
>> Perl, Python or shell scripting is fine.
>>
>> I have 2 files fileA is a list of assets owned by us and fileB is a
>> list of assets held by the whole industry which includes our assets
>> too.
>>
>> I tried fgrep but it is painfully slow
>>
>> /usr/xpg4/bin/fgrep -f sorted_car_mark_history.csv sorted_duedates.csv
>>> /tmp/output.csv
>>
>> FileA is 2456320 bytes (153520 lines)
>> FileB is 100028430 bytes (3334281 lines)
>>
>> FileA has 2 columns separated by comma where as fileB has 4 columns
>> separated by commas.
>>
>> I need to match the 2 columns in fileA with the 2 columns in fileB (BTW
>> the column position is 1 and 2) and for every matched record in FileB
>> spool it out to an output file.
>
> This really sounds like a database problem, not an IDL problem. It's
> possible to download a very nifty database program like sqlite3 (from
> sqlite.org), ingest these two databases, and then perform a JOIN
> operation. sqlite3 is completely public domain.
>
> Since you are doing only a very simple match between the first 13
> character columns of the two tables, it is possible to make the match
> in IDL, with a function like CMSET_OP() and set-intersection.
> Example,
>
> ;; Read data into IDL
> spawn, 'cat a.txt', A
> spawn, 'cat b.txt', B
> ;; Extract matching columns (characters 0-12)
> B2 = strmid(B, 0, 13)
> ;; Locate the matches
> II = cmset_op(B2, 'AND', A, /indices)
>
> ;; Make the output
> openw, 50, 'output.txt'
> printf, 50, B(II), format='(A)'
> close, 50
>
> Of course this needs error checking, etc. And it *assumes* that the
> two sets can be matched by their first 13 columns.
>
> CMSET_OP() can be found on my web page.
> http://cow.physics.wisc.edu/~craigm/idl/idl.html
>
> Good luck!
> Craig
>
> --
> ------------------------------------------------------------ --------------
> Craig B. Markwardt, Ph.D. EMAIL: craigmnet@REMOVEcow.physics.wisc.edu
> Astrophysics, IDL, Finance, Derivatives | Remove "net" for better response
> ------------------------------------------------------------ --------------
[Message index]
 
Read Message
Read Message
Read Message
Previous Topic: Re: Displaying equations in ps output
Next Topic: Problems compiling shared libraries

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Fri Oct 10 09:31:30 PDT 2025

Total time taken to generate the page: 1.27912 seconds