comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » Re: Sorting and comparing 2 files
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Return to the default flat view Create a new topic Submit Reply
Re: Sorting and comparing 2 files [message #47765 is a reply to message #47762] Tue, 28 February 2006 22:47 Go to previous messageGo to previous message
Craig Markwardt is currently offline  Craig Markwardt
Messages: 1869
Registered: November 1996
Senior Member
"sanam" <ajay.pillai@wachovia.com> writes:

> I have a scripting problem and hope you could help me.
>
> Perl, Python or shell scripting is fine.
>
> I have 2 files fileA is a list of assets owned by us and fileB is a
> list of assets held by the whole industry which includes our assets
> too.
>
> I tried fgrep but it is painfully slow
>
> /usr/xpg4/bin/fgrep -f sorted_car_mark_history.csv sorted_duedates.csv
>> /tmp/output.csv
>
> FileA is 2456320 bytes (153520 lines)
> FileB is 100028430 bytes (3334281 lines)
>
> FileA has 2 columns separated by comma where as fileB has 4 columns
> separated by commas.
>
> I need to match the 2 columns in fileA with the 2 columns in fileB (BTW
> the column position is 1 and 2) and for every matched record in FileB
> spool it out to an output file.

This really sounds like a database problem, not an IDL problem. It's
possible to download a very nifty database program like sqlite3 (from
sqlite.org), ingest these two databases, and then perform a JOIN
operation. sqlite3 is completely public domain.

Since you are doing only a very simple match between the first 13
character columns of the two tables, it is possible to make the match
in IDL, with a function like CMSET_OP() and set-intersection.
Example,

;; Read data into IDL
spawn, 'cat a.txt', A
spawn, 'cat b.txt', B
;; Extract matching columns (characters 0-12)
B2 = strmid(B, 0, 13)
;; Locate the matches
II = cmset_op(B2, 'AND', A, /indices)

;; Make the output
openw, 50, 'output.txt'
printf, 50, B(II), format='(A)'
close, 50

Of course this needs error checking, etc. And it *assumes* that the
two sets can be matched by their first 13 columns.

CMSET_OP() can be found on my web page.
http://cow.physics.wisc.edu/~craigm/idl/idl.html

Good luck!
Craig

--
------------------------------------------------------------ --------------
Craig B. Markwardt, Ph.D. EMAIL: craigmnet@REMOVEcow.physics.wisc.edu
Astrophysics, IDL, Finance, Derivatives | Remove "net" for better response
------------------------------------------------------------ --------------
[Message index]
 
Read Message
Read Message
Read Message
Previous Topic: Re: Displaying equations in ps output
Next Topic: Problems compiling shared libraries

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Sun Oct 12 12:09:08 PDT 2025

Total time taken to generate the page: 0.15705 seconds