comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » Re: Locating sequence of bytes within binary file
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Switch to threaded view of this topic Create a new topic Submit Reply
Re: Locating sequence of bytes within binary file [message #71319] Wed, 16 June 2010 06:46
Craig Markwardt is currently offline  Craig Markwardt
Messages: 1869
Registered: November 1996
Senior Member
On Jun 15, 7:30 am, medd <med...@googlemail.com> wrote:
> Hi,
>
> I need to locate a given sequence of bytes within a binary file. I do
> not manage to do it efficiently, and I wanted to ask if somebody here
> has a clue.
>
> I saw that there are no functions in IDL to look for a given sequence
> within a byte array, but there are very powerful functions to look for
> a sequence within a string using regular expressions. This is what I
> tried:
>
> fcontent = BYTARR((FILE_INFO(fn)).size, /NOZERO) ;Variable where to
> read in the file
> OPENU, unit, fn, /GET_LUN;, /SWAP_ENDIAN
> READU, unit, fcontent
> IF(STREGEX(STRING(fcontent), STRING(sequence_searched)) LT 0) THEN
> print, 'sequence not found'
>
> This works!! ... But only as long as the file does not contain a byte
> with the value 0 (which, too bad!, it does...)
>
> After looking a while, I found in this forum (message "Null terminated
> strings") and in the IDL help that a string is truncated as soon as
> this value is found. This explains why this method fails. But it does
> not propose solutions... :(
>
> Do you know some smart workaround? Or do you know other efficient ways
> in IDL to locate a sequence of bytes within a binary file?

You can use FFT cross-correlation to search for matching segments.

;; Sample byte data
haystack = byte(randomu(seed,1000000)*255)

;; This is the search string to be found
needle = haystack(12345:12444)

;; Cross-correlation from the IDL astronomy library
cc = convolve(haystack+0.,needle+0., /correl)

Then look for correlation peaks. At that stage, once you have
identified candidate peaks, you can do a refined search to make sure
you have an exact match. The peak will be located at the center of
the string, not the beginning.

I hadn't thought of this before, but this gives a way to do fuzzy
matching because the correlation technique does not require exact
numerical match at every point. However, this mostly works for longer
search strings.


Good luck,
Craig
Re: Locating sequence of bytes within binary file [message #71320 is a reply to message #71319] Wed, 16 June 2010 05:05 Go to previous message
medd is currently offline  medd
Messages: 8
Registered: June 2010
Junior Member
On Jun 16, 11:40 am, Steve <f...@k.e> wrote:
> medd wrote:
>> Hi,
>
>> I need to locate a given sequence of bytes within a binary file. I do
>> not manage to do it efficiently, and I wanted to ask if somebody here
>> has a clue.
>
>> I saw that there are no functions in IDL to look for a given sequence
>> within a byte array, but there are very powerful functions to look for
>> a sequence within a string using regular expressions. This is what I
>> tried:
>
>> fcontent = BYTARR((FILE_INFO(fn)).size, /NOZERO) ;Variable where to
>> read in the file
>> OPENU, unit, fn, /GET_LUN;, /SWAP_ENDIAN
>> READU, unit, fcontent
>> IF(STREGEX(STRING(fcontent), STRING(sequence_searched)) LT 0) THEN
>> print, 'sequence not found'
>
>> This works!! ... But only as long as the file does not contain a byte
>> with the value 0 (which, too bad!, it does...)
>
>> After looking a while, I found in this forum (message "Null terminated
>> strings") and in the IDL help that a string is truncated as soon as
>> this value is found. This explains why this method fails. But it does
>> not propose solutions... :(
>
>> Do you know some smart workaround? Or do you know other efficient ways
>> in IDL to locate a sequence of bytes within a binary file?
>
>> Thanks!
>
>> PS. I thought about replacing all 0's by 1's, but it is a really dirty
>> solution, which might find the sequence at the wrong place in case
>> there is a similar sequence which really contains a 1 instead...
>
> How about doint the "dirty solution" as a first pass, then filtering the
> returned results to check if they are "real" solutions?

Hi Steve,

Thanks, good idea. It is not the most elegant, but should work by now!

Still, this approach is slower and uses more memory to keep the
original data as well... there should somehow be a more elegant
implementation possible in IDL. It corresponds more or less to the
search function in any HEX-editor, I would guess that other people
have faced the same problem before.
Re: Locating sequence of bytes within binary file [message #71322 is a reply to message #71320] Wed, 16 June 2010 02:40 Go to previous message
Steve[5] is currently offline  Steve[5]
Messages: 10
Registered: September 2007
Junior Member
medd wrote:
> Hi,
>
> I need to locate a given sequence of bytes within a binary file. I do
> not manage to do it efficiently, and I wanted to ask if somebody here
> has a clue.
>
> I saw that there are no functions in IDL to look for a given sequence
> within a byte array, but there are very powerful functions to look for
> a sequence within a string using regular expressions. This is what I
> tried:
>
> fcontent = BYTARR((FILE_INFO(fn)).size, /NOZERO) ;Variable where to
> read in the file
> OPENU, unit, fn, /GET_LUN;, /SWAP_ENDIAN
> READU, unit, fcontent
> IF(STREGEX(STRING(fcontent), STRING(sequence_searched)) LT 0) THEN
> print, 'sequence not found'
>
> This works!! ... But only as long as the file does not contain a byte
> with the value 0 (which, too bad!, it does...)
>
> After looking a while, I found in this forum (message "Null terminated
> strings") and in the IDL help that a string is truncated as soon as
> this value is found. This explains why this method fails. But it does
> not propose solutions... :(
>
> Do you know some smart workaround? Or do you know other efficient ways
> in IDL to locate a sequence of bytes within a binary file?
>
> Thanks!
>
> PS. I thought about replacing all 0's by 1's, but it is a really dirty
> solution, which might find the sequence at the wrong place in case
> there is a similar sequence which really contains a 1 instead...

How about doint the "dirty solution" as a first pass, then filtering the
returned results to check if they are "real" solutions?
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: Re: HISTOGRAM data type bug?
Next Topic: Re: Gauss Hypergeometric function

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Sat Oct 11 10:28:51 PDT 2025

Total time taken to generate the page: 1.20091 seconds