comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » Re: Locating sequence of bytes within binary file
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Return to the default flat view Create a new topic Submit Reply
Re: Locating sequence of bytes within binary file [message #71319] Wed, 16 June 2010 06:46 Go to previous message
Craig Markwardt is currently offline  Craig Markwardt
Messages: 1869
Registered: November 1996
Senior Member
On Jun 15, 7:30 am, medd <med...@googlemail.com> wrote:
> Hi,
>
> I need to locate a given sequence of bytes within a binary file. I do
> not manage to do it efficiently, and I wanted to ask if somebody here
> has a clue.
>
> I saw that there are no functions in IDL to look for a given sequence
> within a byte array, but there are very powerful functions to look for
> a sequence within a string using regular expressions. This is what I
> tried:
>
> fcontent = BYTARR((FILE_INFO(fn)).size, /NOZERO) ;Variable where to
> read in the file
> OPENU, unit, fn, /GET_LUN;, /SWAP_ENDIAN
> READU, unit, fcontent
> IF(STREGEX(STRING(fcontent), STRING(sequence_searched)) LT 0) THEN
> print, 'sequence not found'
>
> This works!! ... But only as long as the file does not contain a byte
> with the value 0 (which, too bad!, it does...)
>
> After looking a while, I found in this forum (message "Null terminated
> strings") and in the IDL help that a string is truncated as soon as
> this value is found. This explains why this method fails. But it does
> not propose solutions... :(
>
> Do you know some smart workaround? Or do you know other efficient ways
> in IDL to locate a sequence of bytes within a binary file?

You can use FFT cross-correlation to search for matching segments.

;; Sample byte data
haystack = byte(randomu(seed,1000000)*255)

;; This is the search string to be found
needle = haystack(12345:12444)

;; Cross-correlation from the IDL astronomy library
cc = convolve(haystack+0.,needle+0., /correl)

Then look for correlation peaks. At that stage, once you have
identified candidate peaks, you can do a refined search to make sure
you have an exact match. The peak will be located at the center of
the string, not the beginning.

I hadn't thought of this before, but this gives a way to do fuzzy
matching because the correlation technique does not require exact
numerical match at every point. However, this mostly works for longer
search strings.


Good luck,
Craig
[Message index]
 
Read Message
Read Message
Read Message
Previous Topic: Re: HISTOGRAM data type bug?
Next Topic: Re: Gauss Hypergeometric function

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Fri Nov 28 08:18:36 PST 2025

Total time taken to generate the page: 0.00787 seconds