Working with Unsigned Integers

QUESTION: I have unsigned integers in an unformatted data file. I know IDL doesn't have an unsigned integer data type. How can I work with this data in IDL?

ANSWER: Since there is no unsigned data type in IDL, you must first read your unsigned data into signed integers. What happens next depends on whether you have 16-bit integers or 32-bit integers. Take the 16-bit case first.

[Note: Research Systems introduced 16-bit, 32-bit, and 64-bit unsigned integers as new data types in IDL 5.2.]

16-Bit Unsigned Integers

Suppose you have 100 unsigned 16-bit integers. Your code to read the data from a file will look something like this:

   OpenR, lun, datafile, /Get_Lun
   data = INTARR(100)
   READU, lun, data

If the unsigned value is important to you, you will have to convert this data to LONG integers with a command like this:

   data = LONG(data) AND 'FFFF'xL

(Yes, those of you uncomfortable with hexadecimal numbers may use 65535L.)

If memory allocation is important, you probably want to include a TEMPORARY function in the command, like this:

   data = LONG(TEMPORARY(data)) AND 'FFFF'xL

If the unsigned value is not terribly important to you, but the relative position of the value in relation to other values in the data is important (e.g., maybe you want to display the data as an image and don't care what the real values are), then you can keep the data as 16-bit integers, but you have to (as they say) "twiddle" or change the top-most bit. This in effect means you subtract an "offset" of -32768 from each member of the data set.

The unsigned value 0 becomes the signed value -32768. The unsigned value 32768 becomes the signed value 0. The unsigned value 65535 becomes the signed value 32768. And so forth.

According to a wonderful IDL newsgroup post by Struan Gray (and explained to me in a private e-mail posting by Mitchell Grunes, which I greatly appreciated), this is most easily done by a command like this:

   data = TEMPORARY(data) XOR (-32768)

If you are going to use this 16-bit data set for some kind of real-world purpose, you will have to remember that the real values are offset by this -32768 amount.

32-Bit Unsigned Integers

Working with 32-bit unsigned integers is a little more problematic. Suppose you have 100 32-bit unsigned integers in a data file. Read them into IDL long signed integers, like this:

   data = LONARR(100)
   READU, lun, datafile, data

First of all, if your data values are all less than 2L^31-1 or 2,147,483,647 you are home free, don't worry about a thing. If your data values are greater than that, things get a little dicey. (Note that MAX(data) won't help much here because values of 2L^31 and higher will show as negative values. You basically will have to know this some other way.)

Now, before I show you how to do this in an array-oriented way, take the case of just a single 32-bit unsigned integer. To turn this unsigned 32-bit integer into its real value, use the BYTE function to individually read the four bytes of information in the 32-bit integer. You then you reconstruct those bytes into a double-precision value. The code looks like this:

   number = data(0)
   factor = 256.0D
   realNumber = BYTE(number, 0)*factor^3 + BYTE(number,1)*factor^2 +$
      BYTE(number,2)*factor^1 + BYTE(number,3)*factor^0

This is for a big endian machine, like most UNIX machines. If you are on a little endian machine (like a PC), you will have to reverse the order in which the real number is constructed. Your code will look like this:

   number = data(0)
   factor = 256.0D
   realNumber = BYTE(number, 0)*factor^0 + BYTE(number,1)*factor^1 +$
      BYTE(number,2)*factor^2 + BYTE(number,3)*factor^3

This could be done in a loop for the 100 values in our 32-bit unsigned integer array, but it would be better to do it in an array-oriented fashion. Peter Berdeklis and James Tappin each supplied a similar algorithm to do this. For big endian machines, it looks like this:

   factor = 256.0D ^ (3 - INDGEN(4))
   byteArray = BYTE(data, 0, 4, N_ELEMENTS(data))
   realNumbers = TOTAL(byteArray * factor(*, INTARR(N_ELEMENTS(data))), 1)

For little endian machines, the factor will have to be written like this:

   factor = 256.0D ^ (INDGEN(4) - 3)

Kevin Ivory from the Max-Planck-Institut fuer Aeronomie has supplied me with an algorithm that doesn't worry about the endian nature of your machine. It looks like this:

   realNumbers = Double(data)
   index = Where(data LT 0L, count)
   IF count GT 0 THEN realNumbers(index) = realNumbers(index) + 2D0^32