# Are These Arrays Equal?

**QUESTION:** I have data in two floating point arrays. I would like to know if the arrays
are “equal ”or not. By equal, I mean they have the same number of elements and that
matching elements in the two arrays are equal. Since these are floating point arrays, I notice
that I do not get good results when I use the **EQ **operator to compare them. I presume this is because
of different round-off errors in the calculations that fill the two arrays. Can you suggest a method
to compare these two arrays and establish equality?

** ANSWER:** In general, the method is to subtract the two arrays from one another, take
the absolute value of the difference, and compare that to a number that is “sufficiently close” to zero.
The only trick, if there is one, is how to choose a number that is “sufficiently close” to zero. Here
is how you might write a function to do the comparison for you.

FUNCTION FLOATS_EQUAL, array_1, array_2 ; Error handling. Return to caller on error. ON_ERROR, 2 IF N_Params() NE 2 THEN Message, 'Must pass two arrays to compare.' ; Arrays not equal if they are not the same length. IF N_Elements(array_1) NE N_Elements(array_2) THEN RETURN, 0 ; Choose a number "sufficiently close" to zero for comparison. NUMBER = 1e-6 ; Compare the arrays. IF Total(Abs(array_1 - array_2) LE NUMBER) EQ N_Elements(array_1) THEN $ RETURN, 1 ELSE RETURN, 0 END

There was an interesting discussion in the IDL newsgroup (see reference below) on how
**NUMBER **should be chosen. Mati Meron also had some relevant comments on this issue in
this
discussion.

Basically, the arguments boil down to two mutually exclusive choices that both appear to be correct. So, you can choose. :-)

One argument is that **NUMBER **should be chosen empirically. You know what your data is about.
You know the magnitude of the expected values in the arrays. You know what it is you are trying to
accomplish with your data analysis. You ought to be able to choose a number that is sufficiently close
to zero for your purposes. That is the kind of number I chose in the example above.

The other argument is that the difference in floating point values is related to how good
your computer is at storing floats internally, and this can be discovered by using the **MACHAR **function.
So **NUMBER **should be calculated something like this.

epsilon = (MACHAR()).eps NUMBER = epsilon

In practice, this generally works, but it depends on the magnitude of your values. For “normal magnitude” values (i.e., in the range of counting numbers) it works great. Unfortunately, not all values are in this range. In fact, science numbers used in research can vary over many, many magnitudes. Thus, it is thought that a better solution is to use the machine precision, but scaled by the size the numbers you are comparing.

Mati Meron explains the situation like this.

The floating number is stored as two parts, mantissa and power (for the garden variety float you have 24 bits for the mantissa and 8 for the power). The mantissa specifies the significant digits, which are then multiplied by the appropriate power. The storage is binary, of course, but for the purpose of this argument we may look at decimal. So, if you store, say, 7 significant digits, your number is of the form 0.abcdefg * 10^p, where a...f are digits between 0 and 9. If you take two numbers such that their true (as opposed to stored) expansion has same first 7 significant digits while differing at the 8th, they'll be stored as same number.[For example, 1234567e28 and 1234567e-28.]So, roughly, one can say that the accuracy of the stored number is 0.00000005 *10^p (note, 7 zeroes for the significant digits, then half the maximum for the next). So, the storage error, for fixed number of decimal places, is relative, not absolute, it is around 0.00000005/0.abcdefg. As the magnitude of the number grows, so does the error.

This implies that **NUMBER **should be chosen like this.

epsilon = (MACHAR()).eps NUMBER = (Abs(array_1) > Abs(array_2)) * epsilon

If you are comparing double precision values, you will want to make sure
you are getting the machine precision in double precision, too. So you will
need to set the **DOUBLE **keyword on **MACHAR** in this case.

epsilon = (MACHAR(/DOUBLE)).eps NUMBER = (Abs(array_1) > Abs(array_2)) * epsilon

Finally, there is often some kind of a “scaling factor” added
to the **NUMBER**. In FORTRAN code this is often called the Unit in the Last Place,
or ULP. It is the gap or difference between two floating point numbers in the last
significant digit. This number should be a positive integer. I'll add this value as
a **ULP **keyword to the program, and give it a default value of 1. If you are
comparing double precision values, make sure the ULP value gets passed in as a double, too.

The final code looks like this. You can find the final program here.

FUNCTION FLOATS_EQUAL, array_1, array_2, ULP=ulp ; Error handling. Return to caller on error. ON_ERROR, 2 IF N_Params() NE 2 THEN Message, 'Must pass two arrays to compare.' ; Are we comparing double precision values? IF Size(array_1, /TNAME) EQ 'DOUBLE' OR Size(array_2, /TNAME) EQ 'DOUBLE' THEN $ double = 1 ELSE double = 0 ; Check keyword. IF N_Elements(ulp) EQ 0 THEN ulp = 1.0D ELSE ulp = ROUND(ABS(ulp)) ; Arrays not equal if they are not the same length. IF N_Elements(array_1) NE N_Elements(array_2) THEN RETURN, 0 ; Choose a number "sufficiently close" to zero for comparison. epsilon = (MACHAR(DOUBLE=double)).eps NUMBER = (Abs(array_1) > Abs(array_2)) * epsilon * ulp ; Compare the arrays. IF Total(Abs(array_1 - array_2) LE NUMBER) EQ N_Elements(array_1) THEN $ RETURN, 1 ELSE RETURN, 0 END

# Source Discussion

If you are interested in following an IDL newsgroup discussion of this topic, you can find it here.

Copyright © 2007 David W. Fanning

Last Updated 29 August 2007