Calculating the Center of Mass of a 2D Array

QUESTION: Can you show me how to calculate the center of mass or centroid of a 2D array?

ANSWER: Thanks to David Foster at UCSD for this algorithm. You can use this Centroid program to return a two-element array containing the center of mass of a 2D array. The program code is extremely simple.

   FUNCTION Centroid, array
   s = Size(array, /Dimensions)
   totalMass = Total(array)
   xcm = Total( Total(array, 2) * Indgen(s[0]) ) / totalMass
   ycm = Total( Total(array, 1) * Indgen(s[1]) ) / totalMass                                                                        
   RETURN, [xcm, ycm]
   END

ADDENDUM: This question recently came up on the IDL newsgroup, where the questioner wanted to eventually extend this idea to a 3D array. As often happens, that led some readers to the N-dimesional case, at which time I became completely lost. :-)

So here is the discussion verbatium for those of you who are interested.

   Subject: Re: Center of mass???
   From: Paul Hick 
   Date: Mon, 08 Nov 1999 19:21:26 -0800

   David Fanning wrote:
   > 
   > Anders Wennerberg (anders@mrc.ks.se) writes:
   > 
   > >    I have tried to find a routine that could calculate the center of
   > > mass, in preliminary 2D but in near future 3D.   I would be happy if
   > > anyone could give direction where to find that kind of routine.
   > 
   > Here is how you do it in 2D. I've never had to extend
   > this to 3D, but when you figure it out, please let us
   > know. I'll write up an article on it. :-)
   > 
   >   s = Size(array, /Dimensions)
   >   totalMass = Total(array)
   >   xcm = Total(Total(array,2) * Indgen(s[0])) / totalMass
   >   ycm = Total(Total(array,1) * Indgen(s[1])) / totalMass
   > 
   > Cheers,
   > 
   > David

   Reasoning by analogy to the 2D case, this should work, I think:

   xcm = Total( Total(Total(array,3),2) * Indgen(s[0])) / totalMass
   ycm = Total( Total(Total(array,3),1) * Indgen(s[1])) / totalMass
   zcm = Total( Total(Total(array,2),1) * Indgen(s[2])) / totalMass

   -- 
    _________________________________________________________________
   |                 Paul Hick (pphick@ucsd.edu)                     |
   | Office : SERF Rm. 302          Smail  : UCSD/CASS/0424          |
   | Phone  : (858) 534-8965                 9500 Gilman Drive       |
   | Fax    : (858) 534-0177                 La Jolla  CA 92093-0424 |
   | WWW    : http://casswww.ucsd.edu/personal/Plh.html              |
   |_________________________________________________________________|

   Subject: Re: Center of mass???
   From: "J.D. Smith" 
   Date: Tue, 09 Nov 1999 15:51:06 -0500

   David Fanning wrote:
   > 
   > Paul Hick (pphick@ucsd.edu) writes:
   > 
   > > Reasoning by analogy to the 2D case, this should work, I think:
   > >
   > > xcm = Total( Total(Total(array,3),2) * Indgen(s[0])) / totalMass
   > > ycm = Total( Total(Total(array,3),1) * Indgen(s[1])) / totalMass
   > > zcm = Total( Total(Total(array,2),1) * Indgen(s[2])) / totalMass
   > 
   > Well, it seems to work in the simple-minded cases I
   > tried it on. :-)
   > 
   > I'll write it up in an article if no one else has
   > serious objections.
   > 

   Why stop at 3 dimensions?  How about any dimension:

   function com, arr,DOUBLE=dbl
      s=size(arr,/DIMENSIONS)
      n=n_elements(s) 
      tot=total(arr,DOUBLE=dbl)
      if keyword_set(dbl) then ret=dblarr(n,/NOZERO) else
   ret=fltarr(n,/NOZERO)
      for i=0,n-1 do begin 
         tmp=arr
         for j=0,i-1   do tmp=total(tmp,1,DOUBLE=dbl)
         for j=i+1,n-1 do tmp=total(tmp,2,DOUBLE=dbl)
         ret[i]=total(findgen(s[i])*tmp,DOUBLE=dbl)/tot
      endfor 
      return,ret
   end

   I would have posted this yesterday, but I couldn't help but feel there
   must be a better way to do it.  

   Here's why:

   In the 3D example above, Paul calculates total(array,3) twice, which
   means once too many.  It could have been saved between steps.  

   The number of times total() is called on the array in this
   straightforward, brute-force method is n*(n-1) (for each of n dimensions
   total the array over the other n-1), not counting the final index
   total.  And many of those are repeats to the exact same total() call
   with the exact same array!  Since we needed to do all directional
   sub-totals, there must be a more efficient way, analogous to the 3D case
   of saving total(array,3).

   After a bit of scratching on paper, I found that I could do it with only
   (n-1)(n+2)/2 calls to total(), by working simultaneously from the last
   dimension backward and the first dimension forward, saving sub-totals
   which can be shared by subsequent calls in both cases.   For 10
   dimensions that becomes 54 vs. 90 calls to total() (though only 5 vs. 6
   for 3 dimens -- 1 saved call as described above).  The problem is how to
   code this model.  Two reciprocating mutually recursive functions should
   do the trick, but I haven't had time to explore.  Any takers?  Anybody
   think they can save more calls to total()?

   This may be folly, since few will use it for more than 3D, but it's an
   interesting problem.

   JD

   P.S. Another wrinkle: The total()'s you'd prefer to save are the ones
   that do the most work... those which total the "largest" dimensions.  So
   sorting by dimension size first of all might speed things up even more.

   -- 
    J.D. Smith                             |*|      WORK: (607) 255-5842    
    Cornell University Dept. of Astronomy  |*|            (607) 255-6263
    304 Space Sciences Bldg.               |*|       FAX: (607) 255-5875 
    Ithaca, NY 14853                       |*|

   Subject: Re: Center of mass???
   From: Jonathan Joseph 
   Date: Tue, 09 Nov 1999 19:05:07 -0500

   Well, I'm not going to take up JD's challenge,
   but are you all sure you are answering the right
   question?

   I mean sure, great, if you happen to have
   an MxNx.... array of masses then you've got
   everything you need.  But when I first read
   Anders' post, I thought, "gee that sounds simple."

   I thought of N masses at N locations,

   m = 1D array of N masses
   pos = D x N array of locations of the masses in D dimensions

   then:

   s=size(pos, /dimensions)
   mm = m ## replicate(1,s(0))
   cm = total(pos * mm, 2) / total(m)

   Please someone correct me if I'm wrong.
   Also, Is there a better way of multiplying
   an MxN array by a one dimensional array of
   length N such that each row of the MxN array
   is multiplied by the corresponding element
   of the one dimensional array?

   -Jonathan

   Subject: Re: Center of mass???
   From: "J.D. Smith" 
   Date: Wed, 10 Nov 1999 13:16:25 -0500

   Jonathan Joseph wrote:
   > 
   > Well, I'm not going to take up JD's challenge,
   > but are you all sure you are answering the right
   > question?
   > 
   > I mean sure, great, if you happen to have
   > an MxNx.... array of masses then you've got
   > everything you need.  But when I first read
   > Anders' post, I thought, "gee that sounds simple."
   > 
   > I thought of N masses at N locations,
   > 
   > m = 1D array of N masses
   > pos = D x N array of locations of the masses in D dimensions
   > 
   > then:
   > 
   > s=size(pos, /dimensions)
   > mm = m ## replicate(1,s(0))
   > cm = total(pos * mm, 2) / total(m)
   > 
   > Please someone correct me if I'm wrong.
   > Also, Is there a better way of multiplying
   > an MxN array by a one dimensional array of
   > length N such that each row of the MxN array
   > is multiplied by the corresponding element
   > of the one dimensional array?

   You can use :

   rebin(reform(m,1,N),D,N,/SAMP)*pos

   but the array multiplication method also works.  The relative speeds are system
   dependent.

   As far as your method of C.O.M. calculation, it's clearly good when you have
   such a DxN array, or even a sparse array of almost all zeroes (which you can
   safely ignore in the calculation).  However, the application I imagine is some
   plane or cube or hypercube of data for which the C.O.M. is required.  In that
   case, to use your method, you'd have to inflate your data by a factor of D. 
   That is, you're paying for all those repeated indices being multiplied *before*
   totalling the data.  This is an Index first rather than total first method.

   Here is a replacement routine using your idea which takes regular MxNx... arrays
   of data and makes the DxN index array.  

   function com2, arr,DOUBLE=dbl
      s=size(arr,/DIMENSIONS)
      d=n_elements(s) 
      n=n_elements(arr) 
      inds=lonarr(d,n,/NOZERO)

      fac=1
      for i=0,d-1 do begin 
         inds[i,*]=lindgen(n)/fac mod s[i]
         fac=fac*s[i]
      endfor 

      return, total(inds*rebin(reform(arr,1,n),d,n,/SAMP),2,DOUBLE=dbl)/ $
       total(arr,DOUBLE=dbl)
   end

   You see the work here is in generating the index array and inflating the data
   array.  I compared this routine to my other one for a random array of size
   10x10x10x10x10.  The results were:

   Index First Method Time: 0.45424998
   Total First Method Time: 0.048227489

   How about 1024x1024:

   Index First Method Time: 1.9181580
   Total First Method Time: 0.23625147

   And for something really ludicrous... 5x5x5x5x5x5x5x5

   Index First Method Time: 2.7887635
   Total First Method Time: 0.44504005

   So you see, even for many dimension, for which the Total First routine is
   currently inefficient, it is always faster.  And for big arrays, like
   100x100x100x20, I couldn't even get the Index First method to run (memory
   issues).  Too much copying of data.

   JD

   -- 
    J.D. Smith                             |*|      WORK: (607) 255-5842    
    Cornell University Dept. of Astronomy  |*|            (607) 255-6263
    304 Space Sciences Bldg.               |*|       FAX: (607) 255-5875 
    Ithaca, NY 14853                       |*|