comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » Check for duplicate locations
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Switch to threaded view of this topic Create a new topic Submit Reply
Check for duplicate locations [message #92730] Thu, 18 February 2016 08:48 Go to next message
Med Bennett is currently offline  Med Bennett
Messages: 109
Registered: April 1997
Senior Member
I have X,Y,Z data for several thousand points that I need to check for duplicate locations. I cannot have duplicated locations in the sample data, as it breaks the kriging algorithm I am using. I've always used a brute force method of computing a distance function between each point and all subsequent points, and flagging any points for which the distance is zero, or some small threshold. This method is very slow for larger numbers of points, however. Does anyone have a method for doing this more efficiently? I've found simple methods for one-dimensional data, but not for points in 3-space.

Thanks!
Re: Check for duplicate locations [message #92745 is a reply to message #92730] Mon, 22 February 2016 11:36 Go to previous messageGo to next message
Russell[1] is currently offline  Russell[1]
Messages: 101
Registered: August 2011
Senior Member
On Thursday, February 18, 2016 at 11:48:38 AM UTC-5, Med Bennett wrote:
> I have X,Y,Z data for several thousand points that I need to check for duplicate locations. I cannot have duplicated locations in the sample data, as it breaks the kriging algorithm I am using. I've always used a brute force method of computing a distance function between each point and all subsequent points, and flagging any points for which the distance is zero, or some small threshold. This method is very slow for larger numbers of points, however. Does anyone have a method for doing this more efficiently? I've found simple methods for one-dimensional data, but not for points in 3-space.
>
> Thanks!

This is a tough answer to explain in just a few words....

Are the (x,y,z) values exactly the same? Either way, I would map them into a single coordinate (basically the inverse operation of array_indices) and compute the histogram. Consider a 2-d example...

(x,y)=(1,2)

and the maximum value of x,y could be (Nx,Ny)=(100,100). Then you can combine (x,y) into single value: xy = x+nx*y

Now use the histogram function on that new variable, and any bin in the histogram with more than a count of 2 has multiple entries. At that point you can do just about whatever to them.
Re: Check for duplicate locations [message #92752 is a reply to message #92730] Wed, 24 February 2016 06:14 Go to previous messageGo to next message
Craig Markwardt is currently offline  Craig Markwardt
Messages: 1869
Registered: November 1996
Senior Member
On Thursday, February 18, 2016 at 11:48:38 AM UTC-5, Med Bennett wrote:
> I have X,Y,Z data for several thousand points that I need to check for duplicate locations. I cannot have duplicated locations in the sample data, as it breaks the kriging algorithm I am using. I've always used a brute force method of computing a distance function between each point and all subsequent points, and flagging any points for which the distance is zero, or some small threshold. This method is very slow for larger numbers of points, however. Does anyone have a method for doing this more efficiently? I've found simple methods for one-dimensional data, but not for points in 3-space.
>
> Thanks!

There's no simple answer to this. Usually you need to do some kind of filtering to make it a faster process. For example, if the spread in "Z" values in your 3-space is the greatest, sort by Z first, then you can limit the range over which you perform the 3D distance computation. But there are lots of potential gotchas when you do this! Another solution is to get a faster CPU!

Craig
Re: Check for duplicate locations [message #92762 is a reply to message #92730] Thu, 25 February 2016 12:35 Go to previous message
Med Bennett is currently offline  Med Bennett
Messages: 109
Registered: April 1997
Senior Member
Thanks for your replies - computationally, it's not really a problem, but I always like to find the most 'elegant' solution to a problem, and thought there might be a better way than brute force distance calculations. One issue is that sometimes the duplicate points differ by a small amount due to different rounding or storage, requiring use of a threshold. Thanks again!
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: Writting intoFile
Next Topic: anyone have a reader for .spc files?

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Wed Oct 08 09:14:49 PDT 2025

Total time taken to generate the page: 0.00484 seconds