comp.lang.idl-pvwave archive: archive » Re: Duplicates

Home » Public Forums » archive » Re: Duplicates - a new twist

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

Re: Duplicates - a new twist [message #39439]

Tue, 18 May 2004 12:35

R.G. Stockwell
Messages: 363
Registered: July 1999

Senior Member

"Martin Doyle" <m.doyle@uea.ac.uk> wrote in message news:d33d6a4b.0405171324.1272c4e0@posting.google.com...
> Hello all,
...
> which are within their countries. However, some of the latitude,
> longitude coordinates lie on the borders of countries and therefore an
> emission is sometimes reported by 2 or more countries for the same
> coordinate (i,e. There are multiple instances of the same coordinate
> within the dataset).

> What I need to do is to look through the dataset and sum the emissions
> when the coordinate is the same, resulting in a dataset with unique
> coordinates and a total emission for each grid point.

You could quickly make a one dimensional "index" array from the coordinates,
like coord = 1000*lat+lon , and use your one column uniq() and where()s.
Of course, handle the decimal points appropriately.
(or make it a string array of coordinates perhaps)

Offhand, it looks like you will need to loop through the uniq(coords) and
take the mean of the sum of where()d points.

Cheers,
bob

Report message to a moderator

Re: Duplicates - a new twist [message #39440 is a reply to message #39439]

Tue, 18 May 2004 12:34

btt
Messages: 345
Registered: December 2000

Senior Member

Bruce Bowler wrote:

> On Tue, 18 May 2004 08:32:45 -0400, Ben Tupper put fingers to keyboard and
> said:
>
>
>> Martin Doyle wrote:
>>
>>
>>> I have a dataset which consists of 3 columns: longitude, latitude and
>>> a value for an emission of an air pollutant. European countries report
>>> the emission of this pollutant for the latitude longitude coordinates
>>> which are within their countries. However, some of the latitude,
>>> longitude coordinates lie on the borders of countries and therefore an
>>> emission is sometimes reported by 2 or more countries for the same
>>> coordinate (i,e. There are multiple instances of the same coordinate
>>> within the dataset).
>>>
>>> What I need to do is to look through the dataset and sum the emissions
>>> when the coordinate is the same, resulting in a dataset with unique
>>> coordinates and a total emission for each grid point.
>>>
>>> Does anyone have any ideas about how to go about this? I've seen posts
>>> on this newsgroup which have had problems with duplicate values in one
>>> column of data, but I'm unsure about how to go about it when there are
>>> 2 columns which need to be examined.
>>>
>>
>> Hello,
>>
>> You should consider using GRID_INPUT. This is from the docs...
>>
>>
>> The GRID_INPUT procedure preprocesses and sorts two-dimensional
>> scattered data points, and removes duplicate values.
>>
>> Ben
>
>
> But Ben, he doesn't want to remove dup's, he wants to sum them...
> (personally, I would have thought that average was better based on the
> description, but what the heck...)
>

Awww! I was duped!

The DUPLICATES keyword for GRID_INPUT does everything BUT 'SUM'. Then
again, setting DUPLICATES = 'all' should sort the data pairs so the
duplicates are adjacent in the list. Then finding the pairwise
difference between consecutive points should reveal where the duplicates
are located. I have a vague memory of making a feature request for an
INDEX output keyword that has the indices of the points retained by
GRID_INPUT (relative to input vectors.) I remember getting a response
at the time, but can't recall what it was... and obviously there is no
such keyword in the current release.

Report message to a moderator

Re: Duplicates - a new twist [message #39441 is a reply to message #39440]

Tue, 18 May 2004 12:10

Bruce Bowler
Messages: 128
Registered: September 1998

Senior Member

On Tue, 18 May 2004 08:32:45 -0400, Ben Tupper put fingers to keyboard and
said:

> Martin Doyle wrote:
>
>> I have a dataset which consists of 3 columns: longitude, latitude and
>> a value for an emission of an air pollutant. European countries report
>> the emission of this pollutant for the latitude longitude coordinates
>> which are within their countries. However, some of the latitude,
>> longitude coordinates lie on the borders of countries and therefore an
>> emission is sometimes reported by 2 or more countries for the same
>> coordinate (i,e. There are multiple instances of the same coordinate
>> within the dataset).
>>
>> What I need to do is to look through the dataset and sum the emissions
>> when the coordinate is the same, resulting in a dataset with unique
>> coordinates and a total emission for each grid point.
>>
>> Does anyone have any ideas about how to go about this? I've seen posts
>> on this newsgroup which have had problems with duplicate values in one
>> column of data, but I'm unsure about how to go about it when there are
>> 2 columns which need to be examined.
>>
>
> Hello,
>
> You should consider using GRID_INPUT. This is from the docs...
>
>
> The GRID_INPUT procedure preprocesses and sorts two-dimensional
> scattered data points, and removes duplicate values.
>
> Ben

But Ben, he doesn't want to remove dup's, he wants to sum them...
(personally, I would have thought that average was better based on the
description, but what the heck...)

Bruce

--
+-------------------+--------------------------------------- ------------+
Bruce Bowler | What garlic is to salad, insanity is to art. -
1.207.633.9600 | Augustus Saint-Gaudens
bbowler@bigelow.org |
+-------------------+--------------------------------------- ------------+

Report message to a moderator

Re: Duplicates - a new twist [message #39447 is a reply to message #39441]

Tue, 18 May 2004 05:32

btt
Messages: 345
Registered: December 2000

Senior Member

Martin Doyle wrote:

> I have a dataset which consists of 3 columns: longitude, latitude and
> a value for an emission of an air pollutant. European countries report
> the emission of this pollutant for the latitude longitude coordinates
> which are within their countries. However, some of the latitude,
> longitude coordinates lie on the borders of countries and therefore an
> emission is sometimes reported by 2 or more countries for the same
> coordinate (i,e. There are multiple instances of the same coordinate
> within the dataset).
>
> What I need to do is to look through the dataset and sum the emissions
> when the coordinate is the same, resulting in a dataset with unique
> coordinates and a total emission for each grid point.
>
> Does anyone have any ideas about how to go about this? I've seen posts
> on this newsgroup which have had problems with duplicate values in one
> column of data, but I'm unsure about how to go about it when there are
> 2 columns which need to be examined.
>

Hello,

You should consider using GRID_INPUT. This is from the docs...

The GRID_INPUT procedure preprocesses and sorts two-dimensional
scattered data points, and removes duplicate values.

Ben

Report message to a moderator

Re: Duplicates - a new twist [message #39450 is a reply to message #39447]

Tue, 18 May 2004 04:42

Chris[1]
Messages: 23
Registered: January 2003

Junior Member

The easiest way is to take copies of the lats & lons, reduce them to the
resolution you think is sufficient that the same station has the same
coordinate (independent of which country reports it); sort on one (say
latitude), and then look for unique values, using uniq() on the sorted
values. Then use the output from uniq() to look at whether points with the
same latitude also have the same longitude. It's an exercise in indexing :)

If you want a more robust technique - one that doesn't fall apart near the
poles or the dateline, for example - use a spherical to Cartesian
coordinate conversion, and do similarly, except now with the three
coordinates.

Cheers;
Chris

"Martin Doyle" <m.doyle@uea.ac.uk> wrote in message
news:d33d6a4b.0405171324.1272c4e0@posting.google.com...
> Hello all,
>
> I have a problem which I've searched everywhere to try and
> solve...many posters on this newsgroup have had _similar_ problems but
> the resolutions didn't help me...anyway, here goes;
>
> I have a dataset which consists of 3 columns: longitude, latitude and
> a value for an emission of an air pollutant. European countries report
> the emission of this pollutant for the latitude longitude coordinates
> which are within their countries. However, some of the latitude,
> longitude coordinates lie on the borders of countries and therefore an
> emission is sometimes reported by 2 or more countries for the same
> coordinate (i,e. There are multiple instances of the same coordinate
> within the dataset).
>
> What I need to do is to look through the dataset and sum the emissions
> when the coordinate is the same, resulting in a dataset with unique
> coordinates and a total emission for each grid point.
>
> Does anyone have any ideas about how to go about this? I've seen posts
> on this newsgroup which have had problems with duplicate values in one
> column of data, but I'm unsure about how to go about it when there are
> 2 columns which need to be examined.
>
> Thanks guys...
>
> All the best,
>
> Martin

Report message to a moderator

Re: Duplicates - a new twist [message #39452 is a reply to message #39450]

Tue, 18 May 2004 00:59

Chris Lee
Messages: 101
Registered: August 2003

Senior Member

In article <d33d6a4b.0405171324.1272c4e0@posting.google.com>, "Martin
Doyle" <m.doyle@uea.ac.uk> wrote:

> Hello all,
> I have a problem which I've searched everywhere to try and solve...many
> posters on this newsgroup have had _similar_ problems but the
> resolutions didn't help me...anyway, here goes; I have a dataset which
> consists of 3 columns: longitude, latitude and a value for an emission
> ....
> What I need to do is to look through the dataset and sum the emissions
> when the coordinate is the same, resulting in a dataset with unique
> coordinates and a total emission for each grid point. Does anyone have
> ...
> Martin

Hi,

;data is fltarr(3,n)

;find possible collisions
d=sqrt(data[0,*]^2+data[1,*]^2)
threshold=1.0 ; numbers after the decimal point.
d=round(d*10^threshold)/10^threshold

;apply the single vector functions here, the result is a *possible* list
of collisions, since any point on a circle centred on the origin will
match, hopefully since your dealing with European data you won't collide
with anything in the Southern atlantic :)

There are probably better ways.

Chris.

Report message to a moderator

Previous Topic:	Re: IDL benchmarks - weird!!
Next Topic:	ENUMLIST in IDLitComponent::RegisterProperty

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Sun Nov 30 11:40:12 PST 2025

Total time taken to generate the page: 0.01053 seconds