comp.lang.idl-pvwave archive: archive » k-mean clustering idl

Home » Public Forums » archive » k-mean clustering idl

Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend

k-mean clustering idl [message #94170]

Sun, 12 February 2017 15:13

smnadoum
Messages: 24
Registered: June 2016

Junior Member

I was wondering if I could get some help on clustering in IDL. I found a good example on Harris Geospatial that explains the method, however, I am confused on how to run the clustering on my own data (ASCII) to perform the K-mean analysis. How can I use my data instead of the 'random' function that generates random numbers
Below is the code I found on Harris:

n = 50
c1 = RANDOMN(seed, 3, n)
c1[0:1,*] -= 3
c2 = RANDOMN(seed, 3, n)
c2[0,*] += 3
c2[1,*] -= 3
c3 = RANDOMN(seed, 3, n)
c3[1:2,*] += 3
array = [[c1], [c2], [c3]]
; Compute cluster weights, using three clusters:
weights = CLUST_WTS(array, N_CLUSTERS = 3)
; Compute the classification of each sample:
result = CLUSTER(array, weights, N_CLUSTERS = 3)

My data is in ASCII format and I have already wrote a code that opens and read ascii (below) but not sure how to run the k-mean clustering analysis on my data. I can't find good IDL resources that explains the clustering.

pro read_text, file, dir

dir='path'
file= 'path*file'

n = file_lines(file)
gv= fltarr(n)
npv= fltarr(n)
soil= fltarr(n)
gv0= 0.0
npv0= 0.0
soil0=0.0

openr, iunit, file, /get_lun
for i= 0, n-1 do begin

readf, iunit, gv0, npv0, soil0

gv[i]= gv0
npv[i]= npv0
soil[i]=soil0

endfor

free_lun, iunit

for i = 0, n-1 do print, gv[i], npv[i], soil[i]

mwell=gv[0,*,*] ;this doesn't work
mwell=gv[1,*,*]
mwell=gv[2,*,*]

end

Thank you.

Report message to a moderator

Re: k-mean clustering idl [message #94177 is a reply to message #94170]

Wed, 15 February 2017 22:16

Dick Jackson
Messages: 347
Registered: August 1998

Senior Member

Hi Cheryl,

I answered another question by you the other day, and now it seems (on Google Groups anyway) that the question was deleted, leaving only my answer. Curious.

Find it here, and it can help you to get your data into the "mwell" array that you seem to want to create (use your gv, npv and soil variable names you have below, if those are better).
https://groups.google.com/forum/#!topic/comp.lang.idl-pvwave /Gsp1JNUZSxs

Once you have "mwell" as a (3, n) array, you simply run:

nClusters = 3 ; or as many as you wish
centroidsXYZ = CLUST_WTS(mwell, N_CLUSTERS = nClusters)

Now centroidsXYZ is a (3, nClusters) array where each row describes a point in the space of (gv, npv, soil) values, which is the "centroid" of the cluster. To determine which cluster the "n" original points belong to (and you can do this with any points you're interested in):

clusterIDs = CLUSTER(mwell, centroidsXYZ, N_CLUSTERS = nClusters)

"clusterIDs" will be a (1, n) array (since mwell is (3, n)), with values from 0 to nClusters-1, telling which of the centroids each of the "n" original points was nearest to.

Be aware that if the range of values seen in your three variables (gv, npv and soil) are very different, k-means clustering may not give the results you expect. If you normalize the variables to have similar ranges, work in the normalized space, and do the reverse "un-normalizing" process when you need real-world numbers for them again.

I hope this helps!

Cheers,
-Dick

Dick Jackson Software Consulting Inc.
Victoria, BC, Canada --- http://www.d-jackson.com

On Sunday, 12 February 2017 15:13:57 UTC-8, Cheryl wrote:
> I was wondering if I could get some help on clustering in IDL. I found a good example on Harris Geospatial that explains the method, however, I am confused on how to run the clustering on my own data (ASCII) to perform the K-mean analysis. How can I use my data instead of the 'random' function that generates random numbers
> Below is the code I found on Harris:
>
> n = 50
> c1 = RANDOMN(seed, 3, n)
> c1[0:1,*] -= 3
> c2 = RANDOMN(seed, 3, n)
> c2[0,*] += 3
> c2[1,*] -= 3
> c3 = RANDOMN(seed, 3, n)
> c3[1:2,*] += 3
> array = [[c1], [c2], [c3]]
> ; Compute cluster weights, using three clusters:
> weights = CLUST_WTS(array, N_CLUSTERS = 3)
> ; Compute the classification of each sample:
> result = CLUSTER(array, weights, N_CLUSTERS = 3)
>
> My data is in ASCII format and I have already wrote a code that opens and read ascii (below) but not sure how to run the k-mean clustering analysis on my data. I can't find good IDL resources that explains the clustering.
>
> pro read_text, file, dir
>
> dir='path'
> file= 'path*file'
>
> n = file_lines(file)
> gv= fltarr(n)
> npv= fltarr(n)
> soil= fltarr(n)
> gv0= 0.0
> npv0= 0.0
> soil0=0.0
>
>
> openr, iunit, file, /get_lun
> for i= 0, n-1 do begin
>
> readf, iunit, gv0, npv0, soil0
>
> gv[i]= gv0
> npv[i]= npv0
> soil[i]=soil0
>
> endfor
>
>
> free_lun, iunit
>
> for i = 0, n-1 do print, gv[i], npv[i], soil[i]
>
> mwell=gv[0,*,*] ;this doesn't work
> mwell=gv[1,*,*]
> mwell=gv[2,*,*]
>
> end
>
>
>
> Thank you.

Report message to a moderator

Re: k-mean clustering idl [message #94196 is a reply to message #94177]

Fri, 17 February 2017 14:39

smnadoum
Messages: 24
Registered: June 2016

Junior Member

On Wednesday, February 15, 2017 at 10:16:41 PM UTC-8, Dick Jackson wrote:
> Hi Cheryl,
>
> I answered another question by you the other day, and now it seems (on Google Groups anyway) that the question was deleted, leaving only my answer. Curious.
>
> Find it here, and it can help you to get your data into the "mwell" array that you seem to want to create (use your gv, npv and soil variable names you have below, if those are better).
> https://groups.google.com/forum/#!topic/comp.lang.idl-pvwave /Gsp1JNUZSxs
>
> Once you have "mwell" as a (3, n) array, you simply run:
>
> nClusters = 3 ; or as many as you wish
> centroidsXYZ = CLUST_WTS(mwell, N_CLUSTERS = nClusters)
>
> Now centroidsXYZ is a (3, nClusters) array where each row describes a point in the space of (gv, npv, soil) values, which is the "centroid" of the cluster. To determine which cluster the "n" original points belong to (and you can do this with any points you're interested in):
>
> clusterIDs = CLUSTER(mwell, centroidsXYZ, N_CLUSTERS = nClusters)
>
> "clusterIDs" will be a (1, n) array (since mwell is (3, n)), with values from 0 to nClusters-1, telling which of the centroids each of the "n" original points was nearest to.
>
> Be aware that if the range of values seen in your three variables (gv, npv and soil) are very different, k-means clustering may not give the results you expect. If you normalize the variables to have similar ranges, work in the normalized space, and do the reverse "un-normalizing" process when you need real-world numbers for them again.
>
> I hope this helps!
>
> Cheers,
> -Dick
>
> Dick Jackson Software Consulting Inc.
> Victoria, BC, Canada --- http://www.d-jackson.com
>
>
> On Sunday, 12 February 2017 15:13:57 UTC-8, Cheryl wrote:
>> I was wondering if I could get some help on clustering in IDL. I found a good example on Harris Geospatial that explains the method, however, I am confused on how to run the clustering on my own data (ASCII) to perform the K-mean analysis. How can I use my data instead of the 'random' function that generates random numbers
>> Below is the code I found on Harris:
>>
>> n = 50
>> c1 = RANDOMN(seed, 3, n)
>> c1[0:1,*] -= 3
>> c2 = RANDOMN(seed, 3, n)
>> c2[0,*] += 3
>> c2[1,*] -= 3
>> c3 = RANDOMN(seed, 3, n)
>> c3[1:2,*] += 3
>> array = [[c1], [c2], [c3]]
>> ; Compute cluster weights, using three clusters:
>> weights = CLUST_WTS(array, N_CLUSTERS = 3)
>> ; Compute the classification of each sample:
>> result = CLUSTER(array, weights, N_CLUSTERS = 3)
>>
>> My data is in ASCII format and I have already wrote a code that opens and read ascii (below) but not sure how to run the k-mean clustering analysis on my data. I can't find good IDL resources that explains the clustering.
>>
>> pro read_text, file, dir
>>
>> dir='path'
>> file= 'path*file'
>>
>> n = file_lines(file)
>> gv= fltarr(n)
>> npv= fltarr(n)
>> soil= fltarr(n)
>> gv0= 0.0
>> npv0= 0.0
>> soil0=0.0
>>
>>
>> openr, iunit, file, /get_lun
>> for i= 0, n-1 do begin
>>
>> readf, iunit, gv0, npv0, soil0
>>
>> gv[i]= gv0
>> npv[i]= npv0
>> soil[i]=soil0
>>
>> endfor
>>
>>
>> free_lun, iunit
>>
>> for i = 0, n-1 do print, gv[i], npv[i], soil[i]
>>
>> mwell=gv[0,*,*] ;this doesn't work
>> mwell=gv[1,*,*]
>> mwell=gv[2,*,*]
>>
>> end
>>
>>
>>
>> Thank you.

Thank you so much.

I think there is a technical problem with my previous post, cause I can only see what I posted and can't see your answer anymore. But I really appreciate your help. I will carefully read your explanation about clustering and try to work on it.
Thanks again

Report message to a moderator

Previous Topic:	Reading h5 dataset by chunks
Next Topic:	harrisgeospatial unreachable...

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

] [

]

Current Time: Sun Nov 30 01:12:09 PST 2025

Total time taken to generate the page: 1.76061 seconds