comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » Re: Chi-square decision trees
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Switch to threaded view of this topic Create a new topic Submit Reply
Re: Chi-square decision trees [message #30324] Fri, 19 April 2002 14:22 Go to next message
James Kuyper is currently offline  James Kuyper
Messages: 425
Registered: March 2000
Senior Member
Dick Jackson wrote:

> "James Kuyper" <kuyper@gscmail.gsfc.nasa.gov> wrote in message
> news:3CC04E6E.7060304@gscmail.gsfc.nasa.gov...
>
>> Dick Jackson wrote:
>>
>>
>>> The ID3 (Iterative Dichotomizer - 3) method of Ross Quinlan may be what
>>> you're thinking of [...]
>>
>> However, I noticed that the web page had no links to an actual
>> implmentation.
>
>
> I just remembered that Quinlan followed up ID3 with an enhancement in 1993
> called C4.5, and published a book and C code to implement it (ISBN:
> 1558602380). If it's of any use, the code's home should be at
> http://www.cse.unsw.edu.au/~quinlan but I'm having trouble reaching it. A
> mirror and very nice tutorial are at
> http://www.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/c4.5/tut orial.html
>
> This code works only with text files as input and output, which might not be
> at all helpful to getting to work through IDL.

Thanks! I'll give it a try. I'm not committed to using IDL for this
purpose. C code designed for a Unix platform is acceptable, and getting
the data into text format won't be difficult.
If I run into any problems (or better yet, if it works!), I'll report
back to you.
Re: Chi-square decision trees [message #30326 is a reply to message #30324] Fri, 19 April 2002 14:36 Go to previous message
Dick Jackson is currently offline  Dick Jackson
Messages: 347
Registered: August 1998
Senior Member
"James Kuyper" <kuyper@gscmail.gsfc.nasa.gov> wrote in message
news:3CC04E6E.7060304@gscmail.gsfc.nasa.gov...
> Dick Jackson wrote:
>
>> The ID3 (Iterative Dichotomizer - 3) method of Ross Quinlan may be what
>> you're thinking of [...]
>
> However, I noticed that the web page had no links to an actual
> implmentation.

I just remembered that Quinlan followed up ID3 with an enhancement in 1993
called C4.5, and published a book and C code to implement it (ISBN:
1558602380). If it's of any use, the code's home should be at
http://www.cse.unsw.edu.au/~quinlan but I'm having trouble reaching it. A
mirror and very nice tutorial are at
http://www.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/c4.5/tut orial.html

This code works only with text files as input and output, which might not be
at all helpful to getting to work through IDL.

Cheers,
--
-Dick

Dick Jackson / dick@d-jackson.com
D-Jackson Software Consulting / http://www.d-jackson.com
Calgary, Alberta, Canada / +1-403-242-7398 / Fax: 241-7392
Re: Chi-square decision trees [message #30331 is a reply to message #30324] Fri, 19 April 2002 10:05 Go to previous message
James Kuyper is currently offline  James Kuyper
Messages: 425
Registered: March 2000
Senior Member
Dick Jackson wrote:

> Hi James,
>
> "James Kuyper" <kuyper@gscmail.gsfc.nasa.gov> wrote in message
> news:3CC030E0.9010302@gscmail.gsfc.nasa.gov...
>
>> Theres's a standard dataset characterization technique I used a couple
>> of decades ago, and I want to use it again, and I can't remember the
>> name of the technique.
>>
>> The context is that you have a discrete dependent variable, and a large
>> number of discrete independent variables. [...]
>>
>> Each basic step of the process involved choosing the particular variable
>> that had the most significant chi-squared value. Then, the process would
>> repeat in a hierarchial fashion on each subset determined by that
>> variable. [...]
>>
>> Does anyone recognise the technique I'm describing? Do you remember what
>> the name is? Is there an IDL routine that implements it?
>
>
> The ID3 (Iterative Dichotomizer - 3) method of Ross Quinlan may be what
> you're thinking of, although it's usually described in terms of 'information
> content' rather than 'chi-squared value', but the difference may be moot.
> It's also possible to use this method for continuous variables, with the
> extra trick of finding a split point.
>
> I once gave a talk on this method to a group of colleagues when I was doing
> work mainly in Lisp, and I had a pretty nice graphical implementation in
> object-oriented Macintosh Common Lisp. I don't know of any IDL code for it,
> but it shouldn't be too hard to do, though.
>
> I found this summary of the method through Google:
> http://www.dcs.napier.ac.uk/~peter/vldb/dm/node11.html


I'm positive that this is a different algorithm than the one I was
talking about. It may be an equivalent one; that's hard to tell without
careful analysis. It may be better; the chi-squared criterion sounded a
bit ad-hoc to me; the information-theoretic derivation of this algorithm
seems better-founded. However, as long as it does what it sounds like it
does, I'd be willing to at least try it out.

However, I noticed that the web page had no links to an actual
implmentation. It mentioned a commercial package, but that doesn't help
much. My current need for this tool has essentially no budget behind it.
If the tool's not hidden away in one of the libraries we already have
installed here (such as the IMSL or IDL libraries), I have to settle for
a freeware solution, or write it myself (and that has to be low cost,
too - I couldn't afford to put in more than a day or two on it). Aren't
budgets fun! :-(
Re: Chi-square decision trees [message #30332 is a reply to message #30331] Fri, 19 April 2002 09:38 Go to previous message
Dick Jackson is currently offline  Dick Jackson
Messages: 347
Registered: August 1998
Senior Member
Hi James,

"James Kuyper" <kuyper@gscmail.gsfc.nasa.gov> wrote in message
news:3CC030E0.9010302@gscmail.gsfc.nasa.gov...
> Theres's a standard dataset characterization technique I used a couple
> of decades ago, and I want to use it again, and I can't remember the
> name of the technique.
>
> The context is that you have a discrete dependent variable, and a large
> number of discrete independent variables. [...]
>
> Each basic step of the process involved choosing the particular variable
> that had the most significant chi-squared value. Then, the process would
> repeat in a hierarchial fashion on each subset determined by that
> variable. [...]
>
> Does anyone recognise the technique I'm describing? Do you remember what
> the name is? Is there an IDL routine that implements it?

The ID3 (Iterative Dichotomizer - 3) method of Ross Quinlan may be what
you're thinking of, although it's usually described in terms of 'information
content' rather than 'chi-squared value', but the difference may be moot.
It's also possible to use this method for continuous variables, with the
extra trick of finding a split point.

I once gave a talk on this method to a group of colleagues when I was doing
work mainly in Lisp, and I had a pretty nice graphical implementation in
object-oriented Macintosh Common Lisp. I don't know of any IDL code for it,
but it shouldn't be too hard to do, though.

I found this summary of the method through Google:
http://www.dcs.napier.ac.uk/~peter/vldb/dm/node11.html

Cheers,
--
-Dick

Dick Jackson / dick@d-jackson.com
D-Jackson Software Consulting / http://www.d-jackson.com
Calgary, Alberta, Canada / +1-403-242-7398 / Fax: 241-7392
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: Re: One file for each procedure/function?
Next Topic: Re: High Quality Frame Grabber Suggestions

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Wed Oct 08 17:03:24 PDT 2025

Total time taken to generate the page: 0.00761 seconds