comp.lang.idl-pvwave archive
Messages from Usenet group comp.lang.idl-pvwave, compiled by Paulo Penteado

Home » Public Forums » archive » parse subdirectories
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Switch to threaded view of this topic Create a new topic Submit Reply
parse subdirectories [message #92215] Mon, 13 March 2017 09:32 Go to next message
Helder Marchetto is currently offline  Helder Marchetto
Messages: 520
Registered: November 2011
Senior Member
Hi,
I have widget application that takes a directory as input and generates a widget_tree of the (sub)directory structure.
I've have done this using basically something like (there's more to it that this...):

pro dirParser::addsubtree, dir
subs = file_search(dir+'*', /test_directory, count=cnt)
subParent = self.widgets.treeSub[-1]
if cnt gt 0 then foreach sub,subs do self->addSubsTree, sub, subParent
end

This works fine, meaning it returns the directory structure and that's great. However, I would like to switch the windows powershell to get the tree structure. Why? Because the above is really slow.

So, I can call the powershell like this:
spawn, 'powershell -WindowStyle Hidden "Get-ChildItem -Recurse | ?{ $_.PSIsContainer } | Select-Object FullName"', result, /noshell

which gives me a very quick response with something like:
IDL> print, transpose(result)

FullName
--------
K:\data\sub-1\2002
K:\data\sub-1\2004
K:\data\sub-1\2005
K:\data\sub-1\2017
K:\data\sub-1\2002\02_01_26
K:\data\sub-1\2002\02_01_28
K:\data\sub-1\2004\04_12_02
K:\data\sub-1\2004\04_12_03

and so on (there are many more subdirectories).
Does anybody have a good suggestion how to parse the text contained in the above result array?

I know how to handle strings, but I don't have a good way to sort the subdirectories (for instance "K:\data\sub-1\2002\02_01_26" is a subdirectory of "K:\data\sub-1\2002", but comes only after all the other same level directories are listed).

I would appreciate any suggestion on how to solve the directory listing chaos.

Regards,
Helder

PS: Just for the time comparison: running the above powershell command over a structure of >2000 directories/subdirectories took "just" 8 seconds. The older method took minutes...
Re: parse subdirectories [message #92312 is a reply to message #92215] Mon, 13 March 2017 10:41 Go to previous messageGo to next message
wlandsman is currently offline  wlandsman
Messages: 743
Registered: June 2000
Senior Member
I don't have an answer to your question, but do want to point out that the slowness of FILE_SEARCH() on Windows is a long-standing problem.

http://www.idlcoyote.com/code_tips/fastsearch.php

and one that I've ranted about before

https://groups.google.com/forum/#!search/file_search$20wayne /comp.lang.idl-pvwave/ABttLQ5NHHU/xgSFcodk4N0J

And just yesterday I became aware of another group at Goddard encountering this problem. Please Harris/Exelis can we get a fix for FILE_SEARCH?

Anyway, for a recursive directory search, I'm not sure that FINDFILE() or David Fanning's listfile.pro are a suitable solution. I suspect -- but am not certain -- that you do have to sort the output of the powershell as you are doing. You might have to count the number of backslashes to get the directory level. --Wayne




On Monday, March 13, 2017 at 12:32:12 PM UTC-4, Helder wrote:
> Hi,
> I have widget application that takes a directory as input and generates a widget_tree of the (sub)directory structure.
> I've have done this using basically something like (there's more to it that this...):
>
> pro dirParser::addsubtree, dir
> subs = file_search(dir+'*', /test_directory, count=cnt)
> subParent = self.widgets.treeSub[-1]
> if cnt gt 0 then foreach sub,subs do self->addSubsTree, sub, subParent
> end
>
> This works fine, meaning it returns the directory structure and that's great. However, I would like to switch the windows powershell to get the tree structure. Why? Because the above is really slow.
>
> So, I can call the powershell like this:
> spawn, 'powershell -WindowStyle Hidden "Get-ChildItem -Recurse | ?{ $_.PSIsContainer } | Select-Object FullName"', result, /noshell
>
> which gives me a very quick response with something like:
> IDL> print, transpose(result)
>
> FullName
> --------
> K:\data\sub-1\2002
> K:\data\sub-1\2004
> K:\data\sub-1\2005
> K:\data\sub-1\2017
> K:\data\sub-1\2002\02_01_26
> K:\data\sub-1\2002\02_01_28
> K:\data\sub-1\2004\04_12_02
> K:\data\sub-1\2004\04_12_03
>
> and so on (there are many more subdirectories).
> Does anybody have a good suggestion how to parse the text contained in the above result array?
>
> I know how to handle strings, but I don't have a good way to sort the subdirectories (for instance "K:\data\sub-1\2002\02_01_26" is a subdirectory of "K:\data\sub-1\2002", but comes only after all the other same level directories are listed).
>
> I would appreciate any suggestion on how to solve the directory listing chaos.
>
> Regards,
> Helder
>
> PS: Just for the time comparison: running the above powershell command over a structure of >2000 directories/subdirectories took "just" 8 seconds. The older method took minutes...
Re: parse subdirectories [message #92808 is a reply to message #92312] Mon, 13 March 2017 16:04 Go to previous messageGo to next message
Helder Marchetto is currently offline  Helder Marchetto
Messages: 520
Registered: November 2011
Senior Member
Thanks for the insight Wayne.
I think I will go for the sort/count slashes option. Otherwise programming is no fun.
Cheers, Helder
Re: parse subdirectories [message #94259 is a reply to message #92808] Wed, 15 March 2017 04:06 Go to previous messageGo to next message
Helder Marchetto is currently offline  Helder Marchetto
Messages: 520
Registered: November 2011
Senior Member
On Tuesday, March 14, 2017 at 12:04:15 AM UTC+1, Helder wrote:
> Thanks for the insight Wayne.
> I think I will go for the sort/count slashes option. Otherwise programming is no fun.
> Cheers, Helder

Hi,
I was just about to give up on this because I noticed that if the output directory is too long, Windows shortens the directories and adds three dots. Something like this:
K:\firstDir\secondDir\thirdDir\four...
instead of
K:\firstDir\secondDir\thirdDir\fourthDir\

This is literaly what is returned by the spawn command:
spawn, 'powershell -WindowStyle Hidden "Get-ChildItem -Recurse | ?{ $_.PSIsContainer } | Select-Object FullName"', result, /noshell

After fiddling around with cmd and powershell, I found finally the answer from stackoverflow:
http://stackoverflow.com/questions/9528039/powershell-why-do es-out-file-break-long-line-into-smaller-lines

So this now works better:
spawn, 'powershell -WindowStyle Hidden "Get-ChildItem -Recurse | ?{ $_.PSIsContainer } | Select-Object FullName | Format-list *"', result, /noShell

There is still however a little problem with overflowing lines. This is simply dealt with by searching for those lines that do not contain ":" and moving the content to the previous line...

Cheers,
Helder
Re: parse subdirectories [message #94260 is a reply to message #92808] Wed, 15 March 2017 08:02 Go to previous messageGo to next message
Helder Marchetto is currently offline  Helder Marchetto
Messages: 520
Registered: November 2011
Senior Member
On Tuesday, March 14, 2017 at 12:04:15 AM UTC+1, Helder wrote:
> Thanks for the insight Wayne.
> I think I will go for the sort/count slashes option. Otherwise programming is no fun.
> Cheers, Helder

Ok,
so I managed this and it is working fine. The improvement is there: crawling through 2000 sub-directories takes ~9 seconds against the >14 sec with the old file_search method.
Now the time to execute is limited by the spawn command (8.5 sec), whereas before it was limited by the file_search (13 sec).
If anybody requests it and I have time I'll put the code together in a "nicer" way.
Cheers,
Helder
Re: parse subdirectories [message #94261 is a reply to message #94260] Wed, 15 March 2017 11:23 Go to previous messageGo to next message
lecacheux.alain is currently offline  lecacheux.alain
Messages: 325
Registered: January 2008
Senior Member
Le mercredi 15 mars 2017 16:02:42 UTC+1, Helder a écrit :
> On Tuesday, March 14, 2017 at 12:04:15 AM UTC+1, Helder wrote:
>> Thanks for the insight Wayne.
>> I think I will go for the sort/count slashes option. Otherwise programming is no fun.
>> Cheers, Helder
>
> Ok,
> so I managed this and it is working fine. The improvement is there: crawling through 2000 sub-directories takes ~9 seconds against the >14 sec with the old file_search method.
> Now the time to execute is limited by the spawn command (8.5 sec), whereas before it was limited by the file_search (13 sec).
> If anybody requests it and I have time I'll put the code together in a "nicer" way.
> Cheers,
> Helder

Maybe you could simply reorder your string array first by using SORT ?

print, result[sort(result)]

K:\data\sub-1\2002                        
K:\data\sub-1\2002\02_01_26              
K:\data\sub-1\2002\02_01_28              
K:\data\sub-1\2004                        
K:\data\sub-1\2004\04_12_02              
K:\data\sub-1\2004\04_12_03              
K:\data\sub-1\2005                        
K:\data\sub-1\2017                        
Re: parse subdirectories [message #94262 is a reply to message #94261] Wed, 15 March 2017 13:05 Go to previous messageGo to next message
Helder Marchetto is currently offline  Helder Marchetto
Messages: 520
Registered: November 2011
Senior Member
On Wednesday, March 15, 2017 at 7:23:46 PM UTC+1, alx wrote:
> Le mercredi 15 mars 2017 16:02:42 UTC+1, Helder a écrit :
>> On Tuesday, March 14, 2017 at 12:04:15 AM UTC+1, Helder wrote:
>>> Thanks for the insight Wayne.
>>> I think I will go for the sort/count slashes option. Otherwise programming is no fun.
>>> Cheers, Helder
>>
>> Ok,
>> so I managed this and it is working fine. The improvement is there: crawling through 2000 sub-directories takes ~9 seconds against the >14 sec with the old file_search method.
>> Now the time to execute is limited by the spawn command (8.5 sec), whereas before it was limited by the file_search (13 sec).
>> If anybody requests it and I have time I'll put the code together in a "nicer" way.
>> Cheers,
>> Helder
>
> Maybe you could simply reorder your string array first by using SORT ?
>
> print, result[sort(result)]
>
> K:\data\sub-1\2002                        
> K:\data\sub-1\2002\02_01_26              
> K:\data\sub-1\2002\02_01_28              
> K:\data\sub-1\2004                        
> K:\data\sub-1\2004\04_12_02              
> K:\data\sub-1\2004\04_12_03              
> K:\data\sub-1\2005                        
> K:\data\sub-1\2017                        

Hi Alx,
that's indeed *one* of the things I have to do. However, there's more to it than plain sorting. This because I want to maintain the physical tree structure: therefore I have to determine if any given directory is a node or a leaf and index it accordingly.
Since I like to use this a lot - and wonder why this isn't already available - I will try to order/clean things up and share it when it's done.

What it does not do and will no do, is follow symbolic links and I will not test this on a linux/mac machine (simply don't have one). If anybody is interested, we can share the load :-)

Cheers,
Helder
Re: parse subdirectories [message #94269 is a reply to message #94262] Thu, 16 March 2017 05:29 Go to previous messageGo to next message
Helder Marchetto is currently offline  Helder Marchetto
Messages: 520
Registered: November 2011
Senior Member
On Wednesday, March 15, 2017 at 9:05:25 PM UTC+1, Helder wrote:
> On Wednesday, March 15, 2017 at 7:23:46 PM UTC+1, alx wrote:
>> Le mercredi 15 mars 2017 16:02:42 UTC+1, Helder a écrit :
>>> On Tuesday, March 14, 2017 at 12:04:15 AM UTC+1, Helder wrote:
>>>> Thanks for the insight Wayne.
>>>> I think I will go for the sort/count slashes option. Otherwise programming is no fun.
>>>> Cheers, Helder
>>>
>>> Ok,
>>> so I managed this and it is working fine. The improvement is there: crawling through 2000 sub-directories takes ~9 seconds against the >14 sec with the old file_search method.
>>> Now the time to execute is limited by the spawn command (8.5 sec), whereas before it was limited by the file_search (13 sec).
>>> If anybody requests it and I have time I'll put the code together in a "nicer" way.
>>> Cheers,
>>> Helder
>>
>> Maybe you could simply reorder your string array first by using SORT ?
>>
>> print, result[sort(result)]
>>
>> K:\data\sub-1\2002                        
>> K:\data\sub-1\2002\02_01_26              
>> K:\data\sub-1\2002\02_01_28              
>> K:\data\sub-1\2004                        
>> K:\data\sub-1\2004\04_12_02              
>> K:\data\sub-1\2004\04_12_03              
>> K:\data\sub-1\2005                        
>> K:\data\sub-1\2017                        
>
> Hi Alx,
> that's indeed *one* of the things I have to do. However, there's more to it than plain sorting. This because I want to maintain the physical tree structure: therefore I have to determine if any given directory is a node or a leaf and index it accordingly.
> Since I like to use this a lot - and wonder why this isn't already available - I will try to order/clean things up and share it when it's done.
>
> What it does not do and will no do, is follow symbolic links and I will not test this on a linux/mac machine (simply don't have one). If anybody is interested, we can share the load :-)
>
> Cheers,
> Helder

Ok, so I loaded this up on github with an example:
https://github.com/heldermarchetto/IDL
I'm not that much into github, so if you want to contribute/improve the procedure, you're welcome. Let me know if I can do anything about it.
The notifications are not really set-up and I think I don't need to... I'll check that later.

I had someone test this for me with a much bigger directory tree and the improvement was from 2'17" to 51". Nice.

I think that this procedure will not be useful for small directories! It will probably be slower, and that is because of the spawn call to the powershell.

Cheers,
Helder
Re: parse subdirectories [message #94270 is a reply to message #94269] Thu, 16 March 2017 08:07 Go to previous messageGo to next message
Matthew Argall is currently offline  Matthew Argall
Messages: 286
Registered: October 2011
Senior Member
Another option would be to populate the widget tree only when the expand node button is clicked. Like an "ls" instead of an "ls -r". This would (I think) involve destroying and regenerating the widget tree each time while updating the widget base via widget_control, [/UPDATE | /REDRAW].
Re: parse subdirectories [message #94271 is a reply to message #94270] Thu, 16 March 2017 10:29 Go to previous messageGo to next message
bradgom is currently offline  bradgom
Messages: 2
Registered: July 2000
Junior Member
I ran into this issue several years ago while trying to find a way to select files in a tree for quick analysis in IDL. (Boy it would be nice if IDL had a fast file listing function in Windows, and also some built-in file management widgets..) I also don't use GitHub, but I've posted a widget object here:
https://github.com/bradgom/BGDirtree_widget

This doesn't solve the directory crawling time issue, but does only dig down into the currently selected branch instead of the whole directory tree.

Also, since Spawn doesn't work in the VM, this code is still slow when run in VM applications.
Re: parse subdirectories [message #94272 is a reply to message #94271] Fri, 17 March 2017 01:17 Go to previous messageGo to next message
Helder Marchetto is currently offline  Helder Marchetto
Messages: 520
Registered: November 2011
Senior Member
On Thursday, March 16, 2017 at 6:29:25 PM UTC+1, bra...@gmail.com wrote:
> I ran into this issue several years ago while trying to find a way to select files in a tree for quick analysis in IDL. (Boy it would be nice if IDL had a fast file listing function in Windows, and also some built-in file management widgets..) I also don't use GitHub, but I've posted a widget object here:
> https://github.com/bradgom/BGDirtree_widget
>
> This doesn't solve the directory crawling time issue, but does only dig down into the currently selected branch instead of the whole directory tree.
>
> Also, since Spawn doesn't work in the VM, this code is still slow when run in VM applications.

Hi,
great work! I still have to learn how to write compound widgets :-)
I have to point out, that spawn DOES work in the VM.

Cheers,
Helder
Re: parse subdirectories [message #94422 is a reply to message #94272] Thu, 18 May 2017 02:09 Go to previous message
Helder Marchetto is currently offline  Helder Marchetto
Messages: 520
Registered: November 2011
Senior Member
On Friday, March 17, 2017 at 9:17:49 AM UTC+1, Helder wrote:
> On Thursday, March 16, 2017 at 6:29:25 PM UTC+1, bra...@gmail.com wrote:
>> I ran into this issue several years ago while trying to find a way to select files in a tree for quick analysis in IDL. (Boy it would be nice if IDL had a fast file listing function in Windows, and also some built-in file management widgets..) I also don't use GitHub, but I've posted a widget object here:
>> https://github.com/bradgom/BGDirtree_widget
>>
>> This doesn't solve the directory crawling time issue, but does only dig down into the currently selected branch instead of the whole directory tree.
>>
>> Also, since Spawn doesn't work in the VM, this code is still slow when run in VM applications.
>
> Hi,
> great work! I still have to learn how to write compound widgets :-)
> I have to point out, that spawn DOES work in the VM.
>
> Cheers,
> Helder

Hi,
I've just finished modifying Bradgom's BGDirtree_widget compound widget.
The new version (HMDirTree_Widget) I've created is largely based on his, with some important changes:
1) HMDirTree_Widget does not issue one spawn command per folder. Issuing
so many spawn commands made the windows task bar go crazy.
2) HMDirTree_Widget does not handle files, only folders.
3) HMDirTree_Widget has no dependencies
4) HMDirTree_Widget always starts by searching for the available fixed
drives (hard drives and network drives) and creating a list.
To avoid hanging on slow network connections, the network drives are
listed, but only explored when the user selects them.
5) HMDirTree_Widget allows only to select single directories, not multiple.
6) HMDirTree_Widget works only with IDL versions 8.0 or higher.
7) HMDirTree_Widget works on Windows only.

If anybody is interested, the code can be found here. https://github.com/heldermarchetto/HMDirTree_Widget
As I said, it is largely based on Bradgom's work, so thanks to him for the hard work and for making his code available.

Regards,
Helder
  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: Harris seems to be deemphasizing IDL
Next Topic: Problem writing SCATTERPLOT() with more than 4096 points to PDF file

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Wed Oct 08 11:34:18 PDT 2025

Total time taken to generate the page: 0.00675 seconds