ncdf + x-connection + many files = crazy. [message #74933] |
Tue, 08 February 2011 12:37  |
Matt[2]
Messages: 69 Registered: March 2007
|
Member |
|
|
Hey all,
I've got a great riddle for you today. (Long story short, can you run
the test code included below to completion on your machine?) I've
already filed a bug report, but I want to see if other people can
reproduce these errors.
I'm experiencing machine/version-dependent, non-deterministic behavior
that seems to present itself orthogonal to the triggering mechanism.
In other words, I've got a crazy-making problem.
I experienced this in working code, but through patient debugging,
I've pared this down my until I've got a relatively small program that
demonstrates the problem behavior.
The undesirable behavior is a NCDF_OPEN error (NC_ERROR=-31) after
some *random* number of open/close iterations on a netcdf file that
was created with IDL (or possibly a segfault core dump on other
machines.). The second feature is that I don't see this behavior
until I make an x-connection of some sort (window, /free, or even a
"device, depth=depth" will cause errors)
Machine 1: (only has idl 7.0.1 64bit installed)
savoie@snow:~/tmp> uname -a
Linux snow 2.6.27.54-0.2-default #1 SMP 2010-10-19 18:40:07 +0200
x86_64 x86_64 x86_64 GNU/Linux
Machine 2 (fails with 32 bit idl7.0.1 & idl6.4):
savoie@snowblower:~/tmp> uname -a
Linux snowblower 2.6.34.7-0.5-default #1 SMP 2010-10-25 08:40:12 +0200
i686 i686 i386 GNux
All the code does is open and close a netcdf file, many, many times.
But after an x-connection (via "window, /free"), the code errors out
at a different spot in the iteration.
Below is the output I get on the problem machines and below that, the
source code for you to try at home. For extra credit, modify the code
to open a netcdf file created with a different program than IDL, NCO
for example.
--OUTPUT--------------------------------------
savoie@snow:~/tmp> idl70
IDL Version 7.0.1 (linux x86_64 m64). (c) 2008, ITT Visual Information
Solutions
Installation number: 100-431.
Licensed for use by: University of Colorado
IDL> .run ./crash_ncdf.pro
% Compiled module: CREATE_HAND_SAMPLE.
% Compiled module: CRASH_NCDF.
IDL> crash_ncdf
% Loaded DLM: NCDF.
Successfully completed first iterations
i = 143017
ncdf_fid = 5
% CRASH_NCDF: NCDF_OPEN: Unable to open the file "/projects/NRTSI-G/
tmp_crashing_ncdf/sample_h
and.nc".
(NC_ERROR=-31)
% Execution halted at: CRASH_NCDF 44 /homes/snowblower/savoie/
tmp/crash_ncdf.pro
% $MAIN$
----------------------------------------
Sample Test Code:
----------------------------------------
;+============================================
; :Author: Matt Savoie <savoie@nsidc.org>
; :Copyright: (C) <2011> University of Colorado.
; :Version: $Id:$
;
; Created 02/03/2011
; National Snow & Ice Data Center, University of Colorado, Boulder
;-============================================*/
;+
; Generate a very simple netcdf file.
;-
pro create_hand_sample, file
compile_opt idl2, logical_predicate
sample_var = dindgen( 4, 3 )
ncid = ncdf_create( file, /CLOBBER )
dimidx = ncdf_dimdef( ncid, 'x', 4 )
dimidy = ncdf_dimdef( ncid, 'y', 3 )
varid = ncdf_vardef( ncid, 'variable', /double )
ncdf_control, ncid, /ENDEF
ncdf_varput, ncid, varid, sample_var
ncdf_close, ncid
end
;+
; This is just a sample looping program that shows a problem
; Opening and Closing a single netcdf file many times.
;-
pro crash_ncdf
compile_opt idl2, logical_predicate
catch, theError
if theError ne 0 then begin
Catch, /cancel
print, "i = ", i
print, "ncdf_fid = ", ncdf_fid
message, !ERROR_STATE.msg
endif
file = './sample.nc'
create_hand_sample, file
long_iteration = 1900000L
for i = 0l, long_iteration do begin
ncdf_fid = ncdf_open( file, write = 0 )
ncdf_close, ncdf_fid
endfor
print, 'Successfully completed first iterations'
window, /free
wdelete
for i = 0l, long_iteration do begin
ncdf_fid = ncdf_open( file, write = 0 )
ncdf_close, ncdf_fid
endfor
print, 'Successfully completed second iterations'
end
|
|
|
Re: ncdf + x-connection + many files = crazy [message #77462 is a reply to message #74933] |
Wed, 07 September 2011 10:03   |
Fabzou
Messages: 76 Registered: November 2010
|
Member |
|
|
Hi,
I will also subscribe, if it can help...
http://www.rhinocerus.net/forum/lang-idl-pvwave/657163-rando m-netcdf-i-o-error.html
On 09/07/2011 11:27 AM, Reimar Bauer wrote:
> Am 06.09.2011 16:54, schrieb savoie@nsidc.org:
>>
>> Yun<bernat.puigdomenech@gmail.com> writes:
>>
>>> https://groups.google.com/group/comp.lang.idl-pvwave/browse_ thread/thread/98cfca51043318d8/d175f2e21203a5bc?hl=en&ln k=gst&q=ncdf+connection#d175f2e21203a5bc
>>>
>>> I am reading 100 files from network volumes, and I get the error
>>> (NC_ERROR=-31) randomly. I am using the following version of NCDF /DLM
>>> on a Linux Machine.
>>>
>>> Any help will be appreciated,
>>
>>
>> Yun,
>>
>>
>> Hi there, that was me who posted that problem. It would be excellent if
>> you could also file a bug report with ITTVIS. I continue to be told it
>> will be fixed in the next version, but they don't have a work-around in
>> the mean time. I've begun the work-around of not using IDL for my
>> NETCDF work, which is becoming more and more of the work I must do. It
>> would be helpful for this problem to be solved and if I'm not the only
>> person having the issue it might help actually get it solved.
>>
>>
>> Thanks,
>> Matt
>>
>
> Can you share the url of the tracked issue?
>
> May be subscribing or telling who all is affected brings it on top.
>
>
> Reimar
|
|
|
|
|
|
Re: ncdf + x-connection + many files = crazy [message #77554 is a reply to message #77462] |
Thu, 08 September 2011 00:12  |
R.Bauer
Messages: 1424 Registered: November 1998
|
Senior Member |
|
|
Am 07.09.2011 19:03, schrieb Fabzou:
> Hi,
>
> I will also subscribe, if it can help...
>
> http://www.rhinocerus.net/forum/lang-idl-pvwave/657163-rando m-netcdf-i-o-error.html
>
>
>
Hi there,
I experienced yesterday with NFS and found probably a solution for me
Somee time ago I already figured that I can read many times without a
problem if the Block size is 4 K which usually is given on local storage.
The problematic nfs mounts tells:
stat -f .
File: "."
ID: 0 Namelen: 255 Type: nfs
Block size: 1048576 Fundamental block size: 1048576
Blocks: Total: 153525 Free: 24690 Available: 24690
Currently I experienced changing rsize, wsize of Block size to a lower
value on client side.
I tried several values and found another fishy number.
4096 * 255 works! But the given old number is a multiplier by 256!
I am not able by 255 * 4096 to crash IDL by reading ncdf files from NFS
storage.
That are the test params for my NFS test mount
rw,rsize=1044480,wsize=1044480,nodev,nosuid,exec,auto,nouser ,async
If I enter the defaults value of 1048576 IDL crashes reading many files.
(Or one file in a loop of 10K iterations)
For further understanding of that also a better NFS expert than me is
wanted. I am not sure if it makes really sense to have such big values
as defaults. Also I don't understand why that makes a problem for IDL
because I have not got any issue requests for other languages (python,
fortran, c, c++) reading netCDF files.
Can you please verify if that workaround helps.
cheers
Reimar
>
> On 09/07/2011 11:27 AM, Reimar Bauer wrote:
>> Am 06.09.2011 16:54, schrieb savoie@nsidc.org:
>>>
>>> Yun<bernat.puigdomenech@gmail.com> writes:
>>>
>>>> https://groups.google.com/group/comp.lang.idl-pvwave/browse_ thread/thread/98cfca51043318d8/d175f2e21203a5bc?hl=en&ln k=gst&q=ncdf+connection#d175f2e21203a5bc
>>>>
>>>>
>>>> I am reading 100 files from network volumes, and I get the error
>>>> (NC_ERROR=-31) randomly. I am using the following version of NCDF /DLM
>>>> on a Linux Machine.
>>>>
>>>> Any help will be appreciated,
>>>
>>>
>>> Yun,
>>>
>>>
>>> Hi there, that was me who posted that problem. It would be excellent if
>>> you could also file a bug report with ITTVIS. I continue to be told it
>>> will be fixed in the next version, but they don't have a work-around in
>>> the mean time. I've begun the work-around of not using IDL for my
>>> NETCDF work, which is becoming more and more of the work I must do. It
>>> would be helpful for this problem to be solved and if I'm not the only
>>> person having the issue it might help actually get it solved.
>>>
>>>
>>> Thanks,
>>> Matt
>>>
>>
>> Can you share the url of the tracked issue?
>>
>> May be subscribing or telling who all is affected brings it on top.
>>
>>
>> Reimar
>
|
|
|