Re: strange behaviour of bytscl by large arrays [message #79953] |
Mon, 23 April 2012 10:06  |
David Fanning
Messages: 11724 Registered: August 2001
|
Senior Member |
|
|
fawltylanguage@gmail.com writes:
> One learns from the others' mistakes :-)
Yes, I understand. Now comes the (usually futile)
process of convincing the "other" that the
mistake is theirs and not yours. My rule of thumb
is this takes about five times as long (if it can
be done!) then discovering and correcting the mistake
yourself. ;-)
Cheers,
David
--
David Fanning, Ph.D.
Fanning Software Consulting, Inc.
Coyote's Guide to IDL Programming: http://www.idlcoyote.com/
Sepore ma de ni thui. ("Perhaps thou speakest truth.")
|
|
|
|
Re: strange behaviour of bytscl by large arrays [message #79955 is a reply to message #79954] |
Mon, 23 April 2012 09:50   |
David Fanning
Messages: 11724 Registered: August 2001
|
Senior Member |
|
|
fawltylanguage@gmail.com writes:
> I think IDL's FINDGEN() implementation is wrong: it uses a float counter instead of an integer one. The following test shows the difference:
>
> pro test
> cpu, tpool_nthreads=1
> n=10l^8
> nn=n-1
> a1=findgen(n) ; real FINDGEN()
> a2=fltarr(n)
> count=0.0
> for j=0l, nn do a2[j]=count++ ; IDL's implementation
> a3=fltarr(n)
> count=0ll
> for j=0l, nn do a3[j]=count++ ; better implementation
> print, a1[nn], a2[nn], a3[nn], format='(3F15.3)'
> end
>
> (Multithreading must be disabled because the starting values for the threads are calculated as an integer. So the result of FINDGEN() depends on the number of your CPU cores, too :-)
Wow! The things you learn about IDL when you look into it, huh? ;-)
Cheers,
David
--
David Fanning, Ph.D.
Fanning Software Consulting, Inc.
Coyote's Guide to IDL Programming: http://www.idlcoyote.com/
Sepore ma de ni thui. ("Perhaps thou speakest truth.")
|
|
|
Re: strange behaviour of bytscl by large arrays [message #79956 is a reply to message #79955] |
Mon, 23 April 2012 09:18   |
Lajos Foldy
Messages: 176 Registered: December 2011
|
Senior Member |
|
|
On Monday, April 23, 2012 6:14:21 PM UTC+2, fawltyl...@gmail.com wrote:
> On Monday, April 23, 2012 4:07:00 PM UTC+2, Klemen wrote:
>> Hi folks,
>>
>> is there any explanation of why I don't get the same or at least similar results using the code below by:
>> a) using DINDGEN in line 3
>> b) using FINDGEN in line 3
>>
>> pro test
>> s = 10000
>> a = sin(findgen(s, s)/100000.)
>> b = bytscl(a)
>> write_tiff, 'b.tif', b
>> end
>>
>> The tif file I get using the DINDGEN function has waves all over the image. The option using FINDGEN produces strange results (a couple of waves and then wide bands of constant values). See the following link for the (resized) results.
>> https://picasaweb.google.com/112572300011512591455/Eumetsat# 5734593216558178098
>>
>> I came across this problem as I tried to scale (using HIST_EQUAL and BYTSCL functions) 16-bit 5-band RapidEye data to 24-bit RGB image. Scaling the whole image produced results that were all black, smaller subsets seemed ok.
>>
>> Does anybody have a suggestion how to handle this issue?
>>
>> Cheers, Klemen
>
> I think IDL's FINDGEN() implementation is wrong: it uses a float counter instead of an integer one. The following test shows the difference:
>
> pro test
> cpu, tpool_nthreads=1
> n=10l^8
> nn=n-1
> a1=findgen(n) ; real FINDGEN()
> a2=fltarr(n)
> count=0.0
> for j=0l, nn do a2[j]=count++ ; IDL's implementation
> a3=fltarr(n)
> count=0ll
> for j=0l, nn do a3[j]=count++ ; better implementation
> print, a1[nn], a2[nn], a3[nn], format='(3F15.3)'
> end
>
> (Multithreading must be disabled because the starting values for the threads are calculated as an integer. So the result of FINDGEN() depends on the number of your CPU cores, too :-)
>
> regards,
> Lajos
Oops! The results were missing:
IDL> test
16777216.000 16777216.000 100000000.000
regards,
Lajos
|
|
|
Re: strange behaviour of bytscl by large arrays [message #79957 is a reply to message #79956] |
Mon, 23 April 2012 09:14   |
Lajos Foldy
Messages: 176 Registered: December 2011
|
Senior Member |
|
|
On Monday, April 23, 2012 4:07:00 PM UTC+2, Klemen wrote:
> Hi folks,
>
> is there any explanation of why I don't get the same or at least similar results using the code below by:
> a) using DINDGEN in line 3
> b) using FINDGEN in line 3
>
> pro test
> s = 10000
> a = sin(findgen(s, s)/100000.)
> b = bytscl(a)
> write_tiff, 'b.tif', b
> end
>
> The tif file I get using the DINDGEN function has waves all over the image. The option using FINDGEN produces strange results (a couple of waves and then wide bands of constant values). See the following link for the (resized) results.
> https://picasaweb.google.com/112572300011512591455/Eumetsat# 5734593216558178098
>
> I came across this problem as I tried to scale (using HIST_EQUAL and BYTSCL functions) 16-bit 5-band RapidEye data to 24-bit RGB image. Scaling the whole image produced results that were all black, smaller subsets seemed ok.
>
> Does anybody have a suggestion how to handle this issue?
>
> Cheers, Klemen
I think IDL's FINDGEN() implementation is wrong: it uses a float counter instead of an integer one. The following test shows the difference:
pro test
cpu, tpool_nthreads=1
n=10l^8
nn=n-1
a1=findgen(n) ; real FINDGEN()
a2=fltarr(n)
count=0.0
for j=0l, nn do a2[j]=count++ ; IDL's implementation
a3=fltarr(n)
count=0ll
for j=0l, nn do a3[j]=count++ ; better implementation
print, a1[nn], a2[nn], a3[nn], format='(3F15.3)'
end
(Multithreading must be disabled because the starting values for the threads are calculated as an integer. So the result of FINDGEN() depends on the number of your CPU cores, too :-)
regards,
Lajos
|
|
|
|
|
Re: strange behaviour of bytscl by large arrays [message #79960 is a reply to message #79959] |
Mon, 23 April 2012 08:10   |
David Fanning
Messages: 11724 Registered: August 2001
|
Senior Member |
|
|
Klemen writes:
> I read "the sky is falling" some time ago. I guess I understand how the numbers are saved in the computer. And perhaps this is not the best case I have shown over here. But still, as far as I understand float type, a variable has first 7 figures that should be ok. But I don't really care about the precision on the 7th figure - i have problems on the 1st figure. Look at this example.
>
> IDL> a=findgen(10000,10000)
> IDL> print, min(a), max(a)
> 0.000000 7.50000e+007 ;a huge error
> IDL> h=histogram(a, BINSIZE=10.^5)
> IDL> print, h
>
> Not just that the maximum value is way too small. The histogram seems strange too. I don't show here the wohle histogram output. As expected the h contains in the begining 100 000 values per bin. But this is just for the first 167 bins. in 168 there are 8 300 000 values, then following about 80 bins contain no values at all, then comes one with 25 000 000, then zeros again... 25 000 000.
>
> So this is not the precision. Is there a point that I am missing, perhaps by the mantisa of a number? If I use dindgen at the begining, everything seems ok. Or is the sky falling anyway? :)
I just think you are dealing with numbers that don't
have enough significant digits to do what you want to
do. You are going to have to multiply (or divide, as
the case may be) by a number that restores enough
significant values for you to perform your operation.
Or, alternatively, use double precision arrays on
the numbers you do have.
Cheers,
David
--
David Fanning, Ph.D.
Fanning Software Consulting, Inc.
Coyote's Guide to IDL Programming: http://www.idlcoyote.com/
Sepore ma de ni thui. ("Perhaps thou speakest truth.")
|
|
|
Re: strange behaviour of bytscl by large arrays [message #79961 is a reply to message #79960] |
Mon, 23 April 2012 07:59   |
Klemen
Messages: 80 Registered: July 2009
|
Member |
|
|
Hi David,
I read "the sky is falling" some time ago. I guess I understand how the numbers are saved in the computer. And perhaps this is not the best case I have shown over here. But still, as far as I understand float type, a variable has first 7 figures that should be ok. But I don't really care about the precision on the 7th figure - i have problems on the 1st figure. Look at this example.
IDL> a=findgen(10000,10000)
IDL> print, min(a), max(a)
0.000000 7.50000e+007 ;a huge error
IDL> h=histogram(a, BINSIZE=10.^5)
IDL> print, h
Not just that the maximum value is way too small. The histogram seems strange too. I don't show here the wohle histogram output. As expected the h contains in the begining 100 000 values per bin. But this is just for the first 167 bins. in 168 there are 8 300 000 values, then following about 80 bins contain no values at all, then comes one with 25 000 000, then zeros again... 25 000 000.
So this is not the precision. Is there a point that I am missing, perhaps by the mantisa of a number? If I use dindgen at the begining, everything seems ok. Or is the sky falling anyway? :)
Cheers, Klemen
|
|
|
|
Re: strange behaviour of bytscl by large arrays [message #80040 is a reply to message #79961] |
Tue, 24 April 2012 00:59  |
Carsten Lechte
Messages: 124 Registered: August 2006
|
Senior Member |
|
|
On 23/04/12 16:59, Klemen wrote:
> So this is not the precision.[...]
It totally is:
IDL> a=findgen(10000,10000)
IDL> plot, a
IDL> print, a[20000000l], a[20000000l] EQ a[20000000]+1
1.67772e+07 1
Since I tried it on a 2-cpu machine, the second half starts with the
correct value, which then stays the same for the whole second part of
the array:
IDL> print, a[60000000l], a[60000000l] EQ a[60000000]+1
5.00000e+07 1
chl
|
|
|
Re: strange behaviour of bytscl by large arrays [message #80045 is a reply to message #79953] |
Mon, 23 April 2012 12:00  |
Klemen
Messages: 80 Registered: July 2009
|
Member |
|
|
Lajos, thank you, this is the same thing as I observed, but it's nice to know that somebody else hase similar experience. You mentioned multhreading - is it possible, that the example I showed (it looks like having 4 areas - waves first and then 3 more constant areas) are due to my processor having two cores (four threads)?
David, sorry that I wasn't so clear in my first post. Yes my problem is, that I have 5 × 7523 × 11727 pixel large dataset. All the values are long integers, relativelly small (not more than 10 000). So the values are not the problem. But coverting my array to byte made me really question what the hell is going on. Conversion worked well on the subset but on the whole image not. The same is converting large float arrays to byte.
So can it be that some IDL functions don't work proper because they include FINDGEN in their code? Or some similar bug? I have today observed strange perfomance on large arrays using BYTE, BYTSCL, HIST_EQUAL, and also the belowed HISTOGRAM. Working on the doubble precision data seems not to be affected.
Cheers, Klemen
|
|
|
Re: strange behaviour of bytscl by large arrays [message #80046 is a reply to message #79957] |
Mon, 23 April 2012 13:22  |
chris_torrence@NOSPAM
Messages: 528 Registered: March 2007
|
Senior Member |
|
|
On Monday, April 23, 2012 10:14:21 AM UTC-6, fawltyl...@gmail.com wrote:
>
> I think IDL's FINDGEN() implementation is wrong: it uses a float counter instead of an integer one. The following test shows the difference:
>
> pro test
> cpu, tpool_nthreads=1
> n=10l^8
> nn=n-1
> a1=findgen(n) ; real FINDGEN()
> a2=fltarr(n)
> count=0.0
> for j=0l, nn do a2[j]=count++ ; IDL's implementation
> a3=fltarr(n)
> count=0ll
> for j=0l, nn do a3[j]=count++ ; better implementation
> print, a1[nn], a2[nn], a3[nn], format='(3F15.3)'
> end
>
> (Multithreading must be disabled because the starting values for the threads are calculated as an integer. So the result of FINDGEN() depends on the number of your CPU cores, too :-)
>
> regards,
> Lajos
Well, wrong is perhaps too strong of a word. The real word is "fast". I just did a test where I changed the internal implementation of FINDGEN to use an integer counter. The "float" counter is 4 times faster than using an integer counter and converting it to floats.
However, perhaps we could look at the size of the input array, and switch to using the slower integer counter if it was absolutely necessary. I'll give it a thought.
Thanks for reporting this!
Cheers,
Chris
Exelis VIS
|
|
|
Re: strange behaviour of bytscl by large arrays [message #80048 is a reply to message #79953] |
Mon, 23 April 2012 13:54  |
Klemen
Messages: 80 Registered: July 2009
|
Member |
|
|
Hmm today I don't have just the usual problems with IDL, but also with thist post... I would assume that everything is online once you press post... Anyway, short version once again.
Lajos, thank you very much. It's nice to know that I am not the only one seeing a miracle. :) Although the sky is not falling. :) And what I find really interesting is that you observed also the correlation to the number of CPU.
David, sorry that I was not clear in the first post. Yes the thing is that using BYTSCL and HIST_EQAL have some difficulties when processing large arrays. Using the above example of Lajos, see the output of the histogram.
pro test
cpu, tpool_nthreads=1
n=10l^8
nn=n-1
a3=fltarr(n)
count=0ll
for j=0l, nn do a3[j]=count++ ; better implementation
h = histogram(a3, binsize=10L^6)
print, h
b = byte(a3)
h = histogram(b)
print, h
end
IDL prints:
1000000 1000000 1000000 1000000 1000000 1000000 1000000
1000000 1000000 1000000 1000000 1000000 1000000 1000000
1000000 1000000 999999 1000000 1000000 1000000 1000000
1000000 1000000 1000000 1000000 1000000 1000000 1000000
1000000 1000000 1000000 1000000 1000000 999999 1000000
1000000 1000000 1000000 1000000 1000000 1000000 1000000
1000000 1000000 1000000 1000000 1000000 1000000 1000000
1000000 1000000 1000000 1000000 1000000 1000000 1000000
1000000 1000000 1000000 1000000 1000000 1000000 1000000
1000000 1000000 1000000 1000000 999998 1000000 1000000
1000000 1000000 1000000 1000000 1000000 1000000 1000000
1000000 1000000 1000000 1000000 1000000 1000000 1000000
1000000 1000000 1000000 1000000 1000000 1000000 1000000
1000000 1000000 1000000 1000000 1000000 1000000 1000000
1000000 1000000 4
2073833 65536 131072 65536 655360 65536 131072
65536 1816871 65536 131072 65536 655360 65536
131072 65536 2073833 65536 131072 65536 655360
65536 131072 65536 1816871 65536 131072 65536
655360 65536 131072 65536 2073833 65536 131072
65536 655360 65536 131072 65536 1816871 65536
131072 65536 655360 65536 131072 65536 2073833
65536 131072 65536 655360 65536 131072 65536
1816871 65536 131072 65536 655360 65536 131072
65536 2073833 65536 131072 65536 655360 65536
131072 65536 1816871 65536 131072 65536 655360
65536 131072 65536 2073833 65536 131072 65536
655360 65536 131072 65536 1816871 65536 131072
65536 655360 65536 131072 65536 2073833 65536
131072 65536 655360 65536 131072 65536 1816871
65536 131072 65536 655360 65536 131072 65536
2073833 65536 131072 65536 655360 65536 131072
65536 1816871 65536 131072 65536 655360 65536
131072 65536 2073833 65536 131072 65536 655360
65536 131072 65536 1816871 65536 131072 65536
655360 65536 131072 65536 2073833 65536 131072
65536 655360 65536 131072 65536 1816871 65536
131072 65536 655360 65536 131072 65536 2073833
65536 131072 65536 655360 65536 131072 65536
1816871 65536 131072 65536 655360 65536 131072
65536 2073833 65536 131072 65536 655360 65536
131072 65536 1816871 65536 131072 65536 655360
65536 131072 65536 2073833 65536 131072 65536
655360 65536 131072 65536 1816871 65536 131072
65536 655360 65536 131072 65536 2073833 65536
131072 65536 655360 65536 131072 65536 1816871
65536 131072 65536 655360 65536 131072 65536
2073833 65536 131072 65536 655360 65536 131072
65536 1816871 65536 131072 65536 655360 65536
131072 65536 2073833 65536 131072 65536 655360
65536 131072 65536 1816871 65536 131072 65536
655360 65536 131072 65536
Isn't that strange - after scaling to BYTE each histogram bin should contain the same number of values, just like the original histogram. This is not a prblem if I set n no 10^7.
Cheers, Klemen
|
|
|