Maximum/minimum representable integers.
The maximum representable integer is the largest integer
i
for which i+1>i
holds true.
Using the while
loop determine your maximum
integer and compare it with "int.MaxValue
".
Something like
int i=1; while(i+1>i) {i++;} Write("my max int = {0}\n",i);It can take some seconds to calculate. It should probably be equal to 2³¹-1=2147483647.
The minimum representable integer is the most negative
integer i
for which i-1<i
holds
true.
Using the "while
" loop determine your minimum
integer and compare with "int.MinValue
".
It should probably be equal to -2³¹=-2147483648.
while
" loop calculate the
machine epsilon for the types "float
" and
"double
".
Something like
double x=1; while(1+x!=1){x/=2;} x*=2; float y=1F; while((float)(1F+y) != 1F){y/=2F;} y*=2F;There seem to be no predefined values for this numbers in csharp (I couldn't find it in any case). However, in a IEEE 64-bit floating-point number (double), where 1bit is reserved for the sign and 11bits for exponent, there are 52bits remaining for the fraction, therefore the double machine epsilon must be about
System.Math.Pow(2,-52)
.
For single precision (float
) the machine epsilon should be about
System.Math.Pow(2,-23)
.
Check this.
Suppose "tiny=epsilon/2
". Calculate the two values,
a=1+tiny+tiny; b=tiny+tiny+1;which should seemingly be the same and check whether "
a==b
",
"a>1
",
"b>1
". Something like,
double epsilon=Pow(2,-52); double tiny=epsilon/2; double a=1+tiny+tiny; double b=tiny+tiny+1; Write($"a==b ? {a==b}\n"); Write($"a>1 ? {a>1}\n"); Write($"b>1 ? {b>1}\n");Explain the results.
double d1 = 0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1; double d2 = 8*0.1;both doubles "d1" and "d2" should be equal 0.8 and then the "==" operator should produce the "true" result. However, try
WriteLine($"d1={d1:e15}"); WriteLine($"d2={d2:e15}"); WriteLine($"d1==d2 ? => {d1==d2}");and see that this is not the case (not in my box in any case). That is because the decimal number 0.1 cannot be represented exactly as a 52-digit binary number.
For this reason, one needs a more complex comparison algorithm. Two doubles in a finite digit representation can only be compared with the given absolute and/or relative precision (where the values for the precision actually depend on the task at hand and generally must be supplied by the user).
bool approx(double a, double b, double acc=1e-9, double eps=1e-9)that returns "
true
" if the numbers "a" and "b" are equal
either with absolute precision "acc",
|a-b| ≤ accor with relative precision "eps",
|a-b|/Max(|a|,|b|) ≤ epsand returns "
false
" otherwise.
Something like
public static bool approx (double a, double b, double acc=1e-9, double eps=1e-9){ if(Abs(b-a) <= acc) return true; if(Abs(b-a) <= Max(Abs(a),Abs(b))*eps) return true; return false; }Compare our "d1" and "d2" from above with your approx function.