← ppnm

Exercise "machine epsilon"

Tasks

  1. Maximum/minimum representable integers.

  2. The machine epsilon is the difference between 1.0 and the next representable floating point number. Using the "while" loop calculate the machine epsilon for the types "float" and "double". Something like
    double x=1; while(1+x!=1){x/=2;} x*=2;
    float y=1F; while((float)(1F+y) != 1F){y/=2F;} y*=2F;
    
    There seem to be no predefined values for this numbers in csharp (I couldn't find it in any case). However, in a IEEE 64-bit floating-point number (double), where 1bit is reserved for the sign and 11bits for exponent, there are 52bits remaining for the fraction, therefore the double machine epsilon must be about System.Math.Pow(2,-52). For single precision (float) the machine epsilon should be about System.Math.Pow(2,-23). Check this.
  3. Suppose "tiny=epsilon/2". Calculate the two values,

    a=1+tiny+tiny;
    b=tiny+tiny+1;
    
    which should seemingly be the same and check whether "a==b", "a>1", "b>1". Something like,
    double epsilon=Pow(2,-52);
    double tiny=epsilon/2;
    double a=1+tiny+tiny;
    double b=tiny+tiny+1;
    Write($"a==b ? {a==b}\n");
    Write($"a>1  ? {a>1}\n");
    Write($"b>1  ? {b>1}\n");
    
    
    Explain the results.

  4. Comparing doubles: introduction

    The equality operator "==" works well on integer types but is not very useful on floating types. Indeed most doubles do not have an exact representation of their values in a computer. They must be rounded to be saved. Because of this rounding, comparing two doubles with the "==" operator would often produce a wrong result. For example, in this code
    double d1 = 0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1;
    double d2 = 8*0.1; 
    both doubles "d1" and "d2" should be equal 0.8 and then the "==" operator should produce the "true" result. However, try
    WriteLine($"d1={d1:e15}");
    WriteLine($"d2={d2:e15}");
    WriteLine($"d1==d2 ? => {d1==d2}"); 
    and see that this is not the case (not in my box in any case). That is because the decimal number 0.1 cannot be represented exactly as a 52-digit binary number.

    For this reason, one needs a more complex comparison algorithm. Two doubles in a finite digit representation can only be compared with the given absolute and/or relative precision (where the values for the precision actually depend on the task at hand and generally must be supplied by the user).

    Comparing doubles: task

    Therefore, implement a function with the signature
    bool approx(double a, double b, double acc=1e-9, double eps=1e-9)
    
    that returns "true" if the numbers "a" and "b" are equal either with absolute precision "acc",
    |a-b| ≤ acc
    
    or with relative precision "eps",
    |a-b|/Max(|a|,|b|) ≤ eps
    
    and returns "false" otherwise. Something like
    public static bool approx
    (double a, double b, double acc=1e-9, double eps=1e-9){
    	if(Abs(b-a) <= acc) return true;
    	if(Abs(b-a) <= Max(Abs(a),Abs(b))*eps) return true;
    	return false;
    }
    
    Compare our "d1" and "d2" from above with your approx function.