@database beginner.guide @Master beginner @Width 75 This is the AmigaGuideŽ file beginner.guide, produced by Makeinfo-1.55 from the input file beginner. @NODE "main" "Floating-Point Numbers" @Next "Recursion.guide/main" @Prev "Memory.guide/main" @Toc "Contents.guide/main" Floating-Point Numbers ********************** @{fg shine }Floating-point@{fg text } or @{fg shine }real@{fg text } numbers can be used to represent both very small fractions and very large numbers. However, unlike a @{b }LONG@{ub } which can hold every integer in a certain range (see @{"Variable types" Link "Introduction.guide/Variable types" }), floating-point numbers have limited @{fg shine }accuracy@{fg text }. Be warned, though: using floating-point arithmetic in E is quite complicated and most problems can be solved without using floating-point numbers, so you may wish to skip this chapter until you really need to use them. @{" Floating-Point Values " Link "Floating-Point Values" } @{" Floating-Point Calculations " Link "Floating-Point Calculations" } @{" Floating-Point Functions " Link "Floating-Point Functions" } @{" Accuracy and Range " Link "Accuracy and Range" } @ENDNODE @NODE "Floating-Point Values" "Floating-Point Values" @Next "Floating-Point Calculations" @Toc "main" Floating-Point Values ===================== Floating-point values in E are written just like you might expect and are stored in @{b }LONG@{ub } variables: DEF x x:=3.75 x:=-0.0000367 x:=275.0 You must remember to use a decimal point (without any spaces around it) in the number if you want it to be considered a floating-point number, and this is why a trailing @{b }.0@{ub } was used on the number in the last assignment. At present you can't express every floating-point value in this way; the compiler may complain that the value does not fit in 32-bits if you try to use more than about nine digits in a single number. You can, however, use the various floating-point maths functions to calculate any value you want (see @{"Floating-Point Functions" Link "Floating-Point Functions" }). @ENDNODE @NODE "Floating-Point Calculations" "Floating-Point Calculations" @Next "Floating-Point Functions" @Prev "Floating-Point Values" @Toc "main" Floating-Point Calculations =========================== Since a floating-point number is stored in a @{b }LONG@{ub } variable it would normally be interpreted as an integer, and this interpretation will generally not give a number anything like the intended floating-point number. To use floating-point numbers in expressions you must use the (rather complicated) floating-point conversion operator, which is the @{b }!@{ub } character. This converts expressions and the normal maths and comparison operators to and from floating-point. All expressions are, by default, integer expressions. That is, they represent @{b }LONG@{ub } integer values, rather than floating-point values. The first time a @{b }!@{ub } occurs in an expression the value of the expression so far is converted to floating-point and all the operators and variables after this point are considered floating-point. The next time it occurs the (floating-point) value of the expression so far is converted to an integer, and the following operators and variables are considered integer again. You can use @{b }!@{ub } as often as necessary within an expression. Parts of an expression in parentheses are treated as separate expressions, so are, by default, integer expressions (this, includes function call arguments). The integer/floating-point conversions performed by @{b }!@{ub } are not simple. They involve rounding and also bounding. Conversion, for example, from integer to floating-point and back again will generally not result in the original integer value. Here's a few commented examples, where @{b }f@{ub } always holds a floating-point number, and @{b }i@{ub } and @{b }j@{ub } always hold integers: DEF f, i, j i:=1 f:=1.0 f:=i! -> i converted to floating-point (1.0) f:=6.2 i:=!f! -> the expression f is floating-point, -> then converted to integer (6) In the first assignment, the integer value one is assigned to @{b }i@{ub }. In the second, the floating-point value 1.0 is assigned to @{b }f@{ub }. The expression on the right-hand side of third assignment is considered to be an integer until the @{b }!@{ub } is met, at which point it is converted to the nearest floating-point value. So, @{b }f@{ub } is assigned the floating-point value of one (i.e., 1.0), just like it is by the second assignment. The expression in the final assignment needs to start off as floating-point in order to interpret the value stored in @{b }f@{ub } as floating-point. The expression finishes by converting back to integer. The overall result is to turn the floating-point value of @{b }f@{ub } into the nearest integer (in this case, six). The assignments below are more complicated, but should be straight-forward to follow. Again, @{b }f@{ub } always holds a floating-point number, and @{b }i@{ub } and @{b }j@{ub } always hold integers. f:=!f*f -> the whole expression is floating-point, -> and f is squared (6.2*6.2) f:=!f*(i!) -> the whole expression is floating-point, -> i is converted to floating-point and -> multiplied by f j:=!f/(i!)! -> the whole division is floating-point, -> with the result converted to integer j:=!f!/i -> floating-point f is converted to integer -> and is (integer) divided by i IF !f<230.0 THEN RETURN 0 -> floating-point comparison < IF !f>(i!) THEN RETURN 0 -> i converted to floating-point, -> then compared to f If the @{b }!@{ub } were omitted from the first assignment, then not only would the value in @{b }f@{ub } be interpreted (incorrectly) as integer, but the multiplication performed would be integer multiplication, rather than floating-point. In the second assignment, the parentheses around the expression involving @{b }i@{ub } are crucial. Without the parentheses the value stored in @{b }i@{ub } would be interpreted as floating-point. This would be wrong because @{b }i@{ub } actually stores an integer value, so parentheses are used to start a new expression (which defaults to being integer). The value of @{b }i@{ub } is then interpreted correctly, and finally converted to floating-point (by the @{b }!@{ub } just before the closing parenthesis). The (floating-point) multiplication then takes place with two floating-point values, and the result is stored in @{b }f@{ub }. In the last two assignments (using division), @{b }j@{ub } is assigned roughly the same value. However, the expression in the first assignment allows for greater accuracy, since it uses floating-point division. This means the result will be rounded, whereas it is truncated when integer division is used. One important thing to know about floating-point numbers in E is that the following assignments store the same value in @{b }g@{ub } (again, @{b }f@{ub } stores a floating-point number). This is because no computation is performed and no conversion happens: the value in @{b }f@{ub } is simply copied to @{b }g@{ub }. This is especially important for function calls, as we shall see in the next section. Strictly speaking, however, the second version is better, since it shows (to the reader of the code) that the value in @{b }f@{ub } is meant to be floating-point. g:=f g:=!f @ENDNODE @NODE "Floating-Point Functions" "Floating-Point Functions" @Next "Accuracy and Range" @Prev "Floating-Point Calculations" @Toc "main" Floating-Point Functions ======================== There are functions for formatting floating-point numbers to E-strings (so that they can be printed) and for decoding floating-point numbers from strings. There are also a number of built-in, floating-point functions which compute some of the less common mathematical functions, such as the various trigonometric functions. @{b }RealVal(@{ub }@{fg shine }string@{fg text }@{b })@{ub } This works in a similar way to @{b }Val@{ub } for extracting integers from a string. The decoded floating-point value is returned as the regular return value, and the number of characters of @{fg shine }string@{fg text } that were read to make the number is returned as the first optional return value. If a floating-point value could not be decoded from the string then zero is returned as the optional return value and the regular return value will be zero (i.e., 0.0). @{b }RealF(@{ub }@{fg shine }e-string@{fg text }@{b },@{ub }@{fg shine }float@{fg text }@{b },@{ub }@{fg shine }digits@{fg text }@{b })@{ub } Converts the floating-point value @{b }float@{ub } into a string which is stored in @{fg shine }e-string@{fg text }. The number of digits to use after the decimal point is specified by @{fg shine }digits@{fg text }, which can be zero to eight. The floating-point value is rounded to the specified number of digits. A value of zero for @{fg shine }digits@{fg text } gives a result with no fractional part and no decimal point. The @{fg shine }e-string@{fg text } is returned by this function, and this makes it easy to use with @{b }WriteF@{ub }. PROC main() DEF s[20]:STRING, f, i f:=21.60539 FOR i:=0 TO 8 WriteF('f is \\s (using digits=\\d)\\n', RealF(s, f, i), i) ENDFOR ENDPROC Notice that the floating-point argument, @{b }f@{ub }, to @{b }RealF@{ub } does not need a leading @{b }!@{ub } because we are simply passing its value and not performing a computation with it. The program should generate the following output: f is 22 (using digits=0) f is 21.6 (using digits=1) f is 21.61 (using digits=2) f is 21.605 (using digits=3) f is 21.6054 (using digits=4) f is 21.60539 (using digits=5) f is 21.605390 (using digits=6) f is 21.6053900 (using digits=7) f is 21.60539000 (using digits=8) @{b }Fsin(@{ub }@{fg shine }float@{fg text }@{b })@{ub }, @{b }Fcos(@{ub }@{fg shine }float@{fg text }@{b })@{ub }, @{b }Ftan(@{ub }@{fg shine }float@{fg text }@{b })@{ub } These compute the sine, cosine and tangent (respectively) of the supplied @{fg shine }float@{fg text } angle, which is specified in radians. @{b }Fabs(@{ub }@{fg shine }float@{fg text }@{b })@{ub } Returns the absolute value of @{fg shine }float@{fg text }, much like @{b }Abs@{ub } does for integers. @{b }Ffloor(@{ub }@{fg shine }float@{fg text }@{b })@{ub }, @{b }Fceil(@{ub }@{fg shine }float@{fg text }@{b })@{ub } The @{b }Ffloor@{ub } function rounds a floating-point value down to the nearest, whole floating-point value. The @{b }Fceil@{ub } function rounds it up. @{b }Fsqrt(@{ub }@{fg shine }float@{fg text }@{b })@{ub } Returns the square root of @{fg shine }float@{fg text }. @{b }Fpow(@{ub }@{fg shine }x@{fg text }@{b },@{ub }@{fg shine }y@{fg text }@{b })@{ub }, @{b }Fexp(@{ub }@{fg shine }float@{fg text }@{b })@{ub } The @{b }Fpow@{ub } function returns the value of @{fg shine }x@{fg text } raised to the power of @{fg shine }y@{fg text } (which are both floating-point values). The @{b }Fexp@{ub } function returns the value of e raised to the power of @{fg shine }float@{fg text }, where e is the mathematically special value (roughly 2.718282). `Raising to a power' is known as @{fg shine }exponentiation@{fg text }. @{b }Flog10(@{ub }@{fg shine }float@{fg text }@{b })@{ub }, @{b }Flog(@{ub }@{fg shine }float@{fg text }@{b })@{ub } The @{b }Flog10@{ub } function returns the log to base ten of @{fg shine }float@{fg text } (the @{fg shine }common logarithm@{fg text }). The @{b }Flog@{ub } function returns the log to base e of @{fg shine }float@{fg text } (the @{fg shine }natural logarithm@{fg text }). @{b }Flog10@{ub } and @{b }Fpow@{ub } are linked in the following way (ignoring floating-point inaccuracies): x = Fpow(10.0, Flog10(x)) @{b }Flog@{ub } and @{b }Fexp@{ub } are similarly related (@{b }Fexp@{ub } could be used again, using 2.718282 as the first argument in place of 10.0). x = Fexp(Flog(x)) Here's a small program which uses a few of the above functions, and shows how to define functions which use and/or return floating-point values. DEF f, i, s[20]:STRING PROC print_float() WriteF('\\tf is \\s\\n', RealF(s, !f, 8)) ENDPROC PROC print_both() WriteF('\\ti is \\d, ', i) print_float() ENDPROC /* Square a float */ PROC square_float(f) IS !f*f /* Square an integer */ PROC square_integer(i) IS i*i /* Converts a float to an integer */ PROC convert_to_integer(f) IS Val(RealF(s, !f, 0)) /* Converts an integer to a float */ PROC convert_to_float(i) IS RealVal(StringF(s, '\\d', i)) /* This should be the same as Ftan */ PROC my_tan(f) IS !Fsin(!f)/Fcos(!f) /* This should show float inaccuracies */ PROC inaccurate(f) IS Fexp(Flog(!f)) PROC main() WriteF('Next 2 lines should be the same\\n') f:=2.7; i:=!f! print_both() f:=2.7; i:=convert_to_integer(!f) print_both() WriteF('Next 2 lines should be the same\\n') i:=10; f:=i! print_both() i:=10; f:=convert_to_float(i) print_both() WriteF('f and i should be the same\\n') i:=square_integer(i) f:=square_float(f) print_both() WriteF('Next 2 lines should be the same\\n') f:=Ftan(.8) print_float() f:=my_tan(.8) print_float() WriteF('Next 2 lines should be the same\\n') f:=.35 print_float() f:=inaccurate(f) print_float() ENDPROC The @{b }convert_to_integer@{ub } and @{b }convert_to_float@{ub } functions perform similar conversions to those done by @{b }!@{ub } when it occurs in an expression. To make things more explicit, there are a lot of unnecessary uses of @{b }!@{ub }, and these are when @{b }f@{ub } is passed directly as a parameter to a function (in these cases, the @{b }!@{ub } could safely be omitted). All of the examples have the potential to give different results where they ought to give the same, and this is due to the inaccuracy of floating-point numbers. The last example has been carefully chosen to show this. @ENDNODE @NODE "Accuracy and Range" "Accuracy and Range" @Prev "Floating-Point Functions" @Toc "main" Accuracy and Range ================== A floating-point number is just another 32-bit value, so can be stored in @{b }LONG@{ub } variables. It's just the interpretation of the 32-bits which makes them different. A floating-point number can range from numbers as small as 1.3E-38 to numbers as large as 3.4E+38 (that's very small and very large if you don't understand the scientific notation!). However, not every number in this range can @{fg shine }accurately@{fg text } be represented, since the number of significant digits is roughly eight. Accuracy is an important consideration when trying to compare two floating-point numbers and when combining floating-point values after dividing them. It is usually best to check that a floating-point value is in a small range of values, rather than just a particular value. And when combining values, allow for a small amount of error due to rounding etc. See the `Reference Manual' for more details about the implementation of floating-point numbers. @ENDNODE