359 lines
16 KiB
Plaintext
359 lines
16 KiB
Plaintext
@database beginner.guide
|
|
|
|
@Master beginner
|
|
|
|
@Width 75
|
|
|
|
|
|
This is the AmigaGuide® file beginner.guide, produced by Makeinfo-1.55 from
|
|
the input file beginner.
|
|
|
|
|
|
@NODE "main" "Floating-Point Numbers"
|
|
@Next "Recursion.guide/main"
|
|
@Prev "Memory.guide/main"
|
|
@Toc "Contents.guide/main"
|
|
|
|
Floating-Point Numbers
|
|
**********************
|
|
|
|
@{fg shine }Floating-point@{fg text } or @{fg shine }real@{fg text } numbers can be used to represent both very
|
|
small fractions and very large numbers. However, unlike a @{b }LONG@{ub } which can
|
|
hold every integer in a certain range (see @{"Variable types" Link "Introduction.guide/Variable types" }), floating-point
|
|
numbers have limited @{fg shine }accuracy@{fg text }. Be warned, though: using floating-point
|
|
arithmetic in E is quite complicated and most problems can be solved
|
|
without using floating-point numbers, so you may wish to skip this chapter
|
|
until you really need to use them.
|
|
|
|
|
|
@{" Floating-Point Values " Link "Floating-Point Values" }
|
|
@{" Floating-Point Calculations " Link "Floating-Point Calculations" }
|
|
@{" Floating-Point Functions " Link "Floating-Point Functions" }
|
|
@{" Accuracy and Range " Link "Accuracy and Range" }
|
|
|
|
|
|
@ENDNODE
|
|
|
|
@NODE "Floating-Point Values" "Floating-Point Values"
|
|
@Next "Floating-Point Calculations"
|
|
@Toc "main"
|
|
|
|
Floating-Point Values
|
|
=====================
|
|
|
|
Floating-point values in E are written just like you might expect and
|
|
are stored in @{b }LONG@{ub } variables:
|
|
|
|
DEF x
|
|
x:=3.75
|
|
x:=-0.0000367
|
|
x:=275.0
|
|
|
|
You must remember to use a decimal point (without any spaces around it) in
|
|
the number if you want it to be considered a floating-point number, and
|
|
this is why a trailing @{b }.0@{ub } was used on the number in the last assignment.
|
|
At present you can't express every floating-point value in this way; the
|
|
compiler may complain that the value does not fit in 32-bits if you try to
|
|
use more than about nine digits in a single number. You can, however, use
|
|
the various floating-point maths functions to calculate any value you want
|
|
(see @{"Floating-Point Functions" Link "Floating-Point Functions" }).
|
|
|
|
|
|
@ENDNODE
|
|
|
|
@NODE "Floating-Point Calculations" "Floating-Point Calculations"
|
|
@Next "Floating-Point Functions"
|
|
@Prev "Floating-Point Values"
|
|
@Toc "main"
|
|
|
|
Floating-Point Calculations
|
|
===========================
|
|
|
|
Since a floating-point number is stored in a @{b }LONG@{ub } variable it would
|
|
normally be interpreted as an integer, and this interpretation will
|
|
generally not give a number anything like the intended floating-point
|
|
number. To use floating-point numbers in expressions you must use the
|
|
(rather complicated) floating-point conversion operator, which is the @{b }!@{ub }
|
|
character. This converts expressions and the normal maths and comparison
|
|
operators to and from floating-point.
|
|
|
|
All expressions are, by default, integer expressions. That is, they
|
|
represent @{b }LONG@{ub } integer values, rather than floating-point values. The
|
|
first time a @{b }!@{ub } occurs in an expression the value of the expression so far
|
|
is converted to floating-point and all the operators and variables after
|
|
this point are considered floating-point. The next time it occurs the
|
|
(floating-point) value of the expression so far is converted to an
|
|
integer, and the following operators and variables are considered integer
|
|
again. You can use @{b }!@{ub } as often as necessary within an expression. Parts
|
|
of an expression in parentheses are treated as separate expressions, so
|
|
are, by default, integer expressions (this, includes function call
|
|
arguments).
|
|
|
|
The integer/floating-point conversions performed by @{b }!@{ub } are not simple.
|
|
They involve rounding and also bounding. Conversion, for example, from
|
|
integer to floating-point and back again will generally not result in the
|
|
original integer value.
|
|
|
|
Here's a few commented examples, where @{b }f@{ub } always holds a floating-point
|
|
number, and @{b }i@{ub } and @{b }j@{ub } always hold integers:
|
|
|
|
DEF f, i, j
|
|
i:=1
|
|
f:=1.0
|
|
f:=i! -> i converted to floating-point (1.0)
|
|
f:=6.2
|
|
i:=!f! -> the expression f is floating-point,
|
|
-> then converted to integer (6)
|
|
|
|
In the first assignment, the integer value one is assigned to @{b }i@{ub }. In the
|
|
second, the floating-point value 1.0 is assigned to @{b }f@{ub }. The expression on
|
|
the right-hand side of third assignment is considered to be an integer
|
|
until the @{b }!@{ub } is met, at which point it is converted to the nearest
|
|
floating-point value. So, @{b }f@{ub } is assigned the floating-point value of one
|
|
(i.e., 1.0), just like it is by the second assignment. The expression in
|
|
the final assignment needs to start off as floating-point in order to
|
|
interpret the value stored in @{b }f@{ub } as floating-point. The expression
|
|
finishes by converting back to integer. The overall result is to turn the
|
|
floating-point value of @{b }f@{ub } into the nearest integer (in this case, six).
|
|
|
|
The assignments below are more complicated, but should be
|
|
straight-forward to follow. Again, @{b }f@{ub } always holds a floating-point
|
|
number, and @{b }i@{ub } and @{b }j@{ub } always hold integers.
|
|
|
|
f:=!f*f -> the whole expression is floating-point,
|
|
-> and f is squared (6.2*6.2)
|
|
f:=!f*(i!) -> the whole expression is floating-point,
|
|
-> i is converted to floating-point and
|
|
-> multiplied by f
|
|
j:=!f/(i!)! -> the whole division is floating-point,
|
|
-> with the result converted to integer
|
|
j:=!f!/i -> floating-point f is converted to integer
|
|
-> and is (integer) divided by i
|
|
IF !f<230.0 THEN RETURN 0 -> floating-point comparison <
|
|
IF !f>(i!) THEN RETURN 0 -> i converted to floating-point,
|
|
-> then compared to f
|
|
|
|
If the @{b }!@{ub } were omitted from the first assignment, then not only would the
|
|
value in @{b }f@{ub } be interpreted (incorrectly) as integer, but the multiplication
|
|
performed would be integer multiplication, rather than floating-point. In
|
|
the second assignment, the parentheses around the expression involving @{b }i@{ub }
|
|
are crucial. Without the parentheses the value stored in @{b }i@{ub } would be
|
|
interpreted as floating-point. This would be wrong because @{b }i@{ub } actually
|
|
stores an integer value, so parentheses are used to start a new expression
|
|
(which defaults to being integer). The value of @{b }i@{ub } is then interpreted
|
|
correctly, and finally converted to floating-point (by the @{b }!@{ub } just before
|
|
the closing parenthesis). The (floating-point) multiplication then takes
|
|
place with two floating-point values, and the result is stored in @{b }f@{ub }. In
|
|
the last two assignments (using division), @{b }j@{ub } is assigned roughly the same
|
|
value. However, the expression in the first assignment allows for greater
|
|
accuracy, since it uses floating-point division. This means the result
|
|
will be rounded, whereas it is truncated when integer division is used.
|
|
|
|
One important thing to know about floating-point numbers in E is that
|
|
the following assignments store the same value in @{b }g@{ub } (again, @{b }f@{ub } stores a
|
|
floating-point number). This is because no computation is performed and
|
|
no conversion happens: the value in @{b }f@{ub } is simply copied to @{b }g@{ub }. This is
|
|
especially important for function calls, as we shall see in the next
|
|
section. Strictly speaking, however, the second version is better, since
|
|
it shows (to the reader of the code) that the value in @{b }f@{ub } is meant to be
|
|
floating-point.
|
|
|
|
g:=f
|
|
g:=!f
|
|
|
|
|
|
@ENDNODE
|
|
|
|
@NODE "Floating-Point Functions" "Floating-Point Functions"
|
|
@Next "Accuracy and Range"
|
|
@Prev "Floating-Point Calculations"
|
|
@Toc "main"
|
|
|
|
Floating-Point Functions
|
|
========================
|
|
|
|
There are functions for formatting floating-point numbers to E-strings
|
|
(so that they can be printed) and for decoding floating-point numbers from
|
|
strings. There are also a number of built-in, floating-point functions
|
|
which compute some of the less common mathematical functions, such as the
|
|
various trigonometric functions.
|
|
|
|
@{b }RealVal(@{ub }@{fg shine }string@{fg text }@{b })@{ub }
|
|
This works in a similar way to @{b }Val@{ub } for extracting integers from a
|
|
string. The decoded floating-point value is returned as the regular
|
|
return value, and the number of characters of @{fg shine }string@{fg text } that were read
|
|
to make the number is returned as the first optional return value.
|
|
If a floating-point value could not be decoded from the string then
|
|
zero is returned as the optional return value and the regular return
|
|
value will be zero (i.e., 0.0).
|
|
|
|
@{b }RealF(@{ub }@{fg shine }e-string@{fg text }@{b },@{ub }@{fg shine }float@{fg text }@{b },@{ub }@{fg shine }digits@{fg text }@{b })@{ub }
|
|
Converts the floating-point value @{b }float@{ub } into a string which is stored
|
|
in @{fg shine }e-string@{fg text }. The number of digits to use after the decimal point
|
|
is specified by @{fg shine }digits@{fg text }, which can be zero to eight. The
|
|
floating-point value is rounded to the specified number of digits. A
|
|
value of zero for @{fg shine }digits@{fg text } gives a result with no fractional part and
|
|
no decimal point. The @{fg shine }e-string@{fg text } is returned by this function, and
|
|
this makes it easy to use with @{b }WriteF@{ub }.
|
|
|
|
PROC main()
|
|
DEF s[20]:STRING, f, i
|
|
f:=21.60539
|
|
FOR i:=0 TO 8
|
|
WriteF('f is \\s (using digits=\\d)\\n', RealF(s, f, i), i)
|
|
ENDFOR
|
|
ENDPROC
|
|
|
|
Notice that the floating-point argument, @{b }f@{ub }, to @{b }RealF@{ub } does not need a
|
|
leading @{b }!@{ub } because we are simply passing its value and not performing
|
|
a computation with it. The program should generate the following
|
|
output:
|
|
|
|
f is 22 (using digits=0)
|
|
f is 21.6 (using digits=1)
|
|
f is 21.61 (using digits=2)
|
|
f is 21.605 (using digits=3)
|
|
f is 21.6054 (using digits=4)
|
|
f is 21.60539 (using digits=5)
|
|
f is 21.605390 (using digits=6)
|
|
f is 21.6053900 (using digits=7)
|
|
f is 21.60539000 (using digits=8)
|
|
|
|
@{b }Fsin(@{ub }@{fg shine }float@{fg text }@{b })@{ub }, @{b }Fcos(@{ub }@{fg shine }float@{fg text }@{b })@{ub }, @{b }Ftan(@{ub }@{fg shine }float@{fg text }@{b })@{ub }
|
|
These compute the sine, cosine and tangent (respectively) of the
|
|
supplied @{fg shine }float@{fg text } angle, which is specified in radians.
|
|
|
|
@{b }Fabs(@{ub }@{fg shine }float@{fg text }@{b })@{ub }
|
|
Returns the absolute value of @{fg shine }float@{fg text }, much like @{b }Abs@{ub } does for
|
|
integers.
|
|
|
|
@{b }Ffloor(@{ub }@{fg shine }float@{fg text }@{b })@{ub }, @{b }Fceil(@{ub }@{fg shine }float@{fg text }@{b })@{ub }
|
|
The @{b }Ffloor@{ub } function rounds a floating-point value down to the
|
|
nearest, whole floating-point value. The @{b }Fceil@{ub } function rounds it up.
|
|
|
|
@{b }Fsqrt(@{ub }@{fg shine }float@{fg text }@{b })@{ub }
|
|
Returns the square root of @{fg shine }float@{fg text }.
|
|
|
|
@{b }Fpow(@{ub }@{fg shine }x@{fg text }@{b },@{ub }@{fg shine }y@{fg text }@{b })@{ub }, @{b }Fexp(@{ub }@{fg shine }float@{fg text }@{b })@{ub }
|
|
The @{b }Fpow@{ub } function returns the value of @{fg shine }x@{fg text } raised to the power of @{fg shine }y@{fg text }
|
|
(which are both floating-point values). The @{b }Fexp@{ub } function returns
|
|
the value of e raised to the power of @{fg shine }float@{fg text }, where e is the
|
|
mathematically special value (roughly 2.718282). `Raising to a
|
|
power' is known as @{fg shine }exponentiation@{fg text }.
|
|
|
|
@{b }Flog10(@{ub }@{fg shine }float@{fg text }@{b })@{ub }, @{b }Flog(@{ub }@{fg shine }float@{fg text }@{b })@{ub }
|
|
The @{b }Flog10@{ub } function returns the log to base ten of @{fg shine }float@{fg text } (the
|
|
@{fg shine }common logarithm@{fg text }). The @{b }Flog@{ub } function returns the log to base e of
|
|
@{fg shine }float@{fg text } (the @{fg shine }natural logarithm@{fg text }). @{b }Flog10@{ub } and @{b }Fpow@{ub } are linked in the
|
|
following way (ignoring floating-point inaccuracies):
|
|
|
|
x = Fpow(10.0, Flog10(x))
|
|
|
|
@{b }Flog@{ub } and @{b }Fexp@{ub } are similarly related (@{b }Fexp@{ub } could be used again, using
|
|
2.718282 as the first argument in place of 10.0).
|
|
|
|
x = Fexp(Flog(x))
|
|
|
|
Here's a small program which uses a few of the above functions, and
|
|
shows how to define functions which use and/or return floating-point
|
|
values.
|
|
|
|
DEF f, i, s[20]:STRING
|
|
|
|
PROC print_float()
|
|
WriteF('\\tf is \\s\\n', RealF(s, !f, 8))
|
|
ENDPROC
|
|
|
|
PROC print_both()
|
|
WriteF('\\ti is \\d, ', i)
|
|
print_float()
|
|
ENDPROC
|
|
|
|
/* Square a float */
|
|
PROC square_float(f) IS !f*f
|
|
|
|
/* Square an integer */
|
|
PROC square_integer(i) IS i*i
|
|
|
|
/* Converts a float to an integer */
|
|
PROC convert_to_integer(f) IS Val(RealF(s, !f, 0))
|
|
|
|
/* Converts an integer to a float */
|
|
PROC convert_to_float(i) IS RealVal(StringF(s, '\\d', i))
|
|
|
|
/* This should be the same as Ftan */
|
|
PROC my_tan(f) IS !Fsin(!f)/Fcos(!f)
|
|
|
|
/* This should show float inaccuracies */
|
|
PROC inaccurate(f) IS Fexp(Flog(!f))
|
|
|
|
PROC main()
|
|
WriteF('Next 2 lines should be the same\\n')
|
|
f:=2.7; i:=!f!
|
|
print_both()
|
|
f:=2.7; i:=convert_to_integer(!f)
|
|
print_both()
|
|
|
|
WriteF('Next 2 lines should be the same\\n')
|
|
i:=10; f:=i!
|
|
print_both()
|
|
i:=10; f:=convert_to_float(i)
|
|
print_both()
|
|
|
|
WriteF('f and i should be the same\\n')
|
|
i:=square_integer(i)
|
|
f:=square_float(f)
|
|
print_both()
|
|
|
|
WriteF('Next 2 lines should be the same\\n')
|
|
f:=Ftan(.8)
|
|
print_float()
|
|
f:=my_tan(.8)
|
|
print_float()
|
|
|
|
WriteF('Next 2 lines should be the same\\n')
|
|
f:=.35
|
|
print_float()
|
|
f:=inaccurate(f)
|
|
print_float()
|
|
ENDPROC
|
|
|
|
The @{b }convert_to_integer@{ub } and @{b }convert_to_float@{ub } functions perform similar
|
|
conversions to those done by @{b }!@{ub } when it occurs in an expression. To make
|
|
things more explicit, there are a lot of unnecessary uses of @{b }!@{ub }, and these
|
|
are when @{b }f@{ub } is passed directly as a parameter to a function (in these
|
|
cases, the @{b }!@{ub } could safely be omitted). All of the examples have the
|
|
potential to give different results where they ought to give the same, and
|
|
this is due to the inaccuracy of floating-point numbers. The last example
|
|
has been carefully chosen to show this.
|
|
|
|
|
|
@ENDNODE
|
|
|
|
@NODE "Accuracy and Range" "Accuracy and Range"
|
|
@Prev "Floating-Point Functions"
|
|
@Toc "main"
|
|
|
|
Accuracy and Range
|
|
==================
|
|
|
|
A floating-point number is just another 32-bit value, so can be stored
|
|
in @{b }LONG@{ub } variables. It's just the interpretation of the 32-bits which
|
|
makes them different. A floating-point number can range from numbers as
|
|
small as 1.3E-38 to numbers as large as 3.4E+38 (that's very small and
|
|
very large if you don't understand the scientific notation!). However,
|
|
not every number in this range can @{fg shine }accurately@{fg text } be represented, since the
|
|
number of significant digits is roughly eight.
|
|
|
|
Accuracy is an important consideration when trying to compare two
|
|
floating-point numbers and when combining floating-point values after
|
|
dividing them. It is usually best to check that a floating-point value is
|
|
in a small range of values, rather than just a particular value. And when
|
|
combining values, allow for a small amount of error due to rounding etc.
|
|
See the `Reference Manual' for more details about the implementation of
|
|
floating-point numbers.
|
|
|
|
|
|
@ENDNODE
|
|
|