amiga-e/amigae33a/E_v3.3a/Docs/BeginnersGuide/Recursion.guide

@database beginner.guide

@Master beginner

@Width 75


This is the AmigaGuide® file beginner.guide, produced by Makeinfo-1.55 from
the input file beginner.


@NODE "main" "Recursion"
@Next "OOE.guide/main"
@Prev "FloatingPoint.guide/main"
@Toc "Contents.guide/main"

Recursion
*********

   A @{fg shine }recursive@{fg text } function is very much like a function which uses a loop.
Basically, a @{fg shine }recursive@{fg text } function calls itself (usually after some
manipulation of data) rather than iterating a bit of code using a loop.
There are also recursive types, which are objects with elements which have
the object type (in E these would be pointers to objects).  We've already
seen a recursive type: linked lists, where each element in the list
contains a pointer to the next element (see @{"Linked Lists" Link "Types.guide/Linked Lists" }).

   Recursive definitions are normally much more understandable than an
equivalent iterative definition, and it's usually easier to use recursive
functions to manipulate this data from a recursive type.  However,
recursion is by no means a simple topic.  Read on at your own peril!


 @{" Factorial Example " Link "Factorial Example" }
 @{" Mutual Recursion " Link "Mutual Recursion" }
 @{" Binary Trees " Link "Binary Trees" }
 @{" Stack (and Crashing) " Link "Stack (and Crashing)" }
 @{" Stack and Exceptions " Link "Stack and Exceptions" }


@ENDNODE

@NODE "Factorial Example" "Factorial Example"
@Next "Mutual Recursion"
@Toc "main"

Factorial Example
=================

   The normal example for a recursive definition is the factorial
function, so let's not be different.  In school mathematics the symbol @{b }!@{ub }
is used after a number to denote the factorial of that number (and only
positive integers have factorials).  n! is n-factorial, which is defined
as follows:

     n! = n * (n-1) * (n-2) * ... * 1     (for n >= 1)

So, 4! is 4*3*2*1, which is 24.  And, 5! is 5*4*3*2*1, which is 120.

   Here's the iterative definition of a factorial function (we'll @{b }Raise@{ub } an
exception is the number is not positive, but you can safely leave this
check out if you are sure the function will be called only with positive
numbers):

     PROC fact_iter(n)
       DEF i, result=1
       IF n<=0 THEN Raise("FACT")
       FOR i:=1 TO n
         result:=result*i
       ENDFOR
     ENDPROC result

We've used a @{b }FOR@{ub } loop to generate the numbers one to @{b }n@{ub } (the parameter to
the @{b }fact_iter@{ub }), and @{b }result@{ub } holds the intermediate and final results.  The
final result is returned, so check that @{b }fact_iter(4)@{ub } returns 24 and
@{b }fact_iter(5)@{ub } returns 120 using a @{b }main@{ub } procedure something like this:

     PROC main()
       WriteF('4! is \\d\\n5! is\\d\\n', fact_iter(4), fact_iter(5))
     ENDPROC

   If you're really observant you might have noticed that 5! is 5*4!, and,
in general, n! is n*(n-1)!.  This is our first glimpse of a recursive
definition--we can define the factorial function in terms of itself.  The
real definition of factorial is (the reason why this is the real
definition is because the `...' in the previous definition is not
sufficiently precise for a mathematical definition):

     1! = 1
     n! = n * (n-1)!    (for n > 1)

Notice that there are now two cases to consider.  The first case is called
the @{fg shine }base@{fg text } case and gives an easily calculated value (i.e., no recursion
is used).  The second case is the @{fg shine }recursive@{fg text } case and gives a definition
in terms of a number nearer the base case (i.e., (n-1) is nearer 1 than n,
for n>1).  The normal problem people get into when using recursion is they
forget the base case.  Without the base case the definition is meaningless.
Without a base case in a recursive program the machine is likely to crash!
(See @{"Stack (and Crashing)" Link "Stack (and Crashing)" }.)

   We can now define the recursive version of the @{b }fact_iter@{ub } function
(again, we'll use a @{b }Raise@{ub } if the number parameter is not positive):

     PROC fact_rec(n)
       IF n=1
         RETURN 1
       ELSEIF n>=2
         RETURN n*fact_rec(n-1)
       ELSE
         Raise("FACT")
       ENDIF
     ENDPROC

Notice how this looks just like the mathematical definition, and is nice
and compact.  We can even make a one-line function definition (if we omit
the check on the parameter being positive):

     PROC fact_rec2(n) RETURN IF n=1 THEN 1 ELSE n*fact_rec2(n-1)

You might be tempted to omit the base case and write something like this:

     /* Don't do this! */
     PROC fact_bad(n) RETURN n*fact_bad(n-1)

The problem is the recursion will never end.  The function @{b }fact_bad@{ub } will
be called with every number from @{b }n@{ub } to zero and then all the negative
integers.  A value will never be returned, and the machine will crash
after a while.  The precise reason why it will crash is given later (see
@{"Stack (and Crashing)" Link "Stack (and Crashing)" }).


@ENDNODE

@NODE "Mutual Recursion" "Mutual Recursion"
@Next "Binary Trees"
@Prev "Factorial Example"
@Toc "main"

Mutual Recursion
================

   In the previous section we saw the function @{b }fact_rec@{ub } which called
itself.  If you have two functions, @{b }fun1@{ub } and @{b }fun2@{ub }, and @{b }fun1@{ub } calls @{b }fun2@{ub },
and @{b }fun2@{ub } calls @{b }fun1@{ub }, then this pair of functions are @{fg shine }mutually@{fg text } recursive.
This extends to any amount of functions linked in this way.

   This is a rather contrived example of a pair of mutually recursive
functions.

     PROC f(n)
       IF n=1
         RETURN 1
       ELSEIF n>=2
         RETURN n*g(n-1)
       ELSE
         Raise("F")
       ENDIF
     ENDPROC

     PROC g(n)
       IF n=1
         RETURN 2*1
       ELSEIF n>=2
         RETURN 2*n*f(n-1)
       ELSE
         Raise("G")
       ENDIF
     ENDPROC

Both functions are very similar to the @{b }fact_rec@{ub } function, but @{b }g@{ub } returns
double the normal values.  The overall effect is that every other value in
long version of the multiplication is doubled.  So, @{b }f(n)@{ub } computes
n*(2*(n-1))*(n-2)*(2*(n-3))*...*2 which probably isn't all that
interesting.


@ENDNODE

@NODE "Binary Trees" "Binary Trees"
@Next "Stack (and Crashing)"
@Prev "Mutual Recursion"
@Toc "main"

Binary Trees
============

   This is an example of a recursive type and the effect it has on
functions which manipulate this type of data.  A @{fg shine }binary tree@{fg text } is like a
linked list, but instead of each element containing only one link to
another element there are two links in each element of a binary tree
(which point to smaller trees called @{fg shine }branches@{fg text }).  The first link points
to the @{i }left@{ui } branch and the second points to the @{i }right@{ui } branch.  Each
element of the tree is called a @{fg shine }node@{fg text } and there are two kinds of special
node: the start point, called the @{fg shine }root@{fg text } of the tree (like the head of a
list), and the nodes which do not have left or right branches (i.e., @{b }NIL@{ub }
pointers for both links), called @{fg shine }leaves@{fg text }.  Every node of the tree
contains some kind of data (just as the linked lists contained an E-string
or E-list in each element).  The following diagram illustrates a small
tree.

                 +------+
                 | Root |
                 +--*---+
                   / \\
             Left /   \\ Right
                 /     \\
         +------*       *------+
         | Node |       | Node |
         +--*---+       +--*---+
           /              / \\
     Left /         Left /   \\ Right
         /              /     \\
     +--*---+     +----*-+   +-*----+
     | Leaf |     | Leaf |   | Leaf |
     +------+     +------+   +------+

Notice that a node might have only one branch (it doesn't have to have
both the left and the right).  Also, the leaves on the example were all at
the same level, but this doesn't have to be the case.  Any of the leaves
could easily have been a node which had a lot of nodes branching off it.

   So, how can a tree structure like this be written as an E object?
Well, the general outline is this:

     OBJECT tree
       data
       left:PTR TO tree, right:PTR TO tree
     ENDOBJECT

The @{b }left@{ub } and @{b }right@{ub } elements are pointers to the left and right branches
(which will be @{b }tree@{ub } objects, too).  The @{b }data@{ub } element is some data for each
node.  This could equally well be a pointer, an @{b }ARRAY@{ub } or a number of
different data elements.

   So, what use can be made of such a tree?  Well, a common use is for
holding a sorted collection of data that needs to be able to have elements
added quickly.  As an example, the data at each node could be an integer,
so a tree of this kind could hold a sorted set of integers.  To make the
tree sorted, constraints must be placed on the left and right branches of
a node.  The left branch should contain only nodes with data that is @{i }less@{ui }
than the parent node's data, and, similarly, the right branch should
contain only nodes with data that is @{i }greater@{ui }.  Nodes with the same data
could be included in one of the branches, but for our example we'll
disallow them.  We are now ready to write some functions to manipulate our
tree.

   The first function is one which starts off a new set of integers (i.e.,
begins a new tree).  This should take an integer as a parameter and return
a pointer to the root node of new tree (with the integer as that node's
data).

     PROC new_set(int)
       DEF root:PTR TO tree
       NEW root
       root.data:=int
     ENDPROC root

The memory for the new tree element must be allocated dynamically, so this
is a good example of a use of @{b }NEW@{ub }.  Since @{b }NEW@{ub } clears the memory it
allocates all elements of the new object will be zero.  In particular, the
@{b }left@{ub } and @{b }right@{ub } pointers will be @{b }NIL@{ub }, so the root node will also be a leaf.
If the @{b }NEW@{ub } fails a @{b }"MEM"@{ub } exception is raised; otherwise the data is set to
the supplied value and a pointer to the root node is returned.

   To add a new integer to such a set we need to find the appropriate
position to insert it and set the left and right branches correctly.  This
is because if the integer is new to the set it will be added as a new
leaf, and so one of the existing nodes will change its left or right
branch.

     PROC add(i, set:PTR TO tree)
       IF set=NIL
         RETURN new_set(i)
       ELSE
         IF i<set.data
           set.left:=add(i, set.left)
         ELSEIF i>set.data
           set.right:=add(i, set.right)
         ENDIF
         RETURN set
       ENDIF
     ENDPROC

This function returns a pointer to the set to which it added the integer.
If this set was initially empty a new set is created; otherwise the
original pointer is returned.  The appropriate branches are corrected as
the search progresses.  Only the last assignment to the left or right
branch is significant (all others do not change the value of the pointer),
since it is this assignment that adds the new leaf.  Here's an iterative
version of this function:

     PROC add_iter(i, set:PTR TO tree)
       DEF node:PTR TO tree
       IF set=NIL
         RETURN new_set(i)
       ELSE
         node:=set
         LOOP
           IF i<node.data
             IF node.left=NIL
               node.left:=new_set(i)
               RETURN set
             ELSE
               node:=node.left
             ENDIF
           ELSEIF i>node.data
             IF node.right=NIL
               node.right:=new_set(i)
               RETURN set
             ELSE
               node:=node.right
             ENDIF
           ELSE
             RETURN set
           ENDIF
         ENDLOOP
       ENDIF
     ENDPROC

As you can see, it's quite a bit messier.  Recursive functions work well
with manipulation of recursive types.

   Another really neat example is printing the contents of the set.  It's
deceptively simple:

     PROC show(set:PTR TO tree)
       IF set<>NIL
         show(set.left)
         WriteF('\\d ', set.data)
         show(set.right)
       ENDIF
     ENDPROC

The integers in the nodes will get printed in order (providing they were
added using the @{b }add@{ub } function).  The left-hand nodes contain the smallest
elements so the data they contain is printed first, followed by the data
at the current node, and then that in the right-hand nodes.  Try writing
an iterative version of this function if you fancy a really tough problem.

   Putting everything together, here's a @{b }main@{ub } procedure which can be used
to test the above functions:

     PROC main() HANDLE
       DEF s, i, j
       Rnd(-999999)    /* Initialise seed */
       s:=new_set(10)  /* Initialise set s to contain the number 10 */
       WriteF('Input:\\n')
       FOR i:=1 TO 50  /* Generate 50 random numbers and add them to set s */
         j:=Rnd(100)
         add(j, s)
         WriteF('\\d ',j)
       ENDFOR
       WriteF('\\nOutput:\\n')
       show(s)         /* Show the contents of the (sorted) set s */
       WriteF('\\n')
     EXCEPT
       IF exception="NEW" THEN WriteF('Ran out of memory\\n')
     ENDPROC


@ENDNODE

@NODE "Stack (and Crashing)" "Stack (and Crashing)"
@Next "Stack and Exceptions"
@Prev "Binary Trees"
@Toc "main"

Stack (and Crashing)
====================

   When you call a procedure you use up a bit of the program's @{fg shine }stack@{fg text }.
The stack is used to keep track of procedures in a program which haven't
finished, and real problems can arise when the stack space runs out.
Normally, the amount of stack available to each program is sufficient,
since the E compiler handles all the fiddly bits quite well.  However,
programs which use a lot of recursion can quite easily run out of stack.

   For example, the @{b }fact_rec(10)@{ub } will need enough stack for ten calls of
@{b }fact_rec@{ub }, nine of which are recursively called.  This is because each call
does not finish until the return value has been computed, so all recursive
calls up to @{b }fact_rec(1)@{ub } need to be kept on the stack until @{b }fact_rec(1)@{ub }
returns one.  Then each procedure will be taken off the stack as they
finish.  If you try to compute @{b }fact_rec(40000)@{ub }, not only will this take a
long time, but it will probably run out of stack space.  When it does run
out of stack, the machine will probably crash or do other weird things.
The iterative version, @{b }fact_iter@{ub } does not have these problems, since it
only takes one procedure call to calculate a factorial using this function.

   If there is the possibility of running out of stack space you can use
the @{b }FreeStack@{ub } (built-in) function call (see @{"System support functions" Link "BuiltIns.guide/System support functions" }).
This returns the amount of free stack space.  If it drops below about 1KB
then you might like to stop the recursion or whatever else is using up the
stack.  Also, you can specify amount of stack your program gets (and
override what the compiler might decide is appropriate) using the @{b }OPT
STACK@{ub } option.  See the `Reference Manual' for more details on E's
stack organisation.


@ENDNODE

@NODE "Stack and Exceptions" "Stack and Exceptions"
@Prev "Stack (and Crashing)"
@Toc "main"

Stack and Exceptions
====================

   The concept `recent' used earlier is connected with the stack (see
@{"Raising an Exception" Link "Exceptions.guide/Raising an Exception" }).  A recent procedure is one which is on the stack,
the most recent being the current procedure.  So, when @{b }Raise@{ub } is called it
looks through the stack until it finds a procedure with an exception
handler.  That handler will then be used, and all procedures before the
selected one on the stack are taken off the stack.

   Therefore, a recursive function with an exception handler can use @{b }Raise@{ub }
in the handler to call the handler in the previous (recursive) call of the
function.  So anything that has been recursively allocated can be
`recursively' deallocated by exception handlers.  This is a very powerful
and important feature of exception handlers.


@ENDNODE