FORTRAN compiler limitations

Next: Formatting C Expressions Up: Implementation Previous: Assignments to expressions

FORTRAN compiler limitations

Many FORTRAN compilers impose a limit on the number of continuation lines a given expression may contain. There are two limitations of Mathematica's FortranForm in this sense. The first is the way that FortranForm breaks expressions into multiple lines. FortranForm attempts to format aesthetically by avoiding breaking sub-expressions where possible, thus producing many more lines of code than is necessary. Compilers do not impose restrictions on where expressions are broken (other than the ANSI standard formatting in columns to ). Secondly, FortranForm imposes no limit on the number of lines of code produced. Typically a restriction of continuation lines by a compiler makes FortranForm useless for translating large expressions.

Code has been developed to automatically extract sub-expressions and introduce temporary variables when expressions exceed a specified tolerance. The syntax of the original expression is maintained, by ensuring only syntactically correct sub-expressions are extracted. This was the most difficult feature to implement. After attempting several different strategies, the following approach was adopted because of its efficiency, whilst at the same time it provides a degree of user control.

Sub-expression extraction is performed recursively. A user specified tolerance is input using the option AssignMaxSize. This relates to the maximum number of bytes in memory a sub-expression may occupy. The test is based on Mathematica's ByteCount, because evaluation is very efficient. Starting at the root of the expression tree, each level in an expression is traversed. Branches in the expression tree are tested to see if they exceed bounds. If the bound is exceeded, a sub-expression within tolerance is found (and extracted) by recursing further down the expression tree. The sub-expression is assigned to a new temporary variable. The position occupied by the sub-expression is replaced by the temporary variable. This process is repeated until the expression is fully traversed and what remains is within bounds.

The test used to estimate the size of sub-expressions is similar to the current strategy implemented by M. Monagan in Maple [17], however, the recursive extraction is quite different. There are several deficiencies in ByteCount for our purposes, which should be mentioned. Firstly, it takes no account of the length of a variable name.


In[4]:= Map[ByteCount,{a,aa,aaa}]

Out[4]= {0, 0, 0}

Secondly, ByteCount resolves very little information concerning numbers and their precision.


In[5]:= Map[ByteCount,{1, 123456789, 123.456}]

Out[5]= {12, 12, 20}

Whilst the strategy based upon ByteCount is only a rough heuristic, it works well in practise and is justified on this basis. Before proceeding with an example, it seems appropriate to explain the above behaviour by giving some more detailed information concerning how ByteCount is implemented.

ByteCount[] for machines with -bit integers returns bytes plus an byte expression overhead. ByteCount[] returns the byte expression overhead plus the memory used for a machine real plus whatever is needed for memory alignment. On computers with bit reals, this works out as bytes. On computers like the Apple Macintosh which typically use bit reals and byte memory alignment, the result is bytes.

ByteCount[] always returns zero. The reasoning behind this is that symbols are always shared, so multiple copies of the same symbol in an expression are not counted twice. As was illustrated above, they aren't even counted once. Allowance for shared expressions disagrees with the documentation for ByteCount [35].

The overhead for a normal expression is essentially bytes. The rest of the memory is normally -byte pointers for the head and each of the elements. Since symbols are taken to use zero memory, ByteCount[f[]] uses bytes, ByteCount[f[x]] uses , and ByteCount[f[x,y]] uses .

A more complicated example is f[f[x],f[x]], which consists of an outer normal expression with two elements. The overhead of this expression is bytes, plus a byte pointer for the head and each of the two elements, yielding a total of bytes. The first element, , is a normal expression with one element, and uses bytes, as does the second element. The total for the entire expression is bytes.

Some comments also need to be made concerning the temporary variable introduced during recursive decomposition of an expression. The option AssignTemporary specifies a symbol or string name temp for the temporary assignment variable introduced during sub-expression extraction. The temporary variables are introduced sequentially as temp1, temp2, ... etc. The user should take care not to assign to symbols with the same name.

In FORTRAN, implicit data statement declarations may be used to avoid individual variable type declarations. For example when data typing a FORTRAN program, the code below would be preceeded by a declaration such as:


implicit double precision(a-h,o-z)

The following example illustrates the use of the auto-extraction code.


In[6]:= example = {Sin[Expand[(a+Exp[(b-c-d)])^3]]};

In[7]:= FortranAssign[x,example,AssignMaxSize->400]

Out[7]//OutputForm=
        t1=a**3+exp(3.d0*b-3.d0*c-3.d0*d)
        t2=3.d0*a*exp(2.d0*b-2.d0*c-2.d0*d)
        t3=3.d0*a**2*exp(b-c-d)
        x(1)=sin(t1+t2+t3)

The default setting of AssignMaxSize should be adjusted by the user if overly-long statements are produced. Experimentation with ByteCount (utilising the previous information) should give a more precise indication. There is a minimum setting of

, which prevents the user-specification of unattainable bounds.

bondaren@thsun1.jinr.ru