\documentstyle[12pt]{article}

\setlength{\parindent}{0pt}

\begin{document}
\title{Massively Parallel Programming \\ with Parallaxis}

\author{Thomas Br\"aunl \\ \\
Universit\"at Stuttgart, Institut f\"ur Informatik \\
Azenbergstr. 12, D-7000 Stuttgart 1, FRG \\ \\
}
\maketitle
\begin{abstract}
{\parindent 0pt} In the Parallaxis programming language, the model of a parallel architecture 
is included as an integral part of the problem solution. That is, the 
combination of algorithm and machine model accounts for an entire 
specification. Since the choice of the computer architecture largely determines
the structure of the algorithm applied, a structured parallel programming 
language should be given the expressive power of stating, or even better, 
selecting the structure of the parallel architecture, for which an algorithm 
is bound. Our language model Parallaxis operates on the class of SIMD array 
processors. In addition to the concurrent algorithm, Parallaxis allows the 
specification of arbitrary network topologies by means of a functional 
description with configuration and connection specifications. Simple concepts 
for concurrent execution of statements and message passing are based on these 
definitions. Variable declaration takes the parallel machine model into account
and therefore splits into variables for the control unit (scalars) and 
variables for each of the parallel processing units (vectors).
\end{abstract}

\section{Introduction}

The Parallaxis language model was designed to allow structured parallel programming in a high 
level language, similar to sequential Pascal or Modula-2. The compiler should 
be machine-independent, so it can be used for a wide range of parallel architectures. In addition, it 
should be useful in exploiting the parallel resources of any particular architecture satisfying the 
basic machine model, in order to achieve optimal performance. Both goals (which might look 
contradictory, at first) are achieved by a translation into a low level intermediate parallel 
language which is  processed by a machine-dependent interpreter or compiler. This flexible model 
is restricted to SIMD structures; it combines the internal representation of the hardware structure 
with the topology- or structure-specific algorithm to form a complete problem solution. 
Thus, it is possible to specify number and arrangement of processing elements, as well as their 
communication network in Parallaxis. \\ \\
Here, we get a new meaning of the term structured. The whole problem solution is structured, 
including the algorithm, as well as the architecture for which the algorithm is bound. Given this 
semantics, it is necessary to include a machine description, if the language should not be 
restricted to a single architecture that is implicitly assumed. An algorithm for the same problem 
will look totally different if designed for a hypercube instead of a ring-topology. Besides, for a 
certain problem there may be structures well suited and others that do not match very well. If 
there is a choice (as for reconfigurable parallel systems), the architecture specification can be 
used for creating the best matching architecture. \\ \\
In this paper, I will address the language concepts supporting the Parallaxis user in 
programming a SIMD computer on a high level of abstraction. I will not be concerned at this 
point, however, what strategy or algorithm to apply in order to perform an efficient mapping 
between a topology specified for a certain application and the topology provided by the 
physical hardware structure. This problem is handled automatically on a lower operating system 
level, transparent to the user. However, it will greatly affect the efficiency of an application 
with heavy communication. H. J. Siegel developed efficient mapping algorithms for simulating 
one topology on a different topology. \\
Corresponding to the machine model of a processor-network, controlled by a single control 
unit, variables may be declared either for the control unit (using the declaration keyword 
scalar), or for each of the PEs (using vector). Each variable is strictly typed, like in Modula-2, 
so there may be no vector variables in an expression that is to be assigned to a scalar variable. 
Vectors may be used inside explicitly marked parallel blocks or operations only while scalars 
may also appear in parallel vector expressions (requiring a duplication or broadcast of that 
particular value).

\section{Specification of Network Topologies}

Parallaxis makes the following assumptions about its abstract parallel machine, whether it is 
physically present or simulated:
\begin{itemize}
\item SIMD structure with
\begin{itemize}
\item central control unit
\item variable number of processing elements (PEs)
\item flexible (reroutable) communication network  that is, it can be used as if forming any topological structure
\end{itemize}
\item all operations occur synchronously
\item all PEs are identical in processor and memory structure
\item each PE owns the same number of bi-directional ports for sending and receiving messages
\item one process per PE
\end{itemize}
                              
Within the realm of this machine model, the programmer may now choose a certain topological 
structure for the communication network to achieve the best possible match between 
architecture and algorithm of a given problem. For each application, the number of PEs and the 
network topology is static, that is, it has to be specified prior to compilation and will remain 
unchanged during program execution. \\ \\
The specification of the network structure's logical form takes two steps: First, Parallaxis is 
told the number of processing elements one wants to use and how they will be arranged in 
dimensions. This almost exactly takes the form of Modula-2's array declaration, except that we 
are dealing with processors here instead of data elements. Although we create some neighbor 
relation between the PEs by this declaration, this does not specify any connections between 
processors which is reserved for the second step. There, we may specify a transfer-function 
(from a general PE position to its relative neighbor) for every processor-port in this topology. 
Or, stated differently, the number of transfer-functions provided equals the number of ports per 
PE. Each transfer-function has a name (the exit-port's name) and also states the name of the 
corresponding entry port on its neighbor-PE after a period. \\ \\
Parallaxis allows the specification of arbitrary network topologies. The examples shown in the 
following sections should be guidelines for modeling other topologies.


\subsection{Ring Structure}
The simple topological structure of a ring may be specified as follows in Parallaxis:
\begin{figure}[h]
\scriptsize
\begin{verbatim}                      		 
CONFIGURATION  ring [12];
CONNECTION  c_wise : ring[i]  ->  ring[(i+1) mod 12].cc_wise; 
            cc_wise: ring[i]  ->  ring[(i-1) mod 12].c_wise;

          11  ---  0  ---  1
           |               |
          10               2
           |               |
           9               3             <-- PE -->
           |               |          cc_wise   c_wise
           8               4
           |               |
           7  ---  6  ---  5

\end{verbatim}     		 
\normalsize
\caption{ring topology}
\end{figure}\\
With the configuration specification the user tells the system to reserve twelve processors, 
labeled zero to eleven, that are arranged in a single dimension. Each PE owns two ports, called 
$c\_wise$ (for clockwise mapping) and $cc\_wise$ (for counter-clockwise mapping), corresponding 
to these two transfer-functions. While the function $c\_wise$ maps any processing element to the 
next higher PE, the function $cc\_wise$ maps any element to the next lower one. Using the 
modulo-operator results in a closed topology that is, every element has a neighbor at every port.


\subsection{Two-Dimensional Grids}

Now, let us take a look at some simple two-dimensional network structures: the grid, an open 
topology that might be useful in application areas ranging from image processing to air flow 
analysis for wing profile design, and its closed counterpart the torus (see figure 2).

\begin{figure}[h]
\scriptsize
\begin{verbatim}                      		 
                                                   ^    ^    ^    ^    ^
                                                   |    |    |    |    |
grid   X -- X -- X -- X -- X         torus      <- X -- X -- X -- X -- X ->
       |    |    |    |    |         (wrapped      |    |    |    |    |
       X -- X -- X -- X -- X          around)   <- X -- X -- X -- X -- X ->
       |    |    |    |    |                       |    |    |    |    |
       X -- X -- X -- X -- X                    <- X -- X -- X -- X -- X ->
       |    |    |    |    |                       |    |    |    |    |
       X -- X -- X -- X -- X                    <- X -- X -- X -- X -- X ->
                                                   |    |    |    |    |
                                                   v    v    v    v    v

\end{verbatim}     		 
\normalsize
\caption{two-dimensional grid and torus}
\end{figure}
Their specification in Parallaxis is straightforward, as shown in figure 3. It takes just four 
transfer-functions to specify the whole grid with each PE owning four data-ports (here called: 
north, south, east, and west). The torus specification is almost identical; the additional modulo-
operator connects opposite boundary-PEs. When considering the grid specification, there are 
PEs whose neighbor function evaluates to an invalid PE-number: e.g. the PE at position [2,4] 
would have a right neighbor [2,5], but this is out of bounds as of the grid size fixed in the 
CONNECTION specification (rows labeled 0 to 3, and columns labeled 0 to 4). By definition, 
these PEs (called ''boundary-PEs'') simply do not have a connection in that particular direction. 
Any data exchange to or from the outside will have no effect (see section 4 for more detail).
\begin{figure}[th]
\scriptsize
\begin{verbatim}                      		 
	CONFIGURATION grid [4] [5];
	CONNECTION    north: grid[i,j]	->  grid[i+1, j].south;
	              south: grid[i,j]	->  grid[i-1, j].north;
	              east : grid[i,j]	->  grid[i, j+1].west;
	              west : grid[i,j]	->  grid[i, j-1].east;
	
	CONFIGURATION torus [4] [5];
	CONNECTION    north: torus[i,j]	 ->  torus[(i+1) mod 4, j].south;
	              south: torus[i,j]	 ->  torus[(i-1) mod 4, j].north;
	              east : torus[i,j]	 ->  torus[i, (j+1) mod 5].west;
	              west : torus[i,j]	 ->  torus[i, (j-1) mod 5].east;
	
\end{verbatim}     		 
\normalsize
\caption{grid and torus specification}
\end{figure}
\section{Variable Declaration}

As seen before, we have to deal with a basically two-fold system structure:

\begin{itemize}
\item a controlling host  and
\item a network of processing elements.
\end{itemize}

Variables that are used for controlling the execution sequence, such as a counting variable in a 
for-loop,  should exist only once at the host computer while a variable used for vector 
computations might be required once for each PE. To distinguish these two types of variable 
declarations, Parallaxis offers the phrases scalar for declaring a variable at the host, and vector for 
declaring a variable to appear in every PE's memory structure, thus overall creating a vector (or 
some higher order data structure, like a matrix, etc., depending on the specified network 
topology). An example follows:

\scriptsize
\begin{verbatim}     		 
  SCALAR  i,j: integer;
  VECTOR  a,b: real;
          c,d: integer;
\end{verbatim}     		 
\normalsize

The distinction between scalar and vector variables is also necessary for formal procedure 
parameters and local variable declarations.


\section{Parallel Programming Concepts}

Parallaxis provides language primitives to account for the parallel machine model which build 
on the topological structure specified by means of configuration and connection. There are 
three basic concepts for parallel processing in Parallaxis:

\begin{enumerate}
\item parallel execution 	(''parallel block'')
\item parallel data exchange	(''propagate operation'')
\item vector reduction	(''reduce function'')
\end{enumerate}

\subsection{Parallel Execution}

In analogy to the begin-end block used to group statements to be executed sequentially, there is 
the parallel-endparallel block for synchronous parallel statements. The semantics hereby is that 
every processing element executes the same statement with its own local values of the variables 
involved, thus obtaining individual results. So the phrase parallel mustn't lead to an erroneous 
interpretation: all statements contained in this block are executed sequentially under central 
control, but they are performed data-parallel. Together with a unique control flow this just 
reflects the single instructionJP multiple data (SIMD) machine model. \\ \\
Because of the SIMD restriction, each PE may execute the current instruction or remain idle; a 
concurrent execution of different instructions is not possible. There are two ways of selecting 
PEs for parallel execution in Parallaxis which may also be combined:

\begin{description}
\item[A)] by explicitly stating their network position at the entrance of a parallel block 
for each dimension specified e.g., for a one-dimensional structure:

\scriptsize
\begin{verbatim}
		PARALLEL [22..44]
		  <parallel statements>
		ENDPARALLEL
\end{verbatim}
\normalsize
\item[B)] by using if-, case-selections or while-, repeat-loops with vector conditions e.g.:
\scriptsize
\begin{verbatim}
		PARALLEL
		  IF <vector expression> THEN <parallel statements> END
		ENDPARALLEL
\end{verbatim}
\normalsize
\end{description}

In case A, only the PEs within the selected range execute the statements inside the parallel 
block, all others remain idle during that time. PE selections may be constant or variable 
positional expressions, such as subrange, enumeration, and set. One selection is required for every 
dimension. \\ \\
In case B, a conditional expression determines which PEs will execute the statements of the 
then-branch and which will remain idle. While the branch\-ing-condition evaluates to true for one 
processor, it might be false for others. The condition is evaluated for each PE individually in 
parallel and only those PEs for which the condition evaluates to true will execute the then-
branch; all other PEs remain idle. If there exists an else-branch, it will be executed 
subsequently with the inverse PE group. The two branches of an if-selection cannot be executed in 
parallel because of the previously mentioned ''single control flow'' SIMD restriction. Therefore, 
they have to be serialized. Processors that execute the then-part may continue while processors 
executing the else-part are blocked with their identifications pushed onto a global stack at the 
controlling host. They remain inactive while the host supervises execution of the then-part. 
Afterwards, when executing the else-part, the processor sets change places, that is, the ''then-
processors'' now become inactive for some time while the ''else-processors'' are active. The use 
of a dynamic stack also accounts for nested if-statements. Each stack-entry corresponds to a 
nesting-level of if-statements. Since serialization degrades the performance of a parallel system, 
the user should keep this fact in mind when designing a parallel algorithm. \\ \\
The semantics of a parallel loop is analogous: only those PEs satisfying the loop-condition 
execute the loop-statements. A loop can only be terminated when none of the active PEs 
satisfy the loop-condition. As long as a single PE remains, the loop is being continued while all 
PEs excluded by the loop-condition are idle. This means, the controlling host has to get a 
feedback from the network whether there are PEs remaining or not. The implementation of this 
feedback depends on the hardware facilities of the target system. In any case, the OR-reduction 
of the condition-vector (see also section 4.3) returns this information to the host.

\subsection{Parallel Data Exchange}

The propagate-operation accounts for data exchange between the parallel processors. Propagate 
behaves like a compound statement of send followed immediately by receive. All selected (or 
active) PEs participate in this parallel data exchange. In its simple form, the 
propagate-operation reads:
\scriptsize
\begin{verbatim}
	propagate.<direction> ( <vector variable> )
\end{verbatim}
\normalsize
	e.g., for the ring structure of section 2.1:
\scriptsize
\begin{verbatim}
	propagate.c_wise (x)
\end{verbatim}
\normalsize

The semantics of this statement is that the value of a vector (here: ''x'') is propagated through 
the network by one step in the stated direction (here: ''$c\_wise$'' for clockwise). The direction 
names directly refer to the CONNECTION specification of the network where they were defined 
as exit-ports. For simple network structures like ring, grid, hypercube, and so on, the 
corresponding entry-port is unambiguous. Therefore, it can be determined automatically by 
Parallaxis (in our ring example: ''$cc\_wise$''). For complex topologies (as the tree structure, see 
section 6) both send-port and receive-port have to be supplied, in order to perform the 
propagate-operation. \\ \\
Let us now assume that the PEs labeled 3 to 8 of the twelve ring PEs have been selected 
(activated) to perform the above propagate-operation. Each PE sends a message (here: the local 
value of variable ''x'') to its clockwise neighbor and then reads a message from its counter-
clockwise neighbor, since the port $cc\_wise$ was defined to be the entry for $c\_wise$ in the 
connection specification. Disregarding those basic sends and receives, we recognize that 
information has been propagated one shift in clockwise direction all around the selected sector, thus 
reflecting the operation's name. Though the ring represents a closed topology, the sector 
selected here is equivalent to an open topology. Interesting is the behavior at the boundary, which 
in our example are PEs no. 3 and 8. PE no. 3 does not have an active neighbor 
for receiving (direction ''$cc\_wise$'') while PE no. 8 does not have a neighbor for sending 
(direction ''$c\_wise$''). As of the propagate-operation, this is equivalent to not having a neighbor 
at all for that direction in an open topology. PE no. 3 first sends its message along to PE no. 4. 
Since it cannot perform a subsequent receive, its local value of ''x'' remains unchanged. PE 
no. 8 's send goes without any effect (its neighbor is inactive) while the receive operation 
can be performed normally. \\ \\
Other syntactical form of the propagate-operation allow:
\begin{itemize}
\item the separation between send-expression and receive-variable (in case one 
does not want to overwrite the variable as in the previous form) e.g.: $propagate.c\_wise$ (x+1,y)
\item the propagation of several steps along a fixed di\-rec\-tion e.g.: \\ $propagate.c\_wise$ $\hat{ }$ 5 (x)
\end{itemize}

\subsection{Vector Reduction}

Sometimes not the whole vector is of interest, but only a scalar reduction of it like its sum. 
Without a specialized operation, this scalar information is awkward to get. Parallaxis provides 
load and download operations to move a scalar array at the host to and from a vector distributed 
over the network. So, in order to get the sum of a vector, one has to download the complete 
vector to the host and then add it up iteratively. The reduce-function enables a vector reduction 
in a single statement and without involving additional variables. The desired reduction 
operation has to be specified after a period. Predefined are sum, product, and, or, min, max, first, 
and last; user defined functions with appropriate parameters may be used as well. \\ \\
The reduce-operation is completely independent of the specified topology; it is a primitive, one-
dimensional vector operation. Specialized components of a parallel system may execute this 
command in time O(log2 n) in a tree-like operation mode. \\ \\
Assume ''s'' being a real scalar and ''x'' being a real vector:
\begin{center}
\scriptsize
{\tt s := REDUCE.sum (x)}
\normalsize
\end{center}
This statement assigns the component-wise sum of vector ''x'' to scalar ''s''.


\section{Sample Programs}

Now, we are going to look at a couple of Parallaxis programs. The first shows 
a parallel sorting algorithm, called ''odd-even transposition sorting'',
the second is a parallel version of the ''Sieve of Eratosthenes'' for
generating prime numbers. \\ \\
''Odd-Even Transposition Sorting'' (OETS) is a parallel sorting algorithm that is able to sort n 
numbers on n PEs in time O(n). The PEs are connected in a bi-directional, open linear list;  I/O 
instructions have been omitted for clarity. In odd iteration steps, the PE-pairs 1-2, 3-4, and so 
on are compared in parallel while in even iteration steps the pairs 2-3, 4-5, and so on are 
handled.
\scriptsize
\begin{verbatim}
	SYSTEM sorting;
	CONST  n = 1000;
	CONFIGURATION list [1..n];
	CONNECTION   left : list[i]  ->  list[i-1].right;
       		     right: list[i]  ->  list[i+1].left;

	SCALAR  k       : integer;
	VECTOR  val,r,l : integer;
       		swap    : boolean;

	BEGIN
	  ... (* read input data *)
	  FOR k:=1 TO n DO
	    PARALLEL
	      PROPAGATE.right(val,l);
	      PROPAGATE.left (val,r);
	      (* l/r now hold the left/right neighbors' values *)
	      swap := false;

	      IF odd(k) THEN  (* compare 1-2, 3-4, ... *)
	        IF odd(dim1) AND (r < val) THEN
	          val  := r;
	          swap := true
	        END
	      ELSE  (* even (k)  compare 2-3, 4-5, ... *)
	        IF even(dim1) AND (r < val) THEN
	          val  := r;
	          swap := true
	        END;
	      END;

	      PROPAGATE.right(swap);
	      IF swap AND (id_no>1) THEN val := l END;
	    ENDPARALLEL
	  END;
	  ... (* write output data *)
	END sorting.
\end{verbatim}
\normalsize
Each PE holds one component of the vector ''val'' that is to be sorted, as well 
as local copies of each one's left and right neighbor. The marker variable 
''swap'' is used for the bookkeeping of swap operations to be finished at the 
right neighbor PEs. A different approach without marker propagation is 
possible, but complicates the program. \\
The program for generating prime numbers is very much straight-forward and
does not need a lot of explanation. In each iteration through the ''while''-
loop, all multiples of the current number are eliminated in parallel. Execution
of the loop continues until no ''candidate'' (on any PE) is left.
\scriptsize
\begin{verbatim}
	SYSTEM sieve;
	CONFIGURATION list [1000];
	CONNECTION (* none *);

	SCALAR prime: integer;
	VECTOR candidate: boolean;

	BEGIN
	  PARALLEL
	    candidate := id_no >=2;
	    WHILE candidate DO
	      prime:= REDUCE.First(id_no);
	      WriteInt(prime,10); WriteLn;
	      IF id_no MOD prime = 0 THEN candidate:=FALSE END
	    END
	  ENDPARALLEL
	END sieve.
\end{verbatim}
\normalsize

\section{Parallaxis Syntax}

Parallaxis Language Definition \\
\copyright \ Thomas Braunl, Universit\"at Stuttgart, 1989 \\
\scriptsize
\begin{verbatim}
System          =       SYSTEM  sys_ident ";"
                        { ConstantDecl  |  TypeDecl }
                        HardwareDecl  SoftwareDecl 
                        sys_ident "." .

ConstantDecl    =       CONST  { ident "=" ConstExpr ";" } .
TypeDecl        =       TYPE  { ident "=" type ";" } .

HardwareDecl    =       CONFIGURATION  conf_ident  IntRange  { "," IntRange }  ";" 
                        CONNECTION  [ TransferFunc  { ";"  TransferFunc } ] ";" .
IntRange        =       "["  range  "]" .
range           =       int_ConstExpr  [ ".." int_ConstExpr ] .
TransferFunc    =       out_direction  ":" conf_ident
                        "[" source   { "," source } "]"  ( "->" | "<->" )
                        destination  { "," destination } .
direction       =       ident  [ "(" (integer | const_ident) ")" ] .
source          =       ident  |  integer.
destination     =       [ discriminant ]
                        conf_ident  "["  ExprList  "]"  "."  in_direction.
discriminant    =       "{"  bool_expr  "}".

SoftwareDecl    =       VariableDecl  { ProcedureDecl ";" }
                        block .
VariableDecl    =       [ ControlVarDecl ]  [ LocalVarDecl ] .
ControlVarDecl  =       SCALAR { ident { "," ident } ":" type ";" } .
LocalVarDecl    =       VECTOR { ident { "," ident } ":" type ";" } .
ProcedureDecl   =       PROCEDURE proc_ident [FormalParams] ";"
                        { ConstantDecl  |  TypeDecl }
                        SoftwareDecl proc_ident .
FormalParams    =       "(" [ parameters  { ";" parameters } ] ")"
                        [ ":" ( SCALAR  |  VECTOR )  function_type ] .
parameters      =       SCALAR  [ VAR ]  ident  { "," ident } ":" type  |
                        VECTOR  [ VAR ]  ident  { "," ident  } ":" type .

block           =       BEGIN
                        StatementSeq
                        END .
StatementSeq    =       statement  { ";" statement } .
statement       =       [  assignment  |  ProcedureCall  | 
                          IfSelection  |  CaseSelection  |  WhileLoop  |
                          RepeatLoop   |   LoopStatement |  ForLoop    |
                          WithStatement |  EXIT   |  RETURN  [ expr ]  |
                          ParallelExec  |  Propagate |  Load  |  Store  ] .

ParallelExec    =       PARALLEL  selection
                        StatementSeq
                        ENDPARALLEL .
selection       =       [  "[" entry  "]"  { "," "[" entry  "]" }  ] .
entry           =       range  { "," range }  |  expr  [ ".." expr ]  | "*" .

Propagate       =       PROPAGATE "." out_dirvar  [ "^"  ( integer | ident) ]
                                         [      "." in_dirvar ]
                          "("  vector_designator  [ "," vector_designator ]  ")" .
dirvar          =       ident [ "(" expr ")" ] .
Load            =       LOAD  selection  "(" vector_designator "," scalar_designator
                                [ "," length_designator ] ")" .
Store           =       STORE  selection  "(" vector_designator "," scalar_designator
                                [ "," length_designator ] ")" .
Reduce          =       REDUCE  "."  operator_ident  selection  "(" expr ")" .

assignment      =       designator  ":="  expr .
designator      =       ident { "[" ExprList "]"  |  "." ident } .
ProcedureCall   =       proc_ident [ "("  ExprList  ")" ] .
FunctionCall    =       proc_ident   "("  [ ExprList ]  ")" .
IfSelection     =       IF bool_expr THEN StatementSeq
                        { ELSIF bool_expr THEN StatementSeq }
                        [ ELSE StatementSeq ] END .
CaseSelection   =       CASE  expr  OF  case { "|" case }
                        [ ELSE  StatementSeq ]  END.
case            =       CaseLabels  { "," CaseLabels }  ":"  StatementSeq .
CaseLabels      =       ConstExpr [ ".." ConstExpr ] .
WhileLoop       =       WHILE  bool_expr  DO  StatementSeq  END .
RepeatLoop      =       REPEAT  StatementSeq  UNTIL  bool_expr .
LoopStatement   =       LOOP  StatementSeq  END .
ForLoop         =       FOR ident  ":="  expr  TO expr  [ BY ConstExpr ] 
                        DO StatementSeq END .
WithStatement   =       WITH  designator  DO  StatementSeq  END .

type            =       SimpleType | ArrayType| RecordType | SetType .
SimpleType      =       type_ident |  enumeration  |  subrange .
enumeration     =       "(" const_ident  { "," const_ident } ")" .
subrange        =       "["  ConstExpr  ".."  ConstExpr  "]" .
ArrayType       =       ARRAY SimpleType { "," SimpleType } OF type .
RecordType      =       RECORD FieldListSeq END .
FieldListSeq    =       FieldList  { ";"  FieldList } .
FieldList       =       [ ident { "," ident } ":" type ] .
                        (* variant records not yet supported *)
SetType         =       SET OF SimpleType .

ExprList        =       expr  { "," expr } .
expr            =       SimpleExpr   { relation SimpleExpr } .
relation        =       "=" | "<>" | "#" | "<" | ">" | "<=" | ">=" | IN .
SimpleExpr      =       [ "+" | "-" ]  term  { AddOperator term } .
AddOperator     =       "+" | "-" | OR .
term            =       power  { MulOperator power} .
MulOperator     =       "*" | "/" | DIV | MOD | AND | "&" .
power           =       factor  { "^"  factor } .
factor          =       FunctionCall  |  Reduce  |  string  |  set  |
                        number  |  designator  |  structure  |
                        "(" expr ")"  |  NOT  factor .
string          =       "'" { character } "'"  |  '"' { character } '"' .
set             =       [type_ident] "{" [ element  { "," element } ] "}" .
element         =       ConstExpr [ ".." ConstExpr ] .
structure       =       record_ident  "("  ExprList  ")" .

CExprList       =       [ ConstExpr  { ","  ConstExpr } ] .
ConstExpr       =       SimpleConstExpr  { relation  SimpleConstExpr } .
SimpleConstExpr =       [ "+" | "-" ]  ConstTerm  { AddOperator  ConstTerm } .
ConstTerm       =       ConstPower  { MulOperator  ConstPower } .
ConstPower      =       ConstFactor  { "^"  ConstFactor } .
ConstFactor     =       const_ident  |  number  |  string  |  set  |
                        recarr_ident "(" CExprList ")"  |
                        stdfct_ident "(" CExprList ")"  |
                        "(" ConstExpr ")"  |  NOT  ConstFactor .

number          =       integer  | real .
integer         =       digit {digit} .
real            =       digit {digit} "." {digit} ["E" ["+" | "-"] digit {digit}] .
ident           =       letter { letter | digit } .
character       =       letter | digit | "$" | .. (* any character, e.g. ASCII *) .
digit           =       "0" | . . | "9" .
letter          =       "A" | . . | "Z" | "a" | . . | "z" | "_" .
\end{verbatim}
\clearpage
\tableofcontents
\end{document}