Tuples and Sequences

Tuples

Tuples are a simple mathematical concept which turns out to be useful in programming languages as well. For instance, if a function receives multiple parameters, this is equivalent to receiving a tuple of values as a single parameter. Similarly, a function could easily return multiple values by returning a tuple of values, without needing to define a particular record or class type for that purpose or resorting to VAR, out, or reference parameters.

Combined with a template or generics mechanism, the approach to interpret multiple parameters as a single tuple parameter opens the way for type-safe variable parameter lists. In the example below, an overloaded function print is defined for several different parameter types, such as char, int, double, etc. Furthermore, a templated version of print is defined which accepts any type of parameter T including tuple types:

    // Simple definitions.
    void print (char c) {
        cout << c;
    }
    void print (int i) {
        cout << i;
    }
    void print (double d) {
        cout << d;
    }
 
    // Template definition.
    template <typename T>
    void print (T values) {
        print(^values);
        print($values);
    }
Calls to print with a single argument, such as print(1) or print(2.0), directly invoke the corresponding simple definition whose parameter type is matched exactly. A call with multiple arguments, such as print('x', 1, 2.0), is interpreted as a call with a single tuple argument ('x', 1, 2.0) that invokes the template definition of the function with the parameter values representing this tuple value.
The prefix operators ^ and $ are assumed to extract the first member of the given tuple (as a simple value) and the remaining members (as a tuple if necessary), respectively. Therefore, ^values returns the character value 'x', while $values returns the tuple (1, 2.0). Calling print with the former invokes the simple definition with parameter type character, while calling it with the latter ``recursively'' invokes the template definition with parameter values equal to the shortened tuple (1, 2.0).
In this invocation, ^values and $values correspond to the simple values 1 and 2.0, respectively, i. e., the calls to print will invoke the simple definitions with parameter type int and double, respectively.

Sequences

A sequence is a kind of tuple whose elements are all of the same type, similar to an array. In contrast to arrays (and other standard container types), however, sequences are immutable, similar to strings in Java and other languages. There are operators to determine the length of a sequence s (#s), to retrieve a single element (s[i]), to select arbitrary subsequences (s(i, j)), and to concatenate two sequences (s # t).
By using these primitive operations, it is easily possible to perform more complex operations, e. g.:

    s = s # x                       // Append x to s.
    s = x # s                       // Prepend x to s.
 
    s = s(2, #s)                    // Remove first element of s.
    s = s(1, #s - 1)                // Remove last element of s.
 
    s = s(1, i-1) # s(i+1, #s)      // Remove i-th element of s.
In the examples above, the expression on the right hand side of the assignment operator constructs a new sequence by selecting subsequences of the existing sequence s (which is immutable) and/or concatenating sequences to form larger sequences. Afterwards, the resulting sequence is assigned to the variable s that contained the original sequence.

The fact that sequences are immutable has at least two important conceptual advantages over mutable containers: First, an iteration through a sequence can never be disturbed by update operations. Second, if some type T is a subtype of some other type S (e. g., Student is a subtype of Person), i. e., objects of type T can polymorphically be used as objects of type S, sequences of Ts can safely be used as sequences of Ss, too (e. g., a sequence of Students can be used where a sequence of Persons is required). This is similar to the subtyping relationship between the array types T[] and S[] in Java, but without the possibility of ever receiving an ArrayStoreException, since assignments to individual sequence elements are generally forbidden.
Furthermore, immutable sequences offer several possibilities for an efficient implementation:

Strings are nothing else but sequences of characters. In fact, sequences are a generalization of ``Ropes: An Alternative to Strings,'' proposed by H.-J. Boehm et al. in Software-Practice and Experience 25 (12) December 1995.

Virtual Sequences

It is possible to define sequences whose elements are not specified explicitly, but are computed on demand by invoking a user-defined function with the element's index as a parameter. Such virtual sequences are useful in various ways.
For example, an operating system file might be represented as a virtual sequence of bytes or characters, whose computation function reads and returns the character at a specified index position. In contrast to reading the entire file at once and constructing a normal sequence containing all its characters, this approach can result in significant space and execution time savings. In an application such as a text editor, such a virtual sequence can be used in the same way as a normal sequence, e. g., by selecting subsequences, concatenating them with normal or other virtual sequences, etc.

Implementation

Sequences are implemented as a tree-like structure whose leaf nodes contain small to medium-sized arrays of elements. Not too large subsequences and concatenations are constructed by copying their elements, while larger result sequences are represented by tree nodes which simply reference (parts of) the trees representing the original sequences. To make sure that accessing individual elements still remains reasonably fast, care must be exercised to produce trees of limited depth whose subtrees are similar in size. In contrast to Boehm et al. (cited above), who use binary trees, a data structure more akin to B* trees, combined with a clever indexing strategy, is used in order to speed up random access to individual elements.


Christian Heinlein, 22.09.09