Null Values

Motivation

Many programming languages provide a distinguished null value for pointer or reference types to indicate that an object reference currently does not refer to any object. On the other hand, it is usually not possible to indicate that a variable of a primitive type (such as int or char) currently does not contain any value. However, there are many circumstances where null values for all types of the language would be useful:

Variables which are not explicitly initialized naturally contain no value, i. e., null.
Functions which do not explicitly return a value naturally return no value, i. e., null.
This is particularly useful for global virtual functions: If the first branch of such a function calls its previous branch, an automatically generated branch zero is called whose body is empty and therefore does not return any value.
If an instance of an open type does not possess a value for a particular attribute, reading this attribute naturally returns no value, i. e., null.
If an index value for an array (or a similar container object) is out of range, reading the corresponding element naturally returns no value, i. e., null, which in many cases is more appropriate than throwing an exception or aborting the entire program.
If a function cannot return a meaningful value for any reason (for instance, a function that shall return the first element of a container matching a given search predicate, but no such element is found), it might naturally return no value, i. e., null. Again, this might be more appropriate in some cases than throwing an exception which has to be caught explicitly.

It is important to note that in all these circumstances, there is an essential difference between a null value representing no value at all and a default value such as zero for integral types.
It is also important to note that it is impossible in general to detect the first two cases (uninitialized variables and functions returning no value) statically at compile time, since the corresponding statements might be executed conditionally. Trying to do it anyway, necessarily results in conservative approximations by the compiler (as, e. g., in Java), which sometimes force a programmer to provide unnecessary dummy initializations or return statements in order to get a program compiled successfully.

Concept

Every type of an advanced procedural programming language, no matter whether it is built-in or user-defined, whether it is primitive or structured, possesses a unique null value representing no value at all. (Therefore, the notion of null value is actually a contradiction in itself.)

Null values are implicitly used in the following circumstances:

Variables which are not explicitly initialized, are implicitly initialized with the null value of their type.
Functions which do not explicitly return a value, either because they execute a return statement without an expression or no return statement at all, implicitly return the null value of their result type.
In particular, the automatically generated branch zero of a global virtual function returns null.
Reading an attribute of an object for which no value has been written yet, implicitly returns the null value of the attribute's type.
In particular, reading an attribute of a “null object” always returns null.
Accessing a non-existing array or container element, returns the null value of the element type.
In particular, accessing an element of a “null array” or a “null container” always returns null.

Furthermore, there is a generic null value constant null compatible with any type, that can be used to explicitly indicate a missing value.

Null values are propagated through all arithmetic operations on integral and floating-point values, i. e., if one operand of an arithmetic expression is null, the entire expression's value will be null, too.

The null value of a particular type is equal to itself, but different from all other values of the type. (This is in contrast to a floating-point NaN value which is different from all values including itself and other NaN values.) Furthermore, a null value is neither less nor greater than any other value of the type, i. e., it is incomparable to other values.

Any value of any type is implicitly convertible to a Boolean value by interpreting null as false and all other values as true. Consequently, the Boolean null value is equivalent to the Boolean false value, i. e., the Boolean true value is the only “other” value of the Boolean type.

Implementation

Ideally, null values, especially those of numeric types, should be supported directly by the hardware in order to implement arithmetic operations without performance penalties. Since off-the-shelf hardware usually does not support them, however, software implementations must be used which use a single bit of a value's representation as a null value indicator or store values as pairs consisting of a Boolean null value indicator and an actual value.

In C++, it is possible to define wrapper types for all primitive types and to overload the arithmetic operators for these in order to implement “null-valuable primitive types.” Furthermore, it is possible to define a “null type” with a single instance that is implicitly convertible to any other type in order to implement the generic null value constant null.

Publications

[1] C. Heinlein: "Null Values in Programming Languages." In: H. R. Arabnia (ed.): Proc. Int. Conf. on Programming Languages and Compilers (PLC'05) (Las Vegas, NV, June 2005), 123–129. (PostScript, PDF)
Describes the concept of null values in more detail.

Impressum Datenschutz