In type theory, given two types A and B, a sum type S is
formed as S ≡ A+B. Values in S are introduced by taking them from
either A or B. Some record keeping (ina, inb) is added to
remember which alternative was used.
S ≡ A + B // formation
a:A ⇒ (ina a):S // introduction 1/2
b:B ⇒ (inb b):S // introduction 2/2
s:S ∧ s = (ina a) ⇒ a:A // elimination 1/2
s:S ∧ s = (inb a) ⇒ b:B // elimination 2/2
Sum types may involve more than just two alternatives. In general, a sum type is denoted as follows.
Σn Ai ≡ A1 + A2 + … + An
With special cases denoted as follows.
Σ1 A ≡ A // type A itself
Σ0 ≡ void // type without values
In C++ sum types occur naturally in function overloading. Let’s look
at a function from the standard C++ library: abs.
abs has multiple overloaded declarations (just listing two here).
int abs (int);
long abs (long);
All abs functions perform the same operation: return the absolute
value of the argument. abs takes as argument either an int or a
long. Therefore, the type of the argument of the function abs is
effectively, the sum type int + long.
// obviously not real C++
// @return the absolute value of the argument
auto abs(<int + long> arg) -> <int + long>;
Sum types are called sum types, because of how their cardinality
adds up. If A has two values (say false and true), and B has three values
(say red, green, and blue) then A+B has five values: (false,
true, red, green, and blue).
Compare this to the product type A×B which has six (2×3) values:
((false, red), (false, green), (false, blue), (true, red),
(true, green), (true, blue)).
Recursive formation is allowed; as long as at least one of the alternatives is free from recursion.
S ≡ A + B + S // recursive formation
In order to use S as the domain of a function f, said function
must be defined for the domains of both A and B. Function
application is delegated to the appropriate domain. The co-domains of
f(A) and f(B) may differ.
let f : A → X
and f : B → Y
then f : A + B → X + Y
ina a ↦ f(a)
inb b ↦ f(b)
Recursive function definitions are allowed; as long as at least one of the alternatives is free from recursion.
f : A + B + S → X + Y
ina a ↦ f(a)
inb b ↦ f(b)
ins s ↦ f(s) // recursion
In C++ a sum type is an imaginary type, as opposed to a real type, in the sense that it exists only in the imagination of the programmer. It is not accessible to the compiler.
In principle, there is union as in
union S {
A a;
B b;
};
But union is forgetful in the sense that it does not remember whether
it holds A or B. The information is contained in the union; but
without outside help, it cannot be recovered.
Since the C++ union is forgetful, and additional code would be
required to make it an actual sum type, it is not being considered a
proper implementation of a sum type. The C++ standard library has an
implementation of a sum type in std::variant. But at the language
level, sum types are imaginery, not real.
Without proper representation of sum types, it falls to the programmer to manage sums of types. Usually the alternatives of a sum types are declared and defined together, and very probably refer to each other. Compare maybes, lists, or trees:
// maybe = _nothing | _some<A>
struct _nothing {
};
template <typename A>
struct _some {
A value;
};
// list = _nil | _cons<A, list>
struct _nil {
};
template <typename A, typename B>
struct _cons {
A a;
B b;
};
// tree = _leaf | _node<A, tree, tree>
struct _leaf {
};
template <typename A, typename B, typename C>
struct _node {
A a;
B b;
C c;
};
Even though C++ has no explicit means of declaring sum types, sums types occur implicitly in C++ in the form of function overloading. See the example of a counting function below.
unsigned count(_nothing&&) {
return 0;
}
template <typename A>
unsigned count(_some<A>&&) {
return 1;
}
unsigned count(_nil&&) {
return 0;
}
template <typename A, typename B>
unsigned count(_cons<A, B>&& cons)
{
return 1 + count(cons.b);
}
unsigned count(_leaf&&)
{
return 0;
}
template <typename A, typename B, typename C>
unsigned count(_node<A,B,C>&& node)
{
return 1 + count(node.b) + count(node.c);
}
Classifier types and concepts can be used to improve declarations.
template <typename>
struct is_list {
static constexpr bool const value = false;
};
template <>
struct is_list<_nil> {
static constexpr bool const value = true;
};
template <typename A>
struct is_list<_cons<A, B>> {
static constexpr bool const value = is_list<B>::value;
};
template <typename A>
concept list_type = is_list<A>::value;
template <typename A, list_type B>
unsigned count(_cons<A, B>&& cons)
{
return 1 + count(cons.b);
}
template <typename>
struct is_tree {
static constexpr bool const value = false;
};
template <>
struct is_tree<_leaf> {
static constexpr bool const value = true;
};
template <typename A, typename B, typename C>
struct is_tree<_node<A, B, C>> {
static constexpr bool const value = is_tree<B>::value && is_tree<C>::value;
};
template <typename A>
concept tree_type = is_tree<A>::value;
template <typename A, tree_type B, tree_type C>
unsigned count(_node<A, B, C>&& node)
{
return 1 + count(node.b) + count(node.c);
}
But there can never be concise declarations like these
unsigned count(maybe&&);
unsigned count(list&&);
unsigned count(tree&&);
because those types are imaginary, and not known to the compiler.
Even though imaginary, C++ handles sum types just fine with one small caveat. In C++ variables have a type, one type, not two, nor more. Therefore, to define a variable of a sum type, sum special treatment is needed.
The following code snippet is not valid C++, becauce variables can’t be defined having a concept as type.
maybe m; // not valid
list l; // not valid
tree t; // not valid
In the following code snippet, variable definitions are valid, but also determined.
Once m is defined as _nothing; it can never be a _cons; and vice versa.
Similar holds for l and t.
_nothing m;
_nil l;
_leaf t;
We have to resort to C++ unions.
template<typename A>
struct maybe_var {
enum { is_nothing, is_some } _d;
union { _nothing _n; _some<A> _s; };
maybe_var() : _d{is_nothing} {}
maybe_var(A a) : _d{is_some}, _s{a} {}
operator bool() { return _d == is_some; }
};
maybe_var<int> m0;
maybe_var<int> m1 = 7;
And functions that shall accept such a C++ union as argument, require some consideration.
template<typename R, typename A>
struct FunctionOnMaybe {
R operator()(_nothing) { /* … */ }
R operator()(_some<A> s) { /* … */ }
};
template<typename R, typename A>
R someFunction(maybe_var<A> m) {
FunctionOnMaybe<R,A> f;
return m ? f(m._s) : f(m._n);
}