Chapter 3 Preliminaries
The notion of proximity among entities in a set \(S\) holds significant relevance in various machine learning applications as it facilitates the comprehension of inter-entity relationships in \(S\). For example, clustering algorithms aim to group points that are proximal to each other. In recommendation systems, the objective is to suggest items that exhibit similarity to those for which a user has already expressed interest. However, the question is how do we precisely quantify the notion of proximity?
Consider a set \(S\) consisting of a collection of abstract entities, as depicted in Figure 3.1(a). Consider the red entity (vertex) \(x\) in the same figure. We wish to identify the entities (vertices) in \(S\) that are ‘closely related’ or ‘in close proximity’ to \(x\). However, a set has no inherent notion of proximity or relations between its entities.
Defining binary relations (edges) among the entities of set \(S\) provides one way of introducing the concept of proximity, which results in a graph whose vertex set is \(S\), as shown in Figure 3.1(b). Using this ‘auxiliary structure’ of edges defined on top of set \(S\), one may declare the ‘local neighborhood’ of \(x\), denoted by \(\mathcal{N}(x)\), to be the subset of \(S\) consisting of all entities that are adjacent to \(x\) via an edge. In Figure 3.1(b), the neighborhood of vertex \(x\) (in red) in \(S\) consists of all vertices colored in yellow.
The choice of neighborhood \(\mathcal{N}(x)\) in Figure 3.1(b) is arbitrary. For example, an alternative valid notion of neighborhood is given by defining \(\mathcal{N}(x)\) to contain all vertices whose distance from the red vertex \(x\) is at most two, represented by all vertices colored in yellow in Figure3.1(c). In Figure 3.1(d), the neighborhood \(\mathcal{N}(x)\) is chosen to consist of all yellow vertices that form triangles incident to red vertex \(x\). In Figure 3.1(e), the neighborhood \(\mathcal{N}(x)\) is chosen to consist of all yellow vertices that form trapezoids incident to the red vertex \(x\). It becomes clear from Figures 3.1(d) and (e) that additional auxiliary structures, such as triangles and trapezoids, can be used to define the notion of neighborhood. In practice, the choice of neighborhood typically depends on the application. Recent research has explored using graph geometry to obtain richer neighborhood notions, as seen in (Morris et al. 2019; Hajij, Istvan, and Zamzmi 2020; L. Zhao et al. 2022). However, the fundamental concept remains the same: one starts by introducing an auxiliary structure defined on a vertex set, and then utilizes the auxiliary structure to induce a well-defined notion of proximity.
Upon generalizing graphs to higher-order networks, it is natural to also generalize the notion of neighborhood for graphs to higher-order networks. The precise notion of neighborhood or proximity among entities of a set \(S\) has been studied in topology (Munkres 1974). A topology defined on \(S\) allows us to meaningfully describe the proximity of the elements in \(S\) to each other. This section introduces topological notions and definitions with the aim of generalizing graphs to higher-order networks.
3.1 Neighborhood functions and topological spaces
There are several equivalent ways to define a topological space. For example, topological spaces are commonly defined in terms of ‘open’ or ‘closed’ sets; for example, see (Munkres 1974). In this work, we opt to define topological spaces in terms of ‘neighborhoods’. This definition is more compatible with the message-passing paradigm that is usually defined on graphs (Gilmer et al. 2017), and it can be generalized to higher-order networks. We refer the reader to (Brown 2006) for an explanation of why the definition in terms of neighborhoods is equivalent to the definition in terms of open sets.
Definition 3.1 (Neighborhood function) Let \(S\) be a nonempty set. A neighborhood function on \(S\) is a function \(\mathcal{N}\colon S\to\mathcal{P}(\mathcal{P}(S))\) that assigns to each point \(x\) in \(S\) a nonempty collection \(\mathcal{N}(x)\) of subsets of \(S\). The elements of \(\mathcal{N}(x)\) are called neighborhoods of \(x\) with respect to \(\mathcal{N}\).
Definition 3.2 (Neighborhood topology) Let \(\mathcal{N}\) be a neighborhood function on a set \(S\). \(\mathcal{N}\) is called a neighborhood topology on \(S\) if it satisfies the following axioms:
- If \(N\) is a neighborhood of \(x\), then \(x\in N\).
- If \(N\) is a subset of \(S\) containing a neighborhood of \(x\), then \(N\) is a neighborhood of \(x\).
- The intersection of two neighborhoods of a point \(x\) in \(S\) is a neighborhood of \(x\).
- Any neighborhood \(N\) of a point \(x\) in \(S\) contains a neighborhood \(M\) of \(x\) such that \(N\) is a neighborhood of each point of \(M\).
Definition 3.3 (Topological space) A pair \((S,\mathcal{N})\) consisting of a nonempty set \(S\) and a neighborhood topology \(\mathcal{N}\) on \(S\) is called a topological space.
Hence, a topological space is a set \(S\) equipped with a neighborhood function \(\mathcal{N}\), which satisfies the properties specified in Definition 3.2. In Section 4.4, we introduce a similar notion of proximity in the context of higher-order networks. Further, the choice of neighborhood function \(\mathcal{N}\) is the first and most fundamental step in the construction of a deep learning model supported on a higher-order domain (see Chapter 5.
3.2 Bridging the gap among higher-order networks
Given a finite set \(S\) of abstract entities, a neighborhood function \(\mathcal{N}\) on \(S\) can be induced by equipping \(S\) with an auxiliary structure, such as edges, as demonstrated in Figure 3.1(b). Edges provide one way of defining relations among the entities of \(S\)3. Specifically, each edge defines a binary relation (i.e., a relation between two entities) in \(S\). In many applications, it is desirable to permit relations that incorporate more than two entities. The idea of using relations that involve more than two entities is central to higher-order networks. Such higher-order relations allow for a broader range of neighborhood functions to be defined on \(S\) to capture multi-way interactions among entities of \(S\).
To describe more intricate multi-way interactions, it is necessary to employ more intricate neighborhood functions and topologies. With an eye towards defining a general higher-order network in Chapter 4 (as motivated in Chapter 2), this section reviews the definitions, advantages, and disadvantages of some commonly studied higher-order networks, including (abstract) simplicial complexes, regular cell complexes, and hypergraphs. In Chapter 4, we introduce combinatorial complexes, which generalize and bridge the gaps between all of these commonly studied topological domains.
Simplicial complexes are one of the simplest higher-order domains with many desirable properties, extending the corresponding properties of graphs. For instance, Hodge theory is naturally defined on simplicial complexes, extending similar notions on graphs (Barbarossa and Sardellitti 2020a; Schaub et al. 2020, 2021).
Definition 3.4 (Simplicial complex) An abstract simplicial complex on a nonempty set \(S\) is a pair \((S,\mathcal{X})\), where \(\mathcal{X}\) is a subset of \(\mathcal{P}(S) \setminus \{\emptyset\}\) such that \(x \in \mathcal{X}\) and \(y \subseteq x\) imply \(y \in \mathcal{X}\). Elements of \(\mathcal{X}\) are called simplices.
Figure 2.6(c) displays examples of triangular meshes, which are special cases of simplicial complexes with many applications in computer graphics. The reader is referred to (Schaub et al. 2021; Crane et al. 2013) for a relevant introduction to simplicial complexes. As it is evident from Definition 3.4, each relation \(x\) on \(S\) must contain all relations \(y\) with \(y\subseteq x\). Hence, a simplicial complex may encode a relatively large amount of data, taking up a lot of memory (T. Mitchell Roddenberry, Schaub, and Hajij 2022). Further, real-world higher-order data (e.g., traffic-flows on a rectangular street network) may not admit a meaningful simplicial complex structure due to the inherent lack of available simplices on the underlying data space. To address this limitation, cell complexes (Hatcher 2005; Hansen and Ghrist 2019) can generalize simplicial complexes, and overcome many of their drawbacks.
Definition 3.5 (Regular cell complex) A regular cell complex is a topological space \(S\) with a partition into subspaces (cells) \(\{x_\alpha\}_{\alpha \in P_{S} }\), where \(P_{S}\) is an index set, satisfying the following conditions:
- \(S= \cup_{\alpha \in P_{S}} \mbox{int}(x_{\alpha})\), where \(\mbox{int}(x)\) denotes the interior of cell \(x\).
- For each \(\alpha \in P_S\), there exists a homeomorphism \(\psi_{\alpha}\), called an attaching map, from \(x_\alpha\) to \(\mathbb{R}^{n_\alpha}\) for some \(n_\alpha\in \mathbb{N}\), called the dimension \(n_\alpha\) of cell \(x_\alpha\).
- For each cell \(x_{\alpha}\), the boundary \(\partial x_{\alpha}\) is a union of finitely many cells, each having dimension less than the dimension of \(x_{\alpha}\).
For the sake of brevity, we henceforth refer to a ‘regular cell complex’ as a ‘cell complex’. Cell complexes encompass a wide range of higher-order networks. Several types of higher-order networks can be viewed as instances of cell complexes. For instance, cell complexes are a natural generalization of graphs, simplicial complexes, and cubical complexes (Hajij, Istvan, and Zamzmi 2020). Figure 2.6(d) shows examples of cell complexes. Intuitively, a cell complex is a disjoint union of cells, with each of these cells being homeomorphic to the interior of a \(k\)-dimensional Euclidean ball for some \(k\). These cells are attached together via attaching maps in a locally suitable manner. The information of attaching maps of a regular cell-complex can be stored combinatorially in a sequence of matrices, called the incidence matrices (Hatcher 2005). These matrices are described in detail in Subsection 4.4.1.
Condition 3 in Definition 3.5 is known as the regularity condition for regular cell complexes. The regularity condition implies that the topological information of a cell complex can be realized combinatorially by equipping the index set \(P_{S}\) with a poset structure given by \(\alpha\leq\beta\) if and only if \(x_{\alpha} \subseteq \overline{x_{\beta}}\), where \(\overline{x}\) denotes the closure of a cell \(x\). This poset structure is typically called the face poset (Hansen and Ghrist 2019). It can be shown that the topological information encoded in a cell complex is completely determined by the face poset structure (Hansen and Ghrist 2019), which allows cell complexes to be represented combinatorially in practice via a poset (Aschbacher 1996; Klette 2000; Basak 2010; Savoy 2021).
Definition 3.5 implies that the boundary cells of each cell in a cell complex are also cells in the cell complex. Hence, one may think about a cell complex as a collection of cells of varying dimensions, which are related via their boundaries. In terms of relations, this implies that the boundary of a cell in a cell complex must also be a cell in the cell complex. While cell complexes form a general class of higher-order networks, this property sets a constraint on the relations of a cell complex. Such a constraint may not be desirable in certain applications if the data do not satisfy it. To remove all constraints on the relations among entities of a set, hypergraphs are typically considered.
Definition 3.6 (Hypergraph) A hypergraph on a nonempty set \(S\) is a pair \((S,\mathcal{X})\), where \(\mathcal{X}\) is a subset of \(\mathcal{P}(S)\setminus\{\emptyset\}\). Elements of \(\mathcal{X}\) are called hyperedges.
A hyperedge of cardinality two is called an edge. Hypergraphs can be considered as a generalization of simplicial and cell complexes. However, hypergraphs do not directly entail the notion of the dimension of a cell (or of a relation), which is explicitly encoded in the definition of cell complex and is also indicated by the cardinality of a relation in a simplicial complex. As we demonstrate in Section 4.3, the dimensionality of cells and relations in simplicial and cell complexes can be used to endow these complexes with hierarchical structures, which can be utilized for (un)pooling type computations on these structures.
3.3 Hierarchical structure and set-type relations
The properties of simplicial complexes, cell complexes and hypergraphs, as outlined in Section 3.2, give rise to two main features of relations on higher-order domains, namely hierarchies of relations and set-type relations. In the present subsection, we formalize these two features.
Definition 3.7 (Rank function) A rank function on a higher-order domain \(\mathcal{X}\) is an order-preserving function \(\mbox{rk}\colon \mathcal{X}\to \mathbb{Z}_{\ge 0}\); i.e., \(x\subseteq y\) implies \(\mbox{rk}(x) \leq \mbox{rk}(y)\) for all \(x,y\in\mathcal{X}\).
Intuitively, a rank function \(\mbox{rk}\) on a higher-order domain \(\mathcal{X}\) attaches a ranking, represented by a non-negative integer value, to every relation in \(\mathcal{X}\) such that set inclusion in \(\mathcal{X}\) is preserved via \(\mbox{rk}\). Effectively, a rank function induces a hierarchical structure on \(\mathcal{X}\). Cell and simplicial complexes are common examples of higher-order domains equipped with rank functions and therefore with hierarchies of relations.
Definition 3.8 (Set-type relations) Relations in a higher-order domain are called set-type relations if the existence of a relation is not implied by another relation in the domain.
Hypergraphs constitute examples of higher-order domains equipped with set-type relations. Given the modeling limitations of simplicial complexes, cell complexes, and hypergraphs, we develop the combinatorial complex in Chapter 4, a higher-order domain that features both hierarchies of relations and set-type relations.
References
Recall that a relation on \(S\) is a nonempty subset of \(S\).↩︎