What Galois connections have to do with learning

Two functions $f \colon P \to Q$ and $g \colon Q \to P$ between partially ordered sets form a Galois connection if, for all $p \in P$ and $q \in Q$ ,

f(p) \leq q \iff p \leq g(q).

That single biconditional has unreasonably many consequences. The compositions $g \circ f$ and $f \circ g$ are closure operators on their respective sides. The fixed points of those closures form isomorphic complete lattices. And — most usefully for the present purpose — every Galois connection is determined by a binary relation, and every binary relation determines a Galois connection.

This last fact is the engine inside Formal Concept Analysis: the incidence relation of a context induces a Galois connection between subsets of objects and subsets of attributes, and the concepts are exactly the pairs of fixed points (Ganter & Wille, 1999). The classical theory of these objects is more than half a century old; its order-theoretic foundations are laid out in (Birkhoff, 1967) and (Davey & Priestley, 2002).

What is younger, and what this essay will pursue, is the observation that many learning algorithms are also computing closure operators — without naming them as such. Version space learning closes a hypothesis set under specialisation. Rule learners close a feature set under implication. Some attention mechanisms can be viewed as soft Galois maps between query and key spaces. Once one starts looking for the closure-operator skeleton, it appears in more places than the literature on FCA usually claims. The forthcoming long form will work several of these correspondences out in careful detail.