Open cybernetic systems II: parametrised optics and agency

This post is the second in a series about open cybernetic systems, i.e. the categorical framework for cybernetics we have been developing at MSP and which is underlying the last two papers I coauthored.

Here’s the whole plan of the series:

  1. Open cybernetic systems I: feedback systems as optics
  2. Open cybernetic systems II: parametrised optics and agency
  3. Open cybernetic systems III: control mechanisms and equilibria (coming soon)
  4. Bonus: Amazing backward induction (coming soon)

Parametrised optics and agency

Last time, I described how optics can be a good formalism for feedback systems, i.e. systems whose dynamics happens in two stages (which I dubbed ‘do’ and ‘propagate’), like the undertow on a beach. It often happens, in practice, that a system dynamics is not set in stone: someone can turn a knob and change the dynamics at will:

Ideally, an agent would fully control the dynamics, as a kid does with their toy plane. In practice, they don’t, as pilots know: the knobs and switches and levers in the cockpit provide the only ways they can influence the plane. This is often enough to do what they want, but they surely can’t specify arbitrary trajectories for their plane. The extent to which an agent can actually control a system is called controllability by control theorists [0].

You can add as many knobs as you want, Airbus, and yet the kid will beat you everytime. (credit: left, right)

The ‘someone’ who gets to play with knobs is usually an agent of some kind: players in a game, training algorithms in a neural network, or ‘controls‘ in control theory. Observe that in all these examples, agents not only have a knob to turn but also have some kind of sensor that measures the state of the system and to which they can, ideally, react accordingly (indeed, a plane cockpit is full of knobs and gauges, screens, windows etc.).

Mathematically, speaking a knob selects a parameter p from a parameter space P, and a sensor shows an observation q from an observation space Q. In the case of the plane, p is the position of all knobs and levers in the cockpit (and elsewhere, if there are any), while q is the displayed reading of all sensors in the plane (radar, speed, pressure, etc.).

How do P and Q interact with an optic representing the system being controlled? First of all, recall that an optic (X,S) \rightleftarrows (Y,R) is given by two morphisms \mathrm{do} : X \to M \bullet Y and \mathrm{propagate} : M \bullet R \to S in a category \mathbf C, where M is an object from a monoidal category \mathbf M called the residual. This object is attached to Y and R using an ‘heterogeneous monodial product’ \bullet, commonly know as an action of \mathbf M on \mathbf C.

Back to parameters and observations, we are going to assume they live in the same category \mathbf M of residuals and we attach parameters to X and observations to S using again \bullet. Thus we get a parametrised optic (X,S) \overset{(P,Q)}{\rightleftarrows} (Y,R) as a pair of morphisms \mathrm{do} : P \bullet X \to M \bullet Y and \mathrm{propagate} : M \bullet R \to Q \bullet S. Intuitively, the system now gets not just the ‘environment state’ X but also the ‘control state’ P, and returns not just the ‘environment feedback’ S but also the ‘control feedback’ Q. A more enthralling point of view is that state has been decomposed in what we consider going to the control and what we consider going to the environment, and likewise for feedback.

Visually, parameters and observations live in the same vertical direction we used to represent residuals: [1]

The construction of a ‘parametrised version’ of a category is called Para (the mathematical aspects of which are the topic of a forthcoming paper with Bruno and Toby). It’s a very cool construction, and quite general too. It takes a monoidal category \mathbf M of parameters and an \mathbf M-actegory (meaning a category with an action \bullet of \mathbf M) \mathbf C and produces another \mathbf M-actegory \mathbf{Para}_\bullet(\mathbf C) of \mathbf M-parametrised \mathbf C-morphisms. Explicitly, this category has the same objects of \mathbf C and morphisms X \to Y are given by a choice of P in \mathbf M and a morphism f : M \bullet X \to Y in \mathbf C. It turns out that if \mathbf C was itself monoidal and \bullet interacts nicely with such structure, then \mathbf{Para}_\bullet(\mathbf C) is again monoidal.

The instance of the Para construction we propose in the cybernetics paper (and which I outlined above) hinges upon the following fact: if \mathbf M is acting on \mathbf C, then we can get an action \circledast of \mathbf{Optic}(\mathbf M) on \mathbf{Optic}_{\bullet}(\mathbf C) [2]. The resulting category \mathbf{Para}_\circledast(\mathbf{Optic}\bullet(\mathbf C)) is what I described above, a category whose morphisms are optics with parameters and observations attached.

The graphical calculus of teleological categories, that we used for optics, extends naturally to parametrised optics. Vertical wires/boxes represent objects/morphisms in the parametrising category, while horizontal wires/boxes represent objects in the parametrised category. This means I can finally show you how parametrised morphisms compose, directly in the case of interest of parametrised optics:

Reparametrisation

The most important fact about the Para construction, however, is that the resulting category is actually a bicategory, i.e. a category with an additional layer of ‘morphisms between morphisms’. Such 2-morphisms are called reparametrisations, so you might already guess what this is all about.

Formally, given \varphi : (X,S) \overset{(P,Q)}{\rightleftarrows} (Y,R) and \psi : (X,S) \overset{(P',Q')}{\rightleftarrows} (Y,R), a reparametrisation \phi \Rightarrow \psi is an optic \alpha :(P', Q') \rightleftarrows (P,Q) in \mathbf M (hence a morphism in \mathbf{Optic}(\mathbf M), the parametrising category) such that \psi = (\alpha \circledast (X,S)) \fatsemi \varphi. Usually, we start from \varphi and \alpha and obtain \psi by interpreting the aforementioned equation as an assignment [3].

Visually, this looks like stacking \alpha on top of \varphi:

This higher structure is crucial as it provides an expressive way to model agency mathematically. That is, if parameters are the way an agent can control and observe a system, reparametrisations (seen as ‘optics in parameters-land’) are the way agent process and react to the information going to and coming from the system of interest. Indeed, being optics themselves, reparametrisations can be considered ‘systems within systems’, a point of view I enthusiastically espouse.

Reparametrisations are also the crucial ingredient to reproduce a distinctive feature of systems with agents, namely non-compositional effects arising from the long-distance correlations the persistent identity of an agent induces in a system. These effects where fundamental in our paper about open games with agency, because this is how imperfect information (and the very concept of ‘player’) manifests in classical game theory: at different points of the game, the possible decisions a player can make are not independent. For example, players might not be able to distinguish among some of the states of the game (states which might be ‘causally’ very far), hence they are forced to play the same strategy in distant parts of the game. Hence, by ‘reparametrising along a copy’, one can reproduce this situation, which otherwise would be impossible.

Reparametrising along a copy forces the same action to be taken at two ‘distant’ points in the system.
This phenomenon is known as ‘weight tying’ in machine learning.

I called these are non-compositional effects because the resulting morphism \Delta^* (\phi \fatsemi \psi) lies outside the image of the composition map \fatsemi : \mathbf{Optic}((X,S), (Y, R)) \times \mathbf{Optic}((Y,R), (Z,T)) \to \mathbf{Optic}((X,S),(Z,T)), hence it literally can’t be obtained by composition alone.

Mereology of agency

Agency is the ability of bring about changes in a system. Oftentimes, this ability is exerted in order to bring about specific states of the system, those that are deemed optimal for whomever embodies agency. Agency is usually exterted through a control, as I described above with the plane example.

I prefer speaking about agency and not agents since the latter is a much more delicate concept. Agents presuppose agency and further decorate it with identity, but identity comes in a continuous spectrum, not in discrete quantities [4]. Therefore we end up talking about an amorphous blob of agents anyway. Most importantly, agents are not ‘blackboxable’ while agency is: by definition, I can’t look in a black box and discern identity information about agents. In sum, from now on ‘agent(s)’ will be a shorthand for ‘a distinguished part of a system imbued with agency’, and does not presuppose anything about their numerosity or identity.

This whole premise already hints at the fact that agency isn’t an intrinsic property of a system. The mathematical manifestation of this fact is that parametrised optics are, at the end of the day, just optics, though presented in a particular form. We factor state and rewards in an ‘environment’ and an ‘agents’ part, although this factorization is somewhat arbitrary: we could put every agent in the environment without any noticeable mathematical difference.

This reflects an even deeper, if obvious, fact: agents have to place themselves in the environment in order to act on and observe a system. Therefore agents are a distinguished part of a given system, that we (the modelers) decide to treat separately from the rest. This manifests as an additional ‘arbitrary’ boundary between system, environment, and agents:

As hinted last time we covered environment-system boundaries, these are arbitrarily chosen in designing a model, and can be reabsorbed anytime. When dealing with the environment-system boundary, reabsorption resulted in closing up a system. Adding a second boundary effectively triples the possible reabsorptions we can perform, depending from the point of view we adopt.

Let’s look again at a parametrised optic:

Where vertical wires (representing agents) meet horizontal boxes (representing the system), an action is used to substantiate agents as actual system-level entities, hence as system parts. As I anticipated above, this can be seen equivalently as decomposing state and feedback into environment and agents parts. Thus, from the point of view of the system, agents’ decisions are part of the state of the environment. In other words, agents and environment are both external to the system, so the formers can be reabsorbed by the latter, restoring a single environment-system boundary.

Considering the point of view of the environment, we get a similar picture, except now agents are reabsorbed by the system:

Finally, one can consider the point of view of the agents. To them, system and environment are both external, hence they might as well be conflated together:

This is the most interesting of the three reabsorption operations (perhaps because we usually are the agents, so we want to take their side). It shows really clearly how agents dealing with a (partially) closed system are effectively interacting with a black-box. It gets even more interesting when the parametrising category \mathbf M comes with a triple adjunction L \vdash - \bullet I \vdash R : \mathbf M \rightleftarrows \mathbf C. In this case, the closed system agents interact with (which turns out to be given by a pair of maps P \to M \bullet I \to Q) lifts to a literal costate in \mathbf{Optic}(M) (given by the map L(P) \to M \to R(Q)), therefore showing quite literally that such a situation is completely described by the agency dynamics.

The full mathematical treatment of this situation is not published yet, but we agree it’s one of the most intriguing parts of the framework. We refer to the operation that turns a parametrised closed system into a costate in parameters as arena lifting, or transposition (hence the notation). Also, this reunites our treatment with other treatment of open dynamical systems, in which systems where modelled simply as lenses: we can imagine them as being systems of agents whose environment has not been modelled explicitly [5]

The addition of further mereological distinctions can continue beyond system and agents. Indeed, agents themselves can be subdivided in hierarchies of increasing detail. For instance, we can model the pilot-plane system as a parametrised optic, with the pilot control given by whatever the cockpit allows them to do. But then, we can reason, pilots are actually two, so we might decide to focus on one of the pilots. And we can go on: a pilot acts on the environment through their body, so we can consider the ‘system’ pilot as a cybernetic system where the pilot brain is in charge of the body which acts in the pilot-copilot duo which act on the plane. And so on, ad libitum. The mathematical reflection of these ‘higher systems’ is an iterated \mathbf{Para} construction: by letting the parametrising category be itself a parametrised category, we add one new mereological level, and so on. Higher systems we expect to describe are coalitional games and things like deep Q-learning (hence learning theory for games).

Examples

A game

The framework described above works wonderfully to represent games. Indeed, those are feedback systems with a very natural concept of agency in them, players. This crucial part was missing so far in compositional game theory (open games), and we introduced it in our latest paper about the subject [6]. In order to stress this, we baptized the new, extended framework as ‘open games with agency’.

The key difference between open games with agency and ‘classical’ open games is the explicit use of parametrised lenses (let’s work with \mathbf C = \mathbf M = (\mathbf{Set}, 1 \times) here). In open games, parameters are called strategies (and we denote them with \Omega_1, \ldots, \Omega_n, where n is the number of players) and observations are limited to real-valued payoffs (so, aggregating everyone’s payoff, a vector \mathbb R^n). The system in which players act is what used to be called the game itself, though we started calling it arena to stress the fact it’s just part of a game (the horizontal ‘system’). Arenas are often built by composing ‘decisions’, in which players observe the state of the game, and implement the chosen strategy to yield a move (this is the action, usually called play in the open games literature), and where they observe and propagate payoff (this is the propagate part, usually called coplay). In games, payoff propagation is usually quite boring since players don’t do anything to it. It doesn’t have to be like this, though, and there are situations in which non-trivial propagation is a fundamental part, such as repeated Markov games with discounting:

The dots ‘…’ mean we can repeat the \mathcal G + \mathcal U unit as many times as we want, potentially infinite [7]. The ground symbol is the discard operation, i.e. the unique morphism !_A : A \to 1 from a given set A.

In a Markov game, the only thing remembered from one round to the next is the ‘final state’ of the game, that becomes the initial state of the next round. Crucially, players do not observe the game in-between rounds. So \mathcal G here acts like a kind of state machine, where ‘transitions’ (moves) are decided by the strategies of each player. After a round is completed, payoffs are distributes among players: \mathcal U is indeed observing the state of the game to generate a payoff vector (hence \mathcal G + \mathcal U behaves like a Mealy machine).

In the vertical direction, we handle strategy and rewards distribution. Strategies are copied to be the same in each round, while rewards are computed by summing the payoffs obtained in each round. Moreover, we apply discounting, which means that payoffs from round k are multiplied by \delta^k, where 0 < \delta < 1. Ideally, this models the fact that ‘future gains are less and less valuable’, which is both a reasonable modelling assumption (people tend to do this) and a useful technical condition (because then you can repeat the game infinitely many times and still have a convergent sum).

This is how we model the dynamics of a game. In my next blog post, I’ll describe how equilibria can be put in the picture.

A learner

GANs are very interesting examples for this framework since they lie at the intersection of game theory and machine learning. Indeed, they’re learners involved in a game: their joint behaviour is governed by game-theoretical laws but their dynamics is interpreted as that of a machine learning model.

This example is taken from our paper on categorical cybernetics:

The system represented by this diagram is composed of two agents interacting a very simple way. The generator, g, is infused with random noise z_i (the latent vector) and produces a fake vector x_i \in \mathbb R^x. Here ‘vectors’ often means ‘image’, but it could be any kind of data. The discriminator, g, is fed both a true vector d_i (usually coming from a training set) and the fake vector x_i coming from g. The goal of the discriminator is to discern whether its inputs are real or fake.

The feedback system is given by reverse differentiation, hence the feedback signal is a gradient over the action signal. In particular, in this example there’s no explicit loss function: the costates dx simply emit a 1. The magic happens in the vertical direction: g performs gradient descent on its weights (that’s the box gd_\alpha, where \alpha is the learning rate), while d performs gradient ascent (ga_\alpha = gd_{-\alpha}). In this way, g is minimizing the ‘fakeness’ value d assigns to fake vectors, while d is maximizing the cost assigned to fake images and minimizing the cost assigned to real ones (that’s why there’s a -dx in the bottom right costate).

Notice how the two d boxes are tied together by vertical wiring. In particular, the wire \mathbb R^q which represents the weights of d is copied and fed to both boxes, in order to make them embody the same discriminator both times. This is crucial to get the right training!

Notice also how, from the perspective of g, what’s happening is normal training: everything concerning d is to it just a loss function, and g is learning to minimize that, a task that amounts to generate realistically looking vectors.

Network communication

Recently André Videla gave a talk about structuring REST APIs as parametrised optics, which has been very exciting since he came up independently with this idea. In this example, agents are computers in a network, which interact through a REST API implementation. REST is a protocol for data exchange on HTTP, which allows clients to fetch and update data on a server. Unsurprisingly, these operations are easily modelled by optics (though it’s non-trivial to map such bidirectional data accessors to an actual REST API implementation, kudos to André for figuring this out). Parametrisation becomes necessary to ‘populate’ the endpoints with actual data from the server, in other words, the vertical direction represents agents’ state.

Conclusions

I hope I managed to convince you how, when combined, the \mathbf{Para} and \mathbf{Optic} constructions are able to model feedback systems with agency. Their mathematics beautifully showcases deep intuitions about the concepts of agency and control of a system. It captures parmetrisation and observations, and accounts for non-compositional effects typical of systems with agency.

In the next post, I’m going to show you how complex ‘control mechanisms’ can be described in this framework, thereby allowing us to analyze equilibria of games (and of other systems), to describe training of machine learning models, and equations of motion for Hamiltonian systems.

Footnotes

[0] Let me say that limited controllability, in many cases, is a feature and not a bug: systems with many degrees of freedom are very hard to govern, thus a limited quantity of control can be a boon. The classic example that comes to mind (and, ultimately, the reason we care about all this) is the way machine learning handles ‘learning a function’: we parametrise the space of functions and learn a parameter that best fits the true objective function. This is because function spaces are (1) intractably large, since they have infinitely-many degrees of freedom and (2) do not admit easy (or even any) representations of their elements.

[1] The usual way we draw these diagrams, that is, with vertical wires, can be misleading. Anything we draw above an ‘horizontal’ box is actually thought as living in the parametrising category, as the later diagrams describing reparametrisation shows. This is unsound, though. What’s actually going on is that diagrams in the parametrised category should live on their own plane, while diagrams in the parametrising category should live in the (3D) space surrounding that plane, which we pictured as directed orthogonally to the plane. Actions (\bullet) describe what happens when wires in the space cross the plane. There’s a developing theory behind this calculus, accompanied by several results about the interaction of parametrised and parametrising monoidal categories (‘what happens at the interface’).

[2] Actually, in the paper we work in the generality of mixed optics. So the result is quite more general.

[3] Indeed, this is a reindexing operation: each hom-category in the bicategory of parametrised optics is 2-fibred over the delooping of \mathbf{Optic}_\bullet(\mathbf C).

[4] This idea is expressed, for instance, by Integrated Infromation Theory, which can be regarderd as a ‘theory of individuality’ as explained in this beautiful Quanta article.

[5] Let me expand a bit on this. The simplest way an open dynamical system can be modelled is a lens (S,S) \rightleftarrows (O,I). Here, S is a ‘private’ state, O is an output given the environment and I is an input received from the environment. In their book, Myers and Spivak call S \to O the expose function since it exposes some observable of the internal state and S \times I \to S the update function as it updates S once feedback from the environment is received.
I interpret the asymmetry between the left and right boundaries of such a lens as witnessing the fact that this simple system really describes a control mechanism for a system, embedded (and lost) in the environment, and whose parameters and observations spaces are given, respectively, by O and I. To put it simply, I believe that such a system should be ‘vertical’ and not ‘horizontal’.

[6] The problem is subtle and rich and interesting, hence solving it has been very thrilling. Actually (as the name ‘open games with agency’ testifies) we didn’t put players in games, but simply agency. As argued above, this concept is more fluid and flexible, and allows us to treat players without worrying much about their identity. We live this concern to the only person who can look inside black boxes: the user. Also, we expect that such a fluid concept of agency will pay dividends when doing cooperative game theory, in which players can ‘merge’ in coalitions, which have all the characteristics of monolithic players.

[7] Iterated games can be treated coalgebraically. An approach is sketched in this MSP paper, though the framework used there was still rudimental. However, the same construction (up to adapting the notion of 2-cell used there) can be replayed in the new framework to yield similar results.

One thought on “Open cybernetic systems II: parametrised optics and agency

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.