Creatures Caves

Steve recently posted a massive update about a breakthrough he's had on the brain map problem that has stumped him for the better part of a year.

The tldr of this is that making a brain that can plan a series of actions towards a goal, while keeping track of thousands of neurons representing variables such as "my leg is bent 10 degrees" and "it's raining" and "I'm slightly hungry" and "I want to be over there" and "where did I last see that carrot" is very difficult, because how do you tell a virtual brain which of those variables is actually important in the task of, say, satisfying a creature's Hunger?

The last time he ate a carrot, a bird was flying overhead. Does that mean a bird has to fly overhead for him to be able to eat carrots? How does a virtual brain learn this when it has no idea if birds are important to eating?

The solution he came up with allows a creature to imagine a sequence of events, guess at the likely outcome, decide how it feels about an imagined course of action, and then choose whether or not to follow it. All while being aware of unimportant details (I walked past a pretty flower) and disregarding them unless they later become important (I want to pick flowers, where do I find one?)

It also allows the creature to dream - reliving previously experienced patterns of events - when it sleeps.

Full update below. It's a long and complex read, but he does a good job of explaining it in relatively simple terms. Relatively.

*********

Steve wrote:

This post is about the new, much agonized-over, executive planning system, the 'conscious', 'deliberate', 'thinking' part of a gloop's brain (the scare quotes are to avoid being sued by the Union of Easily Annoyed Philosophers). But to provide some context and bring the newbies up to speed, I'd better start by explaining the more general principles behind how a gloop’s brain works. After that I'll describe a few of the unexpected problems I faced when trying to extend these to planning, and why my initial brilliant ideas turned out not to be so brilliant after all. And then finally I'll be able to explain roughly what the solution was, and a bit about why I think it's really quite interesting.

Part 1: How I thought it was all going to work when I was still young and foolish

Imagine a paper map and two small pebbles, which we'll call yin and yang. The map might represent a literal map of our local area or it might be more abstract, like a map of the different directions we can point our head, or a map of different kinds of object, arranged by similarity or function.

The yin pebble in this analogy marks where we are now on the map (e.g. which kind of object we’re looking at or which way our head is pointing). The yang pebble, on the other hand, marks where the map itself or one of its parent maps would like us to be. The aim of the map (because this is a map with parents, children and intentions!) is to try to bring where we are now into sync with where we want to be, by changing the world, either directly or more often via other maps.

In reality, a map in a gloop's brain is made from little neural circuits that I call columns. Each column memorizes a specific pattern of locations on several other (child) maps or sensory/motor organs. Imagine a neuron sending out fibers, one into each child map, that each point to a particular spot. The nearer each child’s yin pebble is to our fiber at any moment, the more electrical current the fiber receives. In a real neuron, the fiber would end in a big splodge of even tinier fibers, called a dendritic arbor, which spreads out over some distance, and thus the pebble is basically always somewhere on this ‘receptive field’, but the nearer it is to the main fiber, the more influence it has. The pebbles aren't pebbles at all, of course. They just represent the position of peak nerve activity on each map.

For instance, a map representing all possible body postures will have neurons whose fibers point to other maps representing the possible angles of each limb joint. The yin pebbles on those maps tell us the angle each joint is presently at, and hence a specific pattern of these pebbles will represent 'sitting down'.

If the joint-angle maps' yin pebbles are at exactly the same places pointed to by a particular neuron on the posture map above them, that neuron will fire very strongly and the posture map will conclude that we’re sitting down. But this isn’t an all-or-nothing affair. If we’re half-way through the process of sitting down after standing, all of the posture map’s neurons will be firing to some extent, but the peak of this activity will probably lie somewhere between the points on the map representing sitting and standing.

The map constantly measures the amount of activity across every one of these pattern-detecting neurons, finds the current location of the peak activity, and then 'places its yin pebble' at this point. This is the map's yin state. Its own parent(s) will then use this yin pebble as part of recognizing their own patterns in turn, and thus as we go up the hierarchy the yin states become more abstract and global. At a higher level, ‘sitting fishing’ might be a different point on a map from ‘sitting thinking’, say.

Maps and pebbles are a much easier analogy to think about than neurons and activity surfaces and receptive fields! But just remember that unlike real maps, every point on a brain map represents a pattern of points on some other maps.

So, every map knows a little bit about the current state of the world or the creature’s body and systems, and represents this state using its yin pebble. This is where we are. Meanwhile, the yang pebble represents where some part of the brain wants us to get to. Perhaps the map has its own opinions about this (intrinsic goals), or more likely one of its parents is telling it what it wants to achieve (an extrinsic goal).

If the yin pebble says that we’re standing up with a fishing rod in our hand and the yang pebble says we want to be sitting down fishing, there’s obviously some work to be done. We need to change the world so that it reflects the state we want it to be in. In other words, we (the map of recreational activities) want to create a set of goals for our children, by positioning their own yang pebbles, such that their yin states, and hence ours, eventually move across the map. They, in turn, do the same to their own children. Once everybody’s yin pebble is right on top of their yang pebble we know we’re sitting down fishing, because one of our children knows we’re sitting down, one of its children knows we’ve got our left knee bent, and so on.

The task constantly facing a map is to figure out which output signals to send to its children in order to bring the world, and hence its own yin state, into alignment with its desired goal state and keep it there.

In many cases (and this still kind of amazes me) all the yang pebble has to do is trigger the very same pattern that has already been learned by the neuron directly beneath it, except this time it outputs that pattern instead of comparing something to it. If this pattern will be true when we are in the required state, then this is exactly the pattern of goals we should send to our children to cause us to be in that state. It’s a natural consequence of the rule that each map is trying to make its yin the same as yang. If we send our joint-measuring-and-controlling maps the 'sitting-down' set of joint angles, we’ll end up sitting down. Problem solved!

But it’s not always so easy. Even trying to arrange ourselves into a sitting position from standing, at the maximum speed our joints can flex, will likely end up with us looking like a flying yogi for a moment, before collapsing in a heap on the floor. But if we morph more slowly from the standing pattern to the sitting one it’ll happen much more gracefully. And since the map of all possible postures is likely to have arranged itself quite neatly, this simply involves sliding the nerve activity from the current yin state across the map to the desired yang state, activating each neuron we cross on the way.

In other cases it’s more complex still. Even a mere joint can’t always just be told where to go. Sometimes we have to account for the weight and inertia of the limb, so that we don’t overshoot, and the unpredictable resistance caused by obstructions, wind, carried objects, etc. Hence we need the peak of motor activity to dance around quite dynamically, constantly working hard to minimize the distance between yin and yang.

I’ll give you one more example like this, because it’s an important one for understanding what I’ll be talking about next. Suppose we have a brain map that’s literally a map of our local territory. The ‘morph-by-sliding-motor-activity-smoothly-from-one-place-to-another’ rule applies perfectly well to such a map, because if we want to end up all the way over there, we inevitably want to pass through all the intermediate locations in turn – that’s how physical space works. But only if there are no obstacles! The shortest route between two points might be through a door behind us. So I came up with a scheme in which a flowing 'current' (imagine water) is emitted from the point we want to get to, and flows out in all directions. It will get deflected by any obstacles it meets (neurons on the map that we’ve temporarily deactivated, because we’ve detected a physical obstacle at that spot using our eyes). And hence by the time it reaches the present yin state it might be coming at it from an unexpected direction. If we continually move from the present yin state one neuron ‘upstream’, we’ll end up walking around all the obstacles, following the current to its source.

In all of the above cases it’s been assumed that similar states are closer together than wildly different states, and hence moving from one place on the map to another will involve passing in a relatively simple way through the states that lie between, even if this is not a straight line because of obstacles. This is a safe bet for simple maps consisting of no more than two dimensions, like a map of the different directions in which we can point our head, say. But whilst a map of whole-body postures can be expected to have some level of ‘smoothness’ like this, such that 'sitting' is a point on the map somewhere between 'standing' and 'lying', the more dimensions there are, the more convoluted and ‘subdivided’ the map inevitably has to be.

We see this in real brains, too. One of the earliest maps in the visual system is mostly arranged by where a stimulus is on the retina. Straight ahead is represented in the middle, bottom-left to the bottom-left, and so on (although in reality this mapping is divided in two, turned inside-out and squished around quite a bit). But the same map also has to represent the various possible angles of the edges and textures we might see at each point on the retina, because those are the patterns it’s supposed to be looking out for. So it has to scrunch these up into lots of repeated mini-maps, spread across the bigger map of position. The result is a bit like a tray with a large number of spiral seashells arrayed on it, where the angle of the spiral at any given point represents one possible angle of an edge seen at the shell’s location.

By the time we get well up the hierarchy of maps, perhaps to a map containing relatively complex behaviors like picking something up, pointing at it, hitting it and so on, there’s no good way to group all the patterns in a smooth arrangement, and so it no longer follows that to get from yin to yang it makes sense to activate all of the patterns that lie between them.

So this is where planning comes in. If we’re in state A and want to be in state G, we might have to get ourselves from state A to state X first, then from there to state R and from R to state G. The states are probably spread all over the map. So how do we find the best route? How do we learn which states lead successfully to which other states and under what circumstances? And just as importantly, which ones don't?

When I first started to think about this problem early in the project, I had quite a lot of other stuff on my mind! But it seemed rational that we could learn how to get across such a complex, tangled map by simply linking up states that we discover from experience tend to follow each other in time. In other words, while all the other maps are similar to a contour map of a wide open area, we now needed to build a road map.

And pointing at things is what neurons are good at, after all, so as well as neurons that point to a pattern of locations on multiple child maps, all I'd have to do is add some neurons that can point to other neurons in the same map.

Easy.

Part 2: Where it all went wrong.

The obstacle-avoiding map I mentioned above seemed to my mind like a good starting point for planning, too. Admittedly we now have to stick to the roads, rather than just avoid obstacles in a continuous space, but a very similar ‘current flow’ idea can still work very well for finding routes across a road map, which is why similar algorithms are used inside your satnav.

All I had to do was let current flow outwards from a single desirable goal (or even many of them at once), along these one-way streets, essentially tracing all possible routes simultaneously, going backwards in time from a desired end-state. By the time any of these various currents reach our present yin state it’s possible to know which is the best state to try to get into next, in order that we reach our ultimate destination eventually. And the intensity of the current can even represent how much pain or pleasure we can expect to experience in total while getting there.

Even at the time, I was aware of a few awkward snags with this idea, but it seemed pretty feasible and I expected to have it all working within a week or so.

Ha!

My biggest mistake, I think, was to allow myself to be misled by the nice, tidy progression of ideas, from simple joint maps, through obstacle-avoidance maps, to planning. Every one of these other maps contains a single yin pebble, and so I assumed this one would too. Personally, I'm in one particular state right now, and since I want to be eating my lunch I’ll soon be in a different state. I only have one body, so I can only be in one state at a time!

But I was wrong. Or rather I’m right – I am only ever in one state at a time. For want of a better name, and perhaps misleadingly, I refer to this total state as the current Gestalt. But which particular aspects of my current state are important for deciding which state will follow next, and which ones really don't matter?

The state I’m in now is incredibly complex. I have a keyboard in front of me and a beer to my left, it’s sunny outside, my annoying neighbor is playing her TV too loud, etc., etc., etc. If I decide to put the bottle to my lips, my thirst will go down. But will this still happen if it’s raining? Does my neighbor’s TV have to be on? If her TV goes quiet while I’m drinking, did drinking make that happen?

To an experienced adult human it’s pretty easy to infer which parts of our gestalt state are likely to be important preconditions for the rule to be true, and which of the myriad aspects of the following state were actually caused by the action I took next. Mostly it's common sense. But it took us years to perfect this ability. There's nothing more complex than common sense! For a gloop or a newborn baby it’s all magic. Sometimes crying gets you fed and sometimes it might get you yelled at. Which of the many facts you're aware of at the time actually determine what the outcome will be?

So clearly it’s not the case that we can just calculate our total yin state in the way that all the other maps do, and record it using a single pebble. If nothing else, having learned once that putting a bottle to my lips on a sunny day makes me less thirsty, I’d have to discover this all over again from scratch when it rains, because I’ll be in a completely different start state.

And this problem is compounded even more once we have a ‘mind’s eye’.

Up until this stage of the development I’d been dealing with maps that in many cases are fed live visual information from the eyes. But already I’d found the need for a form of short-term memory, because the eyes are constantly moving. It’s all very well seeing an apple and deciding that it might be a good thing to pay attention to because you’re hungry. But what if you don’t get hungry until later? It would be crazy to expect a gloop to blunder around until he happens to catch sight of the apple again. He should already know he’s seen one, and furthermore should remember where it is. All he should need to do is discover that he’s hungry, wonder whether he’s seen any apples lately (i.e. turn his mental attention to the subject of apples) and thus know where he has to go to find the thing that will solve his problem.

Having both physical attention (e.g. gaze) and mental attention via short-term memory is a tremendously valuable thing, for many reasons. It means we essentially have a nice, stable, unobstructed mental world to look around, rather than the fickle, momentary, one-thing-after-another world our dancing eyes see. Our eyes are mostly only needed for updating this virtual world with new information as we come across it. We’re better off looking around in our minds first, and only then looking at the actual world.

This is great, but unfortunately I now not only have a beer and a neighbor and sunshine to think about; I also know where my car is, where my toothbrush is and a billion other facts simultaneously. Yes, I can decide to mentally look at them one at a time, using some rule or another for filtering and selecting among memories, but this whole idea of being in a single, comprehensive yin state is starting to look increasingly ridiculous!

Now, there are two main ways to approach this problem, and believe me I tried all 400 of them!

The first way is to break the gestalt down into smaller fragments. Perhaps there’s a neuron that fires best when I see a beer bottle glinting in the sun, and another that associates sunshine with my neighbor’s TV, and so on. For every situation that gloop finds himself in, he could momentarily record a whole bunch of single, paired and triplet combinations of raw facts, rather than form a single gestalt representation of the entire situation.

This takes a lot of neurons and it scales pretty badly, but it’s essentially how I solved the problem in my Creatures game, and so I did at least have some experience with this kind of approach. But Norns don’t make plans. Their brains simply have to learn which mini-pattern of sub-facts predicts the best impulsive response. They don’t have to chain these actions together into complete narratives. And therefore they don’t have to deal with the problem that, if you have 100 neurons firing right now, representing various fragmentary combinations of facts about the current situation, you’ll likely have another 100 neurons firing a moment later. Many of them will be identical but some will be different, and since you don’t know which of these actually matter yet or are related to each other, you’re going to have to connect every one of those first hundred to many of the hundred that follow, just in case. And that's quite a lot of neurons, just to represent two moments in time…

The other basic approach is to learn a single gestalt using a single neuron with lots of input fibers, and connect this to the gestalt that happens next. But we design this system in such a way that these neurons can gradually learn which parts of their total pattern really matter for the transition to remain true and which can become set to more of a 'don't care' condition. This essentially ends up with the same result as the approach above, except instead of starting out with zillions of small fragments and eventually being able to discard many of them, we start with one gestalt and gradually turn it into a meaningful fragment. It sounds like a reasonably workable and scalable approach, even though we now need a way to control how selective or tolerant each neuron’s input fiber is. (Not coincidentally, this is akin to controlling the spread of a dendritic arbor in real neurons. Remind me one day to tell you about Q, which is how I represent the distribution of dendritic arbors).

But even this solution has some hidden snags, which is why I haven't tried to explain the idea with an example. The examples don't work! The biggest snag is that we don’t really know what the next state is any more. The gestalt we connect to next has to be just as flexible about certain of its input conditions as the present one. But it’s not guaranteed that the gestalt best describing the outcome of the first one is also the one that best represents what we should do next.

This is a bit non-obvious, perhaps. It certainly didn’t occur to me for ages. But suppose that picking up a kettle means we’re now in the state of having a kettle in our hands. It doesn’t matter whether the faucet was running or not when we did this, and so we end up connecting two gestalts, neither of which cares about the state of the faucet. All well and good. But if we’re trying to make some tea then we now need to pour water into the kettle. If the faucet is off, we need to get it into the state of being on first. So now the state of the faucet does matter. Yet that’s not necessarily the state we’re in! Or at least, it’s not the state we predicted we’d be in. There will be a different gestalt somewhere else that better describes the state we’re in as far as filling the kettle with water is concerned, but if this rule has no connection to the previous one, how do we find a route from our goal back to our starting point?

Argh!

Ok, I won't go on to list the other 17,482 snags I found with this approach, because hopefully you get the drift. These ideas seem simple until you try to work out the fine details, and then it all goes horribly wrong. Every single time. You know how it is when you're trying to juggle a million things to solve a problem in your own life, and someone walks up and says 'well why don't you just fix it?' and you want to asphyxiate them with a pillow? It was like that, except I'd have had to asphyxiate myself.

Part 3: The kind of ‘breakthrough’ you get if you bang your head repeatedly against your desk for long enough

The solution to hard problems is generally to be found by looking at them in an unusual way. Sometimes even by turning them completely inside-out. I knew this, so I tried looking at this one from all sorts of angles. And yet for a very long time I unintentionally managed to stick to a pretty big assumption that I really should have questioned earlier. This was a costly mistake, even though I think I’m pretty good at this sort of thing and know how to interrogate myself for any unquestioned assumptions. I guess I just got overwhelmed by the complexity of the whole system and how much of the story I had to keep in my head, that I became trapped into a particular narrative about it.

To explain this assumption, I need to back up a bit and introduce another idea that hasn’t come up yet. Remember the ‘current flow’ model I use for parts of the brain involved in navigation? I took it for granted that I would use something similar to this for making abstract plans too, except this time applied to a roadmap. The purpose of the current flow would be to find a reasonably optimal route, no matter how tortuous, between where I am now (cold and hungry) and a nicer state to be in (Starbucks), even though this involves a succession of states that have nothing to do with coffee at all (like getting in my car).

This flow idea is a nice, massively parallel method – a way of searching zillions of possibilities ‘at once’ without the problem getting exponentially harder as the number of learned situations or steps in the plan increases. Our own brains do this kind of thing a lot (at the level beneath our consciousness), because the brain is inherently a massively parallel machine.

But I also had it in mind to add an important but rather more serial process to the planning mechanism. For instance, I want to be in Starbucks right now and I know how to get there. But does my car have any gas? Am I too tired to bother? Am I expecting a phone call? Is it the middle of the night and Starbucks will be closed? These are not questions the planning part of my brain can answer alone, because it doesn’t have the specialist knowledge. All sorts of different parts of my brain would have useful opinions about this plan, if only they knew what their part in it was supposed to be.

And so, even if I come to a general conclusion about where I’d like to be and approximately how to get there, those other, semi-independent parts of my brain will only be able to provide opinions and judgments about feasibility, applying their specialist knowledge about geography, the aches and pains of my body, etc. if I think it through, one step at a time. At some level, perhaps in a fragmentary and nonlinear way, I must act out the plan to see how the details might play out.

This is called rehearsal, and it’s as important to the brain of a reasonably complex animal as it is to a theater troupe about to perform a play, or a school or an airline pilot rehearsing for a possible emergency situation.

Now, it seems to me that brain-wide rehearsal is necessarily a one-step-after-another process. We’re not a trillion single-celled creatures acting in parallel now; we have to see ourselves as a single, unitary actor. We can’t be in two places at once, and although some of us might be able to walk and chew gum at the same time, we can’t walk east and west at the same time. In short, we need to be of one mind, and we need to tell ourselves a serial story about the future, with only one plot and starring only one person, in order to see if that particular future is likely to pan out.

This rehearsal can’t (I assumed) be done at the same time as coming up with the plan in the first place, because unlike the general planning process it involves the whole brain. We can’t ask the navigation parts of the brain how they’d feel about going from A to B at the very same time as they’re busy trying to work out how they'd feel about going from A to C.

I’d figured out an approximate solution to this rehearsal problem a long time ago. I’d found a way in which the various maps of the brain could not only coordinate while actually performing a complex task but could also pretend to be doing it, and in the process check that everyone is happy and confident about the script, well before Opening Night. I won't describe how this works in any detail, because it'll just add a lot more complication.

And so I had myself a massively parallel planning process, based on flow, plus a brain-wide mechanism for rehearsal that needed to be approached serially, even though (and in fact because) it ironically involves mass-coordination across all modules of the brain at once. This serial part did feel a bit ‘tacked-on’ and not really integrated with the parallel part, but it seemed like I needed both, and because I’d come at this from the direction of the massively parallel flow idea, that part seemed like a given.

I did go ahead and develop the rehearsal part (in fact I’ve had to bear it in mind throughout the entire development process), and it worked pretty well, on the whole, but although I already had the current-flow technique available for forming the plans prior to rehearsal, I kept getting stuck on this business of how to represent the yin state or states, and how to turn a system filled with ambiguity and tolerance into a nice, neat road map.

It just didn’t occur to me, for over a year, to try turning this setup completely inside-out. Or rather completely rearranging the parts.

Part 4: How it works now, sort of, I hope

Alright. So let’s go with the idea that we record each gestalt – the totality of each situation we experience at some relatively high level of abstraction – as a single entity. But through trial and error we learn which aspects of that gestalt seem to matter for the outcome to be the same. Our sensory experience is unitary - a unique assemblage of facts - but because these learned rules are flexible, we still end up in multiple states at a time. Each firing neuron only 'cares' about some aspects of the gestalt but they may be different aspects in each case.

So we need lots of yin pebbles - we're in multiple states simultaneously - but we already knew we'd have to allow that. We don't need to know which memory best represents the situation we're in, because all of them are potentially relevant. We're more interested in what each one says the outcome is likely to be. Imagine standing at the junction of three roads. We could go down any of them, but what we're interested in is which one leads to our destination.

The problem is, we can't record the roads by connecting junctions together. Remember the kettle and faucet problem above?

So how about, instead of pointing to another state as the outcome, we simply keep a record of the pattern of child inputs we saw next? In other words, each rule stands completely alone. It says, ‘here is the input pattern that suggests we have a good prediction to offer, given some level of tolerance to variations, and here is the pattern we expect next, if we do X’.

The snag with this, though, and the reason it took me so long to consider it seriously, is that there’s apparently no way any longer for us to construct a plan using a massively parallel process. The whole ‘current flow down roads’ business relies on there actually being a road from state A to state B, and without connecting one state to another, there’s nothing for the current to flow along.

Obviously, we could work out which gestalt’s input pattern best represents another gestalt’s output pattern, by comparing them fiber by fiber every single time we want to know, but it’s now an expensive, slow process. Hey, I represent having just done A to a particularly B-looking C whilst in mood D and weather E. Anyone else recognize some of those conditions? For each possible outcome, we must repeatedly check each possible situation. In a map with a thousand rules we'd repeatedly need to do a million comparisons. And that's the least of the problems.

This is already partly what makes rehearsal into a serial process, as it happens. During rehearsal we have a single candidate outcome pattern at any particular moment, which we send out of the map as a pretend goal, and eventually receive the same pattern back again from our children as if it has actually happened. For every step of the plan, this allows the other maps of the brain to ask themselves whether they could achieve what they're being asked to imagine, and how they feel about doing it. But in the case of rehearsal we only had one such situation to consider per step of the plan, and only one plan to rehearse. We’d already done the basic planning in a nicely scalable, massively parallel way, and we only needed to push the various steps in this story through the serial, imagine-doing-this-particular-thing-next bottleneck once.

Nevertheless, let’s run with the idea and this time try to do the rehearsal process at the very same time as the planning process. We’ve just got to make sure we don’t have to try forming a plan by 'simultaneously' comparing every possible rule with every other possible rule, just in order to see which ones connect to each other, because it would be completely impractical to feed all possible plans through the slow and narrow bottleneck of rehearsal.

Ok, so we'll stick to a bunch of gestalt rules. Each rule has an input pattern and a predicted outcome, which is also a pattern. Both patterns can learn to be more or less tolerant to variations in the conditions. For the input pattern this means that some of the inputs learn not to be too fussy, because a wide variety of inputs have turned out to lead to the same outcome. For the output patterns the tolerance level has something to say about whether and how the outcome changes the inputs, but don't worry about how this works for now.

Because of this tolerance, any number of the rules may be triggerable at a given moment. Each triggerable rule represents some action we might take in the current circumstances, and predicts that taking this action will cause one or more changes to the situation.

The rules only describe the pattern of features that they predict will happen next, rather than connecting directly to other rules. Actually that's a lie: in practice I do have a wire that links each state to (one) other state, and I do flow current down this wire. What’s more, a lot of it even works in parallel! But this is all done in a subtly different way, which I would never have figured out if I hadn't broken away from my older paradigm and started thinking about it from this new angle. Forget these wires for a moment. They'll come in handy later.

Crumbs, I don’t know how much of this you’re following. Very little, probably, and it's about to get worse. It’s hard enough to figure out how I got here myself, and I’ve had a lot more time to think about it than you! But if you can come away with a general picture then that's enough. It'll start to make more sense in the months and years ahead, if it interests you enough! There are people out there, twenty years down the line, who know a great deal more about Norns than I do.

Ok, so never mind the rest of the issues and details, let’s watch the new system in action:

So, to repeat: the map is an array of independent ‘rules’, each consisting of an input pattern and an output pattern. The input pattern describes (with greater or lesser tolerance) the aspects of a situation to which this rule applies, and the output pattern describes our prediction about what we think the world will look like next, after we’ve performed a specific action.

Oh dammit! Before we go too far, we need to examine this prediction and action a bit more closely.

For one thing it turns out that we don’t need the action part to be handled separately from everything else any more, as we did when we were saying that ‘this situation leads to that situation when we do X’. In that method, we needed X to be part of the connection, not part of either state - it was the name of the road. But now it can be part of the prediction pattern itself, essentially because an action is also a prediction!

Say we want to send the command to EAT. We’ll know that this has been carried out when the yin signal we receive back is also EAT – i.e. when yin equals yang. That's the law. So a command is really just a prediction of a future state. When we’ve eaten, we will have eaten! This perspective is related to a psychological concept called ideomotor action.

And something similar is true for everything that might merely be changed by taking action X, as well. In fact there are three kinds of prediction altogether, even though they all look identical:

There’s the action we can predict we will shortly have carried out; there are the things we can predict will have changed as a result of that action; and there are the things we’ve decided we don’t care about as triggers, which we can generally assume will remain unchanged by the action, yet still form part of the following gestalt and hence have to be predicted.

In this latter case - things we don't care much about - our prediction shouldn’t be based on whatever we’ve learned tends to happen, but on what is happening now. If we have a rule involving cooking some food, where food is a pretty broad concept, and we’re currently paying attention to a carrot, then we should predict that there will be a carrot involved in the outcome, not an apple. This is important, because not only are predictions fundamentally the same thing as actions, they’re also the same thing as attention. Our short-term memory might be thinking of that carrot we just saw, but if we were to predict that the outcome involves an apple then it’s not merely the case that our prediction will fail; instead it will succeed in making us think about apples. Which is pretty silly when we’ve already got a carrot.

You don’t need to know stuff like this, I guess, but it took me an entire trip to Australia to figure out what to do about just that one part, so it demonstrates that these details can be more convoluted than you might think!

Anyway, so we have a rule that can learn the conditions under which it can be triggered, even when some of these conditions are ‘don’t care’ or ‘any old food’. And the rule can dynamically form a prediction about what those parameters will look like next, given what is true now, what our actions and attention shifts might be, and our expectations about what changes as a result of these.

And here’s what we do with these rules:

We start out in a particular situation – our present reality – and every rule does its usual thing of matching itself against this input pattern to see how well it recognizes it. Most of the rules will turn out to be useless (unable to be triggered) but quite a number might describe subsets of the current situation sufficiently well and so will fire. We mark all of these rules as ‘depth zero’, meaning that we could choose to trigger any one of them right now if we wanted to.

It may be that one of these rules is already so desirable (predicts such a reduction in our needs) that we just want to go ahead and do it right away – to act impulsively. But on the whole, as long as we can afford some time to think about it, we’d like to wait and see if one or more of these rules might lead to other situations that themselves are desirable. Maybe the first step is of no benefit but leads to a situation in which a second step will take us to a happier situation. Or maybe it will require a third step or a fourth.

So, we pick one of the rules we just marked – one of the things we could legitimately choose to do now. Ideally we’ll pick one that looks pretty promising or has some other encouraging characteristics (e.g. we’re very confident about its predictability), but either way we pick one probablistically. Imagine them competing to be chosen. Me, me, Miss!

And then, since we're (hopefully) not busy actually doing something else at the moment, we pretend to trigger it. We send its live prediction to the map’s output, just as if we were acting it out (remember that predictions are also actions and attentional goals). This is only an imaginary intention, not a real intention, but apart from having that characteristic it gets sent down to our children in the usual manner.

Most of our children will then follow these imaginary orders just as if they were real ones, but at some stage, one or more levels down and for reasons I won't go into, they don’t act on these goals, they just pretend they’ve done what they were told to do, which simply means that they reflect their yang signal straight back as yin. When yin equals yang, a map can legitimately claim that it’s done what it was told to do, even though in this case it’s just pretending.

So eventually the planning map gets back a bunch of yin signals that come to match the ones it sent down, and so it, too, thinks it has achieved its imaginary goals. The point of all this is that it gives the children and grandchildren an opportunity to play out every step of what they would have to do if the rule we triggered in our imagination was triggered for real, and then provide feedback on how they feel about complying, or indeed whether they think they could comply at all. If we have to go through the Dark Wood to get to Starbucks, our navigation map will figure this out and proffer an opinion on the risk.

When this reflected yin arrives, a fraction of a second later, the yin inputs to the planning map will no longer represent the present state of the world; they'll represent a hypothetical next state of the world. The creature is thus now 'aware of the future' instead of being aware of the present, and I call this the virtual now. The real now is actually just a special case of this, and the virtual now can go back into the past as well as forward into the future (see below), but either way it’s the moment in time the creature's planning map is currently ‘experiencing’.

Once again, presented with a new virtual now, we match all of the unmarked rules against this new input pattern to see which rules we predict we could legitimately trigger after triggering the first one. These we mark as depth 1, because they’re one step further into the web of possible futures.

And in addition we make all of them point, temporarily, to the rule we just triggered to form our virtual now. We'll need this link later, for flow, but it also means they have access to the dynamic prediction that their predecessor computed, and hence can work out their own predictions too. If in reality we were thinking of a carrot and the first rule was happy that any old food would do, we'll base our own prediction on carrots. These salient but tolerant facts ripple through the whole chain of predictions in much the same way that we might tell a predetermined story to our children, but let them choose the name and sex of the character.

So now we have some rules that could be triggered right away, and a bunch of other rules that could be triggered if we were to carry out a specific one of these first set. All of the second set of rules have taken note of which rule represent the virtual now, and hence which depth 0 rule would need to be triggered before they could be triggered too.

Now we pick another marked rule to act as the virtual now. We might go back to square one and pick a different depth 0, or we might think a bit further down a branch of the future by choosing a rule we’ve just marked as depth 1.

We repeat this process as long as we feel we can afford to think about the future instead of responding to the present: We pick a virtual now, present it as a set of pretend goals to our children, get back some feedback on how they feel about it, and at the same time have our own prediction reflected back to us as a new virtual now. We check which rules could be triggered if this future situation were true, and mark them as being one step further into the future. Sometimes we follow this chain of reasoning and sometimes we jump back and consider a different branch. Rinse and repeat.

After a few iterations of this, we have a fully comprehensive guide to what we could do right now, plus we're starting to build up a more sparsely populated guide as to what we’d be able to do next, if we did some of those other things first, and then what we might do after that. And we’ve folded the rehearsal process right into this – we spend time considering one virtual now after another, brain-wide.

It’s as if we’re constantly sending out a growing web of feelers into the future. We can only be thinking of one possible future at a time, but the longer we do this process, the more possible futures we consider and the further into the future we think. We can’t afford to follow the entire tree of possible futures this way, but very importantly, as you’ll see in a moment, our ability to consider a range of possible futures continues to improve indefinitely, even as we interact with the present, just as long as we don’t get surprised along the way.

Meanwhile, as we’re trying out these virtual nows, building predictions, and marking up rules that we think could get triggered at some point in the future given what we know now, we also let 'current flow' and feedback from our children evaluate these potential paths and discover routes to good situations.

Every time we marked a rule as being triggerable as the result of the current virtual now, we kept a link from it to the prior rule that defined this virtual now, so not only does a rule know that it’s currently at depth N into the future, it also knows which specific rule has to be triggered first. It knows its predecessor.

(This link, incidentally, is dynamic and only applies to futures extending out from the actual present, which means it doesn’t suffer from all the problems that the more permanent and learned ‘state X leads to state Y when I do Z’ models did.)

Because each marked rule knows who would have to be triggered beforehand, and because every rule is constantly aware of whether it predicts a good state to be in or a bad or neutral one (via the affect layer, which I won’t go into), we can use the affect system as our supply of ‘current’ and the links to predecessors as temporary roads along which to flow it. The net result of this is that eventually we’ll come across some potential futures that are more desirable than the situation we are in now, and we can let this reward flow backwards through time, combining it with the pros and cons of prior rules and feedback we’ve received from our children during rehearsal, in order to discover which rules are both triggerable now and will lead us towards a desirable future. We have a plan!

And so we go ahead and trigger the best one of these depth-zero rules. If its prediction doesn’t come true then we’ll need to start again from a clean sheet, by unmarking all the rules and clearing the roads of their current. But assuming it does come true, we simply have to shift everybody’s depth value down by one, to account for the fact that time has moved on. All the depth 1 rules will now become depth 0, and they'll retain the current they inherited from rules further into the future, and hence we’ll find that we already know what to do next!

Any extra time we have spare for thinking will allow us to continue building up our picture of possible futures and how we feel about them, but meanwhile we just carry on following our prepared plan upstream.

Oh, and if we can’t think of anything to do next then this may be because we've reached the end of the plan and we can sit and contemplate our navels until something else occurs to us. But it might simply means that we can't find a route from where we are to a better situation, because we haven't yet learned what to do. This is ok if we're not especially hungry or whatever, but what if we're getting really hungry or scared and still have no idea what to do? What if we're not feeling particularly bad yet but don't know what to do about anything at all, because we're babies, and hence we need to explore and experiment or else we'll learn nothing? Luckily I have a solution to this surprisingly sticky problem too!

We decide to trigger a rule, not just because it is desirable, but because its desirability exceeds some threshold. If we start with a high threshold and let it rhythmically fall to a low value and reset again, in a ramp waveform, then we can make it start out so high that no rule is going to be exciting enough to get triggered for a little while, and this means we get some time to think instead, trying out more virtual nows and adding to our tree of possible futures. As the threshold falls, a rule may find that it is firing above the threshold, in which case it will be triggered and cause some action. It will inevitably be the best rule that gets there first, and the more important it is, the less time we'll spend thinking and the faster we get to act.

But if the ramp falls to its base level with nothing getting triggered, the creature will simply do nothing except think about life. When there's nothing good to do, doing nothing is usually a good strategy, so that's fine. However, hunger and suchlike will continue to rise, and eventually the gloop will start to get distressed. It clearly has problems but doesn’t yet know anything it can do about them. So if this happens, I simply start lowering the base level for the ramp. Now the threshold will drop further each time, and eventually Gloop is likely to act on even the least confident or barely beneficial rules. At the limit, he will simply pick a rule on account of random noise and try it. So Gloop will never do anything too dumb under normal circumstances, when he confidently knows what to do, but when he gets desperate he’ll at least try to do something, in the hope that it will turn out to help. When, inevitably, this random, zero-confidence prediction turns out to be wrong, he'll at least have a valid life experience to replace it with. And from that one experience of action causing change, learning will eventually hone this memory into a more reliable and general case. An episodic memory (see below) will thus gradually turn into a semantic memory - a rule.

Final part: Waxing psychological

This is not as exciting a solution as I was hoping for, I admit, but I do think there’s something quite attractive and maybe even psychologically significant about it. For one thing, we’re gradually and continually constructing a narrative. Not just a standalone plan, or a narrative about what might happen next, but also about what is happening in general. If you’ve ever woken up in hospital and suddenly wondered where you are, how you got there, and maybe even who you are and who these people are that are standing by your bed, you’ll know what it feels like when our personal narratives break down and have to be reconstructed from scratch. Hypnotists can even deliberately break down our narratives by surprising us with a suspiciously Trump-like handshake or similar, and as a result of this sudden shock they're able to make us believe the most ridiculous things. Yet when we have a good sense of what’s going on, when one prediction after another proves true, and when we even have a recent past that makes narrative sense alongside our predictions for the future, we tend to feel confident and calm, we act quickly and efficiently, and we even do things on autopilot, and hence can afford to spend more time looking inward instead of trying to figure out what happens next.

Such a narrative memory of the past, incidentally, comes for free in this system! Every time we slide all the depth values back by one because we've successfully completed a rule and so what were once options for the future have now become an immediate possibility, we can also slide the one-and-only depth zero rule that we actually just triggered back by one as well, to become a negative value. All the other depth-zero rules that we didn’t trigger – the possible futures that never happened – can be unmarked and recycled as potential rules for planning the future. But until these negative-depth rules also need to be reassigned to the future (because history inevitably repeats itself), a chain of them will remain marked as depth -1, -2, -3 and so on, acting as a linked history of the path we actually took – an episodic, partly autobiographical memory. 'Here's what I did on my holidays: first I went to the beach and then I bought an ice-cream, and then it got sand in it...'

I haven’t actually used this episodic memory for anything yet but I can certainly see some ‘memory consolidation tasks’ that might take advantage of it, requiring that we stop letting our virtual now wander off into the future and instead use it to trigger scraps of narrative from the past, and re-experience them. A really good time to do this sort of reflection would be when we don’t need to be planning the future and hence, when we’re asleep, we can afford to dream.

This whole process I've described is massively parallel, at one level, but the 'bottleneck of the virtual now' means that it also has to happen serially; we have to experience fragments of this web of possible futures or pasts one by one, sometimes in a direct sequence of cause and effect and sometimes looping back to consider another branch entirely. We end up with a constantly emerging and readjusting mental narrative about who we are, where we’ve been, where we’d (collectively, as a body and brain) like to go next, and why.

This is definitely not cognition, I freely admit, let alone metacognition. It’s currently incapable of, say, multiplying two arbitrary numbers, or saying what the letter R looks like if you rotate it 270 degrees. It can’t (yet) route signals around its own brain to achieve different computations using circuitry that developed for something else, nor in any sophisticated way can it monitor and consider its own mental state.

Moreover, it’s definitely not the superhuman AI that many people insist (and perhaps even believe, if they’re totally insane) they are very close to achieving Any Day Now(TM). I’m merely working down at the squirrel level, here (and honestly I don't know why they think they can skip over this straight to humans, let alone superhumans. They don't seem to understand the problem, although maybe they know something I don't).

So gloops certainly aren't very smart. Yet I do think that anyone who studies the mind will recognize some of the key features and even oddities of this mechanism. And assuming I can get all the detailed wrinkles ironed out of it I think it would at least be intellectually interesting to posit that, despite being stupid and nothing more than a collection of independent little neural maps moving virtual pebbles around, the gloops are somewhat situated as unitary beings within their own little autobiographical narratives of their past and future. They’ll have primitive beliefs, memories, expectations and hopes, along with worries and disappointments and surprise, all combined into a personal life history. And for us humans this is a significant part of being a ‘me’.

Phew.

So did any of you manage to get this far? :-) If you did, then I hope you can at least see why it took me a while to figure this (and a bunch of other things I haven't even dared mention) out! Sorry it took so horribly long! It doesn't work properly yet, but I'm pretty sure it will. Now I can get on and write that game I promised!