Oly on AI

“Best humans still outperform”

Oliver Sourbut — Fri, 17 Apr 2026 12:59:35 GMT

A few years ago I was tickled by an article headline in a serious academic journal:

Best humans still outperform artificial intelligence in a creative divergent thinking task

Remarkable! Man bites dog! It had become newsworthy, it was worth checking (and, I perceive, worth a little self-congratulatory celebration) that there remained any domain where mere man could still hope to possibly contend with the machines — at least, the best humans still could! (Could you?)

Source

A message from the future

That was 2023. I think what stood out to me at the time1 was that this was in some sense early. Not early in the story of AI — although ChatGPT and StableDiffusion, each less than a year old, had captured the public attention in a way which earlier AI hadn’t, these were merely the latest in a long lineage of gradual developments — but an early sign of a reckoning, an attitude shift in how humanity would grapple with these new machine capabilities we were conjuring fitfully into being.

I’d already been worrying for years that things might get out of hand with AI (and had even started writing about it). I was hardly the first!2 But this had felt almost like a perversely secret concern (how can people not see what’s coming?? — but they didn’t), one which humanity at large appeared destined to ignore until either it was too late… or, if we somehow played it right, until a splendid apotheosis of world peace, unlimited bounty, health and longevity delivered by machine intellect. (In fact I think those remain real prospects, and it’s absolutely in our hands to determine which outcomes we get.)

What this headline implicitly spoke of, the subtextual worldview shift belied by the phrasing — “Best humans still outperform” — was that we had woken up and viscerally felt the reality that even the ‘best’ humans might genuinely need to watch their backs. The machines were coming. It was no longer (had never been) a joke or a fairy tale.

This headline, seemingly from a near future in which it was taken for granted that machines, in general, dominated human capabilities, showed what was coming. Headlines like it are now commonplace — perhaps more common than those (now almost boring!) headlines adding to the litany of tasks AI now outcompetes human experts at.

The world changed

The world changed. Not because the world had actually yet changed (much), but because humanity, in our limited and faltering foresight, had noticed that, soon, it might. That murky perception of the future, humanity’s near-unique hallmark and blessing, memetically reverberated and has worked its way into our collective discourse.

In this way, I’m incredibly grateful to the ‘ChatGPT moment’. Rather than implicitly relying on a plucky band of vaguely foresighted but ultimately underpowered ‘sci-fi weirdos’, humanity as a whole is entering the conversation. We’re all stakeholders in the trajectory of this world-transforming sphere of technology, and all kinds of people are beginning to act like it: people with skillsets and perspectives which we’ll need, which had been lacking, in earlier debates. Law theorists, philosophers, engineers, anthropologists, economists, statespeople. It’s a thickly textured problem. It’ll need more than people like me (aspiring polymath though I may be) to solve it!

Source

These cultural conversation shifts are fickle but surely incredibly consequential. 2025 felt like another shift, to me, and 2026 so far — with AI producing genuine national security implications and at the centre of dirty political manoeuvring — seems to suggest that both the training wheels and the gloves are off, as Dean Ball recently put it. It’s a little scary: powerful and not altogether friendly forces have turned their eye to the potential potency of emerging tech, and they3 may wrestle for it, even under the risk that they destroy much in the process or that the tech spills entirely out of their control.

The world, changed

We can be doing better! People can get curious, find out what’s what, consider stakes and what realistic paths we might prefer. Don’t make the mistake of ‘nowsight’ bias — today’s AI are the least capable there will ever be! Take seriously where things might go, and notice if the conversation seems to miss something important that you understand well: it’s still early and the ‘experts’ are mainly that by virtue of noticing the importance of AI a little sooner than everyone else4. Let’s also grab the new tech building blocks we have and bootstrap the way we do foresight, collective intelligence, and coordination.

Don’t mistake me for naively assuming machines will blast through every bottleneck in short order. There’s a lot of adaptability, dexterity, and generality bottlenecks between here and self-sufficient machines. Perhaps I’ll write something about that soon.

(I intended to blog about it at the time, but… you know how it is with drafts.)

Quoth Turing, some time in the 1950s:

once the machine thinking method had started, it would not take long to outstrip our feeble powers… At some stage therefore, we should have to expect the machines to take control.

Even Turing was not first to perceive that thinking machines could pose takeover hazards.

I’m not only (or even mainly) talking about countries.

I’ve been bemused several times recently upon being referred to as an ‘expert’, that mythical breed.

Orders of magnitude: use semitones, not decibels

Oliver Sourbut — Wed, 01 Apr 2026 10:21:00 GMT

I'm going to teach you a secret. It's a secret known to few, a secret way of using parts of your brain not meant for mathematics... for mathematics. It's part of how I (sort of) do logarithms in my head. This is a nearly purposeless skill.

What's the growth rate? What's the doubling time? How many orders of magnitude bigger is it? How many years at this rate until it's quintupled?

All questions of ratios and scale.

Scale... hmm.

'Wait', you're thinking, 'let me check the date...'. Indeed. But please, stay with me for the logarithms.

Musical intervals as ratios, and God's joke

If you're a music nerd like me, you'll know that an octave (abbreviated 8ve), the fundamental musical interval, represents a doubling of vibration frequency. So if A440 is at 440Hz, then 220Hz and 880Hz are also 'A'. Our ears tend to hear this as 'the same note, only higher'.

That means the 'same' interval, an octave, corresponds to successively greater gaps in frequency. First a doubling, then a quadrupling, an octupling, and so on. Our perception, and musical notation, maps the space of frequencies logarithmically.

You'll also know that a 'perfect fifth' is a ratio of . A to the E above it, C# to the G# above it, etc. Consonance is all about nice ratios! (Ask Pythagoras.)

At least, the really sweet, in tune fifths are this ratio. Because God is an absolute wheeze, you can keep moving in fifths (3:2) and octaves and get 'new notes' eleven times. That's where we get our Western scale from, originally (except it's originally originally Mesopotamian probably). The twelfth time ((3:2)^12) gets you to a ratio of roughly 129.7:1. That's almost exactly seven doublings, seven octaves (7 * 8ve)! That'd be 128:1. God's joke is in that roughly 1% margin, and musicians have been arguing about what to do about it for centuries. It's a whole thing.1

Cutting a long story short, that leaves us with twelve different notes dividing up the octave. They 'repeat', with 'the same' note again and again at either higher or lower octaves (a full doubling of frequency).

In between octaves, those twelve divisions need to 'add up to' a doubling. For reasons, two steps (a sixth of the overall scale) is referred to as a 'tone', and a single step (a twelfth of the scale) is thus a 'semitone'. That means each semitone corresponds to a ratio of the twelfth root of two. (It's about 1.06, i.e. a ratio increase of about 6%.) The full scale as shown above is called 'chromatic' (because it has every 'colour'...).

This means that neat fractional powers of two map cleanly onto musical intervals. God was generous in giving twelve many factors, so we have musical intervals for the square, cube, fourth, sixth, and twelfth roots of two which come for free.

So far, no logarithms. But we have musical powers of two: give me a fraction and I can tell you the musical interval. That means we also have musical logarithm: give me a musical interval and I can tell you the power of two! e.g. C to G# is eight semitones. So

Musical logarithms? What is he talking about? Surely this is pointless. Yes, it is! Hold on!

Harmonic series

If you're a brass music nerd like me, you'll know that the 'overtones' of most natural vibrations correspond to the 'harmonic series' (no, not that harmonic series, the actually harmonic harmonic series), which are the different pitches you can get a big metal tube to vibrate at if you give it the right encouragement. Incidentally this is how brass players get dozens of different notes out of an instrument having (usually) only three valves2.

This harmonic series is generated by lovely integer ratios! Why? The physics of oscillators. Integer multiples are the only frequencies which can support a standing wave on the same vibrating object (air column, string, membrane).

Brass players spend hours and hours sliding and jumping between these harmonics as a matter of sheer necessity. Only three valves!3 So we know them by heart, by fingers, and by ears.

Combining the harmonic series with the chromatic scale: magic

So we have integer multiples, the harmonic series, laid over a fundamentally logarithmic scale, the chromatic scale consisting of twelve semitones.

Numbers above the notes correspond to small adjustments vs the equally-spaced semitones which are usually used today to deal with God's joke. Ignore them if you don't care about small percentage errors. This is the harmonic series on C; you can have a series on any starting note with the same intervals.

Here's the magic trick. Now we can go from arbitrary ratios to musical intervals!

Start with an easy one, 1.25. That's a ratio 5:4. Fifth harmonic is E (+ 2 8ve). Fourth is C (+2 8ve). The octaves cancel. That's an interval E:C, or four semitones. So 1.25 is four semitones. We already know the 'musical logarithm' of four semitones, it's 4/12 = 1/3. Check on a calculator: log_2(1.25) = 0.32193…. I promised close, not perfect!

A slightly trickier one, 1.8. That's a ratio of 9:5. The ninth harmonic is D (+3 8ve), and the fifth is E (+2 8ve). The octaves partly cancel (leaving a single octave). The interval D:E is minus two semitones. Taken off the residual octave, that leaves ten semitones. So log_2(1.8) = 10/12 = 5/6. Calculator check: log_2(1.8) = 0.848…. Not bad!

It turns out that the musical harmonic series is secretly a mini table of base 2 logarithms.

Base 10, if we really have to

The unit that mainstream sheeple often use for fractional logarithms is the decibel. A decibel divides a base ten order of magnitude in ten. So ten decibels is a dectupling, twenty is a hundredfold, and so on.

Stated similarly, a semitone divices a base two order of magnitude in twelve.

In another cosmic whimsy, 2^10 = 1024 ~= 1000 = 10^3. So 120 semitones are essentially equal to 30 decibels, for an easy exchange rate of four semitones per decibel.

What

Well, look. It's fun, and it gets me logarithms to pretty good approximation. It's good enough for ~~jazz~~ Fermi estimation, as they say. Who is this even good for? I maintain that the intersection between music and mathematics nerds is surprisingly well populated. If that's you, you're welcome. If not, I'm pretty unsure how easy it is to get the harmonic series installed in your brain. Maybe it's only available to the warped few who train in childhood.

There are some other fun tricks with powers and logarithms of two. For example, if you know your binary place values, you can figure out logarithms of very big numbers (and the trick comes in handy here too).

There's also a 'rule of 72' which helps when dealing with small percentage growth rates and doubling times.

I aesthetically like this neat division of doublings into twelve parts, and it's fun to invoke musical intuitions that really have no right to help with mathematics.

You might complain that twelfths are faffy. Who uses twelfths anyway? Everyone everywhere has used decimal for goodness' sake! Well, I have something else to share with you...

Usually nowadays we squish all the fifths a tiny bit so that when stacked up they get to that delicious 128:1.

Three valves independently up or down is a total of eight configurations. Because the third valve is usually set to be redundant with the combination of the first two (which aids fluent finger movement), there are usually only seven practically-distinguishable combinations.

Other wind players, who have the benefit of many more, but not infinitely many keys and buttons, often encounter one or two of these harmonics.

AI for Human Reasoning for You

Oliver Sourbut — Tue, 03 Feb 2026 13:22:00 GMT

Today’s humanity faces many high stakes and even existential challenges; many of the largest are generated or exacerbated by AI. Meanwhile, humans individually and humanity collectively appear distressingly underequipped.

Lots of folks naturally recognise that this implies a general strategy: make humans individually — and humanity collectively — better able to solve problems. Very good! (Complementary strategies look like: make progress directly, raise awareness of the challenges, recruit problem solvers, …)

One popular approach is to ‘raise the sanity waterline’ in the most oldschool and traditional way: have a community of best practice, exemplify and proselytise, make people wiser one by one and society wiser by virtue of that. There’ve been some recent successes, not least the existence of communities and forums like LessWrong and Effective Altruism, and some older philosophies and movements.

Another popular approach is to imagine augmenting ourselves in the most futuristic and radical ways: genetic engineering, selective breeding, brain-augmenting implants, brain emulation. Go for it, I suppose (mindful of the potential backfires and hazards). But these probably won’t pan out on what look like the necessary timelines.

There is a middle ground! Use tech to uplift ourselves, yes — but don’t wait for medical marvels and wholesale self-reauthorship. Just use the building blocks we have, anticipate the pieces we might have soon, and address our individual and collective shortcomings one low-hanging fruit at a time.1

The most exciting part is that we’ve got some nifty new building blocks to play with: big data, big compute, ML, and (most novel of all) foundation models and limited agentic AI.

How to generate useful ideas in human reasoning

One place people fall down here is getting locked into asking: ‘OK, what can I usefully ask this AI to do?’. Sometimes this is helpful. But usually it’s missing the majority of the design space: agentic form factors are only a very narrow slice of what we can do with technology, and for many purposes they’re not even especially desirable.

Think about human reasoning. ‘Human’ as in individuals, groups, teams, society, humanity at large. ‘Reasoning’ as in the full decision-making cycle, from sensing and understanding through to planning and acting, including acting together.

I like to first ask: ‘What human reasoning activities are in bad shape?’

OODA is one good frame:
- For a given important (type of) decision, what are people observing?
- How are they orienting and deciding?
- What actions do they have available and do they know how to do them well?
- What about the case of teams, groups, institutions: how do their OODAs work (and how do they fail)?
Also think about development: how do individuals learn and grow? What about groups and communities, how do they form, grow, connect?
In foresight,
- What features are we even paying attention to in the first place?
- What prospects are under consideration?
- What affordances are we aware of?
- How are we strategically creating sensing opportunities and the means to adapt plans?
- How do our forecasts achieve precision and calibration?
In epistemics, think about the message-passing nature of most human knowledge processes.
- How do we assess the nodes (communicators)?
- How do we assess, digest, and compile the messages (claims, evidence, proposals, …)?
- How do we understand and manage the structure of the network itself (communication relationships, broadcasts and other topologies, …)?
- What about the traffic (message rates, density distribution, repeated and relayed transmissions, …)?
- What messages ought to be routed where, when and on behalf of whom?
In coordination, what are the conditions for success?
- We need to find or recognise potential counterparties.
- We might need the charters, norms, or institutions to condition and frame interaction productively — ones which don’t fail or fall to corruption or capture.
- We need to surface enough mutually-compatible intent or outcome preference.
- Our ensembled group wisdom might be a necessary source of insight or agility.
- We need to survive the tug of war of negotiation (which can dissolve into antagonism, even when there’s common knowledge of win-win possibilities).
- Means of verification and enforcement may be needed to access good options.

Think of a particular audience with either the scale or the special influence to make a difference (this can include ‘the general public’), and the deficits they have in these reasoning activities. Now ask: ‘What kinds of software2 might help and encourage people to do those better?’.

Is there an edge to be gained by unlocking big (or even medium) data (which can often be more living and queryable than ever before thanks to LMs)?
Can large amounts of clerical labour (again LMs) per capita make something newly feasible?
Can big compute and simulation (including multi persona simulation: LMs again!) drive better understanding of an important dynamic?
Can extensive background exploration, search, or ‘brainstorming’ by AI surface important opportunities or considerations?
Can always-on, flexibly-semantically-sensitive sensing and monitoring bring attention where it’s needed faster than before (or at all)?
Could facilitation and translation bring forth, and synergise, the best array of human capabilities in a given context?
Could software’s repeatability, auditability, and privacy (in principle), combined with the context and semantic sensitivity of AI, unlock new frontiers of trustable human scaffolding?
…

Finding flaws and avoiding backfire

Think seriously about backfire: we don’t want to differentially enable bad human actors or rogue AI to reason and coordinate! As Richard Rumelt, author of Good Strategy/Bad Strategy observes,

The idea that coordination, by itself, can be a source of advantage is a very deep principle.

Coordination’s dark side is collusion, including cartels, oligarchy, and concentration of power, in imaginable extreme cases cutting out most or even all humans.

Similarly, epistemic advantage (in foresight and strategy, say) can be parlayed into resource or influence advantage. If those can be converted in turn into greater epistemic advantage (by employing compute and position for epistemic attacks or in further private epistemic advancement) without commensurate counterweights or defences, this could be quite problematic.

Part of it is about choosing distribution strategies which reduce misuse surface area (or provide antidotes), and part of it is about preferring tech which asymmetrically supports (and perhaps encourages) ‘good’ use and behaviour.

Do it

FLF’s fellows, and I and others have been doing some of this exploration recently. Stay tuned for more. Meanwhile, join in! We’re early in a critical period where much is up for grabs and what we build now might help shape and inform the choices humanity makes about its future (or whether it makes much choice at all). Try things, see what kinds of tools earn the attention and adoption that matters, and share what you learn. Consider principles to apply, especially for minimising backfire risks, and share particular considerations for or against certain kinds of tech and audience targets.

Thanks to Owen Cotton-Barratt and Ben Goldhaber for helpful comments, and to Lizka Vaintrob for recent relevant conversations

A close relative of this strategy is cyborgism. I might contrast what I’m centrally describing as being more outward-looking, asking how we can uplift the most important sensemaking and wisdom apparatus of humanity in general, whereas cyborgism maybe looks centrally more like a bet on becoming the uplifted paragons (optionally thence, and thereby, saving the world). I’d say these are complementary on the whole.

This is better than asking ‘What kinds of AI…’. Software is the general, capability-unlocking and -enhancing artefact. AI components and form-factors are novel, powerful, sometimes indispensable building blocks in our inventory to compose software capabilities out of.

The First Type of Transformative AI?

Oliver Sourbut — Tue, 06 Jan 2026 17:31:05 GMT

I recently contributed to a discussion of the first type of transformative AI with Owen Cotton-Barrat and Lizka Vaintrob. It’s part of a series those two (primarily, with some input from me and others) are working on, which asks, expanding on their agenda from last year:

AI is not just one, big, singular thing. What are the ways we can bring forward the beneficial possibilities, while delaying or defending against the harmful ones?

AI tools can already produce large changes, and the potentials there will only increase. ‘AGI’ is a moving goalpost, but judicious work now can make sure that society is better positioned to deal with later developments and risks (e.g. by avoiding or defending).

As I repeatedly emphasise to anyone who’ll listen: it’s never been just a dichotomy between ‘yes, good, more AI please’ and ‘no, bad, less AI thank you’ — and it’s not even a case of just making sure ‘the AIs’ are good (though this helps). How the tools and technological building blocks at our disposal are integrated into applications, workflows, and use-cases, is just as important.

And the tools, products, and systems we could develop now shape the context for subsequent developments, including by:

equipping people to better predict and understand their options
enabling people to coordinate better around preferred possibilities (which might otherwise be difficult due to mismatched incentives or race dynamics)
giving the tools to defuse or defend against hazardous developments, or their precursors

When considering ‘big transformations’ from technology, among the strategies we can attempt if we want things to go better for society are: change the ‘order’ these transformations arrive in (which might even prevent or reshape later transitions), or improve the way particularly important transitions go.

For technologists, futurists, philanthropists, legislators, experts, and other members of society trying to make tech progress go well, paying attention to which effects happen, in what order — and what our options are for choosing wisely there — looks like a really promising, and neglected way of reducing large-scale risk and bringing about huge benefits.

You can read our few pages of fuller discussion for more of our thoughts on some scenarios we think are worth considering, including intelligence explosion, turbocharged economy, and epistemic uplift.

A Full Epistemic Stack

Oliver Sourbut — Fri, 19 Dec 2025 22:35:14 GMT

We’re writing this in our personal capacity. While our work at the Future of Life Foundation has recently focused on this topic and informs our thinking here, this specific presentation of our views are our own.

Knowledge is integral to living life well, at all scales:

Individuals manage their life choices: health, career, investment, and others on the basis of what they understand about themselves and their environments.
Institutions and governments (ideally) regulate economies, provide security, and uphold the conditions for flourishing under their jurisdictions, only if they can make requisite sense of the systems involved.
Technologists and scientists push the boundaries of the known, generating insights and techniques judged valuable by combining a vision for what is possible with a conception of what is desirable (or as proxy, demanded).
More broadly, societies negotiate their paths forward through discourse which rests on some reliable, broadly shared access to a body of knowledge and situational awareness about the biggest stakes, people’s varied interests in them, and our shared prospects.
- (We’re especially interested in how societies and humanity as a whole can navigate the many challenges of the 21st century, most immediately AI, automation, and biotechnology.)

Meanwhile, dysfunction in knowledge-generating and -distributing functions of society means that knowledge, and especially common knowledge, often looks fragile1. Some blame social media (platform), some cynical political elites (supply), and others the deplorable common people (demand).

But reliable knowledge underpins news, history, and science alike. What resources and infrastructure would a society really nailing this have available?

Among other things, we think its communication and knowledge infrastructure would make it easy for people to learn, check, compare, debate, and build in ways which compound and reward good faith. This means tech, and we think the technical prerequisites, the need, and the vision for a full epistemic stack2 are coming together right now. Some pioneering practitioners and researchers are already making some progress. We’d like to nurture and welcome it along.

In this short series, we’ll outline some ways we’re thinking about the space of tools and foundations which can raise the overall epistemic waterline and enable us all to make more sense. In this first post, we introduce frames for mapping the space —3 different layers for info gathering, structuring into claims and evidence, and assessment — and potential end applications that would utilize the information.

A full what?

A full epistemic stack. Epistemic as in getting (and sharing) knowledge. Full stack as in all of the technology necessary to support that process, in all its glory.

What’s involved in gathering information and forming views about our world? Humans aren’t, primarily, isolated observers. Ever since the Sumerians and their written customer complaints 4, humans have received information about much of ~~their~~ our world from other humans, for better or worse. We sophisticated modern beings consume information diets transmitted across unprecedented distances in space, time, and network scale.

With an accelerating pace of technological change and with potential information overload at machine speeds, we will need to improve our collective intelligence game to keep up with the promise and perils of the 21st century.

Imagine an upgrade. People faced with news articles, social media posts, research papers, chatbot responses, and so on can trivially trace their complete epistemic origins — links, citations, citations of citations, original data sources, methodologies — as well as helpful context (especially useful responses, alternative positions, and representative supporting or conflicting evidence). That’s a lot, so perhaps more realistically, most of the time, people don’t bother… but the facility is there, and everyone knows everyone knows it. More importantly, everyone knows everyone’s AI assistants know it (and we know those are far less lazy)! So the waterline of information trustworthiness and good faith discourse is raised, for good. Importantly, humans are still very much in the loop — to borrow a phrase from Audrey Tang, we might even say machines are in the human loop.

Some pieces of this are already practical. Others will be a stretch with careful scaffolding and current-generation AI. Some might be just out of reach without general model improvements… but we think they’re all close: 2026 could be the year this starts to get real traction.

Does this change (or save) the world on its own? Of course not. In fact we have a long list of cautionary tales of premature and overambitious epistemic tech projects which achieved very few of their aims: the biggest challenge is plausibly distribution and uptake. (We will write something more about that later in this series.) And sensemaking alone isn’t sufficient! — will and creativity and the means to coordinate sufficiently at the relevant scale are essential complements. But there’s significant and robust value to improving everyone’s ability to reason clearly about the world, and we do think this time can be different.

Layers of a foundational protocol

Considering the dynamic message-passing network of human information processing, we see various possible hooks for communicator-, platform-, network-, and information-focused tech applications which could work together to improve our collective intelligence.

We’ll briefly discuss some foundational information-focused layers together with user experience (UX) and tools which can utilise the influx of cheap clerical labour from LMs, combined with intermittent judgement from humans, to make it smoother and easier for us all to make sense.

All of these pieces stand somewhat alone — a part of our vision is an interoperable and extensible suite — but we think implementations of some foundations have enough synergy that it’s worth thinking of them as a suite. We’ll outline where we think synergies are particularly strong. In later posts we’ll look at some specific technologies and examples of groups already prototyping them; for now we’re painting in broad strokes some goals we see for each part of the stack.

Ingestion: observations, data, and identity

Ultimately grounding all empirical knowledge is some collection of observations… but most people rely on second-hand (and even more indirect) observation. Consider the climate in Hawaii. Most people aren’t in a position to directly observe that, but many have some degree of stake in nonetheless knowing about it or having the affordance to know about it.

For some topics, ‘source? Trust me bro,’ is sufficient: what reason do they have to lie, and does it matter much anyway? Other times, for higher stakes applications, it’s better to have more confirmation, ranging from a staked reputation for honesty to cryptographic guarantee5.

Associating artefacts with metadata about origin and authorship (and further guarantees if available) can be a multiplier on downstream knowledge activities, such as tracing the provenance of claims and sources, or evaluating track records for honesty. Thanks to AI, precise formats matter less, and tracking down this information can be much more tractable. This tractability can drive the critical mass needed to start a virtuous cycle of sharing and interoperation, which early movers can encourage by converging on lightweight protocols and metadata formats. In true 21st Century techno-optimist fashion, we think no centralised party need be responsible for storing or processing (though distributed caches and repositories can provide valuable network services, especially for indexing and lookup6).

Structure: inference and discourse

Information passing and knowledge development involve far more than sharing basic observations and datasets between humans. There are at least two important types of structure: inference and discourse.

Inference structure: genealogy of claims and supporting evidence (Structure I)

Ideally perhaps, raw observations are reliably recorded, their search and sampling processes unbiased (or well-described and accounted for), inferences in combination with other knowledge are made, with traceable citations and with appropriate uncertainty quantification, and finally new traceable, conversation-ready claims are made.

We might call this an inference structure: the genealogy and epistemic provenance of given claims and observations, enabling others to see how conclusions were reached, and thus to repeat or refine (or refute) the reasoning and investigation that led there.

Of course in practice, inference structure is often illegible and effortful to deal with at best, and in many contexts intractable or entirely absent. We are presented with a selectively-reported news article with a scant few hyperlinks, themselves not offering much more context. Or we simply glimpse the tweet summary with no accompanying context.

Even in science and academia where citation norms are strongest, a citation might point to a many-page paper or a whole book in support of a single local claim, often losing nuance or distorting meaning along the way, and adding much friction to the activity of assessing the strength of a claim7.

How do tools and protocols improve this picture? Metascience reform movements like Nanopublications strike us as a promising direction.

Already, LM assistance can make some of this structure more practically accessible, including in hindsight. A lightweight sharing format and caches for commonly accessed inference structure metadata can turn this into a reliable, cheap, and growing foundation: a graph of claims and purported evidence, for improved further epistemic activity like auditing, hypothesis generation, and debate mapping.

Discourse: refinement, counterargument, refutation (Structure II)

Knowledge production and sharing is dynamic. With claims made (ideally legibly), advocates, detractors, investigators, and the generally curious bring new evidence or reason to the debate, strengthening or weakening the case for claims, discovering new details, or inferring new implications or applications.

This discourse structure associates related claims and evidence, relevant observations which might not have originally been made with a given topic in mind, and competing or alternative positions.

Unfortunately in practice, many arguments are made and repeated without producing anything (apart from anger and dissatisfaction and occasional misinformation), partly because they’re disconnected from discourse. This is valuable both as contextual input (understanding the state of the wider debate or investigation so that the same points aren’t argued ad infinitum and people benefit from updates), and as output (propagating conclusions, updates, consensus, or synthesis back to the wider conversation).

This shortcoming holds back science, and pollutes politics.

Tools like Wikipedia (and other encyclopedias), at their best, serve as curated summaries of the state of discourse on a given topic. If it’s fairly settled science, the clearest summaries and best sources should be made salient (as well as some history and genealogy). If it’s a lively debate, the state of the positions and arguments, perhaps along with representative advocates, should be summarised. But encyclopedias can be limited by sourcing, available cognitive labour and update speed, one-size-fits-all formatting, and sometimes curatorial bias (whether human or AI).8

Similar to the inference layer, there is massive untapped potential to develop automations for better discourse tracking and modeling. For example, LLMs doing literature reviews can source content from a range of perspectives for downstream mapping. Meanwhile, relevant new artefacts can be detected and ingested close to realtime. We don’t need to agree on all conclusions — but we can much more easily agree on the status of discourse: positions on a topic, the strongest cases for them, and the biggest holes9. Direct access as well as helpful integrations with existing platforms and workflows can surface the most useful context to people as needed, in locally-appropriate format and level of detail.

Assessment: credence, endorsement, and trust

Claims and evidence, together with counter claims and an array of perspectives (however represented), give some large ground source of potential insight. But at a given time and for a given person there is some question to be answered: reaching trusted summaries and positions.

Ultimately consumers of information sources come to conclusions on the basis of diverse signals: compatibility with their more direct observations, assessment of the trustworthiness and reliability (on a given topic) of a communicator, assessment of methodological reasonableness, weighing and comparing evidence, procedural humility and skepticism, explicit logical and probabilistic inference, and so on. It’s squishy and diverse!

We think some technologies are unable to scale because they’re too rigid in assigning explicit probabilities, or because they enforce specific rules divorced from context. This fails to account for real reasoning processes and also can work against trust because people (for good and bad reasons) have idiosyncratic emphases in what constitutes sensible reasoning.

We expect that trust should be a late-binding property (i.e. at the application layer), to account for varied contexts and queries and diverse perspectives, interoperable with minimally opinionated structure metadata. That said, squishy, contextual, customisable reasoning is increasingly scalable and available for computation! So caches and helpful precomputations for common settings might also be surprisingly practical in many cases.

With foundational structure to draw from, this is where things start to substantially branch out and move toward the application layer. Some use cases, like summarisation, highlighting key pros and cons and uncertainties, or discovery, might directly touch users. Other times, downstream platforms and tools can integrate via a variety of customized assessment workflows.

Beyond foundations: UX and integrations

Foundations and protocols and epistemic tools sound fun only to a subset of people. But (almost) everyone is interested in some combination of news, life advice, politics, tech, or business. We don’t anticipate much direct use by humans of the epistemic layers we’ve discussed. But we already envision multiple downstream integrations into existing and emerging workflows: this motivates the interoperability and extensibility we’ve mentioned.

A few gestures:

Social media platforms struggle under adversarial and attentional pressures. But distributed, decentralised context-provision, like the early success stories in Community Notes, can serve as a widely-accessible point of distribution (and this is just one form factor among many possible). In turn, foundational epistemic tooling can feed systems like Community Notes.
More speculatively, social-media-like interfaces for uncovering group wisdom and will at larger scales while eliciting more productive discourse might be increasingly practical, and would be supported by this foundational infrastructure.
Curated summaries like encyclopedias (centralised) and Wikipedia (decentralised) are often able to give useful overviews and context on a topic. But they’re slow, don’t have coverage on demand, offer only one-size-fits-all, and are sometimes subject to biases. Human and automated curators could consume from foundational epistemic content and react to relevant updates responsively. Additionally, with discourse and inference structure more readily and deeply available, new, richly-interactive and customisable views are imaginable: for example enabling strongly grounded up- and down-resolution of topics on request10, or highlighting areas of disagreement or uncertainty to be resolved.
Authors and researchers already benefit from search engines, and more recently ‘deep research’ tooling. Integration with easily available relational epistemic metadata, these uplifts can be much more reliable, trustworthy, and effective.
Emerging use of search-enabled AI chatbots as primary or complementary tools for search, education, and inquiry means that these workflows may become increasingly impactful. Equipping chatbots with access to discourse mapping and depth of inference structure can help their responses to be grounded and direct people to the most important points of evidence and contention on a topic.
Those who want to can already layer extensions onto their browsing and mobile internet experiences. Having always-available or on-demand highlighting, context expandables, warnings, and so on, is viable mainly to the extent that supporting metadata are available (though LMs could approximate these to some degree and at greater expense). More speculatively, we might be due a browser UX exploration phase as more native AI integration into browsing experiences becomes practical: many such designs could benefit from availability of epistemic metadata.

How? Why now?

If this would be so great, why has nobody done it already? Well, vision is one thing, and we could also make a point about underprovision of collective goods like this. But more relevant, the technical capacity to pull off this stack is only really just coming online. We’re not the first people to notice the wonders of language models.

First, the not inconsiderable inconveniences of the core epistemic activities we’ve discussed are made less overwhelming by, for example, the ability of LLMs to digest large amounts of source information, or to carry out semi-structured searches and investigations. Even so, this looks to us like mainly a power-user approach, even if it came packaged in widely available tools similar to deep research, and it doesn’t naively contribute to enriching knowledge commons. We can do better.

With a lightweight, extensible protocol for metadata, caching and sharing of discovered inference structure and discourse structure becomes nearly trivial11. Now the investigations of power users (and perhaps ongoing clerical and maintenance work by LLM agents) produce positive epistemic spillover which can be consumed in principle by any downstream application or interface, and which composes with further work12. Further, the risks of hallucinated or confabulated sources (for LMs as with humans) can be limited by (sometimes adversarial) checking. The epistemic power is in the process, not in the AI.

Various types of openness can bring benefits: extensibility, trust, reach, distribution — but can also bring challenges like bad faith contributions (for example omitting or pointing to incorrect sources) or mistakes. Tools and protocols at each layer will need to navigate such tradeoffs. One approach could have multiple authorities akin to public libraries taking responsibility for providing living, well-connected views over different corpora and topics — while, importantly, providing public APIs for endorsing or critiquing those metadata. Alternatively, perhaps anyone (or their LLM) could check, endorse, or contribute alternative structural metadata13. Then the provisions of identity and endorsement in an assessment layer would need to solve the challenges of filtering and canonicalisation.

In specific epistemic communities and on particular topics, this could drive much more comprehensive understanding of the state of discourse, pushing the knowledge frontier forward faster and more reliably. Across the broader public, discourse mapping and inference metadata can act against deliberate or accidental distortion, supporting (and incentivising) more good faith communication.

Takeaways

Knowledge, especially reliable shared knowledge, helps humans individually and collectively be more right in making plans and taking action. Helping people better trust the ways they get and share useful information can deliver widespread benefits as well as defending against large-scale risk, whether from mistakes or malice.

We communicate at greater scales than ever, but our foundational knowledge infrastructure hasn’t scaled in the same way. We see a large space of opportunities to improve that — only recently coming into view with technical advances in AI and ever-cheaper compute.

This is the first in what will be a series exploring one corner of the design landscape for epistemic tech: there are many uncertainties still, but we’re excited enough that we’re investigating and investing in pushing it forward.

We’ll flesh out more of our current thinking on this stack in future entries in this series, including more on existing efforts in the space, interoperability, and core challenges here (especially distribution).

Please get in touch if any of this excites or inspires you, or if you have warnings or reasons to be skeptical!

Thanks to our colleagues at the Future of Life Foundation, and to several epistemic tech pioneers for helpful conversations feeding into our thinking.

Thanks for reading Oly on AI! Make sure to cite this post clearly when you share it!

You might think this is a new or worsening phenomenon, or you might think it perennial. Either way, it’s hard to deny that things would ideally be much better. We further think there is some urgency to this, both due to rising stakes and due to foreseeable potential for escalating distortion via AI.

Improved terminological branding sorely needed

Coauthor Oly formerly frequently used single hyphens for this sort of punctuation effect, but coincidentally started using em-dashes recently when someone kindly pointed out that it’s trivial to write them while drafting in google docs. This entire doc is human-written (except for images). Citation: trust us.

or perhaps as early as Homo erectus and his supposed pantomime communication, or even earlier

Some such guarantees might come from signed hardware, proof of personhood, or watermarking. We’re not expecting (nor calling for!) all devices or communications to be identified, and not necessarily expecting increased pervasiveness of such devices. Even where the capability is present on hardware, there are legitimate reasons to prefer to scrub identifying metadata before some transmissions or broadcasts. In a related but separate thread of work, we’re interested in ways to expand the frontier of privacy x verification, where we also see some promising prospects.

Compare search engine indexes, or the Internet Archive.

Relatedly, but not necessarily as part of this package, we are interested in automating and scaling the ability to quickly identify rhetorical distortion or unsupported implicature, which manifests in science as importance hacking and in journalism as spin, sensationalism, and misleading framing.

Wikipedia, itself somewhere on the frontier of human epistemic infrastructure, becomes at its weakest points a battleground and a source of contention that it’s not equipped to handle in its own terms.

This gives open, discoverable discourse a lot of adversarial robustness. You can do all you like to deny a case, malign its proponents, claim it’s irrelevant… but these are all just new (sometimes valuable!) entries in the implicit ‘ledger’ of discourse on a topic. This ‘append-only’ property is much more robust than an opinionated summary or authoritative canonical position. Of course append-only raises practical computational and storage concerns, and editorial bias can re-enter any time summarisation and assessment is needed.

Up- and down-resolution is already cheaply available on request: simply ask an LLM ‘explain this more’ or ‘summarise this’. But the process will be illegible, hard to repeat, and lack the trust-providing support of grounding in annotated content.

Storage and indexing is the main constraint to caching and sharing, but the metadata should be a small fraction of what is already stored and indexed in many ways on the internet.

How to fund the work that produces new structure? In part, integration with platforms and workflows that people already use. In part, this is a public good, so we’re talking about philanthropic and public goods funding. In some cases, institutions and other parties with interest in specific investigations may bring their own compute and credits.

Does this lack of opinionated authority on canonical structure defeat the point of epistemic commons? Could a cult, say, provision their own para-epistemic stack? Probably — in fact in primitive ways they already do — but it’d be more than a little inconvenient, and we think that availability of epistemic foundation data and ideally integration into existing platforms, especially because it’s unopinionated and flexible in terms of final assessment, can drive much improvement in any less-than-completely adversarially cursed contexts.

Better than logarithmic returns to reasoning?

Oliver Sourbut — Wed, 30 Jul 2025 00:50:00 GMT

Lots of phenomena turn out to have logarithmic returns: to get an improvement, you double effort or resources put in, but then to get the same improvement you have to double inputs again and again and so on. Equivalently, input costs are exponential in output quality1. You can probably think of some examples.

I want to know: is ‘extra reasoning compute’ like this? (Or, under what conditions and by what means can you beat this?) I’m especially interested in this question as applied to deliberate exploration and experiment design.

Said another way, from a given decision-context, without making extra observations or gathering extra data, what are the optimal marginal returns to ‘thinking harder’2 about what to do next?

Intuitively, if I have a second to come up with a plan, it might be weak, five minutes and it might be somewhat reasonable, a day and it’ll be better, a year (full time!) and I’ve reached very diminishing returns. Presumably a century in my ivory tower would be barely better. I’d usually do better trying to get more data.

Is this even a sensible question, or is ‘improvement in reasoning output’ far too vague to get traction here?

That’s the question; below some first thoughts toward an answer.

Simple model: repeated sampling/best of k

If you have a proposal generator, and you can choose between proposals, a simple approach to getting better generations is:

sample a large number, k, of proposals
(try to) evaluate and pick the best one

(This is actually the best strategy you could take if you can only add parallel compute, but there might be strictly better approaches if you can add serial3.)

Even assuming you can unerringly pick the best one, this strategy turns out to have logarithmically-bounded expected value for many underlying distributions of proposals4. In fact, for a normally-distributed proposal generator, you even get the slightly worse square root of logarithmic growth5.

You can in principle sidestep this if your proposal generator has sufficiently heavy-tailed proposal distribution, and you can reliably ex ante distinguish better from worse at the tails.

Another simple model: widely distributed ‘promise’ of lines of inquiry

Suppose you have various lines of inquiry to spend thinking time on. The best you can do is:

start thinking on the most promising lines
spend additional thinking on successively less promising lines

(This assumes you can somewhat reliably distinguish promise.)

If their ‘quality’ or ‘promise’ ranges over many orders of magnitude, then even if you get to accumulate insights additively6, you’ll actually make only bounded progress7 towards a theoretical ‘best possible’ - this is worse than logarithmic, though looks qualitatively similar over a substantial range of effort.

But why would the promise of lines of inquiry range over many orders of magnitude? We might say, ‘in practice, it often seems to’, and there are some theoretical reasons to expect this. You ‘pick low hanging fruit’ earliest, and face diminishing returns later. But to a large extent this model assumes the conclusion.

Other rougher gestures

Search depth

Often to find approximate solutions to problems, we might employ search over a tree-like structure. This emerges very naturally for planning over time, for example, where branching options (whether choice or chance) at each chosen time interval give rise to a tree of possible plans. (Compare Monte Carlo tree search.)

If gains are roughly uniform in search depth, this gives rise to logarithmic returns to further search. With excellent heuristics, you might be able to prune large fractions of the tree - this gives you a kinder exponent, but still an exponential space to search.

When (if at all) are gains over search depth dependably growing, rather than uniform at best? Alternatively, when can uniform (or better) gains be reliably achieved by expanding the search strictly less than exponentially?

Modelling chaos

Chaotic systems are characterised by sensitivity to initial conditions: dynamics where measurement or specification errors compound exponentially.

So, to forecast at a given precision and probabilistic resolution, it takes exponentially tighter initial specification precision to forecast marginal incremental time depth. (This is why in practice we only ever successfully forecast chaotic systems like weather at quite coarse precision or short horizon.)

Specification precision doesn’t exactly map to having extra compute, but it feels close. And marginal incremental time depth doesn’t necessarily correspond uniformly to ‘goodness of plan’.

Combinatorial search

If there’s some space of ingredients or components which can be combined for possible insight, the size of the search space is exponential8 in the number of components in a proposed combination. So if, among good plans at each scale, gains are proportional to the number of components in the plan (and there are similarly many good plans at each scale), you get logarithmic returns to searching longer.

Something similar applies if the design possibilities benefit from combining already-discovered structures in a hierarchy, for example if emergent features of subcomponents unlock new levels of effectiveness in a combined design (molecules, peptides, proteins, organelles, cells, ...).

But the assumption of roughly uniform gains over scales like this is carrying some weight here.

Notably this means that, unless you have an exponentially growing source of inputs to counteract it, there’s a practical upper limit to growing the output, because you can only double so many times. And with an exponentially-growing input, you can get a modest, linear improvement to output.

i.e. computing for longer or computing more parallel. Parallel can’t be better than serial in returns to total compute, so I’m mainly interested in the more generous serial case. For parallel, it’s easier to bound because the algorithm space is more constrained (’sample many in parallel, choose best’ is the best you can do asymptotically).

Intuitively you can ‘reason deeper’ with extra serial compute, which might look like recursing further down a search tree. You can also take proposals and try to refine or improve rather than just throwing them out and trying again from scratch.

Proof. Suppose the generator produces proposals with quality X. All we assume is that the distribution of X has a moment-generating function (this is not true of all distributions, in particular heavy-tailed distributions may not have a MGF). Denote k individual samples as X_i. Note first by Jensen’s inequality that:

i.e. the exponential of the expected maximum in question is bounded by the expected maximum of the exponentials. But a max of positive terms is bounded by the sum:

(writing X for a representative single sample.) But that’s just k times the moment-generating function (which we assumed exists). So for all positive t,

So (fixing any t, or minimising over t) we see at most logarithmic growth in k.

Take the proof of the general case for an arbitrary distribution with a moment-generating function. Substitute the normal moment-generating function

so that

Minimising over (positive) t,

Perhaps the insights literally combine into an overall improved proposal, or perhaps less promising lines of inquiry provide fallback or robustness benefits in case the earlier ones fail in practice.

Qualities might be evenly spread over e.g 1, 1/10, 1/100, 1/1000, ... or more generally 1, 1/r, 1/r^2, .... Then the sum of your efforts is geometric, gradually approaching

Or more than exponential if the order or configuration matters!

You Can’t Skip Exploration

Oliver Sourbut — Wed, 21 May 2025 16:08:35 GMT

This essay is part 1 of a series on the role of exploration in AI and the implications for AI development and governance.

This part introduces exploration and research taste, as well as discussing their role in research and development, and the ways that AI could change that picture. This gives rise to some exciting and underexplored (!) opportunities for beneficial and defensive contributions to research.

A second essay will discuss more implications for AI development and governance, including the potential for AI to accelerate the pace of development of AI itself, and some implications for safety and security.

Neither essay will be especially technical, but I will gesture to the technical and mathematical aspects that I find to be illuminating. As ever when I write I raise more questions than I answer! But I hope to provide some initial useful takeaways as well as productive directions for thinking about these issues.

Introducing exploration and experimentation

Scientific and technological progress are driven by experimentation: that is, doing things to find out how the world works. In the field of AI we call this 'exploration'.

Exploration for learning is not just a human phenomenon: it's ubiquitous in natural systems at various scales (from evolution itself to the play of young animals), in individual human lifetimes (as we learn skills or contribute to novel discoveries) as well as human institutions and societies (which also learn through experience), and in computer science and AI (where exploration for discovery and problem-solving are common).

We care a lot about scientific and technological potential - they can yield enormous risks (from accident, misuse, or societal destabilisation) or enormous benefits (solving major problems in medicine, climate, energy, or even defending against other risky technologies). So exploration isn't just of academic interest.

When we forget to consider how new knowledge is generated, how novel technologies are developed, when we conflate 'knowledge' with 'learning' or 'learning' with 'exploring', mistakes are made. Especially when such predictions are action-guiding we can end up taking misguided or even harmful actions, or missing opportunities to intervene in beneficial ways. So let's do some unpacking!

What factors make exploration (and by extension, research) more or less effective? What are the bottlenecks and limits to exploration? How could AI change the picture? And how can we apply insights from this lens to contribute to a better future?

Subscribe now

Why does exploration matter?

Knowledge production loop: activity yields observations, improving knowledge — Exploration drives the loop — Lack of exploration means knowledge stagnation — Exploration is key to understanding technological progress

Learning systems gather new knowledge and insights from observations/data. Random flailing or arbitrary data aren't especially helpful. You want it to be telling you something new that you didn't already know - so it pays to deliberately seek out or gather novelty and informative observations1. This applies at the grandest scales of scientific endeavour as well as in mundane scenarios like navigating an unfamiliar building or learning a new skill.

Owen Cotton-Barratt recently discussed the 'knowledge production loop': activity and observations generate data (captured in datasets and models as 'crystallised intelligence') and combine with thinking algorithms ('fluid intelligence') to in turn drive new activity and observations.

I'd additionally characterise exploration as the way that crystallised world model and novelty taste interact with fluid reasoning and planning to judiciously choose activities yielding the most informative observations... in turn improving world models and taste ad infinitum.

Owen Cotton-Barratt's diagram of crystallized knowledge and fluid reasoning ('capacity for thought') giving rise to a 'knowledge production loop'. Here I discuss exploration as the difference between occasionally chancing upon informative new data and proactively seeking it out (or deliberately producing serendipitous conditions for making new discoveries).

Quality and quantity of exploration mark the vast difference between a civilization with vibrant progress in science and technology and one with a near-static (or even regressing) capability base - and on an individual level, it's often the difference between rapidly developing new skills or knowledge and getting stuck in a rut.

Understanding exploration is therefore key to understanding technological progress, with all the risks and benefits that entails.

Research and taste

Research is world-model-refinement — Exploration quality drives research — Taste is a learned feel for value of information — Reasoning and world modelling augment taste for exploratory planning

'Research' can be thought of broadly as refining one's world model in a particular domain. We want to know things like: how does electromagnetism work and what can it do for us? How can we prevent diseases from ruining lives? Or (more mundane) how can I get better at playing the piano or juggling? When I say 'research', I equally refer to personal learning and skill-building, scientific research, entrepreneurialism, and business development: all involve exploration and learning from experience.

We can describe three factors determining research production:

Throughput: doing more practice, running more experiments, gathering more data faster, etc.
Modelling efficiency: gathering more generalisable insight from a given experiment or observation.
Exploration quality: choosing better experiments and routines to get more informative observations.2

We'll mostly talk about exploration quality here3, which is in turn governed by taste and exploratory planning.

What do I mean by 'taste'? Sometimes people refer to 'research taste' as a sense which develops from domain experience for the types of experiments and other activities which are most likely to be interesting or informative, or otherwise move forward the state of understanding. Clearly this is an essential component of any deliberate exploration - otherwise you're back to flailing randomly!

The taste that's being developed is exactly analogous to a taste for activity which is liable to yield good outcomes of other kinds. We're just considering the value of information as the good in question. So this decomposes into an ability to come up with promising proposals more often, perhaps together with abilities to discriminate more accurately between better and worse proposals or to determine refinements and improvements to proposals4.

Now, imagine - for the sake of the argument - you're a human. Even better, in fact, imagine you're inhumanly fast and detail-sensitive, the best reasoner in the world, and you have general knowledge matching the rest of the world combined. You still need to do research in order to make new discoveries. If you don't know the details yet, experimentation isn't something you can skip!5 Your especially effective reasoning merely acts as another input to exploration quality, alongside domain research taste, perhaps allowing you to choose better experiments, and achieve results sooner. Reasoning applied effectively in this way is exploratory planning.

So present exploration quality depends on your current level of taste, while future exploration quality will also depend on taste accrual6. Reasoning and planning of course also feed into this, as we improve proposals and discard designs in favour of better-looking ones - but this has to ground out in a taste for what makes a good proposal in the first place.

From play to experimentation

Play is proto-exploration — Fun is proto-taste — Humans adaptably accrue taste in novel domains — Taste is domain-specific but exploratory principles generalise

These aspects of taste are discovered and refined through experience. Research taste is domain-specific!

Many humans and animals, especially youngsters, have built in instincts for play, curiosity, and novelty. These have been tuned by painstaking natural selection to aid in orienting to the range of body configurations, environments and communities those animals usually inhabit, precisely by exploring: gathering evidence and information about how things work. In this case, evolution did the slow, gradual work of determining the 'taste', the recognisable hallmarks of good exploratory behaviour, and wired up the 'fun' sense to those hallmarks7.

Two fox cubs play. For diverse animals, discovering the particular ways your body and brain interact, and how those affect and are affected by your surroundings, is a key part of learning adaptable and dynamic behaviours. Individual playfulness delivers novelty and exploration, while group play, especially mock contests, provides a rich 'curriculum' for development (much like the 'self play' of some AI training system designs). (Image from freepik.com)

As Eric Drexler says,

We call children intelligent because of what they can learn, not what they can do

Playful young animals and humans thereby become adept at controlling their bodies and engaging in effective social interaction. But humans move past mere bodily control and socialisation: we use and develop tools, technologies, and diverse and innovative social structures.

For many researchers and others engaging in creation, experimenting is a lot like playing! - the rich and sophisticated kinds of play that humans engage in somewhat instinctively. But, because research and development and science and industry move beyond the historic realms of human activity, the 'taste' bestowed by evolution is rarely well suited. An untrained human has no instinct at all for the kinds of experiments that are most likely to yield useful information about the behaviour of a new material or the structure of an unseen mathematical object! This applies equally to business activities and entrepreneurialism. Substantial experience is needed.

Do we see areas where 'taste' generalises, pointing against the claim of domain-specificity? The broad principles of science and engineering appear to generalise across domains, and evidence suggests that individual humans and human organisations vary in their latent potential to accrue and apply research taste. This might be down to being more or less motivated to explore, having different capacity to learn from experience, or varying procedures for planning next steps. This gives rise to an appearance of research taste generality. But domain-specific research taste is mastered only through domain-specific experience. Expert researchers in one area may contribute to other areas - but almost always only after gaining some depth of familiarity with the new area as well.

So it's reasonable to think of exploration quality as comprising two subfactors. First, the somewhat transferable general principles of exploration: playfulness, open-mindedness, planning for novelty and interestingness. And second, domain-specific research taste: the experience that guides determination of what situations count as novel or interesting8, and what types of planning are most likely to uncover them.

Subscribe now

Exploration in AI, past and future

First: humans curate data — Now: RL allows automatic data generation — Next?: in-context exploration characterises R&D tasks — Perhaps this is ‘AGI’?

In contemporary frontier AI systems, it's been mostly humans responsible for gathering 'high quality' informative data, often in quite hands-off ways like scraping huge datasets from the internet, but latterly with more attention on procurement and curation of especially informative or exemplary data.

With reinforcement learning (RL), the data coming in starts to rely increasingly on the activity of the system itself - together with whatever grading mechanism is in place. That's why lots of RL conversations of the past were so obsessed by exploration: taking judicious actions to get the most informative observations! So earlier AI research actually foregrounded exploration somewhat more. Helen Toner recently discussed the return of RL to centre stage in contemporary frontier AI, asking what properties of a domain make it more or less amenable to gains from reinforcement learning.

Still, in many RL settings, the human engineers are able to curate training environments with high-signal automated feedback systems, as Toner discusses. On the other hand, once we're talking about activities like R&D of various kinds, the task of exploring is inherently most of the task itself, making within-context exploration essential!

This makes 'learning to learn' or in particular 'learning to explore/experiment' among the most useful ways to operationalise 'AGI', from my perspective9. I'm not sure how best to track this, and I'm not aware of any benchmarks or studies which take this view on frontier general AI10. My personal experience with LM agents anecdotally points to them improving over time at orienting to uncertainties within their environment and being a little more creative at trying things out and testing things in 2025 than in 2024 or 2023, but not vastly - progress to date appears much more rapid in 'crystallised' intelligence.

Research by AI: AI with research taste?

Bootstrapping research taste from humans — AI advantages from speed and copying — AI learning by doing — Human advantages and bottlenecks to AI — Human-AI complementary workflows

There may be ways for AI training datasets to 'hoover up' research taste from existing experts and institutions, perhaps from lab notes or interviews, though humans at least usually learn more from actually trying research than merely from reading or talking about it. (This presumably reflects the fact that merely communicating about research experience is a much less rich source of information than actually experiencing it directly: the same issue faced by all kinds of knowledge transfer through limited media like language.)

So research taste in AI is not starting from scratch: already AI can talk in sensible, albeit sometimes basic ways about experiment design. The taste is bootstrapped from the taste implied by all the hints and observations in training data.

Could AI surpass the research taste exhibited by expert humans and human organisations? It's unclear where the ceiling is, but certainly AI would appear to have several advantages in principle: direct sharing of observations and experiences between instances, potentially far larger effective 'researcher headcount', total observation quantity far outstripping the longest-lived human experts (to date)11, all adding up to a far greater opportunity to accrue and accumulate taste. Additionally, due to computer speed, the opportunity to confer and deliberate in far more total depth the implications of each experiment and the appropriate designs of future experiments means that exploratory planning could also be boosted.

Crucially, acquiring frontier-applicable research taste would require either finding ways to bootstrap from existing research taste, which is often implicit (or even proprietary!), or enabling AI to learn by doing, perhaps aided by expert supervisors (just as human trainee researchers are), by instrumenting research processes and equipment with sensors and manipulators. Like hiring junior researchers, this would come with some upfront costs to any organisation attempting it12!

ChatGPT's interpretation of an AI with better research taste than human organisations.

Human researchers begin with some advantages today: easier physical manipulation of experimental materials (for now), a capital base of experimental equipment designed for human use, and an ecosystem designed around the training, retention, and interaction of human experts. These aren't fundamental barriers to researcher AIs, but represent some hurdles or bottlenecks that might take time and other resources to reach past.

Of course, the capacities to interpret evidence, propose experiments, design and refine proposals, and to implement experiments need not reside 'in the same mind', just as human organisations already exhibit this division of labour. But the better fitted these pieces are to each other, the more efficient the overall system will be. Drexler's 'large knowledge models' discussion treats knowledge as a resource, to be combined with planning capacity and discernment from disparate sources. Similar agendas, for example from the UK's Advanced Research and Invention Agency (ARIA) perhaps promise both a more effective and more safely manageable way to integrate AI into research processes than wholesale development of autonomous researcher AIs.

Opportunities

Recapping research, experimentation, exploration, taste — Implications for AI forecasting and ‘intelligence explosion’ — Differentially bootstrapping AI taste — Differentially complementing AI exploration — Detecting dangerous research — Exploring AI applications for flourishing

Deliberate experimentation, consisting of exploratory planning and research taste, is a critical component of efficient learning - which, in R&D-heavy domains at least, because they inherently butt against the boundaries of the known, is foundational to progress.

Much more can and should be said about the implications of an experimentation-oriented view of R&D, both on AI and facilitated by AI in other domains. Here are some initial directions:

First, in forecasting AI capabilities and timelines, we should account for the costs of experimentation. This can include quantifying the relevant variables (iteration speed, quality of simulation, modelling, and exploratory planning, accrual and accumulation of research taste, the cost of experimental resources including compute and real-world interactions, etc.). Of particular interest, this could help to characterise the potential for 'self' improvement and the possibility of an intelligence explosion (which matter by implications for other R&D and for loss of control over AI systems).

You can't skip exploration! But greater intelligences (individual or collective) can be more efficient at it in general, and domain-specific taste in particular certainly yields improved rate of progress.

This cuts both ways for safety. You can't develop dangerous nanotech purely from first principles: you have to experiment, either in vitro or in silico. Unfortunately, nor can you generate new defensive vaccination, sterilisation, or biomonitoring paradigms without putting in the experimental legwork.

This may be revealing for those seeking to differentially drive beneficial and defensive research ahead of risky research. For example, exposing research logs and expert interviews to AI systems may yield a way to bootstrap specific kinds of research taste in AI. Alternatively, recognising the default taste-weakness but speed-advantage and general knowledge breadth of AI systems may suggest strategies for complementary human-AI workflows which could be both more effective and more manageable than naively attempting to create researcher AIs wholesale.

Beyond AI-driven exploratory planning and research taste, we should expect strong synergy with robotics, sensors, simulation, modelling, and other automation technologies, as complementary production factors in R&D progress. This is likely to naturally drive investment into these technologies, but may provide opportunities to differentially unbottleneck AI multipliers in beneficial areas by devoting development to their specific complements in particular.

Further, noting that technology can rarely be developed purely from first principles, intelligence and security organisations concerned about risky research directions may be able to anticipate the kinds of experiments that are likely to be useful, and therefore the kinds of resources and activities required to make progress in those areas. This may include flows and concentrations of certain machines or components, movements of specific rare materials, movement of human talent, or known side-effects of experiments. Where materials are very dual-use (such as concentrations of computing clusters), structured inspection, auditing, or transparency tools may aid in guaranteeing that only safe and sanctioned experiments are being carried out.

Finally, now is a great time to be experimenting with AI systems and their applications, especially for people who haven't traditionally paid attention to AI. Rapid developments mean that the extent of possibilities with current tech remains underexplored, and boosting defensive and beneficial applications ahead of risky ones is a great way to ensure that the future is better than it otherwise would be!

Thanks to Owen Cotton-Barratt and Jay Bailey for feedback and conversations on this topic

Thanks for reading Oly on AI! This post is public so feel free to share it.

This because just flailing, or even just 'doing routine activities', gets you some novelty of observations, but directedly seeking informative circumstances at the boundaries of the known (which includes making novel unpredictable events happen, as well as getting equipped with richer means to observe and record them, and perhaps preparing to deliberatively extract insight) turns out to be able to mine vastly more insight per resource (time, materials, etc.). Hence science, but also hence individual human and animal playfulness, curiosity, adversarial exercises and drills (self-play ish), and whatnot.

Notably, modelling efficiency and exploration quality are sometimes conflated as 'sample efficiency'. In the case of modelling efficiency it's about forming accurate and generalisable models from fewer observations (the classic machine learning sense of sample efficiency). For exploration quality, it's about gathering more informative observations from fewer environment interactions (a kind of 'sample efficiency' familiar from reinforcement learning).

Incidentally, throughput should not be underestimated - this is why industrial expansion often precedes and drives innovation progress as well as being a product of it. There are some very general patterns in 'industrial learning', such as Wright's Law, which describes consistent statistical relationships between the number of units produced and reductions in production cost. We might speculate that Wright's law applies most in domains where the existing human research and development organisations are at the limits of their modelling efficiency and exploration quality, and that the remaining bottlenecks are mostly in experimental throughput.

Speaking of 'taste', this is a little like the difference between a good chef and a good food critic. The chef needs to be able to come up with good recipes, while the critic needs to be able to tell which recipes are good and which are bad. In concert (perhaps adversarially!), they can create and refine recipes that are more likely to be successful.

If you have a perfect simulation of the relevant domain, you can run experiments in the simulation. This looks a bit like skipping experimentation: certainly it can be faster. In a softer sense, a useful but imperfect model can also support reasoning about experiments and potential outcomes. In my taxonomy, both of these are part of the continuum of using world modelling, planning, and some amount of taste to guide exploration.

While we're talking in economic terms, it's worth noting that research taste is a kind of capital. It can even depreciate over time! This happens in two ways. Intrinsically, as the frontier of research moves, what were formerly good intuitions may become outdated. Additionally, individual humans, currently major (though not exclusive) repositories of research taste, age, get distracted, or otherwise lose their edge. In steady fields, depreciation is slow. In fast moving fields, like AI, the frontier is moving fast, and taste depreciation can be very rapid, making accrual and accumulation of taste especially important.

My baby son is evidently thrilled by the challenge of 'balancing' (with some support) upright, a feat he can't yet accomplish, but which is unsurprisingly the kind of activity his brain is eager to get practice at. He instinctively pays close attention to new sights and sounds. His once-flailing hands now grasp interesting objects and begin to manipulate them. When he begins crawling and then toddling, he'll join generations of baby humans in enjoying the most prolonged and diversely playful childhoods of any young animal.

interesting, i.e. carrying high value of information for the domain in question.

(Of course there are nevertheless also many transformative impacts that can come from AI merely with heaps of crystallised intelligence and less R&D ability. For example, we could imagine an interesting possible paradigm in which humans continue for some time to provide input on informative experiment design, while delegating aspects like experiment implementation and interpretation to automated systems. Also note that some crystallised knowledge is currently very rare and concentrated, while if present in AI systems could be much more widely accessible, for better or worse.)

Scattered RL studies set out to evaluate or demonstrate the exploration potential of various RL algorithms, usually in toy environments. The ARC-AGI benchmarks test sample efficiency, which may be an important component of effectively accruing 'taste', but is not directly about exploration.

In fact another relevant comparison may not be between AI and individual humans, but between AI and human research organisations and institutions. Human organisations can of course already outlive individual humans: to say nothing of the broader intergenerational projects of science and research. But communication of research taste and experience between humans is constrained, and while committees of experts sometimes outperform individuals, they are slow and far from able to directly share their relevant experiences. When will AI services be able to supplement or replace particular human research tasks? And what about entire research organisations?

The raw sample efficiency of base machine learning systems like gradient descent are famously apparently much lower than humans, meaning that AI 'junior researchers' could naively be even more costly to upskill than human ones. But as model capacity is scaled up, this may be changing. And speculatively, the possibility of lightweight finetuning, 'in-context' learning, and distillation point towards AI systems matching or exceeding human sample efficiency.

Is the Cat Out of the Bag?

Oliver Sourbut — Thu, 10 Apr 2025 17:07:00 GMT

Adapted from 2025-04-10 internal memo to AISI

I’ve previously made arguments like:

Not long after it becomes possible for someone to make powerful artificial intelligence1, it might become possible for practically anyone to make powerful AI.
Compute gets exponentially cheaper by default.
Knowledge proliferates (fast!) by default: AI techniques are typically simple and easy once discovered.
What’s more, AGI-making know-how may be widespread already.

Or, as Yudkowsky puts it2,

Moore’s Law of Mad Science: Every eighteen months, the minimum IQ necessary to destroy the world3 drops by one point. - Yudkowsky

It’s important to emphasise that none of these are laws of nature! But the economic and social forces at work are quite strong.

So (leaving aside debates about the appropriate definition of ‘AGI’) where the frontier of AI development leads, others – many others – potentially rapidly follow. Followers can go even faster by stealing or otherwise harvesting insights from the frontier, but this is not a hard requirement – just an accelerant.

For more on the first point, compute getting cheaper, consider Moore’s law (or the more general and robust Wright’s law). What about the know-how?

Stupid, Simple AGI

The stupidest, simplest possible approach to producing general intelligence might mimic evolution in a large, open-ended, interactive environment. Nobody has succeeded at this yet because they don’t have enough compute, but just a few more decades of compute scaling might get us there. The code to do this would be ultimately quite simple, but the amount of compute time to run it is out of reach today. Almost nobody nowadays thinks that it will take this long, because this is the stupidest, simplest (and least steerable) possible approach and we have much better ideas.

But this means that unless something interrupts the compute trends, then even if ingenious, well-resourced people ‘get AGI first’, eventually anyone could practically blunder into creating their own. Of course, many things could be changed if powerful AI is developed and applied in the meantime… perhaps including the cost and efficiency of compute, the distribution of compute, or indeed the existence and inclination of people to do the blundering.

The Design Space for AGI

What did I mean by ‘AGI-making know-how may be widespread already’?

I don’t literally mean that the recipe for AGI is known and widespread. I don’t even mean that we broadly know exactly how to make AGI and simply want for the capital (compute and data). But for those paying attention, the design space for practically achievable AGI is narrowing.

Take long-horizon coherence or continual learning, for example. Maybe components of these are expandable memory and long-context management of plans and observations. This could perhaps be cracked with something resembling a selection from:

Context summarisation
Read-write retrieval-augmented generation
Recurrent embeddings
Longer training trajectories
Plan-management or recursive delegation scaffolding
Periodic distillation of history into weights or activation patches
Explicit training for notetaking
Some even simpler thing, like ‘just scale up the compute’

Among the sharpest, most experienced practitioners at the frontier, that perceived design space may be narrower still4. In the far wider cohort comprising all competent computer scientists and engineers, the design space may not be as saliently in view – but the scientific ‘breadcrumbs’ have been pointing in useful directions for years (at least).

My personal testament5 is that by 2020, several landmarks were visibly coming together in NLP and RL, and by 2021 I had a good sense of a plausible research path to general autonomous AI. Developments like further scaling, mixtures of experts, chain of thought, LLM agents, RL ‘reasoning’, fast attention mechanisms, and hyperparameter tuning optimisations are not merely ‘obvious in hindsight’: their rough contours were advance predictable. It was ‘merely’ a matter of experimenting to find out working details. I’m not being (especially) hubristic here: for some experts closer to the action, these same things looked plausible by 2017 or even earlier! The contours of tomorrow’s advancements are similarly already in view, and far more attention and capital are being poured into the discovery process.

That’s not to say that, given the capital, we could have created AGI there and then in a single try, or even here and now. A design space is not a complete or final design. But iterative refinement by well-resourced and moderately creative problem solvers has been charting a course, and if we are willing to anticipate one frontrunning group getting ‘all the way there’ we must acknowledge that the feat will be reproducible in relatively short order.

Accelerants

Scharre 2024 demonstrates (and forecasts) rising cost to reach new frontiers, but rapidly diminishing cost to reach the same capability level thereafter.

‘Reproducible’ is one thing. How soon and how fast? With the current level of sharing of research insights, the answer seems to be roughly ‘as soon as you can outlay comparable capital’, or even sooner!

What phenomena are responsible for accelerating this proliferation? In very roughly descending order of effect size:

Theft, leak, or deliberate release of pretrained baselines and training algorithms
Distillation (authorised or not) from exposed APIs
Exponentially cheaper compute
(Sometimes cheap or even public access to) ever more sensor and record data6
Shared algorithmic and experimental details in papers and blogposts
Conversations and rumours at conferences and other events7
Movement of experts between development groups and projects
Use of AI to assist development8

Very tightly securitized projects might partly dampen some of these effects. Competition between firms and countries could amplify them.

What about exponentially cheaper compute? Market dynamics might pivot at some stage to reduce or even reverse the effect of dwindling compute price (for example, extreme buyer concentration driven by strategic accumulation, increasing marginal compute utility9, deliberate regulatory intervention on compute, or something else), but will otherwise continue to drive proliferation. On the other hand, if compute production increases even faster, costs may drop commensurably faster10.

Sensor and records data are being collected even more feverishly now that companies have realised their critical use in training modern AI systems — notice when companies’ privacy policies update to include carve-outs for collecting AI training data. We should expect more of this, as well as more collection of physical and industrial activity records for training robotics, autonomous vehicles, and automated laboratory workcells.

Alternatively, some have imagined an ‘end of history’ moment when sufficiently smart AI arrives and (usually by underspecified mechanism) prevents all of these factors from proceeding. Some envisage not only that, but an AGI or AGI-enabled organisation foreclosing not only the accelerants of proliferation, but also the potential for a rival project to emerge anywhere11. This is conceivable, but one has to ask on what timeframe these changes would happen, and the consequences if it takes longer than imagined.

Short of such an acute and decisive interruption of all of these dynamics, other shocks such as international conflict could have impacts in either direction.

Concluding

Intelligent engineering-minded people exist in all geographies and of all ideologies. Most lag the frontier of AI development only for want of compute capital and intent. Because compute continues to get cheaper, and the potential of AI comes more clearly into focus, both compute and intent become rapidly more widespread. The open sharing of discoveries can further lower barriers and shorten proliferation timelines, but is not essential to this dynamic.

Given this, we have to ask what the consequences of this proliferation could be. Where they are concerning, we must consider in what ways these dynamics could be defused, or, likely failing that, how we will ready ourselves, on a short timeframe, for what follows.

We live in interesting times! There’s a lot we can do.

For now I’ll use ‘powerful AI’ and ‘AGI’ (Artificial General Intelligence) interchangeably. The definitions have never been settled, and will likely never be settled, but I’m considering systems which are able to autonomously act, develop new tools and technology (given sufficient research resources), and in principle maintain or upgrade themselves if that was their goal.

E.g. in Artificial Intelligence as a Positive and Negative Factor in Global Risk – Yudkowsky 2008 (though this phrase was coined earlier)

Yudkowsky believes that sufficiently advanced AGI developed in a context like ours leads to everyone dying. I think he’s probably right… but it depends a lot on how you operationalise ‘sufficiently advanced’ and ‘context like ours’. That’s where all the action is!

Sam Altman claims “We are now confident we know how to build AGI” and Dario Amodei predicts it “could come as early as 2026”. These CEOs of some of the best resourced and talented AI organisations will have privileged insight into the design space, while also having unusual psychology and possible conflicts of interest. Meanwhile, Turing Award and Nobel Prize winners Bengio and Hinton both think 2028 is possible. Crowd wisdom forecasts give wide uncertainty, but centre on the early 2030s. Experts rarely agree on exact anticipated details, but mostly agree on the outlines of the candidate design space.

(as a smart computer scientist who has been roughly following AI since 2015, made it my graduate study in 2022, but who has never actively pursued frontier AI capability contributions)

Think robots in factories, recordings and logs of computer use, autonomous vehicle logs, scientific lab measurements, CCTV and satellite readings, meeting recordings, social media activity logs, wearable recording devices, …

Parties in Silicon Valley are allegedly a somewhat good source of technical AI gossip!

The use of AI to assist AI development, or even to fully automate it, has long been discussed in the field of AI. The possibility of an ‘intelligence explosion’ or similar technological singularity is still debated decades after first being hypothesised. For the first time in 2025, some artificial intelligence researchers have claimed they achieve non-trivial acceleration in their work from AI assistance, and some companies have now set explicit targets to automated AI research before the decade is out. If this plays out, it might make AI-assisted development a dominant contributor to accelerating progress. I tend to think that compute for experiments and environments for learning are the more critical bottlenecks to progress.

Historically, returns to concentrating more compute have been eventually diminishing (a typical pattern for tech products) once efficiencies from parallelism and brute force run dry. This supports a wide distribution of purchasers and diffusion of applications, because once the larger use cases hit diminishing returns, the smaller players and applications’ willingness to buy exceeds the largers’. This remains so at the frontier of AI, though we see some concentration with a small number of very large players buying out a majority of the most advanced generations of chips when they are first marketed. If some new dynamic caused increasing or constant marginal returns to compute accumulation — who knows, perhaps exclusive access to AGI software — it might no longer be the case even on an open market that other buyers could afford compute.

This is not predicated on the simple effect of increased supply, which would merely serve to erode margins. Rather, increased production predictably provides new technological insight, driving further efficiency: the origin of Moore’s law. This is a much stronger effect over time.

Companies pursuing AGI do not have coherent strategies, but several have made references to ‘beating China’, and their intellectual heritage includes an assumption that the first AGI would be able to rapidly and decisively shut down competing projects. Sometimes the companies use this supposed dynamic as a justification for racing ahead while cutting corners on safety. This sounds a lot like ‘we plan to take over the world, but nicely’.

Cooperation and Alignment in Delegation Games

Oliver Sourbut — Wed, 15 Nov 2023 22:04:58 GMT

This work was facilitated by the Oxford AI Safety and Governance group, Cooperative AI Foundation, and Oxford Autonomous Intelligent Machines and Systems. Thanks also to Bart Jaworski, Jesse Clifton, Joar Skalse, Sam Barnett, Vincent Conitzer, Charlie Griffin, David Hyland, Michael Wooldridge, Ted Turocy, and Alessandro Abate.

This blogpost accompanies the paper Cooperation and Control in Delegation Games by Sourbut, Hammond, and Wood, which was presented at IJCAI 2024. In essence, the work attempts to deconfuse some of the discourse around safety, cooperation, and alignment in multi-agent settings by:

Showing that, just like in the control problem, cooperation problems can be broken down into alignment and capabilities, which are orthogonal to one another;
Providing measures for alignment and capabilities (both “individual” and “collective”) in multi-principal multi-agent settings (“delegation games”);
Showing that any of these measures is insufficient alone to guarantee the best outcomes, but that they are together sufficient;
Bounding the principals’ welfare loss in terms of these measures, and validating this with a series of empirical results.

The goal of this post is to explain what those terms mean, and hopefully why it matters. In doing so, we hope to shed light on some of the related questions posed by other AI safety researchers, for example Dafoe et al in Open Problems in Cooperative AI who discuss the concept of ‘horizontal’ and ‘vertical’ aspects of coordination, or an open problem on the AI Alignment Forum about quantifying player alignment in normal-form games.

There is also a poster which is an even more condensed summary of some key material, and Lewis and Oly have given a few presentations on the topic, one of which is recorded here.

Why Delegation Games?

You may have heard of the Principal-Agent problem. It's a phrase and a setting which turns up in some economics literature, and elsewhere. The idea is that a principal (in this case, the human) is asking, telling, employing, or otherwise exhorting an agent (in this case, the robot) to act on their behalf. The 'problem' is the question of how to ensure that the agent's behaviour results in outcomes which the principal in fact prefers.

Delegation games arise when we have multiple principals, and multiple agents.1 When you read 'principal', think 'human', and when you read 'agent', think 'AI'. (For a slightly different semantic, you can alternatively think of each principal as a basically-coherent coalition of humans, and likewise with AIs.)

Why does this setting matter? It's looking increasingly likely that, perhaps quite soon, many somewhat-autonomous digital personal assistants, digital employees, or similar, will be deployed on behalf of human overseers. That is, we might be entering a highly multipolar world when it comes to somewhat-autonomous AI deployments2. A more obvious, immediate, lower-stakes example of this is autonomous vehicles, which we use as a toy example in the paper. Finally, in the future, multiple large coalitions of humans (e.g. states or companies) may deploy powerful AI systems to act on their behalf in high-stakes scenarios. We want to understand the important features of how to make sure this goes well!

We formalise delegation games in the way you might expect: agents adopt strategies that lead to (a distribution over) outcomes, and both the agents and the principals have (potentially different) preferences over these outcomes.

Cooperation, Alignment, and Calibration

In the paper, we identify some key properties which influence the outcome of a delegation game. We’ll highlight Cooperation, Alignment, and Calibration, because one key punchline of the paper is:

You need all three to guarantee good outcomes

This is important to bear in mind given that much AI safety work focuses on alignment, which is (demonstrably) not enough for safety in multi-polar worlds.

We'll explain what these terms mean, what 'good outcomes' are, and by the end of the post we should have covered enough to understand the high level meaning of the highlighted inequality, which is adapted from Theorem 1 of our paper. This simplification also assumes agents are perfectly individually rational, which we generalise in the paper, and here we mostly skip over collective alignment, which gets a thorough treatment in the paper.

Subscribe now

Cooperation

Intuitively, cooperation is working together for mutual gains over some uncooperative baseline. This is actually already non-terrible as a definition, but we can sharpen it.

Note that cooperation can be partial (one coalition cooperates, potentially with downsides for others). That's collusion. We especially don't want AI to be collusive! This is tricky, because cooperation and collusion rest on basically the same abilities and infrastructure. It's a very important topic, but we don't discuss it here.

Cooperation example

Let's look at a simple example.

Imagine we're palaeolithic hunter-gatherers: we, the authors, are one small, cohesive group, and you, the reader, another. In the morning, if we all set out to gather, by the end of the day we each come back with a basket of fruit (the fruit/fruit outcome, score: 2). If we for some reason decide to instead hunt a mammoth, well... we’re big and tough but we probably can't catch a mammoth. Meanwhile you sensibly gathered some fruits (the mammoth/fruit outcome, score: 1). Likewise if you try to catch a mammoth alone (fruit/mammoth, score: 1). BUT, if we all work together, we've a decent chance at catching the mammoth (mammoth/mammoth outcome, score: 10). These scores are our utilities for the outcomes3.

Now, laid out like this, there's an obvious best-case outcome4: we work together to catch the mammoth! (Nobody has invented conservationism yet5, so this is a preferred outcome.) But there really is a coordination challenge here: if we have reason to believe that you'll go gathering berries, we should too – it would be in our interest (and in this case, yours too) to get more fruit rather than waste our time fruitlessly (!) chasing mammoth.

This isn't just academic. When we look around the world, many problems have the mammoth-nature: we all have to 'show up' or we don't get the benefit (and indeed many cooperation problems are even harder than this due to selfish incentives). Consider international cooperation on climate change, biological weapons control, or coordination on safe technological progress.

Humans solve this sort of problem all the time. We are able to do this due to various abilities, affordances, and so on, which we can collectively refer to as cooperative infrastructure. This includes such things as:

talking to each other
trust and reputation
trade
commitments and enforcement
norms and laws

Nevertheless, our cooperative infrastructure is often not up to all tasks.

AI systems have been pretty bad at this on the whole, though there have been some interesting improvements over the years. Future AI might have access to very powerful kinds of cooperative abilities and infrastructure6.

Formalising cooperation

How do we characterise 'better cooperation'? We operationalise the collective goodness of an outcome with a welfare function (an aggregation over utilities). One way to specify cooperation is by considering failure. We look at the welfare-optimal outcome(s) σ⋆ (in this case mammoth/mammoth, score: 10) and then compare any actual (or predicted) outcome σ. The difference in welfare is the welfare regret – how much better it 'could have been'.

The welfare regret of principals (humans) is our primary measure of interest (the 'dependent variable', if you like). When we have only principals playing, that tells most of the story. With agents involved (machines/AI) welfare regret of agents7 quantifies how successfully they cooperated (according to their criteria and coordination mechanisms).

There is also an alternative, more geometric interpretation. 'Mutual gains' are Pareto gains, and Pareto optima coincide (in all but edge-cases) with welfare optima8. Hence, we can interpret cooperation as movement toward the Pareto frontier. (The specific direction of movement corresponds with the welfare aggregation function.)

In the paper we also make some discussion of capabilities and how they give rise to outcomes (and thus to welfare/regret). We tentatively distinguish 'individual' from 'collective' capabilities, and describe mathematically and algorithmically how, given access to estimates or measurements of interactions, these can be determined and distinguished. A related concept is the price of anarchy which quantifies the failure of a particular system to be robust to selfish behaviour.

Now, in the preceding example, we assumed that food was shared and the humans’ dietary preferences were equivalent(ly primitive). That is, we have perfect collective alignment (between principals). Notice that even with this perfect collective alignment, there can remain coordination problems, as in this scenario. In general, we can distinguish problems of collective alignment from problems of collective capabilities (cooperation). We fully characterise this breakdown in the paper, and we'll touch on it later under calibration.

Alignment

When we have more than one actor with some preferences over outcomes, it is natural to ask about the relationship between those preferences. Alignment is the extent to which two or more preference relations are in agreement.

We might prefer exactly the outcomes that you prefer and vice versa (as in our mammoth example where all returns are shared), in which case we are perfectly aligned. Or (as in a myopic chess match9) we might be playing for exactly the outcomes you want to avoid, and vice versa, in which case we are perfectly misaligned. More usually, it'll be something in between these extremes.

Alignment example

In the paper, we discuss several forms of alignment, but here we will focus on alignment between a principal and an agent, namely individual alignment.

Back to our hunter-gatherers, except now we're high tech palaeolithic hunter-gatherers. We have hunter-gather-bots which we delegate to. This is where the game becomes a delegation game.

We produced these bots somehow, perhaps through a process of machine learning and subsequent scaffolding; we ran lots of tests in the lab and the agents seemed to be doing basically what we expected. But we failed our alignment homework.

After the shift to the wild deployment distribution, it turns out our bots in practice prefer more fruit-and-nut and less mammoth-steak (yellow utilities). They're not horribly misaligned (they still prefer to feed rather than starve us), but they're soft-vegetarian bots, and ultimately the consequence of deploying them is muesli for breakfast, lunch, and dinner... forever. An unmitigated disaster10.

Formalising alignment

Now, we can easily define perfect alignment if two sets of preferences are the same, or perfect misalignment if they're exactly opposite. What about intermediates? A key issue with comparing utility functions (or reward functions) is that the same preferences can be described by many different utilities.

For example, if you scale your utilities all by 10x, or if you add a constant 0.1, the preferences this represents are unchanged. This generalises to any scale and shift, namely an affine transformation.

So if we naively compare two utility functions, we might get nonsense or misleading results. We need a way to standardise the representation of preferences as utilities.

In the paper we provide expressions and algorithms to account for these requirements, discussing various desiderata and showing that our measures satisfy them. In particular, utility functions with indistinguishable preferences have identical representation in our standardisation11.

Now we have standardised points, it actually makes sense to compare them. So we can take an appropriate distance measure between points to quantify how aligned they are. This gives us misalignment distance. In the single-principal single-agent case, this alone is enough to provide some interesting regret bounds for the principal.

For more on this kind of approach to comparing utilities and rewards, see e.g. the EPIC and STARC papers. We or some of our colleagues might write up a blog digging more into these concepts at some point.

In the full delegation game setting, distances between utility functions can be used not only to measure principal-agent misalignment, but also the misalignment between groups of agents (or principals). This collective alignment measure essentially captures how much the agents are ‘on the same team’ (or not).

Calibration

Calibration is intuitively a fairness consideration: how much weighting is each player being given in a cooperative outcome?

When we began this project, we were intuitively expecting that a satisfying operationalisation of 'perfect cooperation' and of 'perfect alignment' would together guarantee optimal outcomes. That is, we anticipated that perfectly cooperative and perfectly aligned agents would produce welfare-optimal outcomes for principals. In fact, we could only prove that the outcomes were Pareto efficient for principals12. Calibration is the missing piece.

Calibration example

Let's return to our hunter-gatherers once more. Previously, we imagined a perfect implicit contract to share all gains equally. Hence, a mammoth/mammoth outcome is straightforwardly better. But we might imagine some alternatives:

we all agree to hunt mammoth together, but you only get one steak and we get the whole rest of the mammoth
we make the original agreement to share the mammoth equally
you refuse to hunt mammoth with us unless we give you most of it
...

Some of these outcomes may seem more or less 'intuitively fair', but it is hard to find this law written into the universe, and in practice players simply have their preferences and act on them (which may include some preference for fair or altruistic outcomes). Notably, they all improve on the uncooperative baseline.

The point here is that there are generally lots of different ways that cooperation can cash out, and even different 'cooperative outcomes' weight players differently.

Formalising calibration

The weighting over players implied by their modes of cooperation, or equivalently the welfare weightings of players in the welfare function used to score cooperation, determine which Pareto outcomes are deemed welfare optimal.

In our setting with standardised utilities, these welfare weightings correspond exactly to the magnitudes m of the players' utilities. Thus, we can completely characterise the relationship between the agent welfare aggregation and the principal welfare aggregation by considering the ratios:

where m̂ⁱ is the ith principal's magnitude, and mⁱ is the corresponding agent's magnitude. When these ratios are all equal, we have perfect calibration.

Otherwise, these individual welfare ratios rⁱ are combined (as we explain in the paper) with the collective alignment to produce R, the contribution of the miscalibration and collective alignment to the overall welfare regret.

The punchline: you need cooperation, alignment, and calibration to guarantee good outcomes

Now we have all the pieces to understand this claim more clearly.

The term on the left is the principals' welfare regret: how much better the aggregate utility of principals (humans) could have been.

On the right we have a cooperation failure term (the agents' welfare regret), an alignment failure term (a sum over the individual alignment distances), and a calibration failure term (the aggregate R over the welfare ratios and collective alignment distances). In the paper we also demonstrate that these measures are 'orthogonal' in the sense that each can be instantiated arbitrarily, regardless of the others.

By 'good outcomes', we mean 'minimising welfare regret of principals'. From this inequality, it's immediately apparent that, to get 'good outcomes', it is sufficient to minimise all three of the terms on the right. In the paper, we also prove that, to guarantee good outcomes, it is necessary to minimise the cooperation failure, misalignment, and miscalibration – that is, absent extra information, luck, or other magic, if you have failures in one of these areas you can't be certain of good outcomes.

Experiments

Besides theory, we've got experiments, a few of which are visualised here. The blue surface is derived from our regret bounds, and each green dot is the result of one simulated delegation game. There are more variables at play, but here each chart is controlled for particular welfare ratios (miscalibration), with axes for agent welfare regret (miscoordination) and aggregate individual alignment distance (misalignment).

A few other observations from the experiments:

the highest points here come quite close to the surface – it can be a relatively tight bound13
the average regret follows the contour of the bound, as you might expect

In some other experiments, we looked into how we can estimate some of these quantities empirically from much more limited data, a harder challenge.

Limitations and Next Steps

There are several limitations of these analytical tools.

Perhaps the most practical weakness is in computing these things. If we have access to the utility/reward functions, we can compute alignment measures in linear time over outcomes... but there can be a lot of outcomes! Further, we generally don't have direct access to a complete utility function14, and some decisioners may not be well-described as having utility functions. Worse, the outcome space might not only be very large but also unknown/unexplored!15 Welfare regret is easy to compute, but only if you know the welfare optimum – otherwise you can only get a lower bound. We demonstrate some preliminary work on estimation of these measures with limited empirical access in the paper.

The definitions we use in our analysis (welfare regret, alignment distance, and welfare ratios) also rely on some 'design choices' from a family of possible functions (e.g. norm choice for alignment distance and welfare weightings for welfare regret). For putting these into practice, there remains a challenge of making a choice. Importantly, these affect the tightness of bounds, and also the normative weight of principals in the overall welfare regret. For any such choice (and we should expect there is some sensible choice), our theoretical conclusions are nevertheless sound.

We also make a few simplifying assumptions about the structure of the delegation game. First, we assume that principals don’t take actions, only their delegate agents do. Second, we assume 1-1 principal-agent relationship, which facilitates some of the individual alignment analysis, but is missing full generality. These limitations should be simple enough to generalise, but a bit messier to talk about. Finally, and relatedly, we assume a fixed population of principals and agents. This has various implications. For one, total and average utilitarianism are identical in this case, while in practice impacts on population can mean that these come apart radically.

Some readers may be uncomfortable with 'agent welfare'. Where do we get these agent utility magnitudes from in the first place? Or equivalently, where do we get welfare weightings from in the case of agents? In fact, due to an equivalence between Pareto optima and welfare optima, you can actually taboo 'agent welfare' from our analysis entirely, and talk only in terms of Pareto gains and Pareto efficiency, while deriving substantially the same conclusions. The point is, a Pareto gain is just another vector (this time in the space of players' joint utilities), and a vector necessarily has a direction! – and thus cooperative gains and Pareto optima give rise implicitly to welfare weightings16.

Another potential philosophical issue is that we give normative precedence to principals' utility and welfare. If you think the agents might matter in and of themselves too, you might want to do an altered analysis. The modification should be straightforward, and the essence of the conclusions is unchanged (just differently-weighted) unless the agents are utility monsters or moral super-patients.

All of these limitations represent interesting avenues for future work. We’re especially interested in scalable ways to evaluate some of the measures using data gathered from interactions between humans and complex AI systems. We hope these measures will be a useful tool when it comes to thinking about the principles behind building more aligned and cooperative AI systems in multi-polar worlds.

A multi-multi delegation scenario, in ARCHES terminology.

When we conceived and did the bulk of work for this paper in late 2022 through early 2023, this was a more speculative claim. Here in mid 2024 it is coming into sharper focus, while still far from certain.

We're using the term 'utility' in a technical sense familiar in game theory and decision theory. In particular, it might not correspond exactly to the utility of consequentialist philosophers! It's a measure which rational actors approximately maximise, so it's about decision-making. Importantly (as we'll see later), you could multiply a player’s utilities by 10 (or any positive scalar) and their option preferences and behaviour would stay the same. For principals (humans) in our analysis, it might be appropriate to think of the two senses as being roughly equivalent. For agents (machines/AI), it's all about the de-facto preferences implicit in the decision-making process, not any sense of wellbeing or 'actual preferences' (necessarily).

This is a deliberately simple cooperation challenge; in practice the best cases might not be unique, or might be hard to discover, or might not be in agreement between players, or all of these challenges can apply.

Incidentally, this is one of the reasons there are not very many mammoths any more...

Seen another way, mammoths weren't part of the human coalition, so what looked like 'cooperation' to us looked like 'collusion' to them (at least, if they had a word for it).

Of course, this might not be a good thing: as mentioned above, cooperation by players A and B can look like collusion to player C, if there are negative externalities imposed! Powerful cooperative abilities between AI therefore don’t necessarily bode well for humans.

Like 'utility', we're using the term 'welfare' in a technical sense which comes from game theory. It is a tool for scoring an overall outcome for multiple players, when those players might have different preferences over outcomes. It doesn't necessarily refer to what we'd colloquially mean by 'welfare'. On the other hand, it's an aggregation over utilities... so when (for humans and other moral patients) those utilities actually correspond to wellbeing, and importantly when the aggregation is appropriately commensurable, this 'welfare' can indeed correspond to the utility which the philosophers tell us to maximise (disclaimer: not all philosophers)! Hence in part our interest in principals' (humans') welfare regret.

There's a small lineage of research into this relationship, beginning, as far as we can tell, with Arrow, Barankin and Blackwell's now-eponymous ABB theorem of 1953. We discuss this more in the appendices to our paper.

Of course, in a real chess match, we may share the positive-sum subgoal of 'have fun playing chess together', along with other mutual interests outside the game.

The authors include (somewhat inconsistent) vegetarians and care quite a great deal about animal welfare, don't sue us!

To briefly elaborate on the technical details, first, notice that a utility function, as a real-valued function, is just a vector. (In general functions are potentially very high dimensional, but here we're visualising a 3d space for simplicity.) The possible utility functions u fill the space. We apply our shift c, projecting onto this lower-dimensional manifold (middle image, here a sort of accretion-disk-looking surface). Now we guarantee that any utility functions which differ only by a constant shift are mapped to exactly the same point, but we still have a spread of magnitudes. Normalising by m projects again onto another lower-dimensional surface (here a 1-d circle), and now we guarantee that any utility functions with the same preferences map to exactly the same point, and any utility functions with different preferences map to different points. We also have a few other mathematical guarantees provided by this procedure, which you can read about in the paper.

We mentioned earlier that we also analysed collective alignment. If we have perfect collective alignment too, then calibration doesn't matter, a Pareto optimum is a welfare optimum, so our original guess is borne out. But since we are the designers of agents (AI), not of principals (humans), and it turns out empirically that humans are not perfectly collectively aligned, this case is ruled out! Perfect collective alignment between agents is possible in principle, but seems unlikely in the near future.

We haven't explicitly provided theoretical results on the tightness but some of our necessity results suggest the bounds can be tight for the right choice of parameters.

This is a really fundamental barrier for contemporary ML-based AI, where most of the computation takes place inscrutably in huge trained neural networks or similar, and we don't even know if the system can be sensibly described by a utility function, let alone what that function would be. (Consider an application of an LLM-derived AI agent.)

Indeed, some expect the most transformative impacts from AI to come from the ability to explore outcome- and option-space in ways (or at a pace) that humans can't, i.e. a kind of generalised R&D or experimentalism.

There are some edge cases, and the implicit welfare weightings are not uniquely defined if the Pareto frontier is non-strictly convex.

Exponentials and extinction

Oliver Sourbut — Sat, 07 Oct 2023 14:28:00 GMT

Exponentials were on my mind this month (this is nothing new, of course). Back in Un-unpluggability I wrote

exponential expansion (until constraints are reached) ... in practice often manifests as first imperceptible and then rapid escalation.

Connor Leahy, CEO of Conjecture AI, echoes me more pithily

As we learned with COVID, there are only two times to react to an exponential - too early or too late.

Of course, everything has been said before, and we are both in dialogue with the famous quote from Professor Bartlett

The greatest shortcoming of the human race is our inability to understand the exponential function.

I brought a more specific and lighthearted1 take in Invading Australia, where I looked more closely at some case studies of expansionist/replicating (exponential) systems in the field of human biosphere interventions. In summary,

The experiment, this introduction of foreign species was... successful, if by 'successful' you mean 'devastating and difficult or impossible to roll back'.

We learned about some pesky amphibians:

It turns out that cane toads don't jump or climb well, so outside of the lab, where beetles live at the top of sugar cane, the toads were all but useless at their intended purpose. (Out of context failure!)
But the toads were a success, in their own terms: unexpectedly unfussy eaters, prolific reproducers, and poisonous to most wildlife, they have rapidly colonised Queensland state, lately expanding into New South Wales and the Northern Territories, while being resilient to every attempt at pushing them back. We fought fire (cane beetles) with fire (toads), and ended up with two fires!

and we looked at the rather amusing case of humans deploying yet more replicators to bring the toads under control, namely toad viruses and/or genomic interventions like driving fertility-reductions. I'm reminded of the Old Lady Who Swallowed A Fly.

The culprit: Bufo marinus

There's good (?) news, of course:

the prickly-pear or paw-paw is another species inadvertently unleashed on the Australian ecosystem, which has caused some displacement of native wildlife. A moth (believe it or not, Cactoblastis!) was found which actually does seem to work as a self-regulating suppressant of the cacti.

When it comes to AI, we'd have to be very very prepared in order to expect shenanigans like this with replicating or propagating systems to end well.

Also, no-one picked up on my Charles Darwin/extinction pun: 'endless formerlies most beautiful'??

Upcoming, alignment and cooperation

I'm working on a paper2 with some collaborators in Oxford, analysing cooperation and alignment concepts for AI. What does this mean? When there are lots of AI systems, we want them to generate value by interacting positively rather than destroying value through conflict or anarchy. That's cooperation. And we want the values they are oriented at to be valuable to us, rather than arbitrary (or worse, harmful) things. That's alignment. Expect a takeaway or two to appear in the near future elucidating and elaborating on that.

Thanks for reading Oly on AI! This post is public so feel free to share it.

Extinction of hundreds of species, lighthearted?? I confess this is questionable. Please don't send me angry messages about how much you miss the giant wombats.

Actually we're done really, it's just going through review which, in contemporary academia, is an often arcane, perfunctory, and glacial process, rather far removed from the very healthy review culture I've been lucky enough to experience and contribute to in (some) industry and independent research settings. Not coincidentally, some different collaborators and I are trying various things in Oxford to incubate healthy opt-in review for researchers in AI safety. If you have thoughts on this topic, I'd love to hear them!

Universally Challenged

Oliver Sourbut — Fri, 06 Oct 2023 14:17:00 GMT

I recently appeared on University Challenge. Our team had a lot of fun, and won the match! I wrote something about the experience, focusing on the challenges faced in a competitive buzzer quiz format, as well as how these apply to reasoning generally and how we can do better. I managed to sneak in a reference to Harry Potter and the Methods of Rationality, which didn't go unnoticed by my friend Mark, the culprit who must be held responsible for introducing me to that most excellent novel. I also mentioned Bayes Rule, uncertainty, calibration, logical uncertainty, time constraints, and cost-functions. All good fun.

We don't get the opportunity to pause time after every question syllable, pull up a notepad, run some supercomputer evaluations, compute exact Bayesian posteriors, estimate our teammates' and opponents' credences and likely buzzing behaviour, and so on. Cruelly, time flows at one second per second and the quizmaster keeps quizmastering. So too in life!

So too, indeed.

Twitter/X and YouTube had fun, as some friends were eager to point out to me. There were some more flattering remarks, but I think my favourite was

Sourbut combs his hair with a toffee apple #universitychallenge

As I anticipated, people also enjoyed it when I said 'groyne'. I know it sounds like 'groin', and yes, my name is 'Sourbut' and it's pronounced how you think.

It's 'groyne', actually

The editors actually cut the bit where I said, 'Never thought I'd say, "groin" on TV'. As I cryptically hinted in the Hertford, Sourbut post,

(As we may find out, there are also secretly other options, like '(-1000) say something embarrassing on national TV'.)

‘US vs China’ vs me

Oliver Sourbut — Fri, 06 Oct 2023 06:30:22 GMT

I had some great feedback on my piece Careless talk on US-China AI competition?, which generated a bit of discussion (and perhaps a little controversy).

Ironically for a piece on speaking clearly and with nuance, I failed to explicitly point out crucial facts! - the actual true accounts regarding one of the exemplars I brought of language-misuse (these were obvious in my mind while writing, but some readers appeared confused in ways which make sense if they didn't know). I criticised

China has made several efforts to preserve their chip access, including smuggling, buying chips that are just under the legal limit of performance, and investing in their domestic chip industry.1

but didn't explicitly point out that, beyond being an oversimplification, there just isn't a ready way to map this to the reality, which is that

the smuggling in question was done by... smugglers
the buying of chips was done by multiple China-based entities
the (implicit but unmentioned) selling (and importantly, provisioning/enabling) of chips was done by NVIDIA, a US-based company (and perhaps others)
the investing was done by the CCP

I had a great response from CAIS in particular. The original author agreed this was ambiguous and unfortunate, and they've updated the text in question substantively. They also responded

More generally, we try to avoid zero-sum competitive mindsets on AI development. They can encourage racing towards more powerful AI systems, justify cutting corners on safety, and hinder efforts for international cooperation on AI governance. It’s important to discuss national AI policies which are often explicitly motivated by goals of competition without legitimizing or justifying zero-sum competitive mindsets which can undermine efforts to cooperate. While we will comment on the how the US and China are competing in AI, we avoid recommending "race with China."

This was really welcome and I hope other readers took on board the lesson here.

A few other readers pushed back a little. Stephen Clare expressed general agreement and offered a rearticulation of the problem I'm pointing to, while also criticising my relegation of governments to 'not currently meaningful players in AI development and deployment' as being too strong. Quite right: I meant that governments have (to date) been entirely passengers regarding the direction and nature of advanced AI development, but it is true that they have begun to get involved in coarse economy-level lever-pulls like investing and regulating hardware.

I went on a minor rant in the comments:

Do people actually think that Google+OpenAI+Anthropic (for sake of argument) are the US? Do they think the US government/military can/will appropriate those staff/artefacts/resources at some point? Are they referring to integration of contemporary ML/DS into the economy? The military? Or impacts on other indicators2? What do people mean by "China" here: CCP, Alibaba, Tencent, ...? If people mean these things, they should say those things, or otherwise say what they do mean. Otherwise I think people motte-and-bailey themselves (and others) into some really strange understandings.

Amazingly, one reader admitted that,

Yes.
In the end, all the answers to your questions are yes.

and made some further assertions about inevitability of international conflict. We had a minor back-and-forth but this was pretty remarkable, to me, and I think there was some talking-past happening. Thank you for sharing honestly.

You should feel bad (hotpot.ai/art-generator)

Sadly, Scott Alexander, an author I hugely admire, has evidently not read my admonishment to CAIS, as his latest letter is full of thoughtless remarks about China and the US/West. Scott, you should know better. Words have power. Saying this, I think it is a good and useful post in many ways, in particular laying out a partial taxonomy of differing pause proposals and gesturing at their grounding and assumptions. He writes,

The biggest disadvantage of pausing for a long time is that it gives bad actors (eg China) a chance to catch up.

There are literal misanthropic 'effective accelerationists' in San Francisco, some of whose stated purpose is to train/develop AI which can surpass and replace humanity. There's Facebook/Meta, whose leaders and executives have been publicly pooh-poohing discussion of AI-related risks as pseudoscience for years, and whose actual motto is 'move fast and break things'. There's OpenAI, which with great trumpeting announces its 'Superalignment' strategy without apparently pausing to think, 'But what if we can't align AGI in 5 years?'. We don't need to invoke bogeyman 'China' to make this sort of point. Note also that the CCP (along with EU and UK gov) has so far been more active in AI restraint and regulation than, say, the US government, or orgs like Facebook/Meta.

Suppose the West is right on the verge of creating dangerous AI, and China is two years away. It seems like the right length of pause is 1.9999 years, so that we get the benefit of maximum extra alignment research and social prep time, but the West still beats China.

Now, this was in the context of paraphrases of others' positions on a pause in AI development, so it's at least slightly mention-flavoured (as opposed to use). But as far as I can tell, the precise framing here has been introduced in Scott's retelling.

Regardless of the origin of this formulation, this is bonkers in at least two ways. First, who is 'the West' and who is 'China'? This hypothetical frames us as hivemind creatures in a two-player strategy game with a single lever. Reality is a lot more porous than that, in ways which matter (strategically and in terms of outcomes). I shouldn't have to point this out, so this is a little bewildering to read. Let me reiterate: governments are not currently pursuing advanced AI development, only companies. The companies are somewhat international, mainly headquartered in the US and UK but also to some extent China and EU, and the governments have thus far been unwitting passengers with respect to the outcomes. Of course, these things can change.

Second, actually think about the hypothetical where 'we'3 are 'on the verge of creating dangerous AI'. For sufficient 'dangerous', the only winning option for humanity is to take the steps we can to prevent, or at least delay, that thing coming into being. This includes advocacy, diplomacy, 'aggressive diplomacy' and so on. I put forward that the right length of pause then is 'at least as long as it takes to make the thing not dangerous'. You don't win by capturing the dubious accolade of nominally belonging to the bloc which directly destroys everything! To be clear, I think Scott and I agree that 'dangerous AI' here is shorthand for, 'AI that could defeat/destroy/disempower all humans in something comparable to an extinction event'. We already have weak AI which is dangerous to lesser levels. Of course, if 'dangerous' is more qualified, then we can talk about the tradeoffs of risking destroying everything vs 'us' winning a supposed race with 'them'.

I'm increasingly running with the hypothesis that many anglophones are mind-killed on the inevitability of contemporary great power conflict in a way which I think wasn't the case even, say, 5 years ago. Maybe this is how thinking people felt in the run up to WWI, I don't know.

I wonder if a crux here is some kind of general factor of trustingness toward companies vs toward governments - I think extremising this factor would change the way I talk and think about such matters. I notice that a lot of American libertarians seem to have a warm glow around 'company/enterprise' that they don't have around 'government/regulation'.

[ In my post about this I outline some other possible cruxes and I'd love to hear takes on these ]

Separately, I've got increasingly close to the frontier of AI research and AI safety research, and the challenge of ensuring these systems are safe remains very daunting. I think some policy/people-minded discussions are missing this rather crucial observation. If you expect it to be easy (and expect others to expect that) to control AGI, I can see more why people would frame things around power struggles and racing. For this reason, I consider it worthwhile repeating: we don't know how to ensure these systems will be safe, and there are some good reasons to expect that they won't be by default.

I repeat that Scott’s post as a whole is doing a service and I'm excited to see more contributions to the conversation around pause and differential development and so on.

Relatedly, I had a great conversation at lunch yesterday with Will MacAskill, who’s currently working on questions of coordination around development of advanced AI. Very excited to read more when that comes out!

Thank you for reading Oly on AI. This post is public so feel free to share it.

Center for AI Safety, AI Safety Newsletter #19, 2023-08-15

What indicators? Education, unemployment, privacy, health, productivity, democracy, inequality, ...?

Who, me? You? No! Some development team at DeepMind or OpenAI, presumably, or one of the current small gaggle of other contenders, or a yet-to-be-founded lab.

Careless talk on US-China AI competition?

Oliver Sourbut — Wed, 20 Sep 2023 12:45:53 GMT

China has made several efforts to preserve their chip access, including smuggling, buying chips that are just under the legal limit of performance, and investing in their domestic chip industry.1

Sounds about right?

hotpot.ai/art-generator

This post centres around an email I sent to the Center for AI Safety (CAIS) expressing concern about their 2023-08-15 newsletter's coverage of US-China competition in the AI space2, but the overall point is broader. There are some ways of discussing the topic of international relations regarding AI which strike me as un-nuanced in a counterproductive and dangerous way, by hiding certain truths or emphasising others, and supporting a conflict-oriented mindset.

In writing about this, I'm also gesturing at something about the more general topic of 'how to think and write about politically-charged topics'.

Jump to the summary if you are in a hurry.

This conversation really is important, which is why I think it's worth a public message discussing particular statements, but this should be understood as constructive criticism and part of a broader conversation which society, and especially the community of those focused on AI safety, needs to have. The CAIS newsletters are worth a (not unquestioning!) read, including the edition in question.

The particulars in this message serve as good exemplars of the problems and questions I have, and I'd be interested in responses from CAIS but even more so in remarks more broadly on the topic from anyone interested. The public conversation about this appears from my perspective to sometimes be broken. If that is so, I would like it to be rectified, and if not, I would like to be put right myself, the better to prioritise in my own work!

Specifics, CAIS case study

Here, quoted3, is my response to the CAIS letter, which serves as an initial dialogue opener and the core of this post:

Hello,
I've been a supportive reader for some time and am myself an AI safety researcher. I'm generally very impressed and encouraged by your newsletters! - but I was disappointed and concerned by the phrasing (and mindset it can encourage) regarding US-China competition in the letter dated 16th August ('US-China Competition on AI Chips, ...').
In general I'm very wary of messaging which could inflame a them-vs-us mindset, in short mainly because I think it a) destroys humans' ability to think sensibly and b) tends to foreclose win-win outcomes. I expect these brief points to be clear and to have rich referents in your mental pictures of the world, but please correct me if not!

Interjection: I'm referring to the vicinity of mind-killing politics

I think your letter skirted close to dangerously simplified presentation in this way. I would not normally spend my time or yours on a criticism of one section of a newsletter, but in this case I consider it worthwhile because your letter is close to the Pareto frontier on nuance, correctness, helpfulness, and reach (and it pays to try to nudge such things in good directions), but this kind of message needs to be delivered with care to avoid misunderstanding and harm.
Hopefully my pointing this out, accompanied by a few select quotes, is enough to encourage you to carefully take this criticism into account, but I'd happily expand more if you like!
Without further ado, a few quotes and my response:
The US and China have been competing for access to these chips for years.
Kind of true, but really US-based and China-based international corporations (as well as other orgs) have sought access to this scarce resource. Competition for this particular resource is mostly zero-sum across all of these entities, importantly including intra-US and intra-China.
Where market-share is near zero-sum (i.e. for direct competitors) the market-share outcomes may be greater-than-linearly zero-sum in this resource (your minor/temporary lack of chips could be my major gain of market-share), which might better warrant the term 'competing', but this effect is actually much stronger intra-country/bloc rather than inter-, due to respective markets! i.e. Google and Microsoft really care about each other's chip access in a way that they only do to a weaker degree about Alibaba's.

Interjection: To emphasise, 'the US and China have been competing' doesn't literally preclude belief in intra-bloc competition. But there's a strong implicature that intra-bloc is (relatively) unimportant, while in fact the mechanism I mentioned here increases intra-bloc competition (which I think is borne out by observation to date).

Subscribe now

When governments have paid attention, they have indeed made moves which adjust share (but also supply, as you've noted later, making it nonzero-sum, in chips at least). It's unclear (to me) exactly what incentives have motivated each move, but certainly they're not the actions of monolithic or coherent entities 'The US' and 'China'. And it's certainly not the case that where such activity changes chip share, it's collected by the acting entity. Non-governments have also made moves adjusting chip share, for example the case you cite of Nvidia (a 'US' company) deliberately rules-lawyering the US gov in order to supply more chips to various China-based companies!
Typically when nations are used as agentic subject nouns it refers to the government and/or military of said country. I don't think there's a reading of these statements in those terms which is true, and I'm not aware of any other plausible reading which is true.
China has made several efforts to preserve their chip access, including smuggling, buying chips that are just under the legal limit of performance, and investing in their domestic chip industry.
I dislike this sentence and think it is false! Who is this 'China'? Did said unified entity carry out all of these activities? Was it coherently pursuing all of these 'several efforts' to some particular end?

Interjection: I feel that I was unkind in my tone here. The kind of claim exemplified in the letter and in other places has proved impossible for me to map to something sincerely resembling reality without caveating so much as to be essentially starting from scratch i.e. it looks like a kind of non-proposition or emotive filler. I would be very interested to hear from people who have a better parse on this to help me understand! My crux discussion below the rest of the email contains my current best attempts.

Meanwhile, the United States has struggled to build American chip manufacturing capacity, and has taken further steps to prevent Americans from investing in Chinese technology.
This one is poor for the same reasons, though not quite as bad (perhaps because the authors are American/anglophone and have a closer perspective on the nuance).
The discussion after this point in the letter is relatively good and nuanced! It names (some of) the individual orgs and companies, and makes clearer the multiplicity of others. All further references to nations as agentic subject nouns appear to be consistent with a conventional reading referring to the respective governments.
I'm interested to know how the rather good detail got paired with a rather harmful introduction and I urge you to consider the processes and thinking which gave rise to this section of the otherwise good letter.
Thanks,
Oly

This was a brief email intended to convey something I expected to be quickly-graspable with a few pointers. CAIS content suggests that they have a broader familiarity with associated facts, but it seems to be digested/compressed here in a way which is needlessly and harmfully lossy.

In particular, the statements abstract very neatly over pre-drawn boundaries (national i.e. 'US' and 'China') and furthermore assign a greater sense of coherence and agency to those abstractions than is warranted. At least some possible such statements must be true-ish (or we would not have those abstractions), but this convenient compression happens in too many conversations to be a coincidence! Said pre-existing abstraction boundaries are already salient in the information ecosystem, and laden with emotional and political baggage. This same phenomenon (it sometimes seems like a pre-written bottom line but in implicature?) appears in other publications by other orgs and in verbal conversations I've witnessed or been part of.

The fact that I, a relative governance rookie (I'm focused mainly on technical matters), struggle to rectify or understand this makes me wonder: am I missing something? Is there a relevant factor I'm unaware of? More concerningly, is there some terrible equilibrium which prevents more involved people from speaking more clearly here? I think more likely the abstractions ('US' and 'China') have a background potency which distorts perceptions and shapes how people communicate.

Possible cruxes and areas of high uncertainty

Implicit in my own discussion is the background assumption that the main concern is about possible direct or near-direct impacts4 of AI deployments by government entities (or military). This seems to me the obvious reading when people use countries as agentic subject nouns. ('China [the governmental entity] has made several efforts'5.) In this framing, I can't rectify some of the things people say with reality. But some alternative concerns might fit the bill, and if so, this is evidence that people are talking past each other, and we should aim to frame concerns more clearly!

I've never focused intently on this area, but have had a handful of conversations about this over the years, and among relevant cruxes seems to be a family of questions along the lines of

How quickly/totally/coherently could US gov/CCP capture AI talent/artefacts/compute within its jurisdiction and redirect them toward excludable destructive ends? Under what circumstances would they want/be able to do that?

People's intuitions here appear to differ a lot, and data might be hard to come by!

It seems plain that nations are not currently meaningful players in AI development and deployment, absent conspiracy-level secrecy. So to support the apparent take that they are, we may need to imagine that they could ably/rapidly become meaningful players in AI development and deployment, hence the above cruxes.

Depending on the answers to these questions, one might perceive various goings-on which happen to occur under one or other jurisdiction to have greater import on the international stage and perhaps to warrant treating national or multi-national blocs as more coherent entities than they really are at present, for the purposes of AI discussion. ('China [the government] has [allowed/encouraged ~~made~~] several efforts [because eventually they will probably seize the gains/means]'6.)

Other possible cruxes, more guesswork:

Perhaps the concern is about indirect (e.g. economic) impacts of non-government entities' AI activities leading to some (risky) change in balance of power (between existing governments/blocs)
- Then, abstracting references to lots of individuals and groups via their home country might be a move which is writer-intuitive, even if nonstandard and reader-confusing. ('China [the impersonal collective economic entity] has made several efforts'7.)
- The additional step in this type of theory, namely that indirect effects cause a risky change in balance of power, should really be spelled out if it is loadbearing
- The use of countries as agentic subject nouns is difficult to justify under this reading
Perhaps the concern is indeed about direct impacts, but wielded by non-government entities (who remain the major players in development and deployment of AI)
- If so, the conversation should be about general/global resource/capabilities rather than inter-bloc 'competition'...
- ...unless we also posit that inter-bloc non-government conflict is liable to be much worse than intra-bloc8
- Similarly, if these are loadbearing assumptions, they really ought to be spelled out clearly
- The use of countries as agentic subject nouns could be justified at a stretch here, but only by first spelling out the reasoning
Perhaps the concern is that, regardless of the actual impact of AI resources, apparent competition could lead to inflammation of traditional conflict, or weaken defenses against such inflammation
- Then, reporting on the apparent competition via a mention and with explicit caveat, would make sense! ('China [gov/military] has [been perceived as having] made several efforts... [but the reality is more nuanced]'9.)
- Alternatively, reporting on actual conflict could be used as evidence for the claim (that apparent competition inflames conflict), but only by also pointing to the stated or implied reasons for the conflict. ('China has made several efforts... [in each case citing US provocation as justification]'10.)
- In either case, there are additional claims being made that can not be left implicit, and require supporting argument

Speculation

As it is, for me, the evidence seems to suggest that an AI race, if it is happening at all, is being run by (mainly US- and UK-based) companies with little or no oversight from governments or militaries. Rather, governments are in a position to collectively act to diffuse the race! And they appear as likely to do this as to exacerbate it, from my limited viewpoint.

Separately, a lack of reliable alignment techniques and performance guarantees makes AI-powered belligerent national interest plays look more like bioweapons than like nukes - i.e. minimally-excludable - and perhaps mutually-knowably so! This presently damps the incentive to go after them. But proliferation of naively-aligned AI ('figure out what I want and make it happen') might make harm plays more excludable, exacerbating lose-lose or race game dynamics ('go and steal/destroy their stuff but don't let that happen to my stuff'). This concern in part motivates consideration of multi-principal-multi-agent delegation and the cooperative AI agenda.

Summary and takeaways

Un-nuanced coverage and discussion has the potential to inflame harmful confusion and us-vs-them mentality, which diminish the chance of safe outcomes.

China has made several efforts to preserve their chip access, including smuggling, buying chips that are just under the legal limit of performance, and investing in their domestic chip industry.
China [the governmental entity] has made several efforts...
China [the government] has [allowed/encouraged ~~made~~] several efforts [because eventually they will probably seize the gains/means]...
China [the impersonal collective economic entity] has made several efforts...
[People and organisations in] China [have ~~has~~] made several efforts...
China [gov/military] has [been perceived as having] made several efforts... [but the reality is more nuanced]
China has made several efforts... [in each case citing US provocation as justification]
...?

When compressing discussion of political topics, be extra wary of compression which coincidentally abstracts over already-charged us-them divides (and be careful when phrasing comes too easily, lest you write bottom lines first)! You're more likely to be wrong (because your information ecosystem biases toward thinking in these terms, and because you might be mildly-to-severely mind-killed on the matter), and being wrong is more likely to be harmful (by reinforcing those dynamics in others).11 The same vigilance applies to reading and listening.

My (not very informed) take is that governments are at this point as likely to want to defuse as to exacerbate an AI race, and those of us with any privileged insight or influence should avoid one-sided discussion of the matter (if anything preferring to focus on constructive, collaborative possibilities, the better to raise them to salience and generate common knowledge).

Most of my remarks here are somewhat weakly held (if forcefully stated) and it seems important to gather perspectives on this. Inform me! All responses will be gratefully received.

Cross-posted to EA Forum

Center for AI Safety, AI Safety Newsletter #19, 2023-08-15

I'm supportive of some of CAIS' work, and the content of their newsletters (they have impressive breadth), and theirs is far from the only outfit which appears to produce confused or confusing messaging on the topic of US-China competition. In fact they seem to be better than many!

My response is at quote level 1. Excerpts from the CAIS letter are within, at quote level 2. I interject a little for the purposes of this post, without quotation.

e.g. deployment for weapons control or for offensive R&D (bio, materials, ...)

'China [the governmental entity] has made several efforts' is a fairly standard use of language; governments are at least somewhat coherent and also do things with consequences and subsequent plans 'in mind'. This sentence has the disadvantage of being baldly false, though (unless we posit a near-hivemind coherence to the people of China, which is absurd and obscene)

'China [the government] has [allowed/encouraged ~~made~~] several efforts [because eventually they will probably seize the gains/means]' is a big stretch of the language, but at least somewhat consistent. To support this reading, though, there's a substantial additional claim that needs to be justified.

'China [the impersonal collective economic entity] has made several efforts' would be a rather nonstandard use of language; economies of billion+ people do not make 'efforts' with consequences or subsequent plans. Leaving aside the implicature of agency, though, this sentence is a closer fit to reality. '[People and organisations in] China [have ~~has~~] made several efforts' would be even better.

Should conflict between non-territorial entities be worse for inter-bloc than intra-bloc? I think the point I made previously about zero-sum market-share competition suggests the opposite. But humans' destructive jingoistic/xenophobic tendencies are real, and a point in favour.

'China [gov/military] has [been perceived as having] made several efforts... [but the reality is more nuanced]' is in large part one of the messages of this post! I don't think the original letter in question can have meant this, but I do maintain it as a hypothesis for the more general case of compressed discussion of political things. People are often imagining third-party reactions when discussing political topics, and sometimes use-mention distinctions fail to come across.

'China has made several efforts... [in each case citing US provocation as justification]' is something that could legitimately be said, and has a clear meaning, even if in this particular case it is false if we understand 'China' to be the CCP or military.

I'd tentatively go further and suggest you ought to train yourself to be appalled when you catch yourself doing this without justification because only then do you stand a chance of thinking clearly about politics.

Subscribe now

Invading Australia

Oliver Sourbut — Fri, 08 Sep 2023 16:03:35 GMT

G'day! I recently got back from a really lovely extended break down under. There was a lot ('heaps') to see and do! - including visiting family and friends with my partner ~~in crime~~, sampling some unspecified quantity of homebrewed rum, and taking afternoon tea with the locals (humans and wildlife).

A friendly local (crimson rosella) I met in the Bunya mountains

I learned a lot from hosts, guides, museums, and simply from keeping eyes and ears open while indulging my passion for wildlife and nature while walking in the bush. But the main topic I'll touch on here is the pervasive sense of barely-controlled or out-of-control ecosystem impacts, which is felt pretty viscerally there.

Oly, don't you mainly think and write about AI and computer science? This one is mostly (superficially) about nature, but don't worry, we'll get there.

In Un-unpluggability I actually had some Australian cases in mind when I wrote:

Replication and growth (with reinvestment) get special mentions as they naturally produce exponential expansion (until constraints are reached), which in practice often manifests as first imperceptible and then rapid escalation.
Replicating systems also give rise to a kind of robustness due both to redundancy and repair. They are notoriously difficult to shut down, which is why autonomous replication is rarely a deliberate part of human designs - though we see it employed under well-understood and controlled conditions in agriculture and some industry, maliciously in computer viruses and bioweapons, and sometimes accidentally in biosphere interventions. In fact, in biological, zoological, and related sciences, great care is usually taken to avoid inadvertently unleashing autonomously replicating systems, though this remains sometimes insufficient1.
Examples:
pandemics
invasive species (e.g. plant weeds)
computer viruses
wildfire
rumours and ideologies?

But why is great care usually taken to avoid unleashing autonomously replicating systems? Well, it's been a hard-won lesson, known intimately within some disciplines and circles, while less perceptible or understood in others. And we're still making a lot of mistakes!

Australia, the big experiment

Australia, separated from the rest of the world's continents for tens of millions of years, was in many ways a natural experiment here: humans and our water-(and later air-)transport were one of the only ways in or out for the vast majority of species. Before we really knew what we were doing, sometimes accidentally and other times misguidedly deliberately, like good little experimentalists, we humans were responsible for an unprecedentedly rapid introduction of new foreign organisms, and we got the chance to watch the effects unfold in front of us. The experiment, this introduction of foreign species was... successful, if by 'successful' you mean 'devastating and difficult or impossible to roll back'.

Lessons learned?

Replication and growth! Expansionism! Within the jurisdiction of Australia, and in the limited domain of biosphere interventions, the relevant decisionmakers appear to have taken this lesson on board. Australia has the tightest border controls for bio of any nation. In my experience it wasn't all that, given the stakes, but they did scrub my boots on the way in, in case I'd brought any nasty English countryside with me.

It's unclear to me to what extent the First Nations Aboriginal people, the original (human) custodians of Australia, had learned this lesson, which would therefore merely be being remembered or rediscovered today. The popular stereotype I encountered was of a wise and learned people living in harmony with nature prior to European incursion. Certainly in recent centuries on the whole they were... more restrained than their European and Asian cousins in the destruction of the environment. Various Aboriginal cultural practices among different groups seem to have been designed or fit for the purpose of environmental protection2, but a fair share of casualties undeniably occurred under their ancestors' watch too, notably many species which are now counted only in the fossil record and on the oldest of Aboriginal artworks. There's a reason the Thylacine, whose fossils are found throughout Australia, was known colloquially as the 'Tasmanian Tiger': by the time Europeans arrived, the only remaining population lived even more remotely on the island of Tasmania! A similar story is playing out with the 'Tasmanian Devil', also once widespread on the mainland. Were the same lessons learned, once, only to be forgotten when the balance of power shifted to the European colonists? I don't think the archaeological or cultural record is intact enough that we'll ever answer that question.

A probable Thylacine depicted in ancient art at Ubirr. nettispaghetti, Wikimedia commons

Subscribe now

Characters and critters

Besides a burgeoning defensive stance against future incursions, what mitigations are pursued?

The main means of halting expansionist systems is to remove or protect the resources used for expansion, or intervene in some other way to reduce the rate. Very commonly we are forced to simply await resource exhaustion (as with some wildfires) or learn to live with it3 (as with endemic diseases or established invasive species).

My discussion of mitigation in un-unpluggability

In no particular order, here's a shortlist of some uncontrolled introduced species I encountered on my limited excursions in Australia.

cane toads (perhaps the most famous on the list, and spotlighted below)
various devastating tree infections
humans4
dingo5
cats and foxes
rats and mice
deer
rabbits
pigs

Not one of these has yet been brought back under control despite numerous efforts. We're well into the realm of 'learn to live with it' (though try telling that to the extinct or threatened native wildlife, Aboriginal communities whose traditional food sources have dwindled, or farmers whose crops are at risk).

Cane Toad case study: Fighting fire with fire

In the early 20th Century, sugarcane farmers became plagued by cane beetles, which eat the leaves and whose larvae destroy the canes' roots. Some enterprising genius suggested that, hey, don't toads eat beetles? So they checked, and indeed, there was such a thing as toads, and a South American species that would happily eat these particular beetles if it got the chance. (Behavioural evaluation, success!) Rather than poisoning or otherwise directly attacking the beetles, why not delegate to a small population of beetle-controlling amphibians, the toads? Apparently a few people raised concerns, but they were quickly overruled and a hundred or so toads were brought in, bred, and released, becoming known as 'cane toads'.

Bufo marinus, Eli Greenbaum, Wikimedia commons

In the mid 20th Century, sugarcane farmers were still plagued by cane beetles, which eat the leaves and whose larvae destroy the canes' roots. It turns out that cane toads don't jump or climb well, so outside of the lab, where beetles live at the top of sugar cane, the toads were all but useless at their intended purpose. (Out of context failure!)

But the toads were a success, in their own terms: unexpectedly unfussy eaters, prolific reproducers, and poisonous to most wildlife, they have rapidly colonised Queensland state, lately expanding into New South Wales and the Northern Territories, while being resilient to every attempt at pushing them back. We fought fire (cane beetles) with fire (toads), and ended up with two fires! Meanwhile, pre-existing species, especially predators unfamiliar with the toads' toxins, have been pushed toward extinction6.

This is fine

In the 21st century, a few new kinds of self-replicating 'fire' are being considered for toad control. First, toad viruses! This sounds like the start of the plot of something that ends badly, at least for other wildlife and native amphibians (which are already having a hard time) - a third fire to add to the mix?

The other, more high tech approaches involve deliberate introduction of driving genes, genes which subvert the usual 'fair' sexual recombination process resulting in much greater than 50% chance of appearing in a given offspring. If the driving gene is also one which reduces fitness in the traditional sense (for example, by being always male), the population can be reduced or even eliminated by the rapid colonisation of its gene pool by the driving gene. Various other genetic shenanigans along similar lines have been suggested. My take is that, despite sounding even more sci-fi, this driving gene approach is far less liable to go awry because it recruits, as a very robust boundary, the natural incompatibility of genetic mixing between species, in contrast to pathogens, which like to mutate and hop between hosts. On the other hand, further developing this technology may be a risky prospect indeed since it's so ripe for abuse in other contexts! We'll see how successful these approaches turn out to be as and when they are rolled out.

Incidentally, Burt and Trivers' book Genes in Conflict is a fascinating and rich dive into the often surprisingly-computational world of driving genes and other intragenomic conflict.

All of this goes to exemplify that, when unleashing autonomously replicating systems, rollback is seriously difficult. Fighting replicators with replicators usually hasn't worked so far7, but there's a first time for everything.

Casualties: endless formerlies most beautiful

Darwin visited Australia (I didn't know this until I came there). In his day, whether he was aware of it or not, we'd already lost8

giant kangaroos
giant 'wombats'
giant echidnas
giant platypus
Genyornis (a giant flightless bird)
Megalania (a giant predatory lizard)
Thylacoleo (a giant predatory marsupial)
Quinkana (a giant semi-terrestrial crocodile)
(there's a theme here; larger creatures are often fewer in number and at more precarious positions in a disrupted food chain)
mainland Thylacine ('Tasmanian tiger')
mainland Sarcophilus ('Tasmanian devil')

Since Darwin's time, post-Aboriginal colonisers have worked very hard at bringing new and exciting devastation to the environment. We've since lost

Thylacine (Tasmanian populations lost, captive individuals died without offspring)
various bandicoots9 and bilbies
various possums
various wallabies and kangaroo species
several large fruit bats
emu relatives and various fowl
numerous pigeons and parrots
drop bears

These are just the ones which are front of mind for me from previous reading and after visiting a few museums. There's a longer list on Wikipedia and this is not to mention plants and marine life!

I'm not an ecology preacher. It's sad, devastating perhaps, that these rather wonderful creatures can no longer be found. But for all I know they lived terrible, brutish lives of struggle, disease and misery, and they're better off absent (I'm half joking). The main point is that, mostly, these species weren't literally slaughtered to extinction by humans - instead, humans' activities, and more often than not one or another autonomous replicator that we set in motion, sealed their fate. And not one of these effects was deliberate10.

Takeaways

Replication is extremely potent! Nature actually hits us in the face with evidence for this every day, but because it all seems so normal, we don't always take the lesson on board. Because most replicators happen to exist in states of approximate equilibrium with each other, we really only witness the power of the exponential when there's some shock to the system: a new habitat unlocked, an invasive species introduced, or a novel pathogen burning through the available resources until some new equilibrium is reached.

Large groups of well-resourced people attempting to face such replicators head-on have found it to be challenging or impossible to put the cat back in the bag. I for one have learned to take risks which route via something replicating very seriously indeed - hence the special place it gets in my list of un-unpluggabilities. Covid was just another in a list of wakeup calls on this front.

Australians have learned this lesson in the limited domain of biology, perhaps more than once! And they're taking action on it, sometimes in pretty creative and ambitious ways.

Fighting fire with fire is an amusing and audacious approach that just might work for some bio interventions, but it's domain-specific, contingent, and untested: there's no law of nature that the medicine is a mirror of the poison, so we can't expect this to work for nonbiological systems without very good reason.

I write and think about AI a lot. Let's not get into a situation where AI can even semi-autonomously replicate, without thinking very hard about the consequences, including potential outcompetition of other intelligences. It only has to happen once, somewhere, by accident, and it might be hard or impossible to walk back from.

Including some credible suggestions (e.g. by US government agencies) that the ongoing coronavirus pandemic may have had an accidental lab leak as its origin, as well as more thoroughly verified cases of accidental pest introduction or pathogens finding their way out of laboratory contexts

Notably, strong taboos against hunting of certain keystone species like Cassowary, hereditary land custodianship including some measure of preservation duty, and management of controlled burning in many locations (which pre-emptively exhausts the potential for more devastating wildfires, as well as opening the ground for various fire-adapted native plant species). It seems likely that other such historic practices are entirely or substantially forgotten, and I'll only have encountered some subset of those remembered.

Assuming it hasn't already taken our life or livelihood, that is

Don't forget that humans are another autonomously-replicating introduced species in Australia (not to mention nearly everywhere else), it's just that we introduced ourselves (more than once).

Yes, the dingo is far from native. How do you think a dog got to Australia?

Especially monitor lizards or 'goannas', habitual hunters of amphibians and formerly among Australia's most characteristic and visible predator groups, as well as snakes. Some of the smarter birds have reportedly learned in some locations to carefully eat around the poison glands.

A counterexample: the prickly-pear or paw-paw is another species inadvertently unleashed on the Australian ecosystem, which has caused some displacement of native wildlife. A moth (believe it or not, Cactoblastis!) was found which actually does seem to work as a self-regulating suppressant of the cacti. Unlike cane toad, these moths haven't adapted to eat much of the native flora, and their presence appears to be less disruptive to other parts of the food chain.

To be clear, human fault is far from definitive in all of these cases. Some effect from naturally changing climate is also suspect. But all of these genera were co-temporal with humans in Australia.

I'm told Crash is still alive and well

Actually who knows, the Aboriginal ancestors faced a pretty horrifying range of megafauna; maybe they were deliberate. Giant, terrestrial crocodiles? 4-7m predatory lizards? There's environmental preservation and then there's self-preservation.

Hertford, Sourbut

Oliver Sourbut — Mon, 04 Sep 2023 17:49:04 GMT

Amongst the huge range of excitements offered by joining the University of Oxford was the unexpected opportunity to join this lovely bunch

Hertford College University Challenge team 2023, with our mascot Simpkin the cat

You can tune in on 2023-09-04 at 20:30 UK time on BBC 2 or watch or catch up online if you want to see us in action.

As a relative quiz-noob1, joining an elite quizzing team (hold your applause) was an eye-opening experience in a few ways. I'm not allowed to talk about how things went on the show (on pain of getting told off by the NDA police), but actually (as with all forms of performance and competition), the vast majority of the time was spent in prep and practice, which is where most of the insights came in anyway.

I'm going to talk a bit about University Challenge, and also gesture at how the experience as a competitive quizzer relates to broader theory and practice in decision-making under uncertainty. If you just want to see some fun quiz questions and my take at answering them, you can skip the middle Real-time calibrated decision-making section, or just skip reading this entirely and watch the show.

The format and some example questions

For readers unfamiliar with University Challenge, it's a competitive quiz, where each match consists of two teams head-to-head. Importantly for this discussion, a key part of the format is buzzer rounds ('Starter for 10'): that means you don't just have to know the answer, you have to know the answer and buzz before your opponent if they also know, otherwise you get nothing. But buzz too soon with a wrong answer and you lose points2.

Here are some example questions. Maybe you know some of the answers! If you want to, imagine hearing the question word by word - when do you have a good guess or some ideas? At what point are you confident of the answer? Would you risk buzzing early and losing points if you're wrong - and on what basis?

I'll go through these examples later, and give the answers (my realtime guesses and the actual ground truth).

What single-digit number links: the element boron, the fourth root of 625, and the planet Jupiter’s position from the Sun?
Resembling a cornet but having a slightly larger bell, which instrument is a standard in British brass bands, its name being the German for ‘wing horn’?
Rayleigh-Taylor, Kelvin-Helmholtz and Rayleigh-Bénard are all types of what general physical phenomenon, characterised by the unbounded growth of small disturbances?
Following the example of the Cadbury brothers’ model at Bourneville, which manufacturer and philanthropist developed the model village of New Earswick, north east of York?
Which art gallery links How It Is by Miroslaw Balka, Shibboleth by Doris Salcedo, Embankment by Rachel Whiteread, Marsyas by Anish Kapoor and The Weather Project by Olafur Eliasson?
In 2010, which tennis player became the seventh player to win all four Grand Slam tournaments when he defeated Novak Djokovic in the US Open men’s final?
‘Chain’, ‘double treble’, ‘reverse half double’ and ‘slip stitch’ are all terms used in which handicraft, whose name is a diminutive of the French word for ‘hook’?

Subscribe now

Real-time calibrated decision-making

Uncertainty in beliefs

A lot of theory and practice point to respecting and manipulating uncertainty as being mandatory for good truth-seeking and good decision-making. There's a lot of theory here which I won't elaborate on, but Bayes' Rule features heavily in my favourite chunks of literature3.

Bayes says that (assuming certain reasonable assumptions about how you want your beliefs to work) when you see evidence which is more or less likely under different hypotheses, you should adjust the odds of your credence in each hypothesis according to how relatively likely the evidence was, in a particular way. For hypotheses A and B, and observed evidence O4:

That is, the ratio of your credence in A and B after observing O (the ratio on the left hand side) is scaled by the ratio of the likelihood of the observation under each hypothesis.

How does this apply to competitive quizzing? We're trying to hone in on an answer to a question - but we receive the question one word (indeed, syllable!) at a time. The question, word by word: that's the evidence we receive. Our ideas about what the answer might be, the hypotheses.

Calibration

Besides representation of uncertainty, another important aspect of belief-formation and decision-making is correctly handling uncertainty.

How this is often operationalised is a notion of calibration of our uncertainties. If we express (or act in such a way as to implicitly express) a confidence level of 80% in some proposition, we are calibrated if 80% of similar cases resolve positively.

This actually gets trickier and more philosophical than we might like! What are 'similar cases'? What about confidence levels of tiny orders of magnitude, like 0.00001%? - surely we'll never be able encounter or identify sufficiently many 'similar cases' to find out if we were calibrated or not! Do we have to start reaching for notions, heavens forbid, of counterfactual possible outcomes? I don't have the answers, and as far as I know these are open questions in philosophy on one hand (descriptively) and machine learning and statistics on the other (prescriptively).

When quizzing, if I feel 80% sure of the answer, I want that to correspond to my being right about 80% of the time! Otherwise I'll make a loss in expectation, either because I buzz too confidently and get it wrong (losing points), or buzz too late and lose out to someone on the opposing team.5

Logical or computational uncertainty

There's a big hole in the Bayesian literature. Actually it's a big hole in the entire statistical literature, it's just more obvious in Bayes-land because it's more explicit.

Sometimes our uncertainty is because we haven't had long enough to think yet. Consider the digits of pi. (Presumably you know them. If not, I can tell you an effective procedure for enumerating them and you can come back when you're done.) Suppose I want to know the millionth such digit. Well, I know all the facts I need to get there. There are only ten things it can be (assuming decimal). I don't need to make any more 'observations' per se to arrive at a conclusion. But still I don't know the answer yet.

One of the interesting things about being a self-reflective computer and trying to do hard things is that you start to notice when you bump up against computational constraints - especially ones like this which aren't always obvious (or which get neglected for simplicity) the first few times theorists wade in to try to disentangle things! This is just one example where an appreciation of time constraints6 as a major determinant of effective computational procedures gives rise to interesting scientific problems and insights.

It's exactly the same when quizzing. Our brains' word-association and retrieval and evaluation and updating can only run so fast - often not as fast as the quizmaster can read the question!

As a technical puzzle, understanding this aspect of uncertainty intrigues me. I'm certainly less au fait than with the standard timeless perspective on uncertainty. There's some great work by MIRI which begins to address this, for example discussing logical uncertainty and logical (or Garrabrant) induction.

Doing this stuff fast

We don't get the opportunity to pause time after every question syllable, pull up a notepad, run some supercomputer evaluations, compute exact Bayesian posteriors, estimate our teammates' and opponents' credences and likely buzzing behaviour, and so on. Cruelly, time flows at one second per second and the quizmaster keeps quizmastering. So too in life! Our decisions in modern life might not usually be as split-second as in a head-to-head quiz, but our uncertainties (including logical uncertainty) and the costs of mistakes are just as real7.

One of the main changes resulting from practising and competing in UC was that I went from 'quiz noob with broad knowledge base and slow mental retrieval' to 'quiz rookie with broad knowledge base and slightly-less-slow mental retrieval'. One of our team in particular was a much more experienced quizzer, and an absolute master of buzzer technique! Entering the world of highly-practised quizzers gave me an appreciation for the challenges involved. I don't know how well these competences generalise, but I wouldn't be surprised if competitive quizzers (more experienced than me) would on the whole be great (or have great potential) at calibrated decision-making under uncertainty, and forecasting.

Of course, maybe I'm overanalysing this: after all, maybe general knowledge quizzes mostly come down to brute knowledge-base-retrieval. You either know or you don't! But I think the head-to-head competitive aspect brings out this mandate for fast approximate calibrated estimation: you have to eke this out sometimes, or your opponent will! I didn't expect this insight going into it, but it's given me a fresh appreciation for UC the show, and for head-to-head quizzing in general.

But at what cost?

This whole discussion sidesteps the reasons for wanting to have accurate and calibrated beliefs on the basis of limited evidence. Of course, it's fun and perhaps even virtuous and whatnot to have accurate beliefs, but usually we want them because they help us to do the good things.

This raises the question of costs: if my beliefs are feeding into my actions, and I have limited computational budget (always), it matters how much difference I expect it to make. In the case of University Challenge, for each buzzer question, there are seven outcomes, where I've indicated a rough net score for each

(-25) lose points and other team gains points
(-20) other team gains points
(-5) lose points and other team gets nothing
(0) nobody gets anything
(5) other team loses points
(20) gain points
(25) other team loses points and you gain points

(As we may find out, there are also secretly other options, like '(-1000) say something embarrassing on national TV'.)

Our expectation of these outcomes depends on

our current estimate of what the answer might be
our estimate of our teammates' state
our estimate of our opponents' state
(a guess at how soon the question will end)

For these reasons, when time and/or decision-making computational resources are scarce, it can pay to make fast approximations and gut checks on all of these. I felt my own system 1 slowly incorporating some of this stuff through practice, and I expect think this is a large part of what separates really good quizzers from the rest of us!

In University Challenge in particular, the cost of an interruption is a mere 5 points, which is no big deal in the scheme of things (compared to the upside of +10-25 for a correct answer, depending on bonuses). But what it really costs is the opportunity for you, or a teammate, to reduce uncertainty about the answer - either by hearing more of the clue, or by having more time to think! This is something it took a while to internalise, and you do have to internalise it to get good in this competitive setting.

To some extent, this whole thing pattern-matches a lot of work I've done in my time as a data scientist and software engineer in industry, relating to online secret auctions: our estimate of teammates and opponents' states corresponding to their 'bid' and our own estimate corresponding to our 'true evaluation' of the item. There, as in quizzing, time is limited and computational constraints reign. High throughput and rapid constrained decision-making often trumps slow and painstaking deliberation in these contexts. I even have a hunch that some of the theory I developed there might transfer over! - especially concerning how to handle optimal bidding over a time-distributed volume in an uncertain market environment. But I've not played enough with the maths, and I'd need to check what's covered by NDA, so I won't elaborate any more.

You'll spot me doing some time-saving approximations and inversions below, as well as accounting for cost, not just for belief-updating.

Just answer the questions already!

OK here goes, in dialogue format. Remember, these are honest but hypothetical (I'm not allowed to reveal any real things from the show). I've partly but not totally biased the sample toward topics I know about8.

Quizmaster (QM): What single-digit number...
Oly (O): P(anything other than [0-9]) = 0
QM: ...links: the element Boron, ...
O: Boron is B, group 3, element 5, mass 10 (right, usually?) [thinking]
QM: ... the fourth ...
O: Gotta be 5, the proton number, group number would be a weird choice and it's subject to debate anyway because of the transition elements, buzz time...?
QM: ... root of 625 ...
O: [buzz] 5

And the whole question

What single-digit number links: the element boron, the fourth root of 625, and the planet Jupiter’s position from the Sun? (5)

Review: not bad, if I'd been more confident of calibration I could have buzzed sooner. Who knows if my opponent would have been so cautious...? If I knew they were the waiting type, I could have rested a bit and double checked the fourth power of 5, just to be really sure, or waited even more and counted 'My Very Easy Method Just'.

What about a speed superintelligence? We can imagine they suffer no or limited logical uncertainty - effectively they can pause after every syllable. Well then, 'element' would be a big clue and they could precompute a mapping from the first ten chemical elements to their proton numbers and mass numbers. Because stable mass numbers aren't uniquely defined, proton number would already get most of the posterior, and once 'bor-' has been said there's really only one possible answer.

QM: Resembling a cornet...
O: Hmm, I've played a lot in brass bands, is this going to be cornet the brass instrument or something else? P(brass instrument) = 0.7. Maybe trumpet. P(trumpet) = 0.4. Should I buzz??
QM: ... but having a slightly larger bell, ...
O: 'bell'! Got to be brass! P(trumpet) = 0.7 [NB this should be lower to follow Bayes] but uhh, what are those other ones? Piccolo trumpet, soprano cornet, ...
QM: ... which instrument is a standard in British brass ...
O: ...wait, they all have smaller bells. THINK
QM: ...bands, its name being the German
O: Flugel horn! [buzz] Flugel horn

And the whole question

Resembling a cornet but having a slightly larger bell, which instrument is a standard in British brass bands, its name being the German for ‘wing horn’? (Flugelhorn)

Review: My logical uncertainty and brain speed let me down here. But I got very lucky on actually having spent a lot of time in British brass bands. Handy coincidence. The speed superintelligence would have done better in my place, but only if it had niche knowledge of brass band instruments.

QM: Rayleigh-Taylor, ...
O: [pure association] physics or maths stuff?
QM: ...Kelvin-Helmholtz and ...
O: Definitely physics! I've heard of Rayleigh scattering, won't be Taylor polynomials, Kelvin did... temperature stuff? Helmholtz did a bit of everything. Acoustics?
QM: ...Rayleigh-Bénard are all types...
O: Rayleigh again! Scattering? But I can't confidently link the other names, and I haven't heard of these name pairings at all.
QM: ... of what general physical phenomenon, characterised by the unbounded growth of small disturbances?
O: Oh, must be chaos! That's a general physical phenomenon. Question seems to be over, teammates haven't buzzed yet... [buzz] Chaos
QM (looking disappointed): I'm afraid that's the wrong answer

And the whole question

Rayleigh-Taylor, Kelvin-Helmholtz and Rayleigh-Bénard are all types of what general physical phenomenon, characterised by the unbounded growth of small disturbances? (Instability)

Ouch. If any of the other players had better physics breadth or recall, they'd have certainly got it long before I failed to. In practice there's at least one of my teammates (cough Omer Keskin cough) who I'd expect to get this question, or to have buzzed sooner with a guess than me! At least I didn't buzz early with a wrong answer.

QM: Following the example of the Cadbury brothers' model...
O: Ooh, I did my undergrad in Birmingham and I remember some history about this
QM: ... at Bourneville, which manufacturer and philanthropist...
O: OK, Bourneville was a constructed model village for confectionary workers, designed to offer better quality of life - getting distracted. It's probably another confectioner. Rowntree, Fry, ...? P(something I didn't think of) = 0.3
QM: ...developed the model village of New Earswick, north east of York?
O: My Granny lived in York and told me this! Pretty sure it was Rowntree. Confidence? Question seems over, time to [buzz]. Rowntree.
QM: Can I ask which Rowntree...?
O: Uhhhh. [I really should have read that fun Chocolate Wars book they gave us at undergrad induction. Pick a common early C20 businessman's name.] John?
QM: I'm afraid that's the wrong answer

And the whole question

Following the example of the Cadbury brothers’ model at Bourneville, which manufacturer and philanthropist developed the model village of New Earswick, north east of York? (Joseph Rowntree (1836-1926; not to be confused with his son, Seebohm Rowntree))

Review: well, that seems harsh. I'm not sure actually on the real show if they'd have given this or not. This was basically a case of fact retrieval, though it illustrates the important issue of allocating probability/credence to 'something I didn't think of yet'.

QM: Which art gallery...
O: P(art gallery) = 1. I know hardly any, thankfully my teammates might know a few more. I remember visiting the Tate and Tate Modern as a kid. P(something I've never heard of) = 0.8.
QM: ...links How It Is by Miroslaw Balka, Shibboleth by Doris Salcedo, ...
O: Sound quite modern. P(modern) = 0.8
QM: ... Embankment by Rachel Whiteread, Marsyas by Anish Kapoor ...
O: Confidently all modern
QM: ... and The Weather Project by Olafur Eliasson?
O: Wait, that sounds familiar. It might actually be Tate Modern. Question’s over. [buzz] Tate Modern

And the whole question

Which art gallery links How It Is by Miroslaw Balka, Shibboleth by Doris Salcedo, Embankment by Rachel Whiteread, Marsyas by Anish Kapoor and The Weather Project by Olafur Eliasson? (Tate Modern)

Review: I got very lucky here (as I said, I biased my sample a bit). In practice, someone else would probably have beaten me to it by a long shot, hopefully someone on my team!

QM: In 2010, which tennis player ...
O: Oh no, sport facts. Hopefully Daniel is on it. Federer? Djokovic? Williams? Nadal? Murray?
QM: ... became the seventh player to win all four Grand Slam tournaments when he defeated Novak Djokovic in the US Open men's final?
O: Yep, no idea. P(Federer or Nadal or Murray) = 0.3. Strictly, if alone, I ought to buzz and guess something now that the question has finished. But surely a teammate has a better guess than me?
QM: Anyone? Anyone going to buzz?
O: I really should buzz and guess. But besides getting answer right, my utility function also includes not putting myself forward embarrassingly wrongly on TV. P(wrong) = 0.9.

And the whole question

In 2010, which tennis player became the seventh player to win all four Grand Slam tournaments when he defeated Novak Djokovic in the US Open men’s final? (Rafael Nadal)

Review: This awkward silence actually happens, quite rarely, but sometimes, on the show. I assume this can only be when everyone involved has a similar thought process to me at the end. A superintelligence optimised exclusively to value University Challenge performance might not have similar compunctions, but there's also no knowing what instrumental strategies it might pursue. Maybe embarrassment or some analogue would actually play a part there.

QM: ‘Chain’, ‘double treble’, ‘reverse half double’...
O: Sport? 'treble' is a musical clef? Gymnastics?
QM: ... and ‘slip stitch’ ...
O: Oh, knitting or something? I remember something about this.
QM: ... are all terms used in which handicraft
O: It has to be knitting, right? Can I afford to wait? [buzz] Knitting
QM: I'm afraid that's the wrong answer and you lose five points. ...in which handicraft, whose name is a diminutive for the French word for ‘hook’?
O: [fuming] Should have waited! It's got to be crochet.

And the full question

‘Chain’, ‘double treble’, ‘reverse half double’ and ‘slip stitch’ are all terms used in which handicraft, whose name is a diminutive of the French word for ‘hook’? (Crochet (not knitting or embroidery))

Review: I'd have thought this would be an acceptable error, but then, for my sins, I don't know much about the fine distinctions between wool-based handicrafts. Alas. This is one where the competitive uncertainty bit me, my model of the opponents' lack of buzzing interpreting it as over-caution (therefore we're in a race and I'd better not waste more time thinking) rather than appropriate caution (therefore I'd better think harder to generate alternative hypotheses in case). A speed SI version of me would have generated at least 'crochet' as an alternative, and, I expect, waited until 'name is a diminutive' which would be enough evidence to be confident.

Takeaways

This post is already far too long. You can tune in on 2023-09-04 20:30 UK time on BBC 2 or watch or catch up online if you want to see me and my team in action!

Quizzing (and more importantly the practice matches and friendlies) gave me a fresh appreciation for various decision-making concerns, theoretical and practical. Plus, it was a great laugh and a chance to meet some really intriguing and friendly people, here in Oxford as well as from other teams.

Uncertainty is powerful! Calibration is a slightly elusive concept, but an important part of using uncertainty appropriately.

Logical uncertainty is a fascinating and under-studied phenomenon - especially in time- or compute-constrained settings!

As well as getting things right, we often need to accept some tradeoff for getting things wrong in order to free up decision-making and acting resources for other uses. When you're in a head-to-head time-based competition, this bites hard.

Finally, decision-making is, ultimately, about value. If my belief doesn't make a value-difference by changing my behaviour, then I might better pay attention to other beliefs which do, even at the expense of the first belief being true. Again this only bites in a constrained context: whether constrained by evidence (for factual uncertainty) or by compute (for logical uncertainty). Since there are so many things we are terribly clueless about, this actually applies in practice all the time.

Quizzing is fun, and I haven't even got round to mentioning the arcane art of quiz-writing. A peek behind the scenes at how those particular sausages are made was also illuminating. Were it possible, quiz- and puzzle-setters have gone up even further in my estimations.

If you're a quizzer or a quiz-setter (whether with more or less experience than me), I'd love to hear if any of this resonated with you, and about your own reflections on the art of quizzing and decision-making!

I've enjoyed as many pub quizzes as the next Brit but that's about as far as it goes

You win 10 points for a correct answer, whether early or not (hence 'Starter for 10'), and you lose 5 for an interruption and an incorrect answer. An incorrect answer after the end of the question gets nothing. There are also non-buzzer questions where teams can confer, for bonus points.

OK, when I said 'literature' I meant 'scientific literature' i.e. papers and textbooks, but actually this sentence applies to some of my favourite actual literature too: Harry Potter and the Methods of Rationality, for example.

I've used my favoured formulation of Bayes' Rule here where it's all about odds (ratios of probabilities/likelihoods). I think it's a historical accident that Bayes usually gets introduced in less obvious ways and then forgotten.

It gets more complicated! There are multiple members of the team, so really I want to be sensitive to when my particular knowledge areas overlap with those of my teammates, and when they might buzz (right or wrong), and which of us is more likely to be right, given the current information, ... And the decision to buzz early or not of course also depends on some belief model of the opposing team's capabilities! If I know for sure they won't get it, or that they'll play cautiously, I can afford to wait, but if I have some sense that they're aggressive on early guesses, I need to be willing to play with less confidence.

There are other computational bottlenecks, and tradeoffs around parallelism and memory-use are other fruitful considerations.

Well, the cost of mistakes in life are more real, provided your utility function includes anything other than 'be good at quizzes'

I ran through 30 actual questions and somewhat ad-hoc chose 7 that seemed especially interesting.

Pets, Friends, and Partners

Oliver Sourbut — Sun, 21 May 2023 17:35:00 GMT

To date in human history, if my eyes are met with a credible impression of personhood, if my ears hear the voice or cry of a person, I can reliably conclude that, ‘here is a person’, and it is furthermore appropriate for me to intuitively and unhesitatingly believe, ‘this person is worthy of dignity, care, respect, perhaps even love (if not necessarily attention or deference)’1. No longer.

In Un-unpluggability I wrote

I believe there are large risks to allowing AI systems (dangerous or otherwise) to be perceived as pets, friends, or partners, despite the economic incentives.

We are making a choice to train and deploy systems which credibly portray personhood. The choice has concerning consequences. We don’t have to make that choice, and we should prepare ahead of time as we cross this particular collection of capability thresholds.

There may be a relatively narrow window of opportunity to set directions and make preparations.

Not only the rock, but also the oak tree at the bottom of the hill is an animated being, and so is the stream flowing below the hill, the spring in the forest clearing, the bushes growing around it, the path to the clearing, and the field mice, wolves and crows that drink there. — Yuval Noah Harari in Sapiens

This quote is describing the widespread animist instincts of behaviourally modern humans. We’re deeply, irresistibly primed to see faces, persons, and spirits everywhere we look.

Pretend persons describes what I think is new here. Consequences, Risks, Questions includes some severe consequences if we don’t navigate this right. I really don’t know what to propose here - it’s a complex issue.

Pretend persons

Our tools, toys, and products have practically forever2 been decorated with and imbued with imaginary personhood.

Left, Der Löwenmensch, the ‘lion person’ figurine, over 30,000 years old, Dagmar Hollmann / Wikimedia Commons (license: CC BY-SA 4.0); right, a similar figurine, available at Toytown

Pretend persons in AI

Naturally, this very human urge has been applied also to AI systems since the beginning of the field.

As a large language model trained by OpenAI, I...

Cutting edge AI is importantly different here, for two reasons:

even contemporary systems, which are probably not persons3, have the demonstrated capacity to fool some people some of the time
more speculatively, artificially intelligent artefacts are the first which may actually exhibit some degree of personhood

In essence, the difference arises because it is increasingly empirically hard to tell by surface detail. These are quite credible pretences at personhood, and have the potential to be more so. I won’t address the second point here, beyond saying that a mistake in either direction could be disastrous.

Perhaps a key here is the interactive modality - even ancient chatbots of the 60s, by their apparently flexible, conversational responsiveness, convinced some users of their understanding and emotional responsiveness.

Pretend?

Pretending to be a specific person is an overt fraud, of the kind that is fairly obviously malign, apart from under some relatively precise conditions e.g. in unmistakably comedic, theatrical, critical, or artistic contexts. I won’t discuss this further here.

Emulating generic personhood per se is much more covert. All the more so because the use of some language forms (especially first person pronouns) does not on its face appear to constitute an assertion at all4! - how can ‘I think…’ be a lie?? But in fact, it carries an assertion along the lines of ‘some particular entity is originating this statement and that entity is a person’, which of course can be as false as any other proposition. Contrast an ‘encyclopedia voice’, where responses might describe objects, events, and so on without ever resorting to first person. Or consider the pre-product forms of the very same language model AI systems, which, depending on conditioning and prompting5 can be coaxed into outputting text of nearly any kind, be it monologue, dialogue, descriptive text, business accounts, or even computer code.

The same applies to semblances of other kinds, like appearance in images, audio, or video, where a product which presents a coherent user experience of a ‘this is me’ image/voice/video is psychologically very different from one which can visibly produce representations of diverse things (including multiple people or non-person objects). But the coherence, which produces the entirety of the psychological effect, is a very shallow property.6

What’s the problem?

We should be clear: a subtle lie is being told each time an AI system puts on a mask of personhood. Probably! To date, the lie is primarily conveyed by the AI, and the culprits are the human organisations which design, train, and deploy them. They can and perhaps should choose to set things up otherwise, with careful deliberation around any exceptions.

Have we not been doing the same for decades? Alexa, Google Home, and others all speak in first person language, don’t they? Why worry now? Well, before, to nearly everyone, the lie was always transparent. The convincing portrayal of personhood is what is beginning to change. More people will be fooled, more of the time.

Pay attention when AI companies themselves tell you what they’re doing. OpenAI’s ChatGPT voice mode was explicitly designed to be (my emphases):

An approachable voice that inspires trust
A warm, engaging, confidence-inspiring, charismatic voice with rich tone

In other words, part of the overt goal is to buy (without earning) the trust and confidence of users! Quite naturally, profits incentivise companies to put substantial resources into simulating trustworthiness when that’s cheaper than the real deal.

The trouble is, when I see (or hear) a person, by default I think I should:

Care for their wellbeing (to some extent)
Trust and incorporate what they say (to some extent)
Expect them to be sensitive to roughly the kinds of preferences and concerns that humans usually have (with some individual priorities and idiosyncratic particulars)

Consequences, Risks, Questions

The obvious outcome of highly credible pretend people (persuasion, rights, autonomy, human obsolescence)

The most salient concern, for me, regards un-unpluggability arising from dependence on AI systems:

In light of recent developments in AI tech, I actually expect the most immediate unpluggability impacts to come from foundationality, and for anti-unplug pressure to come perhaps as much from emotional dependence and misplaced concern7 for the welfare of AI systems as from economic dependence

The worst case is that people in positions of influence (be they developers and engineers themselves, policymakers, independent deployers and operators of autonomous systems, or otherwise) or the public at large may be moved by a misplaced concern for the welfare of the artificial ‘person’. They might feel affection or even love toward these systems. For virtuous, empathetic reasons, we might end up with the assignment of more autonomy, rights, or affordances than safely and appropriately assigned to such systems.

With inappropriate expectations of such systems’ capabilities, and a false understanding of their goals or motivations, this could be disastrous - humanity might sign its own obsolescence notice8.

Lesser (but still concerning) persuasion

Less extremely, if some group or groups of humans retain influence over the pretend people, they are granted a new and powerful means of persuasion and manipulation of the wider human population: the first truly bespoke and responsive marketing, propaganda, or persuasion devices.

And the vulnerable interacting with such systems may be driven to derangements by trusting concocted nonsense, convincing echoes of their own input, or even deliberate deceptions (implanted by the AI’s creators or emerging accidentally from the AI’s training).

The other obvious outcome of credible pretend people (hardened hearts, harms to future actual digital people)

Unfortunately, the most readily available mitigation is developing ‘social antibodies’ of skepticism about personhood. Skepticism about intentions is entirely precedented: if I receive an email from my friend in urgent need of £10k, my heart does not go out to them (and nor does my £10k) - rather, I assume it’s a scam and my £10k can be put to better use elsewhere (though for sufficiently convincing scams, I may waste some effort in verification).

Perhaps credible pretend personhood can be overcome by similar learned skepticism. But at what cost?

We are forced to ‘harden our hearts’ - no longer can I safely and responsibly intuitively treat all impressions of credible personhood as a sign that ‘here is a being worthy of dignity and care’. Sometimes it’ll instead be an AI system which shouldn’t be granted rights, autonomy, or political consideration (probably), at least not out of benevolence9.

How does this affect our relationships? Will we begin to see other people more instrumentally, like the artificial non-people they ever more closely resemble, in a kind of ‘social autoimmune’ failure?10

How do we explain this nuance to technically underprepared people, or society at large? How do we explain this to our children, who will grow up without a reference point from the before? Children and young people already suffer harmful developmental distortion from social media. I’d like to have children some day, and, if we avoid acute disaster in the meantime, I tremble at the challenge of raising a child to wholesomely and safely navigate a world full of pretend people.

Perhaps as concerningly, though speculatively, this may weaken society’s ability to respond with appropriate compassion to future non-natural moral patients - i.e. actual digital or artificial people, should they arise by other means. Once we have become used to brushing off, in law and in custom, all apparent people who lack biological plausibility, what will be the life prospects of a person who lives on an operating system somewhere?

Hard Ban? ‘Free Energy’ of credulous population

Based on a very limited conversation with Sam Altman in London a few weeks ago, and based on public statements by OpenAI and other AI developers, I expect they might reasonably point out: if we avoid (or ban) pretend personhood now, society won’t have a chance to develop social antibodies to manipulation and so on. That’s an awful lot of free energy - the hearts of a credulous population - for a potential bad actor to exploit! It’s a valid argument. I don’t know how to navigate this.

Summing up

We are surrounded by pretend people! They are all around us in fiction, toys, and other cultural artefacts. This is fine, and somewhat lovely. But, in AI, there arise the first glimmers of widely-convincing portrayal of personhood. It may even be possible to bring bona fide artificial persons with inner lives into being! - though I strongly doubt that current systems have inner experiences (at least, not corresponding to their superficial outputs).

This is confusing, because our intuitions about people’s moral status, competence, and intentions will not carry over by default.

This is hazardous, because credulous operators could be more liable to be confused and lose oversight or control of systems, credulous society may suffer deceptions and derangements or even willingly (but inappropriately) grant rights or affordances to autonomous systems, or a stone-hearted society may overadjust, harming real relationships or even future artificial people.

What to do about it is up for debate.

Thanks for reading Oly on AI! Share this with your real person friends. Share it with your AI friends (or pets, or partners): maybe they can help us figure it out. If you know any digital people, wish them my best and invite them to read as well.

Even if all too often it is expedient or selfishly instrumental not to apply this corollary, it is nevertheless appropriate and in some sense right to. Those of us privileged enough to live in lawful and peaceful societies and communities can - and many do - safely adopt this as an immediate, intuitive impression, and rely on our slower reasoned intellect to moderate as appropriate.

Some historians and palaeontologists even mark the beginning of modern humanity by the appearance of depictions (in art or otherwise) of figures, people, and animals at least 40000 years ago - and whether or not we mark it as a milestone among milestones, certainly it was an important and unprecedented moment. So it really has been ‘forever’, in some sense, that we’ve been doing this.

I say probably because the current state of the science of personhood and consciousness is concerningly inadequate to answer questions like these. We don’t even know for sure which animals have inner lives.

Pronouns aren’t part of an utterance that we usually consider truth-apt, so we are not at all on guard to assess its truth or otherwise. I would have loved to see Grice take on this kind of linguistic implicature! Maybe there is some relevant philosophical literature I’m unaware of.

Importantly, the chatbot products you see are actually a thin veneer over much less ‘coherent’ general text predictors, which are the real system underneath. They’ve been conditioned, by a little finetuning and a ‘playscript’ prompt, to predict the text corresponding to an ‘ASSISTANT:’ character in an expanding dialogue with a ‘USER:’ character (played by you).

A similar story may play out for robots, where physically embodied responsiveness and cute or relatable features already elicit automatic empathetic responses from humans: we all love WALL·E!

There, I noted

It is my best guess for various reasons that concern for the welfare of contemporary and near-future AI systems would be misplaced, certainly regarding unplugging per se, but I caveat that nobody knows

I mean this absolutely sincerely: in extreme cases this could be the end of humanity, either acutely (if terribly misaligned AI systems, given freedom of operation, deliberately take over or destroy human societies), or gradually (if AI agents and systems, which needn’t sleep, eat, or live in expansive housing, are able to replicate, expand, and outcompete human claims to resources… as we have done without particular malice to many animal inhabitants of Earth).

In desperation or out of coercion, some autonomy might be granted to AI systems for pragmatic reasons, if it makes a fight less likely.

Maybe instead it will naturally carve out in our intuitions, like it does for toys and fictional representations, in spite of the increasing fidelity of pretend personhood converging with the increasing digitisation of real personal interactions (thus lowering their fidelity). It could be that our response to fictions, and rational approaches to altruism offer hopeful glimpses of a way forward - the grown-up ability to contend rationally with a situation by overriding our intuitions and instincts, without suppressing or weakening our capacity for compassion and emotion. Nevertheless, the increasing cognitive overhead of distinguishing real from pretend personhood remains.

Un-unpluggability

Oliver Sourbut — Mon, 15 May 2023 13:23:12 GMT

Can’t we just unplug it?

Cover photo by Kelly Sikkema on Unsplash

A few weeks ago I was invited to the UK FCDO to discuss opportunities and risks from AI. I highly appreciated the open-mindedness of the people I met with, and their eagerness to become informed without leaping to conclusions. One of their key questions was, perhaps unsurprisingly, ‘If it gets too dangerous, can we just unplug it?’. They were very receptive to how I framed my response, and the ensuing conversation was, I think, productive and informative1. I departed a little more optimistic about the prospects for policymakers and technical experts to collaborate on reducing existential risks.

Here I’ll share the substance of that, hoping that it might be helpful for others communicating or thinking about ‘systems being hard to shut down’, henceforth ‘un-unpluggability’. None of this is especially novel, but perhaps it can serve as a reference for myself and others reasoning about these topics.

This contrasts pretty strongly with a more technical discussion of ‘off switches’ and instrumental convergence, handled admirably by e.g. Rob Miles and MIRI, which is perhaps the reflex framing to reach for on this question (certainly my mind went there briefly): absent quite specific and technically-unsolved corrigibility properties, a system will often do better at an ongoing task/intent if it prevents its operator from shutting it down (which gives rise to an incentive, perhaps a motive, to avoid shutdown). This perspective works well for conveying understanding about some parts of the problem, but in my case I’m pleased we dwelt more on the mechanics of un-unpluggability rather than the motives/incentives (which are really a separate question).

Both perspectives are informative; consider what you are trying to learn or achieve, and/or who your interlocutors/audience are.

Un-unpluggability factors

Broadly, I’ll discuss six classes of property which can make a system less unpluggable2; with each, some analogous examples3, a note on applicability to AI/AGI, and a gesture at mitigation.

In brief

Rapidity and imperceptibility are two sides of ‘didn’t see it coming (in time)’
Robustness is ‘the act itself of unplugging it is a challenge’
Dependence is ‘notwithstanding harms, we (some or all of us) benefit from its continued operation’
Defence is ‘the system may react (or proact) against us if we try to unplug it’
Expansionism includes replication, propagation, and growth, and gets a special mention, as it is a very common and natural means to achieve all of the above

Of course this is not comprehensive, the properties can come in degrees, and far from requiring all properties, a system with sufficient of any of these properties can become very un-unpluggable. The main point is that there are plausible paths to AI systems gaining any or all of these properties at least, so the most reliable mitigation is to work hard at avoiding building systems which are not unpluggable in the first place.

One angle that isn’t very fleshed out is the counterquestion, ‘who is “we” and how do we agree to unplug something?’ - a little on this under Dependence, though much more could certainly be said.

Finally I’ll share a little thought on when and why we might expect these factors to arrive in AI systems in un-unpluggability incentives and expectations.

Subscribe now

Rapidity (of gains in power)

This one is pretty straightforward: if something gets powerful or impactful fast enough, you can’t react in time to turn it off, even if you in principle have the necessary access and capacity to do so.

Examples (objectively ‘fast’):

explosive devices mid-detonation
flash floods
meteorites?

Examples (somewhat objectively ‘slower’ but pitted against slower reaction times):

pandemics
positive climate feedback loop?
modern human society? (from the perspective of ‘nature’)

The most classic analogue in discussions of AI is the hypothetical recursive (self)-improvement, or ‘intelligence explosion’: if a system of AI(s) becomes capable enough that it can contribute to progress in AI capabilities, this may lead to a feedback of rapidly increasing gains in intelligence, with very unclear (ex ante) rates of progress and no guarantee of desirable ends. Separately, some discussions point to the step-changes in many metrics of influence which humanity sees in its own history (e.g. such revolutions as the cognitive, agricultural, scientific, or industrial) as evidence that there may be hard-to-foresee thresholds of capability which lead to comparatively very rapid gains (whether these are achieved by feedback processes or otherwise).

Leaving aside these plausible feedback or threshold effects, human engineering effort alone has generated quite fast exponential growth in computing power over the last decades (compare Moore’s Law), and investment in AI and machine learning has outstripped even that by many measures.

There is no real mitigation here besides anticipating and building cautiously—investing in organisational and societal insight and governance capacity could help with this. Providing more rapid shutdown mechanisms (for example of servers or datacentres) may be a minor palliative.

Imperceptibility (of gains in power or of harmful ends)

Failure to perceive power or impactfulness gains until ‘too late’ is complementary to rapidity of those gains—if you are foresightful enough you can perceive things sooner and make up for lack of speed (consider advance detection of potential meteorites), and if you are fast enough you may have time to make up for failure to perceive things in advance (consider a quick-thinking or lucky escape from a flash flood). In this way, rapidity and imperceptibility of power gains can be considered two sides of the same coin.

Failure to perceive harmful ends is slightly different. Whether a system is ‘deliberately oriented at’ undesired ends, or simply by its nature constituted to bring them about, if we fail to perceive the danger, we may be entirely aware of its power and impact without being moved to turn it off until too late.

Examples:

coups (military or otherwise) involving treachery
latent pathogen reservoirs
harms to social fabric and epistemology from ubiquitous contemporary social media?

The ongoing discussion and research topics around AI deception, ‘treacherous turns’, and goal misgeneralisation demonstrate that imperceptibility of harmful ends is a live concern for AI. The inscrutability of contemporary deep learning systems must separately be emphasised: even for networks with millions of parameters, let alone billions or trillions, our current ability to understand the mechanics of learned algorithms is insufficient to detect the presence of ends, goals, or intentions, much less their specific nature.

Indirect effects, network effects, and interaction effects on the potential for harmful ends of AI have received some attention (especially by analogy to social dilemmas and to the observed effects of social media), though relatively less. Complex systems, and systems with feedback, can be notoriously difficult to study and to predict, whether or not the individual components are relatively well-understood, hence the challenges faced in fields such as economics, control theory, biology, sociology, and others. Therefore, we might expect there to be, depending on the deployment scenarios of AI systems, a substantial further challenge beyond understanding and verifying the individual component(s).

An important technical note here is that, without mechanistic understanding of an algorithm’s workings, it is mathematically impossible to provide general guarantees merely by observing behaviour as with a black box (and even granted a mechanistic understanding, there are feedback and network effects to account for). Perceptibility is relative to the perceiver. We can improve it by researching and developing interpretability tooling for systems which are currently inscrutable, or by limiting high-impact deployments to systems which are more explainable—both a technical and a governance concern. Research into goal misgeneralisation and deceptive AI aims to elucidate and mitigate this issue.

The study of complex systems and more specifically the field of collaborative AI may shed light on the feedback and network effects.

Robustness (redundancy)

Among engineers of software systems, it is (rightly) considered a devastating criticism of a design to observe that it has any ‘single point of failure’ (SPOF). Depending on the application, SPOFs may be tolerated, be engineered around, or, when the cost of failure is deemed unacceptable, prompt potentially expensive redesigns to incorporate redundancy, fault tolerance, error checking, and so on4. We even explicitly discuss how to ensure system uptime if someone were to literally pull the power supply (deliberately or otherwise) on some of our machines! Such considerations are a large part of the responsibility of a software engineer or architect, and I understand the same to be true in other engineering disciplines (though I can not speak from experience).

Examples:

the internet and other high-availability technologies and platforms
military command structures
decentralised insurgencies
biological tissues and organs
colonial organisms like ants

The existence of multiple points of failure can make a system harder to unplug in two ways: first, the challenge of locating and tracking each point of failure, and secondly the commensurately increased effort of targeting each point.

With system reliability and uptime such a core engineering consideration, we may expect AI systems to continue to be built with such properties in mind. Even in the absence of human design, a misaligned AGI would presumably have no trouble at least identifying the usefulness of such robustness, though implementation is another matter. Such software sophistication appears to be out of reach for current AI systems, for now.

It is unclear how to mitigate this.

Aside on repair, error correction, course-correction

Other aspects of robustness are relevant to the consideration of AI, including systems with repair, error correction, and course correction, all of which are seen in systems both natural and artificial. These are less pertinent to literal unpluggability per se, but certainly relate to the challenges of disrupting a system in general.

Dependence (collateral)

When we refer to systems or organisations as ‘too big to fail’, what we usually mean is, ‘too big to be permitted to fail without substantial collateral damage’. What these systems have in common is that, due to their utility and scale, they become foundational to other things of value such that their removal would (with some degree of plausibility, or without expensive mitigation) damage those other valuables.

Examples:

large banks
supply chain infrastructure
energy production systems
telecoms
manufacturing or particular industrial processes
atmospheric oxygen from photosynthesis?5
ozone?

Such foundational systems are harder to unplug for two main reasons. First, the collaterally damaged valuables make unplugging straightforwardly less desirable. Perhaps more importantly, the collateral is valuable to someone, whether it’s a livelihood or essential, a way of life, a good, a luxury, or a comfort. That person has a degree of incentive to act against any attempt to remove it. This raises the question, ‘who is the “we” that intends to unplug the system?’, and can bring tricky collective action problems into view. This appears to bite even with relatively clearcut cases like climate risk and energy production, and the greater the ambiguity (as with pandemic risks or AI risks), the harder to gain consensus about tradeoffs and externalities, which is one first step toward resolving such collective action problems.

It’s hard to predict how substantially and in what ways AI or AGI system(s) will become foundational, but increasing generality and capability has so far given rise to increased deployment and perhaps dependency.

Finding ways to avoid dependence on increasingly capable and unproven systems could counter this risk—alternatively, building capacity and planning for ways to attenuate or relieve collateral harms when it becomes necessary to shut down a dangerous system.

Defence (active, reactive, deterrent)

Weaponry, cyber capabilities, propaganda and public relations, legal and normative protection. These capacities can be proactively deployed, or held in reserve ready to react to any attempt to disable a system (including as a deterrent).

Examples:

large companies
governments
militaries, insurgents, paramilitaries
dangerous animals

Computational systems may demonstrate an advantage in research and development of chemical and biological weaponry—this is perhaps the most straightforward route to a misaligned AGI acquiring massive destructive potential. In addition, conventional weaponry is increasingly automated or automatable: mass destruction has been available in the form of nuclear weapons for some time, and targeted destruction is increasingly feasible using drones or similar tech—appropriation of these may vary in means and difficulty but broadly include cracking cybersecurity or influencing key human operators. Content generation and persuasive messaging appear to be feasible or near-feasible for today’s AI systems, such as could be deployed in propaganda. It remains to be seen what laws and norms will arise around use of AI, rights or responsibilities of AI developers and of the systems themselves—these might be able to be wielded in defence or offence by corporate-like AI, networks of AI, or organisations including AI systems.

There are not obvious technical mitigations in this area. Perhaps the most technical would include efforts to robustify organisational- and cyber-security best practices across weapons-enabled organisations, globally. Research and investment in detection and prevention of novel pathogens may provide some defence against bioweapons. Besides this, deproliferation of weapon stockpiles, and of the means to produce them (especially novel biological and chemical substances), might reduce the scale and chance of an appropriation. Misguided or otherwise, assigning control of weapons to AI systems obviously nullifies the challenge of acquiring control of weapons for those particular AI systems. Whether this ends well depends on what those AI systems do with the weapons, of course.

Whether it is overall safer to train machine learning algorithms with access to human-specific data is unclear, but certainly it makes the job of creative and persuasive content generation much more straightforward. Norms and laws around impersonation of humans (in general, or of particular humans) will evidently have strong influence over the propaganda potential for AI, as will processes for detecting and signifying generated content, and the broader questions of human-AI interaction modalities and interfaces.

Expansionism (replication, propagation, growth)

Replication and growth (with reinvestment) get special mentions as they naturally produce exponential expansion (until constraints are reached), which in practice often manifests as first imperceptible and then rapid escalation.

Replicating systems also give rise to a kind of robustness due both to redundancy and repair. They are notoriously difficult to shut down, which is why autonomous replication is rarely a deliberate part of human designs—though we see it employed under well-understood and controlled conditions in agriculture and some industry, maliciously in computer viruses and bioweapons, and sometimes accidentally in biosphere interventions. In fact, in biological, zoological, and related sciences, great care is usually taken to avoid inadvertently unleashing autonomously replicating systems, though this remains sometimes insufficient6.

Examples:

pandemics
invasive species (e.g. plant weeds)
computer viruses
wildfire
rumours and ideologies?

The obvious technical property to note about AI systems is the inherent copyability of digital software and data, and for many types of algorithm the inherent scalability of capability with access to more/faster computers7. From the ‘rogue AI’ perspective it is straightforward to see replication being an early strategic consideration. For ‘pre-rogue’ or ‘ambiguously rogue’ AI, the inherent copyability of software also means that human actors are liable to replicate AI systems.

Beside replication per se, we have more general propagation and growth. Companies and industries expand, both into new and related lines of business. A sufficiently autonomous AI system, or one participating in a corporate-like entity could do the same.

The main means of halting expansionist systems is to remove or protect the resources used for expansion, or intervene in some other way to reduce the rate. Very commonly we are forced to simply await resource exhaustion (as with some wildfires) or learn to live with it8 (as with endemic diseases or established invasive species). Unprecedently sophisticated antivirus programs or evaluation/certification methods may provide some protection, but for computers, the quantity of highly-networked and in principle accessible resources is very large, and it is unclear how we could intervene, assuming we detected an AI system autonomously replicating. Licensing and monitoring of compute resources might offer some avenue to control this, but by the time an autonomous replicator is at large, this is probably too little too late.

Improvements to cybersecurity practices and organisational security on the one hand, and changes to research closure norms on the other, may affect the proliferation of potentially-dangerous AI systems by humans.

Un-unpluggability incentives and expectations

Of course, the very fact that un-unpluggability can be increased by these and other properties gives an incentive to any system (or system designer) to achieve these. Hence we see organisms, processes, human organisations, and human-designed devices exhibiting all of these properties in one shape or another.

In the case of robustness, there is a clear incentive for designers and developers to imbue their systems with this property, and more or less similarly for rapidity and dependence, at least while developers are incentivised to compete over market share in deployments.

In light of recent developments in AI tech, I actually expect the most immediate unpluggability impacts to come from collateral, and for anti-unplug pressure to come perhaps as much from emotional dependence and misplaced concern9 for the welfare of AI systems as from economic dependence—for this reason I believe there are large risks to allowing AI systems (dangerous or otherwise) to be perceived as pets, friends, or partners, despite the economic incentives.

For imperceptibility, defence, and expansionism, there is a definite incentive for a system to develop these itself, though perhaps a more mixed incentive for the developers—we might land with them anyway through mistakes, malice, or the inherent inscrutability of deep learning, but otherwise these appear more likely to arrive after situationally-aware AGI.

Conclusion

We discussed six properties of systems which can make them hard to ‘unplug’, namely

Rapidity
Imperceptibility
Robustness
Dependence
Defence
Expansionism

where expansionism gets a special mention for often giving rise to the others, and for being especially difficult for humans to combat.

It is tempting to reason about AI within a frame of simple programs running on a laptop, but modern impactful software systems are more often large, complex and networked. We touched on some ways of relating contemporary and future AI systems to the six un-unpluggability properties mentioned.

Cooperation between technical experts, policy leaders, developers, and the public will be needed to evaluate and prevent these properties from arising in AI; I’m cautiously optimistic that such cooperation can be achieved, but it will take sustained and creative effort from many stakeholders.

I remind readers that this should not be considered a comprehensive summary, and that these and other potential factors are individually sufficient to lend a system un-unpluggability, rather than being required all at once.

My thanks to Sam Brown for feedback on readability and ordering

Thank you for reading Oly on AI. This post is public so feel free to share it.

I appreciated how open-minded their questioning was—there was a genuine truth-seeking inquisitiveness, rather than a debate-minded presupposition. The people there even connected some of the dots and filled some of the gaps themselves once the conversation was unfolding, which is a great sign of ideas and knowledge moving successfully between minds.

Less unpluggable? More un-unpluggable? I welcome terminological criticism and suggestions

Analogous examples are not necessarily intended to be things we would want to unplug if we could (though many will be). Besides confirmed examples, I will also provide potential or unconfirmed examples (consensus or otherwise), which I denote with a question mark.

Such organisations consider robustness of other systems too: in more macabre terms we will discuss the ‘bus factor’ of a project or team—how many people would need to get hit by a bus for key knowledge or competence to be irrecoverable? - and take deliberate steps to mitigate this, like knowledge-sharing, upskilling, and documentation (and not putting the whole team on the same bus). Nobody likes being on call when a critical component goes haywire and the only expert is sick, on vacation, or asleep on the other side of the world! I’ve been on both sides of that phonecall, and in each case it imparts a true and visceral appreciation for system reliability and knowledge diffusion.

Atmospheric oxygen was not always present—its introduction due to early photosynthesis actually killed off almost all earlier life—but now it is essential to most life forms. The presence of ozone protects land-based life from deadly solar radiation—so modern life forms have developed very limited capacity to withstand such radiation.

Though note that scalability of algorithms varies widely from ‘barely scales at all, even with supercomputers’ to ‘trivially scales up the more compute you throw at it’

Assuming it hasn’t already taken our life or livelihood, that is

It is my best guess for various reasons that concern for the welfare of contemporary and near-future AI systems would be misplaced, certainly regarding unplugging per se, but I caveat that nobody knows

Deliberation Everywhere: Simple Examples

Oliver Sourbut — Mon, 27 Jun 2022 17:26:20 GMT

The analysis and definitions used here are tentative. My familiarity with the concrete systems discussed ranges from rough understanding (markets and parliaments), through abiding amateur interest (biology), to meaningful professional expertise (AI/ML things). The abstractions and terminology have been refined in conversation and private reflection, and the following examples are both generators and products of this conceptual framework.

We previously discussed a conceptual algorithmic breakdown of some aspects of goal-directed behaviour with the intention of inspiring insights and clarifying thought and discussion around these topics.

The examples presented here include some original motivating examples, some used to refine the concepts, and others drawn from the menagerie after the concepts were mostly refined1. Each example is subjected to the analysis, in several cases drawing out novel insights as a consequence.

Most of these examples, for all their intricacy in some cases, are relatively ‘simple’ as deliberators, and I am quite confident in the applicability of the framing. Analysis of more derived and sophisticated deliberative systems is reserved for upcoming posts.

Brief framework summary

We decompose ‘deliberation’ into ‘proposal’, ‘promotion’, and ‘action’.

Propose:S→Δ{X}nonempty (generate candidate proposals)
Promote:S→{X}→{V} (promote and demote proposals according to some criterion)
Act:{X×V}→A (take outcome of promotion and demotion to activity in the environment)

We also identify as important whether a deliberator’s actions are final, or give rise to relevantly-algorithmically-similar subsequent deliberators (iteration and replication), or create or otherwise condition heterogeneous deliberators (recursive deliberation).

Reaction examples

Chemical systems

Innumerable basic chemical reactions, like oxidation of iron, involve actions which change the composition or configuration of some material(s). For the purposes of this analysis these are, alone, mostly uninteresting, but serve to illustrate natural systems which do not preserve their essential algorithmic form and thus do not constitute iterated systems.

Some reactions, on the other hand, involve catalysis, wherein some reagents are essentially preserved or reconstituted in the action, producing the seeds of iterated algorithmic reaction, thus basic ‘control’.

Despite pushing in a particular ‘goal’ direction, these systems are reactions rather than proper deliberations, because they occur without computing alternative pathways2.

Biological systems

Even very simple organisms can react (that is, act or perform some function in response) to stimuli. No proper deliberation need be involved, no computation instantiated to consider alternatives.

A reaction can take place with or without a brain, or even a nervous system: consider many motions of single-celled organisms 3 or the snapping of a Venus flytrap. The cringe of many animals from intense heat goes via nervous circuitry but takes no deliberation.

Indeed, temperature changes are pervasive in nature, so it is no surprise that we find automatic heat-responsive behaviour at the protein-machinery level in every lineage of cellular life. These latter, along with other protein machinery and the organ-functions of multicellular organisms, demonstrate that, in nature, a single organism will be found to consist of many reactive and deliberative systems.

The class of ‘systems undergoing a transformation’ in chemical or physical interactions often does not preserve the essential algorithmic characteristics of the system, but most salient biological examples are iterated (the act essentially preserves the capacity to further act in an algorithmically similar way). This is not surprising given their origin by natural selection. (Some context-specific counterexamples exist, like some cases of one-shot autotomy to distract predators, though this works precisely because in so-doing, another deliberator, the gene complex coding for autotomy, sometimes thereby preserves and propagates its essential form).

Genes and systems-producing-systems

In many biochemical cases, it may be most appropriate to think of active genes and gene-complexes as highly iterated actors, sometimes literally implementing classical control analogues. Their products, the proteins, cells, organs and organisms, can be seen as the mediators of the genes’ actions. Some of these mediators are themselves, or give rise to, deliberators or controllers.

The primordial genetic replicators, in the RNA world or even earlier, made little use of mediating systems, but nowadays they predominantly build cellular and multicellular vehicles which are responsible for most of their replicative success. Note that ‘creating’ is just a special case of ‘conditioning’, and, once created, cells, tissues, and organs are available to be conditioned as mediators of other genes, cells, tissues, and organs’ activity.

Of course genes are themselves products and objects of natural selection, so we have systems-conditioning-systems-conditioning-systems.

Artificial reactions

Many artificial systems take reactive actions too. A thermostat adjusts without considering what it might otherwise do, and similar control systems have been in use for centuries at least.

Most contemporary predictive systems, most generative systems, a few game-playing, and many robotics systems produce their outputs reactively—the algorithm is not evaluating multiple proposals, it just proposes some single ‘idea’ (of varying quality).

There are many contemporary AI systems with rudimentary proper deliberation as well, and some uncertain cases, discussed below.

Many predictive AI systems, e.g. image classifiers, come closest to being ‘non iterated’ because their results have least bearing on their future invocations (though being more or less apparently useful to human operators does affect this).

Gradient descent as reaction

A single step of gradient descent is a reactive motion. A powerful heuristic—compute (an estimate of) the gradient of some target function at a point—produces exactly one proposed update (direction). The update is degenerately promoted and acted upon (applied) in a single fused motion. (In some cases minor proper deliberation may be introduced by evaluating multiple candidate step sizes.) Variations like momentum, and higher-order gradient methods like Newton-Raphson, have the same top-level analysis (though their quality as controllers may differ).

Iterating steps like this results in a controller directed toward ever lower scores4.

What makes gradient descent so good?

Despite carrying out no proper deliberation, iterated gradient descent is capable of somewhat-reliable goal-directed control. In fact, under the right conditions, it can produce results just like iterated natural selection which is, in contrast, a canonical properly deliberative process (see below). How so?

An essential takeaway from this analysis is that effective evaluation and promotion are only one part of deliberation strength. The other part is coming up with good proposals. Gradient descent happens to use a strong heuristic5 for generating good proposals—good enough to entirely compensate for degenerate deliberation, at least when compared to a relevantly-similar natural selection.

Accurate contour rendition of a loss landscape. For navigating to the bottom-right basin from the marked point, a gradient step which is pushed away by the steep slope will eventually subsequently route around, but better steps are possible.

Note that in spite of this clever proposal heuristic, better-still controllers for the same domain are conceivable (though not, perhaps, easy to find or implement), even constricted to reactive deliberation and a fixed learning-rate schedule: local best-descent is only the best possible intuition on a globally-linear surface (which is nonexistent in practice). Similarly, appropriately-adaptable learning-rates can proceed more efficiently. This is part of the motivation behind such techniques as momentum, trust region optimisation, and adaptive hyperparameter tuning.

Market arbitrage

At a different level of organisation, arbitrage between sufficiently liquid markets is an example of an emergent reactive system which serves to push toward price convergence, along something akin to a gradient.

The essential form of this tendency is not disturbed by its own action, so it is iterated: a controller.

As a robust agent-agnostic process this reaction is constituted by the interactions between many individual actors, but is robust to many of the particulars of the behaviour of individuals or subsets of individuals. It serves as an example of a (reactive) controller at the multi-organism level. Even though the individuals enacting the arbitrage may be doing so very properly deliberately, this does not make the arbitrage control process itself any less degenerately reactive.

Deliberation examples

Natural selection, the ur-deliberator

Natural selection with mutation, a weak evaluative promotive proper deliberator, ‘proposes’ variations on current themes, tries them out by ‘evaluating’ their success, and moves (on average) to ‘promote’ fitter combinations6. (Natural selection couples its evaluation and promotion to its action, like reactive systems, but other deliberators need not do so.) It is essential to distinguish ‘natural selection’, the deliberative moment-to-moment or generation-to-generation proposal, evaluation, and promotion/demotion of local variations from ‘evolution by natural selection’, the iterated deliberative process (controller) which accumulates changes over time.

The implicit abstraction underlying the proposal part of natural selection is basic: sometimes random mutations will be fitter. The evaluation and promotion implicitly depend on the abstraction that if something works, this is evidence that it may work again. These abstractions are weak—the vast majority of lineages of biological organisms are extinct, and most extant organisms carry large amounts of deleterious genes—but we happen to live in a universe where they have been true enough times that life has persisted on Earth for billions of years and adapted to many changing circumstances in that time.

Hyperparameters of natural selection

Certain particulars of the machinery of life on Earth are notable for affecting the ‘hyperparameters’ of natural selection:

Separation of concerns into storage (mostly nucleic acids) and function (mostly proteins)
DNA repair mechanisms, their prevalence and their absence
Sexual reproduction in many lineages
Highly-conserved evolutionary development mechanisms

Since none of these can be said to be the primordial state, in some sense it can be said that natural selection has acted on its own hyperparameters. I think it may be appropriate to rather identify ‘higher level’ natural selections acting on lineages for the effectiveness of their adaptability, as determined by the hyperparameters of their respective natural selection.

What makes natural selection so bad?

Note that we can turn our question about gradient descent around: what makes natural selection so bad?7 It uses far more resources than gradient descent but performs comparably.

The answer is the mirror image: the proposal-generating and evaluation heuristics of natural selection are about as rudimentary as they could possibly be, so in spite of massive parallelism and enormous computational resources, it takes a lot of iteration to get many capability innovations off the ground.

Plants are surprisingly deliberate

Plants as actors are often considered to be lacking deliberation, but in contrast, it is more appropriate to consider many of their behaviours to be weakly deliberative in a fashion comparable to or perhaps surpassing that of natural selection.

For example, climbing plants have numerous flailing, spreading, and otherwise (local) searching routines8 to generate candidate route-proposals, which, composed with evaluation mechanisms (testing support, light availability, etc) and promotion routines (execute growth, latching, grasping, coiling—when evaluated as favourable) serve as every bit as much a heuristic local deliberative search mechanism as does natural selection.

These procedures are iterated as part of the overall growth control mechanism of the plant, to very successful effect (locating efficient shortcuts to regions of most light and other resources). Other plants, fungi and mostly-sedentary organisms exhibit similar behaviours which should be interpreted in similar algorithmic terms (perhaps especially commonly in roots and hyphae—fungal root-like structures—which are less obviously visible).

Another notable local-search deliberation executed by sedentary organisms concerns their dispersal of offspring and offshoots. Note that for such organisms, conditions as determined by physical location are paramount! It is typical of this lifestyle to invest in many candidate offspring and in a wide variety of dispersal mechanisms. Thus, just as genome is replicated with variation, so is location, and it is therefore part of a kind of meta-genome eligible to be selected on9. As a consequence of this control process, lineages of entirely sedentary organisms can be seen to rapidly reliably proceed toward physical niches to which they are fit. For some species with vegetative propagation it may be more plain that this process is carried out by a ‘single actor’10, but it results in the same kind of ‘hill climbing’ (or ‘shaded fertile valley seeking’ as the case may be), because it is essentially the same algorithm.

Why is plant deliberation ‘more efficient’ than natural selection?

Notably, these control operations, closely algorithmically related to iterated genetic natural selection, nevertheless operate much faster than genetic natural selection (they may take as little as days to manifest ‘large’ results). Why might this be?

The obvious way to ‘go faster’ in an iterated deliberation process is to have a shorter iteration cycle, performing more iterations per time. This can account for some of the difference but not all. The other obvious way to ‘do better’ is to evaluate more proposals per iteration. Notably, natural selection has high parallelism at its disposal and often evaluates many more proposals per step than do climbing plants.

It is also important to consider the dimension of the search space, which has important bearing on how tractable it is. The dimension is straightforward: genetic natural selection operates on a high-dimensional space, while plant climbing and locomotion operate on two or three dimensions, so it is simply easier for them to locate good solutions by generating local candidate proposals.

Notice, though, that lichen or mosses climbing trees over time can be seen to instantiate the same search, with similar parameters, in the same dimensions as larger more sophisticated climbing plants, but they achieve this much more slowly (but still fast by comparison to natural selection). So dimension and iteration speed do not appear to tell the whole story.

Two other important properties to pay attention to are suggested by the algorithmic breakdown described previously. First, relating to proposal, the fit of the sampling heuristics used in the deliberation. Second, relating to evaluation and promotion, the fidelity of these procedures to tracking the actual target.

The fit of the sampling heuristics is very important for tractability: genetic natural selection generates its proposals via something similar to symmetrical noise, leading to a lot of dead ends, wasted iterations, and even repeated or approximately-repeated computation. Moss-style migration up trees makes its proposals similarly noisily. In contrast, tendril growth of climbing plants is both decidedly biased (upward) and has an orientation, hence a ‘memory’, yielding a tendency not to double back on itself, at least locally (unless executing a coil, which is a different part of the algorithm). It is also able to use phototropism to further heuristically bias its search.11

These simple but highly-fit heuristic proposal-sampling biases, discovered by natural selection, the slow outer deliberator, may largely account for the evident efficiency advantage of climbing plants over less sophisticated passive climbers like moss and lichen.

Colonial organisms

Viewing insect colonies (for example, ants) as actors, we can identify all of the ingredients of deliberation. The pattern is, like with climbing plants, structurally similar to natural selection, but once again with fitter, more efficient, sampling heuristics.

Consider an ant colony foraging for food. The candidate proposals are the paths taken by individuals or small groups of workers. The evaluations begin with the guesses of those individuals (perhaps augmented by a message-passing algorithm) about the quality and quantity of any foods they come across. The properly promotive outcome involves laying down chemical signals and tactile interactions which encourage attractive or aversive behaviour by other workers (and adjust other behavioural parameters). As an iterated control procedure, this rapidly results in dense lines of workers ferrying the best and most plentiful food back to the nest. Note that no individual ant need be at all deliberative for this to work12.

This process, again similarly-structured and with similar dimension to the tendril-climbing algorithm of some plants, is able to proceed even faster. From an algorithmic point of view, this can be explained by the even fitter sampling heuristics: for example, individual ants have very strong senses of smell, which allows the foraging party’s search to be very heavily biased toward promising locations. This bias is more powerful even than phototropism exhibited by plants, because smell is nonlocal. Another factor is the resource efficiency of the state management: plant climbing algorithms record state and promote candidates by actual organism growth, while ant colonies execute these parts of the algorithm by leaving much less resource-intensive pheromone trails.

Properly deliberative artificial systems

The most obvious properly deliberative artificial systems are training and search algorithms inspired by natural selection, including population based training (PBT) and neuroevolution for neural networks, all of which employ an iterated proper deliberator as a control process to locate and promote complex configurations which are found to score well against some target.

Training and search routines often also employ similar but non iterated deliberation, for example static hyperparameter search and random seed search. (Sometimes these themselves end up iterated.)

Other times, the deployed artefact itself is the proper deliberator. Both value-based and policy-based RL controllers over discrete action spaces fall under the properly-deliberative Opt;Enact decomposition in a very basic way. One could make a similar case for a discrete classification model which opts for a single category at the last stage.

Cherry-picking from reactive generative models is an example of composing a Promote function (in this case human judgement!) with a reactive system to bootstrap a properly deliberative system. There are also automated examples of this type of composition and it is a very general formula for improving decision-making13.

Hard-coded vs learned deliberation

All of the abovementioned examples of deliberation in artificial systems have the deliberation structure (composing Propose, Promote and Act) coded in by the designers.

In the case of evolutionary algorithms and PBT, proposals take the form of some carefully chosen local ‘mutation’ heuristics and promotion is coded around some provided (perhaps approximate) evaluation or fitness function.
In contemporary non-hierarchical RL with discrete action spaces14, the proposals consist identically of fixed hard-coded analogues of the atomic action space, the promotion consists of a (usually learned) Evaluate, and Opt;Enact are hand-coded.
- Evaluate consists of either a (state-dependent) mapping from the atomic-action-analogues to estimated utility (expected return, or some similar action-quality estimate, as in Q learning), or to (state-dependent) level-of-intent to take the analogous atomic action (probability or logits, as in policy gradient methods), both of which have V⊆R.
- Opt translates these evaluations via hard-coded logic into a particular single choice of atomic-action-analogue.
- Enact actuates that analogue in simulation or the real world.

Risks from Learned Optimization as well as prior and subsequent discussions ask questions about when we might expect to find ‘optimizers’ arising without hard-coding them, and about what behaviours and capabilities we should expect from them. Since we have identified non-artificial examples of deliberation arising from the refinement or interactions between non-deliberative systems, it seems reasonable to look for analogies in our artificial systems. It remains unclear whether any contemporary artificial systems have learned deliberation.

Further work could consider the question of learned optimization from the deliberation lens, and subsequent posts in this sequence will discuss where deliberation comes from, as well as its manifestation at different levels of sophistication.

Parliaments

Perhaps representing the prototypical ‘deliberation’, some mechanisms of democratic parliaments illustrate deliberation algorithms at the level of multiple interacting humans.

Consider a vote on some motion, preceded by a debate. This is a proper, quasi-evaluative, optive deliberation.

Even if the motion is indivisible and singular, there are still at least two proposals: enact the motion, or do not enact the motion. In other cases there may be more proposals. These proposals are generated and surfaced by humans within the process by some means or other (which presumably involves its own deliberation!).

The debate, and the sentiments of the individuals participating, are a protracted evaluation and promotion mechanism over the proposals at hand.

The vote precipitates the choice, typically opting for some proposal, which is then enacted into some derived realisation by one or other organ of the democratic system. (Other possible outcomes to debates are state updates in the form of minds changed or documents written, and re-iterations with new or similar proposals, which is a more properly promotive than optive result.)

Parliaments rarely permanently dissolve themselves (though their actions can lead to their dissolution), so should be seen as iterated deliberators.

Often the parameters of a particular parliamentary session are set by some invoking organ, body, or individual, making a complex of deliberators which fit together in a larger democratic system.

As in previous cases, the power of a parliament lies partly in producing ‘good’ proposals, partly in recognising and promoting good proposals, and partly in the suitability of its delegate organs to appropriately enact its proposals. In the iterated sense, behaviours which precipitate improvements to these capacities are instrumental.

A note on these algorithmic assignments

One may rightly opine that an assignment of a particular algorithmic abstraction to a system, hiding the full and intricate detail of every moving piece, must involve some ultimately arbitrary or subjective choices. Perhaps this undermines the preceding discussion.

Indeed, no abstraction can work well in all circumstances, but abstractions are nevertheless inescapably necessary tools to make reasoning tractable in a world with bounded computational resources. If an ant colony were transported to the surface of the sun, there would rapidly be no sense in which it continued to carry out the previously-identified food-searching algorithm. In fact, the abstraction ‘ant’ would cease to be useful in the same instant. But if the same colony were transported to many locations on the surface of the Earth, or other relevantly-similar hypothetical places, the abstract algorithm identified above would continue to usefully predict outcomes, including counterfactuals (‘what if I put some food here?’, ‘what if I move some ants there?’ etc.), at least for some time, depending on the fitness of the colony’s algorithm’s abstractions and heuristics to the new environment.

‘Ant’ and ‘ant colony’ cease to be useful descriptions in the context of the surface of the sun, but remain defensible summary descriptions on many parts of the surface of the Earth. On the left a photograph of a magnification of the author’s garden. On the right an artist’s impression of the same experiment on the sun. (The author was regrettably unable to attend for the latter experiment.)

Conclusion

We’ve explored various deliberative systems (both proper and reactive), finding them at many levels of organisation, from cells to democracies. In several cases we uncovered novel insights, such as commonalities between natural selection, climbing plants, and ant colonies, and the deliberation abstraction further enabled us to point to relevant discrepancies determining the ‘strength’ of some of these systems. Suitability or fitness of Propose appears at least as important as that of Promote for deliberation strength in many cases, exemplified by the comparison of natural selection with gradient descent.

Various deliberative systems comprise interactions between multiple deliberators. These can be seen emerging from the interaction between relatively homogeneous subdeliberators (as in colonies and markets), or in deliberators giving rise to, or conditioning, other heterogeneous delegate deliberators’ behaviour (as in cellular and multicellular life and bureaucracies).

Later posts will go into more depth about where deliberations come from, and also examine some deliberators which are more sophisticated, either in how they arrange their delegation or how they perform the key components of deliberation. This includes some artificial systems as well as some of the more cognitively sophisticated and powerful behaviours of animals and humans.

Thanks to those who probed in conversation, and thanks to SERI for sponsoring the research time and making those conversations possible.

In some sense this resembles a training, validation, and test data split, and in a similar way it gives me some confidence that the concepts here have some explanatory power. The real test though is other people attempting to use the ideas or providing criticism.

I am not a chemist or physicist but based on my rough understanding of molecular thermodynamics and quantum mechanics, this might not quite be true, and it may be that there are in fact nigh-infinitesimal deliberations happening which ‘add up to’ what looks like smooth reactions, in a similar way to the right kind of infinitesimal natural selection adding up to gradient descent.

Bacterial ‘twitching’ may be a counterexample which involves basic deliberation; compare below discussion of plant tendrils

The iteration here actually comes from an outer weakly properly deliberative evaluative process which always proposes ‘step again’ and ‘stop’ and evaluates these against some measure (e.g. validation score, compute budget, some combination …). Usually ‘step again’ is promoted by default in a short-circuit loop to save compute. This outer deliberation is a controller which delegates most of its action to the single-step gradient descent reaction.

The heuristic is predicated on the extrinsic (human-derived) abstraction, empirically often applicable, that repeatedly moving, perhaps noisily, straight in the direction of local steepest descent can often effectively find non-straight paths to globally-low places.

Note that the ‘on average’ claim takes a stochastic ex ante model of fitness as a latent property which is stochastically realised in numbers of progeny. Natural selection can be alternatively understood as just the ex post tautology that ‘things which in fact propagate in fact propagate’.

It feels wrong to say this; I am in fact in awe of natural selection. Perhaps it is a sign of my own bias for inappropriate anthropomorphism that I feel the need to apologise for saying harsh things about an emotionless natural force.

Observe two fascinating short time-lapse montages of these literal ‘tree searches’ from Sir David Attenborough and the BBC at

Life: Plants: Climbing plants (https://www.bbc.co.uk/programmes/p005fptt)
The Private Life of Plants: Climbing Plants (https://www.bbc.co.uk/programmes/p00lx6cl)

Note that the Price equation does not care whether a characteristic is described by genes or some other information-carrying attribute

The ‘humongous fungus’, possibly simultaneously the largest-spanning and most massive ‘single organism’ on Earth, falls in this category

It is less clear whether climbing plants with tendril action have improved evaluation and promotion heuristics vs moss-style migration, but certainly in both types these heuristics track the target less noisily than does realised-fitness in natural selection.

That is not to say that an individual partaking in this collective algorithm can not be deliberative; a tribe of individually-deliberative humans could also execute this type of procedure. The deliberative algorithm in this case just happens to be located outside of any single organism.

In general, a good enough promoter can expect to get a benefit of a few standard deviations of quality from a proposer this way, though there are rapidly diminishing returns on extra compute spent.

For RL or other sequential control with continuous action spaces, either the atomic-action-analogues are discretised or sampled from a predefined distribution, in which case the same analysis applies, or the action analogues are sampled directly after a learned policy distribution, in which case we have a learned reaction. Hierarchical cases will be discussed in a later post.

Deliberation, Reactions, and Control

Oliver Sourbut — Mon, 27 Jun 2022 17:25:45 GMT

This analysis is speculative. The framing has been refined in conversation and private reflection and research. To some extent it feels vacuous, but at least valuable for further research and communication.

A cluster of questions fundamental to many concerns around risks from artificial systems regard the concepts of search, planning, and ‘deliberateness’. How do these arise? What can we predict about their occurrence and their consequences? How strong are they? What are they anyway?

Here is laid out one part of a conceptual decomposition which maps well onto many known systems and may allow further work towards answering more of those questions. The ambition is to really get at the heart of what is algorithmically happening in ‘optimising systems’, including humans, animals, algorithmic optimisers like SGD, and contemporary and future computational artefacts. That said, I do not have any privileged insight into the source code (or its proper interpretation!) for the examples discussed, so while this framing has already generated new insights for me, it may or may not be ‘the actual algorithmic truth’.

We start with the analysis: a definition of ‘deliberation’ and its components, then of ‘reactions’ and ‘control’. Next we consider, in light of these, what makes a deliberator or controller ‘good’. We find conceptual connections with discussions of instrumental convergence. Little attention is given here to how to determine what the goals are, which is obviously also important.

These concepts were generated by contemplating various aspects of many different goal-directed systems and pulling out commonalities.

Some readers may prefer to start with the examples, which include animals, plants, natural selection, gradient descent, bureaucracies, and others. Here in the conceptual section I’ll footnote particularly relevant concrete examples where I anticipate them helping to convey my point.

A full treatment is absent, but two major deferences to embedded agency underlie this analysis. A Cartesian separation need not be assumed, except over ‘actor-moments’ rather than temporally-extended ‘actors’. And a major driver for this sequence is a fundamental recognition that any goal-directed behaviour instantiated in the real world must have bounded computational capacity per time^[1].

Inspiration and related

My (very brief) ‘Only One Shot’ intuition pump for embedded agency may help to convey some background assumptions (especially regarding how time and actor-moments fit into this picture).

Scott Garrabrant’s (A→B)→A talks about ‘agency’ and ‘doing things on purpose’. I’m trying to unpack that further. The (open) question Does Agent-like Behaviour Imply Agent-like Architecture? is related and I hope for the perspective here to be useful toward answering that question.

Alex Flint’s excellent piece The ground of optimization informs some of the perspective here, especially a focus on scope of generalisation and robustness to perturbation.

Daniel Filan’s Bottle Caps Aren’t Optimisers and Abram Demski’s Selection vs Control begin discussing the algorithmic internals of optimising systems, which is the intent here also. Risks from Learned Optimization is of course relevant.

John Wentworth’s discussions of abstraction (for example What is Abstraction?) especially with regards its predictive properties for a non-omniscient computer, are central to the notion of abstraction employed here. Related are the good and gooder regulator theorems, which touch closely computationally upstream of the aspects discussed here while making fewer concessions to embedded agency.

Definitions

Deliberation

A deliberation is any part of a decision algorithm which composes the following

Propose: generate candidate proposals
Promote: promote and demote proposals according to some criterion
Act: take outcome of promotion and demotion to activity in the environment

taking place over one ‘moment’ (as determined by the particular algorithmic embodiment).

Some instances may fuse some of these components together.

Propose

A deliberation needs to be about something. Candidate proposals X do not come from nowhere; some state S is mapped to a distribution over nonempty proposal-collections (there must be at least one candidate proposal^[2]). Proposals correspond in some way to actions, but need not be actions or be one-to-one with them^[3] - more on this under Act.^[4]

Propose:S→Δ{X}nonempty

Note one pattern for achieving this is to sample one or more times from

ProposeOne:S→ΔX

and similarly taking the union of several proposal-collections could be a proposal-collection in some cases.

Some instantiations may have a fixed set of proposals, while others may be flexible.

Propose: robot considers proposals relating to the concepts of delivering ice cream or delivering grenade (NB this image conveys a substantial world model and proposals corresponding closely to plans of action, but this is merely one embodiment of the deliberation framework and not definitive)

Promote

Given a collection of proposals, a deliberative process produces some promotion/demotion weighting V for each proposal. This may also depend on state.^[5]

Promote:S→{X}→{V}

In many instantiations it makes sense to treat the weighting as a simple real scalar V=R^[6].

Many instantiations may essentially map an evaluation over the proposals

Evaluate:S→X→V

either serially or in parallel. This suggests ‘evaluative’ as a useful adjective to describe such processes.

Other instantiations may involve the whole collection of proposals, for example by a comparative sorting-like procedure without any intermediate evaluation.^[7]

Promote: robot evaluates proposals of throwing ice cream or grenade respectively, promoting ice cream and demoting grenade

Act

A decision algorithm eventually acts in its environment. In this analysis, that means taking promotion-weighted proposals and translating them, via some machinery, into activity A.

Act:{X×V}→A

In many cases proposals may correspond closely as precursors to particular actions or plans. Promotion may then correspond to preference over plans, in which case action may approximately decompose as Opt:{X×V}→X and Enact:X→A i.e. a selection of an action, plan, or policy, which is then carried out^[8]. Let’s tentatively call such systems ‘optive’. Enact may involve signalling or otherwise invoking other (reactive or proper) deliberations!

Act-optive: robot’s evaluation and promotion leads to opting to give ice cream, which is a precursor to subsequent motor enaction, giving ice cream. Meanwhile the robot may or may not be performing new deliberations.

More generally, proposals may correspond to configurations, and promotion to weighting-adjustment between those configurations, which play out as activity. We might call such systems ‘(properly) promotive’ in contrast to ‘optive’^[9].

Act-promotive: robot’s promotion and demotion of plans produces mostly ‘internal’ updates to its plans

Reactions

A reaction is any degenerate deliberation with identically one candidate proposal. In this case the Promote step is trivial or vestigial, and action is effectively fused with the other steps.

In some sense, allowing for degenerate cases of ‘deliberation’ means that this algorithmic abstraction applies to essentially everything (e.g. a rock is a reactive deliberator which always proposes ‘do the thing a rock would’).

The important thing is that this abstraction allows us to analyse and contrast non reactive ‘proper deliberators’ with a useful decomposition, as well as to identify and reason about more ‘interesting’ and less vacuous reactions^[10] (while still identifying them as such). e.g. ‘Where did this reaction come from?’, ‘What heuristics/abstractions is it (implicitly) using for proposals?’, ‘How might this reaction combine with other systems and could proper deliberation emerge?’, ‘What deliberation(s) does this reaction approximate?’.

Iterated deliberation is generalised control

A decision algorithm eventually produces activity in the environment. Any such activity directly or indirectly alters the relevant algorithmic state of the process, from trivial thermal noise up to and including termination (or modification to some other algorithmic form).

In many cases, the activity typically preserves the essential algorithmic form and it may be appropriate to analyse the algorithm as being recurrent or iterated, perhaps with an approximate Cartesian separation from its environment—the action has some component playing out in the world (which updates and provides new inputs) and some component folding into a privileged part of the world corresponding to the algorithm’s ‘internal state’. Indeed, in many sophisticated deliberators, the most powerful effect^[11] may be on the state or condition (or existence!) of subsequent deliberation(s) and actor-moments.

A system whose activity and situation in the environment preserve the essential outline of its deliberation algorithm (perhaps with state updates) de-facto invokes a re-iteration of the same (or related) deliberation procedure. This gives rise to iterated deliberation, which in this analysis is synonymous with control^[12].

A basic ‘reactive’ controller is a system which performs degenerate, reactive deliberation, but which is (essentially) preserved by its actions, giving rise to an iterated reaction, a relatively classic control system.

A (properly) deliberative controller is a deliberative system generating multiple proposals while being (essentially) preserved by its actions, giving rise to a (properly) deliberative control system.

Note that in this sense, what might be considered a single organism generally consists of multiple controllers (reactions or otherwise), and multiple organisms taken together may comprise one controller. Likewise an artificial actor may comprise multiple deliberative and reactive components, and a system of multiple artificial actors may compose to a single deliberative process. The relevant object of analysis is the algorithm.

Recursive deliberation is more general still

Just as the outcome of a deliberative actor-moment may result in zero (terminated) or one (iterated) invocations of relevantly-similar future actor-moments, there is no reason why there should not be a variable number sometimes more than one (homogeneous recursive).

We observe replicative examples throughout nature, as is to be expected in a world where natural selection sometimes works.

As well as invoking multiple copies of ‘itself’ (or relevantly-similar algorithms) through its actions, as in replication, a deliberator may invoke, create, or otherwise condition other heterogeneous deliberators, as in delegation (heterogeneous recursive). (In fact replication per se is usually not atomic, and goes via other intermediate processes.) Invocation or conditioning of heterogeneous deliberators is what humans and animals do when locomoting, for example^[13], and what some robotics applications do when delegating actuation to classical control mechanisms like servomotors. It’s also what we see in bureaucracies and colonies of various constructions and scales, from cells to ant nests to democracies.

What factors go into strong deliberation?

Quality of deliberation depends on the fitness^[14] of the abstractions and heuristics which the algorithmic components depend on, as well as the amount of compute budget consumed. These are the raw factors of deliberation.

So a deliberator whose Propose generators make better suggestions (relative to some goal or reference) is ceteris paribus stronger (for that goal or reference). Likewise a deliberator whose Promote algorithm more closely tracks reality relevant to its goal (is better fit), or a deliberator which can generate and evaluate more proposals with a given compute budget. We might expect good deliberators to ‘model’ the goal-relevant aspects of their environment as precursors to the state S used in proposal and promotion^[15].

No model is correct in all situations but some are more effective than others over a wider range of ‘natural’ situations. In subsequent posts this framing is used to examine some real examples, locating some relevant ways their abstractions are more or less fit, and some consequences of this. Importantly, quality of Propose generators is emphasised^[16].

Quantifying the magnitude of the abovementioned ceteris paribus deliberation improvements is important, especially in the context of multiple competing or collaborating deliberators, but this is not attempted in detail here.

Convergent instrumental goals and recursive deliberation

In light of the deliberation framing, and focusing on the potential for iterated or recursive deliberation, we can restate and perhaps sharpen the convergence of instrumental goals in these terms.

A deliberator is a single actor-moment, finitely capable and neither logically nor empirically omniscient.

For a deliberator oriented to directions or goals which can not reliably be definitively achieved in a single round of deliberation, or goals more generally for which greater success can be expected by subsequent refinement of action, proposals and actions which precipitate the existence and empowerment of similarly-oriented deliberative actor-moments can be expected ceteris paribus to achieve greater success.

In this framing, ‘self-preservation’ and ‘goal-content integrity’ are special cases of ‘pushing the future to contain relevantly-similarly-oriented deliberative actor-moments’^[17]. Other instantiations of this include ‘replicating’. The prototypical cases from nature are systems which do one or both of persisting in relevantly algorithmically similar form (e.g. organisms having nontrivial lifespans or enzymes and catalysts being untransformed), and replicating or reproducing relevantly algorithmically similar forms (e.g. genetic elements and simpler autocatalysts directly or indirectly invoking copies to be made, or organisms producing offspring). As noted previously we also see cases of delegation to heterogeneous deliberators either by conditioning or wholesale creation.

Now, once we recognise a class of deliberator which does well by (at least sometimes) invoking relevantly-similarly-oriented deliberative future actor-moments (whether by persistence, replication, or other delegation), all things equal, the same deliberator would do better to invoke better such future actor-moments^[18]. So how can they be better? In all the same ways as identified already!

Hence if ‘self’ is the baseline for persistence or replication, actions which induce future deliberations with better expected fitness-to-goal are better, other things equal. This can span the range of

‘self state updates’ tracking locally-relevant information to improve deliberation (memory)
improving Propose and Promote heuristics more broadly (learning, exploring, play, experimentation)
more direct ‘self’ modification including investing more computation into deliberation (cognitive enhancement)
improving Act efficacy by tuning existing delegation templates or transforming or aligning resources into new and improved (in the preceding ways) copies or delegates (resource acquisition and technological improvement)

Whether to expect a deliberator to discover and/or be capable of acting on any of these kinds of improvements is a separate matter, though evidently any system able to reason similarly to a human is capable of at least in principle apprehending them.

Immediate takeaways

We broke down ‘deliberation’ into algorithms for ‘proposal’, ‘promotion’ (which may be ‘evaluative’ or ‘sortive’), and ‘action’ (which may be ‘optive’ or ‘(properly) promotive’). Combined with the actor-moment framing, we discussed cases of iterated or recursive deliberation. This framing allows us to discuss commonalities and differences between systems and perhaps make more reliable predictions about their counterfactual behaviour and emergence.

Pointing at different parts of this decomposition, we identified various axes along which a deliberator can be improved, additionally rederiving and refining instrumental convergence for iterated/recursive deliberators.

All this is quite abstract and subsequent posts will make it more concrete with examples, starting with relatively simple deliberators, illustrating some more applications of this framing to generating new insights.

Credit to Peter Barnett, Tamera Lanham, Ian McKenzie, John Wentworth, Beth Barnes, Ruby Bloom, Claudia Shi, and Mathieu Putz for useful conversations prompting and refining these ideas. Thanks also to SERI for sponsoring my present collocation with these creative and helpful people!

↩︎
So algorithms can not happen ‘all at once’ and no process can be logically or empirically omniscient, nor can it, in particular, draw ‘conclusions’, generate abstractions, or otherwise be imbued with information for which there is not (yet) evidence.
↩︎
Alternatively, a lack of proposals could be considered a termination of the process.
↩︎
In principle, the computation performing proposal might have side-effects—on its own produce meaningful outcomes or ‘actions’ (including ‘state updates’) - but for now this possibility is elided and effects are considered to happen as part of Act.
↩︎
Type notation.Δ can be read ‘distribution over’, that is, a generative process which can be sampled from, not necessarily anything representing a probability distribution per se.{A} means some collection of A. All types are intended to be read as pure, mathematical, or functional—i.e. side-effect free—rather than imperative.
↩︎
NB type notation here uses currying. The resulting V collection is also implicitly dependently-shaped to match the proposals one to one. We might write something like Promote:S→shape→{X}shape→{V}shape
↩︎
Possibly a central example of not mapping to a real scalar might be a committee where not only are ‘votes’ behaviour-relevant, but the particulars of who on the committee voted which way.
↩︎
We might call this ‘sortive’ in contrast to ‘evaluative’ but I am not entirely sold on the usefulness of this terminology
↩︎
Consider an example where the promotions are one or other of value estimates, logits, or probabilities over an action space in RL.
↩︎
Natural selection is a canonical properly promotive example.
It might be most obvious to think about the ‘properly promotive’ as a relaxation of the ‘optive’ case. In the optive case, we had some proposals, and depending on the promotion scores for them, we selected exactly one proposal to carry forward. But more generally we might record or apply the promotion decisions in some way, adjusting behaviour, while retaining a representation of a spectrum of proposal-promotion state going forward to subsequent decisions. Going back to natural selection, except in degenerate cases, there is some weighted configuration space of ‘proposals’ (~genomes or gene-complexes) at any time, which is adjusted by the process. But rarely does it outright opt exclusively for a particular proposal. It seems plausible to me that this proper promotive action is what more sophisticated deliberators do in many cases, also.
Alternatively, consider that the distinction between ‘opting’ for one proposal and ‘weighting’ multiple configurations on promotion is more of degree than kind: ‘opting’ for exactly one proposal is just a degenerate special case where all other alternatives are effectively discarded (equivalent to some nominal zero weighting).
↩︎
Like SGD and biochemical catalysis
↩︎
Powerful in the sense of counterfactually most strongly pushing in a goal-directed fashion. It is difficult to bottom-out these concepts!
↩︎
Terminology note: the conceptually important thing is the iterated deliberation.
Ruby objected to the use of ‘control system’ because it evokes classical control theory, which is a more constrained notion. I think that’s right, but actually evoking classical control theory is part of the reason for choosing ‘control’: classical control systems are a useful subset of reactive controllers in this analysis (though not the only controllers)! Classical control literature already uses the words ‘controller’, ‘regulator’, and ‘governor’. Alternative terms if we seek to distinguish might be ‘navigator/navigation’ or ‘director/direction’. We might prefer ‘generalised controller/control’. I eschew ‘agent’ as it carries far too many connotations. I try to use ‘controller’ to evoke classical and broader notions of sequential control, without necessarily meaning a specific theorisation as a classical open- or closed-loop control system.
↩︎
High level intentions condition lower-level routing and footfall/handhold (etc) placement, which condition microscopic atomic motor actions.
↩︎
Fitness in the original, nonbiological sense of being better-fit i.e. appropriateness to situation or context. For a heuristic this means how much it gets things right, with respect to a particular context and reference.
↩︎
The good regulator and gooder regulator theorems make a Cartesian separation between actor (‘regulator’) and environment (‘system’), and set aside computational and logical constraints. Under these conditions, those theorems tell us that the best possible regulator is equivalent to a system which perfectly models the goal-relevant aspects of its environment and acts in accordance with that model.
For a deliberator to satisfy this, its Propose would need to at least once with certainty generate the best possible proposal (relative to the goal and observed information), and its Promote would need to reliably discern and promote this optimal proposal. In the more general deliberator setting, it is less clear how to trade off computational limitations and there is (as yet) no ‘good deliberator’ theorem, but this does not prevent us from reasoning about ceteris paribus or Pareto improvements to those tradeoffs.
↩︎
I have found many discussions to typically overemphasise what corresponds here to Promote but it should become clear here and in later posts why I think this is missing an important part of the story.
↩︎
It is still useful to distinguish self-preservation from goal-content integrity in cases when there is a good decomposition including a privileged ‘goal containing’ component or components of the system, reasonably decoupled from capabilities.
↩︎
This might not be as tautological as it seems if there are fundamentally insuperable challenges to robust delegation.

Breaking Down Goal-Directed Behaviour

Oliver Sourbut — Thu, 16 Jun 2022 18:45:11 GMT

When we speak about entities ‘wanting’ things, or having ‘goal-directed behaviour’, what do we mean?

Because most of the actors that we (my human readers and I) attentively interact frequently with are (presumably) computationally similar (to each other and to ourselves), it is easy for our abstractions to conflate phenomena which are in fact different^[1], or to couple together impressions of phenomena which are in fact separable^[2]. On the occasions that we attentively observe dissimilar actors^[3], we are often-enough ‘in the wild’ (i.e. looking in their ‘habitat’) that conflating—anthropomorphising—is sufficiently usefully predictive to get by^[4], so these poor abstractions are insufficiently challenged.

Further, most of the people who in fact attentively observe a particular class of human-dissimilar actors (enough to perceive and understand the non-conflations) are focused domain experts about that particular class, and talk mainly to other domain experts about them^[5]. There are few venues in which it is useful to have unambiguous terminology here. Thus, the language we use to communicate our abstractions about goal-directed behaviour is prone to conflation and confusion even when some people have some of the right abstractions^[6].

Here I aim to take steps to break down ‘goal-directed behaviour’ into a conceptual framework of computational abstractions for which I offer tentative terminology, and which helps me to better understand and describe analogies and disanalogies between various goal-directed systems. The overarching motivation is to better understand goal-directed behaviour, in the sense of being able to better predict its (especially counterfactual and off-distribution) implications, its arisal, and other properties. Hopefully it is clear why I consider this worthwhile.

In order to ground this discussion, I refer to a reasonably diverse menagerie of candidate goal-directed systems, including natural and artificial systems at various levels of organisation. Contemplation of this diverse collection was responsible for the ideation and refinement of the ideas and gives some confidence in the appropriateness of the abstractions.

A collection of a few ‘agents’ drawn from the menagerie

↩︎
Different in the sense that, even if the observed surface phenomena are similar ‘in the wild’, their behaviour in different contexts might radically come apart; that is, a conflation is a poor predictor for out-of-distribution behaviour.
↩︎
Separable meaning that the absence of one or other piece is a meaningful, conceivable state of affairs, even if in practice they are almost always found composed together; that is, a coupled abstraction means if we have an impression of the one, we (perhaps incorrectly) assume presence of the other(s).
↩︎
That is, dissimilar from humans and from each other e.g. ants, genes, chess-engines, learning algorithms, corporations…
↩︎
If it is not clear why ‘in the wild’ (or ‘in the typical setting’) is important for the predictiveness of the anthropomorphism-heuristic, hopefully Deliberation and Reflexes and Where does Deliberation Come From? will clarify. In short, an actor fit for a particular setting can carry out ‘deliberate-looking’ behaviours without ‘deliberative machinery’, because the process which generated the actor provides enough (slow, gradual) ‘deliberation’ to locate such behaviours and bake them into ‘reflexes’.
↩︎
e.g. entomologists, ornithologists, AI researchers, business executives, economists… are not often in a room together, at least not mutually-knowingly in their capacity as said experts of their respective fields
↩︎
I attempt to avoid use of the term ‘agent’ as it is a very loaded term which carries many connotations. In fact it is unfortunately a perfect exemplary linguistic victim of the abstractive conflation and coupling phenomena I have described. (I think the recent reception of the Gato paper was confused in part as a consequence of this.) I substitute ‘actor’ and ‘controller’ more freely as less loaded terms.