Feed URL:

Mon, 19 Feb 2018 15:41:03 EST

A New Framework

(Thanks to Valentine for a discussion leading to this post, and thanks to CFAR for running the CFAR-MIRI cross-fertilization workshop. Val provided feedback on a version of this post. Warning: fairly long.)

Eliezer's A Technical Explanation of Technical Explanation, and moreover the sequences as a whole, used the best technical understanding of practical epistemology available at the time* -- the Bayesian account -- to address the question of how humans can try to arrive at better beliefs in practice. The sequences also pointed out several holes in this understanding, mainly having to do with logical uncertainty and reflective consistency.

MIRI's research program has since then made major progress on logical uncertainty. The new understanding of epistemology -- the theory of logical induction -- generalizes the Bayesian account by eliminating the assumption of logical omniscience. Bayesian belief updates are recovered as a special case, but the dynamics of belief change are non-Bayesian in general. While it might not turn out to be the last word on the problem of logical uncertainty, it has a large number of desirable properties, and solves many problems in a unified and relatively clean framework.

It seems worth asking what consequences this theory has for practical rationality. Can we say new things about what good reasoning looks like in humans, and how to avoid pitfalls of reasoning?

First, I'll give a shallow overview of logical induction and possible implications for practical epistemic rationality. Then, I'll focus on the particular question of A Technical Explanation of Technical Explanation (which I'll abbreviate TEOTE from now on). Put in CFAR terminology, I'm seeking a gears-level understanding of gears-level understanding. I focus on the intuitions, with only a minimal account of how logical induction helps make that picture work.

Logical Induction

There are a number of difficulties in applying Bayesian uncertainty to logic. No computable probability distribution can give non-zero measure to the logical tautologies, since you can't bound the amount of time you need to think to check whether something is a tautology, so updating on provable sentences always means updating on a set of measure zero. This leads to convergence problems, although there's been recent progress on that front.

Put another way: Logical consequence is deterministic, but due to Gödel's first incompleteness theorem, it is like a stochastic variable in that there is no computable procedure which correctly decides whether something is a logical consequence. This means that any computable probability distribution has infinite Bayes loss on the question of logical consequence. Yet, because the question is actually deterministic, we know how to point in the direction of better distributions by doing more and more consistency checking. This puts us in a puzzling situation where we want to improve the Bayesian probability distribution by doing a kind of non-Bayesian update. This was the two-update problem.

You can think of logical induction as supporting a set of hypotheses which are about ways to shift beliefs as you think longer, rather than fixed probability distributions which can only shift in response to evidence.

This introduces a new problem: how can you score a hypothesis if it keeps shifting around its beliefs? As TEOTE emphasises, Bayesians outlaw this kind of belief shift for a reason: requiring predictions to be made in advance eliminates hindsight bias. (More on this later.) So long as you understand exactly what a hypothesis predicts and what it does not predict, you can evaluate its Bayes score and its prior complexity penalty and rank it objectively. How do you do this if you don't know all the consequences of a belief, and the belief itself makes shifting claims about what those consequences are?

The logical-induction solution is: set up a prediction market. A hypothesis only gets credit for contributing to collective knowledge by moving the market in the right direction early. If the market's odds on prime numbers are currently worse than those which the prime number theorem can provide, a hypothesis can make money by making bets in that direction. If the market has already converged to those beliefs, though, a hypothesis can't make any more money by expressing such beliefs -- so it doesn't get any credit for doing so. If the market has moved on to even more accurate rules of thumb, a trader would only lose money by moving beliefs back in the direction of the prime number theorem.

Mathematical Understanding

This provides a framework in which we can make sense of mathematical labor. For example, a common occurrence in combinatorics is that there is a sequence which we can calculate, such as the catalan numbers, by directly counting the number of objects of some specific type. This sequence is boggled at like data in a scientific experiment. Different patterns in the sequence are observed, and hypotheses for the continuation of these patterns are proposed and tested. Often, a significant goal is the construction of a closed form expression for the sequence.

This looks just like Bayesian empiricism, except for the fact that we already have a hypothesis which entirely explains the observations. The sequence is constructed from a definition which mathematicians made up, and which thus assigns 100% probability to the observed data. What's going on? It is possible to partially explain this kind of thing in a Bayesian framework by acting as if the true formula were unknown and we were trying to guess where the sequence came from, but this doesn't explain everything, such as why finding a closed form expression would be important.

Logical induction explains this by pointing out how different time-scales are involved. Even if all elements of the sequence are calculable, a new hypothesis can get credit for calculating them faster than the brute-force method. Anything which allows one to produce correct answers faster contributes to the efficiency of the prediction market inside the logical inductor, and thus, to the overall mathematical understanding of a subject. This cleans up the issue nicely.

What other epistemic phenomena can we now understand better?

Lessons for Aspiring Rationalists

Many of these could benefit from a whole post of their own, but here's some fast-and-loose corrections to Bayesian epistemology which may be useful:

  • Hypotheses need not make predictions about everything. Because hypotheses are about how to adjust your odds as you think longer, they can leave most sentences alone and focus on a narrow domain of expertise. Everyone was already doing this in practice, but the math of Bayesian probability theory requires each hypothesis to make a prediction about every observation, if you actually look at it. Allowing a hypothesis to remain silent on some issues in standard Bayesianism can cause problems: if you're not careful, a hypothesis can avoid falsification by remaining silent, so you end up incentivising hypotheses to remain mostly silent (and you fail to learn as a result). Prediction markets are one way to solve this problem.
  • Hypotheses buy and sell at the current price, so they take a hit for leaving a now-unpopular position which they initially supported (but less of a hit than if they'd stuck with it) or coming in late to a position of growing popularity. Other stock-market type dynamics can occur.
  • Hypotheses can be like object-level beliefs or meta-level beliefs: you can have a hypothesis about how you're overconfident, which gets credit for smoothing your probabilities (if this improves things on average). This allows you to take into account beliefs about your calibration without getting too confused about Hofstadter's-law type paradoxes.

You may want to be a bit careful and Chesterton-fence existing Bayescraft, though, because some things are still better about the Bayesian setting. I mentioned earlier that Bayesians don't have to worry so much about hindsight bias. This is closely related to the problem of old evidence.

Old Evidence

Suppose a new scientific hypothesis, such as general relativity, explains a well-know observation such as the perihelion precession of mercury better than any existing theory. Intuitively, this is a point in favor of the new theory. However, the probability for the well-known observation was already at 100%. How can a previously-known statement provide new support for the hypothesis, as if we are re-updating on evidence we've already updated on long ago? This is known as the problem of old evidence, and is usually levelled as a charge against Bayesian epistemology. However, in some sense, the situation is worse for logical induction.

A Bayesian who endorses Solomonoff induction can tell the following story: Solomonoff induction is the right theory of epistemology, but we can only approximate it, because it is uncomputable. We approximate it by searching for hypotheses, and computing their posterior probability retroactively when we find new ones. It only makes sense that when we find a new hypothesis, we calculate its posterior probability by multiplying its prior probability (based on its description length) by the probability it assigns to all evidence so far. That's Bayes' Law! The fact that we already knew the evidence is not relevant, since our approximation didn't previously include this hypothesis.

Logical induction speaks against this way of thinking. The hypothetical Solomonoff induction advocate is assuming one way of approximating Bayesian reasoning via finite computing power. Logical induction can be thought of as a different (more rigorous) story about how to approximate intractible mathematical structures. In this new way, propositions are bought or sold at market prices at the time. If a new hypothesis is discovered, it can't be given any credit for 'predicting' old information. The price of known evidence is already at maximum -- you can't gain any money by investing in it.

There are good reasons to ignore old evidence, especially if the old evidence has biased your search for new hypotheses. Nonetheless, it doesn't seem right to totally rule out this sort of update.

I'm still a bit puzzled by this, but I think the situation is improved by understanding gears-level reasoning. So, let's move on to the discussion of TEOTE.

Gears of Gears

As Valentine noted in his article, it is somewhat frustrating how the overall idea of gears-level understanding seems so clear while remaining only heuristic in definition. It's a sign of a ripe philosophical puzzle. If you don't feel you have a good intuitive grasp of what I mean by "gears level understanding", I suggest reading his post.

Valentine gives three tests which point in the direction of the right concept:

  1. Does the model pay rent? If it does, and if it were falsified, how much (and how precisely) could you infer other things from the falsification?
  2. How incoherent is it to imagine that the model is accurate but that a given variable could be different?
  3. If you knew the model were accurate but you were to forget the value of one variable, could you rederive it?

I already named one near-synonym for "gears", namely "technical explanation". Two more are "inside view" and Elon Musk's notion of reasoning from first principles. The implication is supposed to be that gears-level understanding is in some sense better than other sorts of knowledge, but this is decidedly not supposed to be valued to the exclusion of other sorts of knowledge. Inside-view reasoning is traditionally supposed to be combined with outside-view reasoning (although Elon Musk calls it "reasoning by analogy" and considers it inferior, and much of Eliezer's recent writing warns of its dangers as well, while allowing for its application to special cases). I suggested the terms gears-level & policy-level in a previous post (which I actually wrote after most of this one).

Although TEOTE gets close to answering Valentine's question, it doesn't quite hit the mark. The definition of "technical explanation" provided there is a theory which strongly concentrates the probability mass on specific predictions and rules out others. It's clear that a model can do this without being "gears". For example, my model might be that whatever prediction the Great Master makes will come true. The Great Master can make very detailed predictions, but I don't know how they're generated. I lack the understanding associated with the predictive power. I might have a strong outside-view reason to trust the Great Master: their track record on predictions is immaculate, their Bayes-loss miniscule, their calibration supreme. Yet, I lack an inside-view account. I can't derive their predictions from first principles.

Here, I'm siding with David Deutsch's account in the first chapter of The Fabric of Reality. He argues that understanding and predictive capability are distinct, and that understanding is about having good explanations. I may not accept his whole critique of Bayesianism, but that much of his view seems right to me. Unfortunately, he doesn't give a technical account of what "explanation" and "understanding" could be.

First Attempt: Deterministic Predictions

TEOTE spends a good chunk of time on the issue of making predictions in advance. According to TEOTE, this is a human solution to a human problem: you make predictions in advance so that you can't make up what predictions you could have made after the fact. This counters hindsight bias. An ideal Bayesian reasoner, on the other hand, would never be tempted into hindsight bias in the first place, and is free to evaluate hypotheses on old evidence (as already discussed).

So, is gears-level reasoning just pure Bayesian reasoning, in which hypotheses have strictly defined probabilities which don't depend on anything else? Is outside-view reasoning the thing logical induction adds, by allowing the beliefs of a hypothesis to shift over time and to depend on on the wider market state?

This isn't quite right. An ideal Bayesian can still learn to trust the Great Master, based on the reliability of the Great Master's predictions. Unlike a human (and unlike a logical inductor), the Bayesian will at all times have in mind all the possible ways the Great Master's predictions could have become so accurate. This is because a Bayesian hypothesis contains a full joint distribution on all events, and an ideal Bayesian reasons about all hypotheses at all times. In this sense, the Bayesian always operates from an inside view -- it cannot trust the Great Master without a hypothesis which correlates the Great Master with the world.

However, it is possible that this correlation is introduced in a very simple way, by ruling out cases where the Great Master and reality disagree without providing any mechanism explaining how this is the case. This may have low prior probability, but gain prominence due to the hit in Bayes-score other hypotheses are taking for not taking advantage of this correlation. It's not a bad outcome given the epistemic situation, but it's not gears-level reasoning, either. So, being fully Bayesian or not isn't exactly what distinguishes whether advanced predictions are needed. What is it?

I suggest it's this: whether the hypothesis is well-defined, such that anyone can say what predictions it makes without extra information. In his post on gears, Valentine mentions the importance of "how deterministically interconnected the variables of the model are". I'm pointing at something close, but importantly distinct: how deterministic the predictions are. You know that a coin is very close to equally likely to land on heads or tails, and from this you can (if you know a little combinatorics) compute things like the probability of getting exactly three heads if you flip the coin five times. Anyone with the same knowledge would compute the same thing. The model includes probabilities inside it, but how those probabilities flow is perfectly deterministic.

This is a notion of objectivity: a wide variety of people can agree on what probability the model assigns, despite otherwise varied background knowledge.

If a model is well-defined in this way, it is very easy (Bayesian or no) to avoid hindsight bias. You cannot argue about how you could have predicted some result. Anyone can sit down and calculate.

The hypothesis that the Great Master is always correct, on the other hand, does not have this property. Nobody but the Great Master can say what that hypothesis predicts. If I know what the Great Master says about a particular thing, I can evaluate the accuracy of the hypothesis; but, this is special knowledge which I need in order to give the probabilities.

The Bayesian hypothesis which simply forces statements of the Great Master to correlate with the world is somewhat more gears-y, in that there's a probability distribution which can be written down. However, this probability distribution is a complicated mish-mosh of the Bayesian's other hypotheses. So, predicting what it would say requires extensive knowledge of the private beliefs of the Bayesian agent involved. This is typical of the category of non-gears-y models.

Objection: Doctrines

Infortunately, this account doesn't totally satisfy what Valentine wants.

Suppose that, rather than making announcements on the fly, the Great Master has published a set of fixed Doctrines which his adherents memorize. As in the previous thought experiment, the word of the Great Master is infallible; the application of the Doctrines always leads to correct predictions. However, the contents of the Doctrines appears to be a large mish-mosh of rules with no unifying theme. Despite their apparent correctness, they fail to provide any understanding. It is as if a physicist took all the equations in a physics text, transformed them into tables of numbers, and then transported those tables to the middle ages with explanations of how to use the tables (but none of where they come from). Though the tables work, they are opaque; there is no insight as to how they were determined.

The Doctrines are a deterministic tool for making predictions. Yet, they do not seem to be a gears-level model. Going back to Valentine's three tests, the Doctrines fail test three: we could erase any one of the Doctrines and we'd be unable to rederive it by how it fit together with the rest. Hence, the Doctrines have almost as much of a "trust the Great Master" quality as listening to the Great Master directly -- the disciples would not be able to derive the Doctrines for themselves.

Second Attempt: Proofs, Axioms, & Two Levels of Gears

My next proposal is that having a gears-level model is like knowing the proof. You might believe a mathematical statement because you saw it in a textbook, or because you have a strong mathematical intuition which says it must be true. But, you don't have the gears until you can prove it.

This subsumes the "deterministic predictions" picture: a model is an axiomatic system. If we know all the axioms, then we can in theory produce all the predictions ourselves. (Thinking of it this way introduces a new possibility, that the model may be well-defined but we may be unable to find the proofs, due to our own limitations.) On the other hand, we don't have access to the axioms of the theory embodied by the Great Master, and so we have no hope of seeing the proofs; we can only observe that the Great Master is always right.

How does this help with the example of the Doctrines?

The concept of "axioms" is somewhat slippery. There are many equivalent ways of axiomatizing any given theory. We can often flip views between what's taken as an axiom vs what's proved as a theorem. However, the most elegant set of axioms tends to be preferred.

So, we can regard the Doctrines as one long set of axioms. If we look at them that way, then adherents of the Great Master have a gears-level understanding of the Doctrines if they can successfully apply them as instructed.

However, the Doctrines are not an elegant set of axioms. So, viewing them in this way is very unnatural. It is more natural to see them as a set of assertions which the Great Master has produced by some axioms unknown to us. In this respect, we "can't see the proofs".

In the same way, we can consider flipping any model between the axiom view and the theorem view. Regarding the model as axiomatic, to determine whether it is gears-level we only ask whether its predictions are well-defined. Regarding in in "theorem view", we ask if we know how the model itself was derived.

Hence, two of Valentine's desirable properties of a gears-level model can be understood as the same property applied at different levels:

  • Determinism, which is Val's property #2, follows from requiring that we can see the derivations within the model.
  • Reconstructability, Val's property #3, follows from requiring that we can see the derivation of the model.

We might call the first level of gears "made out of gears", and the second level "made by gears" -- the model itself being constructed via a known mechanism.

If we change our view so that a scientific theory is a "theorem", what are the "axioms"? Well, there are many criteria which are applied to scientific theories in different domains. These criteria could be thought of as pre-theories or meta-theories. They encode the hard-won wisdom of a field of study, telling us what theories are likely to work or fail in that field. But, a very basic axiom is: we want a theory to be the simplest theory consistent with all observations. The Great Master's Doctrines cannot possibly survive this test.

To give a less silly example: if we train up a big neural network to solve a machine learning problem, the predictions made by the model are deterministic, predictable from the network weights. However, someone else who knew all the principles by which the network was created would nonetheless train up a very different neural network -- unless they use the very same gradient descent algorithm, data, initial weights, and number and size of layers.

Even if they're the same in all those details, and so reconstruct the same neural network exactly, there's a significant sense in which they can't see how the conclusion follows inevitably from the initial conditions. It's less doctrine-y than being handed a neural network, but it's more doctrine-y than understanding the structure of the problem and why almost any neural network achieving good performance on the task will have certain structures. Remember what I said about mathematical understanding. There's always another level of "being able to see why" you can ask for. Being able to reproduce the proof is different from being able to explain why the proof has to be the way it is.

Exact Statement?

Gears-y ness is a matter of degree, and there are several interconnected things we can point at, and a slippage of levels of analysis which makes everything quite complicated.

In the ontology of math/logic, we can point at whether you can see the proof of a theorem. There are several slippages which make this fuzzier than it may seem. First: do you derive it only form the axioms, or do you use commonly known theorems and equivalences (which you may or may not be able to prove if put on the spot)? There's a long continuum between what one mathematician might say to another as proof and a formal derivation in logic. Second: how well can you see why the proof has to be? This is the spectrum between following each proof step individually (but seeing them as almost a random walk) vs seeing the proof as an elementary application of a well-known technique. Third: we can start slipping the axioms. There are small changes to the axioms, in which one thing goes from being an axiom to a theorem and another thing makes the opposite transition. There are also large changes, like formalizing number theory via the Peano axioms vs formalizing it in set theory, where the entire description language changes. You need to translate from statements of number theory to statements of set theory. Also, there is a natural ambiguity between taking something as an axiom vs requiring it as a condition in a theorem.

In the ontology of computation, we can point at knowing the output of a machine vs being able to run it by hand to show the output. This is a little less flexible than the concept of mathematical proof, but essentially the same distinction. Changing the axioms is like translating the same algorithm to a different computational formalism, like going between Turing machines and lambda calculus. Also, there is a natural ambiguity between a program vs an input: when you run program XYZ with input ABC on a universal Turing machine, you input XYZABC to the universal turing machine; but, you can also think of this as running program XY on input ZABC, or XYZA on input BC, et cetera.

In the ontology of ontology, we could say "can you see why this has to be, from the structure of the ontology describing things?" "Ontology" is less precise than the previous two concepts, but it's clearly the same idea. A different ontology doesn't necessarily support the same conclusions, just like different axioms don't necessarily give the same theorems. However, the reductionist paradigm holds that the ontologies we use should all be consistent with one another (under some translation between the ontologies). At least, aspire to be eventually consistent. Analogous to axiom/assumption ambiguity and program/input ambiguity, there is ambiguity between an ontology and the cognitive structure which created and justifies the ontology. We can also distinguish more levels; maybe we would say that an ontology doesn't make predictions directly, but provides a language for stating models, which make predictions. Even longer chains can make sense, but it's all subjective divisions. However, unlike the situation in logic and computation, we can't expect to articulate the full support structure for an ontology; it is, after all, a big mess of evolved neural mechanisms which we don't have direct access to.

Having established that we can talk about the same things in all three settings, I'll restrict myself to talking about ontologies.

Two-level definition of gears: A conclusion is gears-like with respect to a particular ontology to the extent that you can "see the derivation" in that ontology. A conclusion is gears-like without qualification to the extent that you can also "see the derivation" of the ontology itself. This is contiguous with gears-ness relative to an ontology, because of the natural ambiguity between programs and their inputs, or between axioms and assumptions. For a given example, though, it's generally more intuitive to deal with the two levels separately.

Seeing the derivation: There are several things to point at by this phrase.

  • As in TEOTE, we might consider it important that a model make precise predictions. This could be seen as a prerequisite of "seeing the derivation": first, we must be saying something specific; then, we can ask if we can say why we're saying that particular thing. This implies that models are more gears-like when they are more deterministic, all other things being equal.
  • However, I think it is also meaningful and useful to talk about whether the predictions of the model are deterministic; the standard way of assigning probabilities to dice is very gears-like, despite placing wide probabilities. I think these are simply two different important things we can talk about.
  • Either way, being able to see the derivation is like being able to see the proof or execute the program, with all the slippages this implies. You see the derivation less well to the extent that you rely on known theorems, and more to the extent that you can spell out all the details yourself if need be. You see it less well to the extent that you understand the proof only step-by-step, and more well to the extent that you can derive the proof as a natural application of known principles. You cannot see the derivation if you don't even have access to the program which generated the output, or are missing some important inputs for that program.

Seeing the derivation is about explicitness and external objectivity. You can trivially "execute the program" generating any of your thoughts, in that you thinking is the program which generated the thoughts. However, the execution of this program could rely on arbitrary details of your cognition. Moreover, these details are usually not available for conscious access, which means you can't explain the train of thought to others, and even you may not be able to replicate it later. So, a model is more gears-like the more replicable it is. I'm not sure if this should be seen as an additional requirement, or an explanation of where the requirements come from.

Conclusion, Further Directions

Obviously, we only touched the tip of the iceberg here. I started the post with the claim that I was trying to hash out the implications of logical induction for practical rationality, but secretly, the post was about things which logical inductors can only barely begin to explain. (I think these two directions support each other, though!)

We need the framework of logical induction to understand some things here, such as how you still have degrees of understanding when you already have the proof / already have a program which predicts things perfectly (as discussed in the "mathematical understanding" section). However, logical inductors don't look like they care about "gears" -- it's not very close to the formalism, in the way that TEOTE gave a notion of technical explanation which is close to the formalism of probability theory.

I mentioned earlier that logical induction suffers from the old evidence problem more than Bayesianism. However, it doesn't suffer in the sense of losing bets it could be winning. Rather, we suffer, when we try to wrap our heads around what's going on. Somehow, logical induction is learning to do the right thing -- the formalism is just not very explicit about how it does this.

The idea (due to Sam Eisenstat, hopefully not butchered by me here) is that logical inductors get around the old evidence problem by learning notions of objectivity.

A hypothesis you come up with later can't gain any credibility by fitting evidence from the past. However, if you register a prediction ahead of time that a particular hypothesis-generation process will eventually turn up something which fits the old evidence, you can get credit, and use this credit to bet on what the hypothesis claims will happen later. You're betting on a particular school of thought, rather than a known hypothesis. "You can't make money by predicting old evidence, but you may be able to find a benefactor who takes it seriously."

In order to do this, you need to specify a precise prediction-generation process which you are betting in favor of. For example, Solomonoff Induction can't run as a trader, because it is not computable. However, the probabilities which it generates are well-defined (if you believe that halting bits are well-defined, anyway), so you can make a business of betting that its probabilities will have been good in hindsight. If this business does well, then the whole market of the logical inductor will shift toward trying to make predictions which Solomonoff Induction will later endorse.

Similarly for other ideas which you might be able to specify precisely without being able to run right away. For example, you can't find all the proofs right away, but you could bet that all the theorems which the logical inductor observes have proofs, and you'd be right every time. Doing so allows the market to start betting it'll see theorems if it sees that they're provable, even if it hasn't yet seen this rule make a successful advance prediction. (Logical inductors start out really ignorant of logic; they don't know what proofs are or how they're connected to theorems.)

This doesn't exactly push toward gears-y models as defined earlier, but it seems close. You push toward anything for which you can provide an explicit justification, where "explicit justification" is anything you can name ahead of time (and check later) which pins down predictions of the sort which tend to correlate with the truth.

This doesn't mean the logical inductor converges entirely to gears-level reasoning. Gears were never supposed to be everything, right? The optimal strategy combines gears-like and non-gears-like reasoning. However, it does suggest that gears-like reasoning has an advantage over non-gears reasoning: it can gain credibility from old evidence. This will often push gears-y models above competing non-gears considerations.

All of this is still terribly informal, but is the sort of thing which could lead to a formal theory. Hopefully you'll give me credit later for that advanced prediction.


Sat, 17 Feb 2018 19:57:18 EST

Circling is a practice, much like meditation is a practice.

There are many forms of it (again, like there are many forms of meditation). There are even life philosophies built around it. There are lots of intellectual, heady discussions of its theoretical underpinnings, often centered in Ken Wilber's Integral Theory. Subcultures have risen from it. It is mostly practiced in the US and Europe. It attracts lots of New Age-y, hippie, self-help-guru types. My guess is that the median age of practicers is in the 30's. I sometimes refer to practicers of Circling as relationalists (or just Circlers).

In recent years, Circling has caught the eye of rationalists, and that's why this post is showing up here, on LessWrong. I can hopefully direct people here who have the question, "I've heard of this thing called Circling, but... what exactly is it?" And further, people who ask, "Why is this thing so ****ing hard to explain? Just tell me!"

You are probably familiar with the term inferential distance.

Well, my friend Tiffany suggested a similar term to me, experiential distance—the gap in understanding caused by the distance between different sets of experiences. Let's just say that certain Circling experiences can create a big experiential distance, and this gap isn't easily closed using words. Much of the relevant "data" is in the nonverbal, subjective aspects of the experience, and even if I came up with a good metaphor or explanation, it would never close the gap. (This is annoyingly Postmodern, yes?)

[Ho ho~ how I do love poking fun at Postmodernism~]

But! There are still things to say, so I will say them. Just know that this post may not feel like eating a satisfying meal. I suspect it will feel more like licking a Pop-Tart, on the non-frosted side.

Some notes first.

Note #1: I'm not writing this to sell Circling or persuade you that it's good. I recommend using your own sense of curiosity, intuition, and intelligence to guide you. I don't want you to "put away" any of your thinking-feeling parts just to absorb what I'm saying. Rather, try remaining fully in contact with your awareness, your sensations, and your thoughts. (I hope this makes sense as a mental move.)

Note #2: The best introduction to Circling is to actually try it. It's like if I tried to explain watching Toy Story to someone who's never seen a movie. You don't explain movies to people; you just sit them down and have them watch one. So, I encourage you to stop reading at any time you notice yourself wanting to try it. My words will be mere pale ghosts. Pale ghosts, I tell you!

Note #3: This post is written by a rationalist who's done 400+ hours of Circling and has tried all the main styles / schools of Circling.

OK, I will try to explain what a circle is (the activity, not the general practice), but I also want to direct your attention to this handy 100-page PDF I found that attempts to explain everything Circling, if you're willing to skim it. (It is written by a relative amateur to the Circling world and contains many disputed sentences, but it is thorough. Just take it all with a grain of salt.)

So what is a circle?

You start by sitting with other people in a circle. So far, so good!

Group sizes can be as small as 2 and as large as 50+, but 4-12 is perhaps more expected.

There are often explicitly stated agreements or principles. These help create common knowledge about what to expect. The agreements aren't the same across circles or across schools of Circling. But a few common ones include "Honor self", "Own your experience", "Stay with the level of sensation", ...

There is usually at least one facilitator. They are responsible for tracking time and declaring the circle's start and end. Mostly they function as extra-good, extra-mindful participants—they're not "in charge" of the circle.

Then the group "has a conversation." Or maybe more accurately, it experiences what it’s like to be together, and sometimes intra-reports what that experience is like.

[^I'm actually super proud of this description! It's so succinctly what it is!]

Two common types of circles: Organic vs Birthday

Organic circles are more like a loose hivemind, where the group starts with no particular goal or orientation. Sometimes, a focal point emerges; sometimes it doesn't. Each individual has the freedom to point their attention however they will, and each individual can try to direct the group's attention in various ways. What happens when you put a certain selection of molecules into a container? How do they react? Do they bond? Do they stay the fuck away? What is it like to be a molecule in this situation? What is it like to be the molecule across from you?

Birthday circles start with a particular focal point. One person is chosen to be birthday circled, and the facilitator then gently cradles the group's attention towards this person, much like you can guide your attention back to your breath in meditation. And then the group tries to imagine/embody what it's like to be this person and "see through their eyes"—while also noticing what it's like to be themselves trying to do this.

Circling is often called a "relational practice."

It's a practice that's about the question of: What is it like to be me? What is it like to be me, while with another? What is it like for me to try to feel what the other is feeling? How might I express me? How does the other receive me and my expression?

In other words, it's a practice that explores what it means to be a sentient entity, among other sentient entities. And in particular what it means to be a human, among other humans.

If you haven't thought to yourself, "Being sentient is pretty weird; being a human is super weird; being a human around other humans is super-duper crazy weird." Then I suspect you haven't explored this space to its fullest extent. Circling has helped me feel more of the strangeness of this existence.

How is Circling related to rationality?

I notice I feel trepidation and fear as I prepare to discuss this. I'm afraid I won't be able to give you what you want, that you'll become bored or start judging me.

[^This is a Circling move I just made: revealing what I'm feeling and what I'm imagining will happen.]

If this were an actual circle, I could ask you and check if it's true—are you feeling bored? [I invite you to check.]

I felt afraid just now—that fear was borne out of some assumptions about reality I was implicitly making. But without having to know and delineate what the assumptions are, I can check those assumptions by asking you—you who are part of reality and have relevant data.

By asking you while feeling my fear and anticipation, I open up the parts of me that can update, like opening so many eyes that usually stay closed. And depending on how you respond, I can receive the data any number of ways (including having the data bounce off, integrating the data, or disbelieving the data).

So, perhaps one way Circling is related to rationality is that it can:

  1. put me in a state of being open to an update,
  2. train me to straightforwardly ask for the data, from the world, and
  3. respond to and receive the data—with all my faculties available.

What does it mean to be open to an update?

If you've experienced a more recent iteration of CFAR's Comfort Zone Exploration class (aka CoZE), it is just that.

There are parts of me that are scared of looking over the fence, where there might be dragons in the territory. (Why is the fence even there? Who knows. It belongs to Chesterton.)

My job, then, is not to shove the scared parts over the fence, or to suggest they shut their eyes and jump over it, or to destroy the fence. I walk next to the fence with my scared part, and I sit with and acknowledge the fear. Then I play around with getting closer to the fence; I play with waving my arms above the fence; I play with peeking over it; I play with touching the fence.

And this whole time, I'm quite aware of the fear; I do not push it down or call it inappropriate or dissociate. I listen to it, and I try to notice all my internal sensations and my awareness. I am fully exposed to new information, like walking into an ice bath slowly with all my senses awake. In my experience, being in an SNS-activated state really primes me for new information in a way that being calm (PSNS activation) does not.

And this is when I am most open to receiving new inputs from the world, where I might be the most affected by the new data.

I can practice playing around with this during Circling, and it can be quite powerful.

What does it mean to receive data with all my faculties available?

This means I'm not mindlessly "accepting" whatever is happening in front of me. All of me is engaged, such that I can notice and call bullshit if that's what's up.

If I'm actually in touch with my body and my felt senses, I can notice all the small niggling parts that are like, "Uhhh" or "Errgh." Often they're nonverbal. Even the tiniest flinches of discomfort or retraction I will use as signals of something, even if I don't really understand what they mean. And I can then also choose to name them out loud, if I want to. And see how the other person reacts to that.

In other words, my epistemic defense system is online and running. It's not taking a break during any of this, nor do I want it to be. If things still manage to slip past, I want to be able to notice it later on and investigate. Sometimes slowing things down helps. My mind will also automatically defend itself—in circles, I've fallen asleep, gotten distracted, failed to parse sentences, become aggressively confused or bored, among other things. What's cool is being able to notice all this as it's happening.

However, if I'm not in touch with my body—if I'm dissociated, if I don't normally feel my body/emotions, if I'm overwhelmed, if I'm solely in my thoughts—then that is a skill area I'd want to work on first. How to learn to stay aware of myself and my felt-sense body, even when I'm uncomfortable or my nervous system is activated. Circling can also train this, similar to Focusing.

The more I train this skill, the more I'll be able to engage with the universe. Rather than avoid the parts of it I don't like or don't want to acknowledge or don't want to look at.

I suspect some people might not even realize what they're missing out on here. People who've lived their entire lives without much of an "emotional library" or without understanding that their body is giving them all kinds of data. Usually these people don't go looking for the "missing thing" until some major problems crop up in their lives that they can't explain.

Circling as a rationality training ground

Circling can be a turbocharged training ground for a variety of rationality skills, including:

  • Real-time introspection
  • Surrendering to the unknown / being at the edge
  • Exploring unknown, unfamiliar, or avoided parts of the territory (like in CoZE)
  • Looking at parts of the territory that make you flinch (Mundanification)
  • Having the Double Crux spirit: being open to being wrong / updating, seeing other people as having relevant bits of map

I've also found it to be powerful in combination with:

  • Internal Double Crux (a CFAR technique for resolving internal conflict that involves lots of introspection)
  • Immunity to Change mapping (a Kegan-Lahey technique for making lasting change by looking for big assumptions)
  • CT Charting (a Leverage technique for mapping your beliefs and finding hidden assumptions)
  • or any other formal attempt to explore my aliefs and find core assumptions I've been holding onto

After using one of the above techniques to find a core assumption, I can use Circling to test out its validity. (My core assumptions often have something to do with other people, "Nobody can understand me, and even if they could, they wouldn't want to.") I can sometimes feel those assumptions being challenged during a circle.

So, if I try being in any ole circle, will I get all of the above?

Probably not.

Circles are high-variance. (The parameters of each circle matter a lot. Like who's in it, who's facilitating, what school of Circling is it based on, what are the lighting conditions, etc.)

I've circled about a hundred times by now, and a lot of those were in 3-day chunks. I guess multi-day immersions are a pretty good way to really try it out, so maybe try that and see? They reduce the variance in some dimensions.

What are some pitfalls of Circling?

1) You might become a "connection junkie".

Circling is (in its final form) a truth-seeking practice. IMO. But a lot of folks flock to it as a way to feel connected to other people.

This is not necessarily a bad thing. In fact I suspect human-to-human contact is something many of us are seriously lacking, possibly starving for. It might be good for us to get more of this in our lives.

That said, there can be such a thing as "too much of a good thing."

2) You might obtain false beliefs.

I think this is always a risk, for humans, in life. But Circling does have a way of making things more salient than usual, and if some of those super-salient things lead you to believing, somehow, the "wrong" things, then maybe that's more of a problem.

I think this isn't actually a huge problem, as long as one has a good meta- or meta-meta-process for arriving eventually at true beliefs. (See the rest of this website for more!)

I also think this is mitigated by exposing yourself to a wide range of data. Like, consciously avoid being in a bubble. Join multiple cult-ures [sic].

3) Circles can be bad / harmful.

IMO, there is a qualitative difference between good and bad circles.

Concretely, the good facilitators understand the nuances of mental health and have done at least some research on therapy modalities. Circling isn't therapy, but psychological stuff comes up a fair amount. And if you vulnerably open up in a situation where they're not actually equipped to navigate your mental health issues, that could be quite bad indeed.

A good facilitator will also not force you to open up or try to get you to be vulnerable (this goes against Circling's principles). Instead they will tune into your nervous system and try to tell when you're feeling stressed or anxious or frozen and will probably reflect this back at you to check. Circling is not about "getting somewhere" or "healing you" or "solving a problem." So ... if you encounter a circle where that seems to be what's happening, try saying something out loud like "I have a story we're trying to fix something."

Good facilitation often costs money—there's a correlation, anyway. I wouldn't assume the facilitation will be good just because it costs money, but it's an easy signpost.

Final thoughts

It's not like Circling has taken over the world or anything. So the same question posed to rationality has to be posed to it, Given it hasn't, why do you think it’s real?

And like with rationality, for me the answer is kind of like, I dunno because my inside view says it is?

/licks a Pop-Tart


Sat, 17 Feb 2018 4:06:35 EST

Someone walking around stabbing people can cause a lot of damage before anyone stops them. That’s a bummer, but some technologies make the situation much worse.

If we don’t want to ban dangerous technologies outright (because they have legitimate purposes, or because we really love guns), we could instead expand liability insurance requirements. In the case of guns I think this is an interesting compromise; in the case of sophisticated consumer robotics, I think it’s probably the right policy.

Example: firearms

Anyone can buy a gun, but first they have to put down a $100M deposit. If you use the gun to cause damage or kill people then the damages are deducted from your deposit (in addition to other punishments).

Most individuals can’t afford this deposit, so they would need to purchase firearm liability insurance. An insurer could perform background checks or tests in order to better assess risk and offer a lower rate, or could get buyers to agree to monitoring that reduces the risk of killing many people. Insurance rates might be lower for weapons that aren’t well-suited for murder, or for purchasers who have a stronger legitimate interest in firearms. Insurance rates would be lower if you took appropriate precautions to avoid theft.

When all is said and done, about 10M firearms are made in the US per year, and about 30k people are murdered. So if $10M is charged per murder, the average cost of firearm insurance should end up being around $30k (with significantly lower costs for low-risk applicants, and prohibitively large costs for the highest risk applicants). In practice I would expect the number of firearms, and the risk per firearm, to fall.

Some details

Rather than putting down a separate deposit for every gun they insure, an insurer only needs to demonstrate that they can cover their total obligations. For example, an insurer who insures 5M firearms might be required to keep $50B in reserves (rather than a 5M * $100M = $500 trillion deposit), based on a conservative estimate of the correlated risk. $50B may sound like a lot, but if insurers charge a 15% markup and the total payouts are 30k * $10M = $300B, then the gun insurance industry is making $50B/year.

When you manufacture a gun, you automatically assume liability for it. In order to legally make guns you need insurance or a deposit, and you must ensure that your gun is traceable (e.g. via a serial number). Most of the time when you sell someone a gun, they will assume liability (by putting down their own deposit, replacing your deposit) as a condition of purchase. You are welcome to sell to someone who won’t assume liability, but that’s a recipe for losing $100M. Likewise, whoever buys the gun can resell it without transferring liability, but their insurer is going to try to stop them (e.g. by confiscating a smaller deposit with the insurer, by signing additional legally binding agreements, by background checks, by monitoring).

This amounts to privatizing regulation of destructive technologies. The state could continue to participate in this scheme as an insurer—if they wanted, they could sell insurance to anyone who is allowed to buy a gun under the current laws. They’d be losing huge amounts of money though.

Example: Robots

We are approaching the world where $50 of robotics and a makeshift weapon can injure or kill an unprotected pedestrian. Cheap robotics could greatly increase the amount of trouble a trouble-maker can make (and greatly decrease their legal risk). We could fix this problem by tightly controlling access to robots, but robots have plenty of legitimate uses.


A less drastic solution would be to require liability insurance, e.g. $2M for a small robot or or $20M for a large robot. Manufacturers could make their robots cheap to insure by placing restrictions that make it hard to use it for crime or that limit their usefulness for trouble-making. (This could be coupled with the same mechanisms described in the section on firearms, including monitoring to make it more difficult to circumvent restrictions.)

It makes sense to have different requirements for different robots, but they should err on the side of simplicity and conservativeness. Insurers can make a more detailed assessment about whether a particular robot really poses a risk when deciding how much to charge for insurance.

Whether or not liability insurance is required for owning a robot, I think it would be good to require it for operating a robot in a public space. This doesn’t require sweeping legal changes or harmonization: local governments could simply decide that uninsured robots will be destroyed or confiscated on sight.

Fri, 16 Feb 2018 13:42:27 EST

Cross-posted to the EA Forum & my personal blog.

This is the fourth (and final) post in a series exploring consequentialist cluelessness and its implications for effective altruism:

  • The first post describes cluelessness & its relevance to EA; arguing that for many popular EA interventions we don’t have a clue about the intervention’s overall net impact.
  • The second post considers a potential reply to concerns about cluelessness.
  • The third post examines how tractable cluelessness is – to what extent we can grow more clueful about an intervention through intentional effort?
  • This post discusses how we might do good while being clueless to an important extent.

Consider reading the previous posts (1, 2, 3) first.

The last post looked at whether we could grow more clueful by intentional effort. It concluded that, for the foreseeable future, we will probably remain clueless about the long-run impacts of our actions to a meaningful extent, even after taking measures to improve our understanding and foresight.

Given this state of affairs, we should act cautiously when trying to do good. This post outlines a framework for doing good while being clueless, then looks at what this framework implies about current EA cause prioritization.

The following only make sense if you already believe that the far future matters a lot; this argument has been made elegantly elsewhere so we won’t rehash it here.[1]

An analogy: interstellar travel

Consider a spacecraft, journeying out into space. The occupants of the craft are searching for a star system to settle. Promising destination systems are all very far away, and the voyagers don’t have a complete map of how to get to any of them. Indeed, they know very little about the space they will travel through.

To have a good journey, the voyagers will have to successfully steer their ship (both literally & metaphorically). Let's use "steering capacity" as an umbrella term that refers to the capacity needed to have a successful journey.[2]

"Steering capacity" can be broken down into the following five attributes:[3]

  • Intent: The voyagers must have a clear idea of what they are looking for.
  • Coordination: The voyagers must be able to reach agreement about where to go.
  • Wisdom: The voyagers must be discerning enough to identify promising systems as promising, when they encounter them. Similarly, they must be discerning enough to accurately identify threats & obstacles.
  • Capability: Their craft must be powerful enough to reach the destinations they choose.
  • Predictive power: Because the voyagers travel through unmapped territory, they must be able to see far enough ahead to avoid obstacles they encounter.

This spacecraft is a useful analogy for thinking about our civilization’s trajectory. Like us, the space voyagers are somewhat clueless – they don’t know quite where they should go (though they can make guesses), and they don’t know how to get there (though they can plot a course and make adjustments along the way).

The five attributes given above – intent, coordination, wisdom, capability, and predictive power – determine how successful the space voyagers will be in arriving at a suitable destination system. These same attributes can also serve as a useful framework for considering which altruistic interventions we should prioritize, given our present situation.

The basic point

The basic point here is that interventions whose main known effects do not improve our steering capacity (i.e. our intent, wisdom, coordination, capability, and predictive power) are not as important as interventions whose main known effects do improve these attributes.

An implication of this is that interventions whose effectiveness is driven mainly by their proximate impacts are less important than interventions whose effectiveness is driven mainly by increasing our steering capacity.

This is because any action we take is going to have indirect & long-run consequences that bear on our civilization’s trajectory. Many of the long-run consequences of our actions are unknown, so the future is unpredictable. Therefore, we ought to prioritize interventions that improve the wisdom, capability, and coordination of future actors, so that they are better positioned to address future problems that we did not foresee.

What being clueless means for altruistic prioritization

I think the steering capacity framework implies a portfolio approach to doing good – simultaneously pursuing a large number of diverse hypotheses about how to do good, provided that each approach maintains reversibility.[4]

This approach is similar to the Open Philanthropy Project’s hits-based giving framework – invest in many promising initiatives with the expectation that most will fail.

Below, I look at how this framework interacts with focus areas that effective altruists are already working on. Other causes that EA has not looked into closely (e.g. improving education) may also perform well under this framework; assessing causes of this sort is beyond the scope of this essay.

My thinking here is preliminary, and very probably contains errors & oversights.

EA focus areas to prioritize

Broadly speaking, the steering capacity framework suggests prioritizing interventions that:[5]

  • Further our understanding of what matters
  • Improve governance
  • Improve prediction-making & foresight
  • Reduce existential risk
  • Increase the number of well-intentioned, highly capable people

To prioritize – better understanding what matters

Increasing our understanding of what’s worth caring about is important for clarifying our intentions about what trajectories to aim for. For many moral questions, there is already broad agreement in the EA community (e.g. the view that all currently existing human lives matter is uncontroversial within EA). On other questions, further thinking would be valuable (e.g. how best to compare human lives to the lives of animals).

Myriad thinkers have done valuable work on this question. Particularly worth mentioning is the work of the Foundational Research Institute the Global Priorities Project the Qualia Research Institute as well the Open Philanthropy Project’s work on consciousness & moral patienthood.

To prioritize – improving governance

Improving governance is largely aimed at improving coordination – our ability to mediate diverse preferences, decide on collectively held goals, and work together towards those goals.

Efficient governance institutions are robustly useful in that they keep focus oriented on solving important problems & minimize resource expenditure on zero-sum competitive signaling.

Two routes towards improved governance seem promising: (1) improving the functioning of existing institutions, and (2) experimenting with alternative institutional structures (Robin Hanson’s futarchy proposal and seasteading initiatives are examples here).

To prioritize – improving foresight

Improving foresight & prediction-making ability is important for informing our decisions. The further we can see down the path, the more information we can incorporate into our decision-making, which in turn leads to higher quality outcomes with fewer surprises.

Forecasting ability can definitely be improved from baseline, but there are probably hard limits on how far into the future we can extend our predictions while remaining believable.

Philip Tetlock’s Good Judgment Project is a promising forecasting intervention, as are prediction markets like PredictIt and polling aggregators like 538.

To prioritize – reducing existential risk

Reducing existential risk can be framed as “avoiding large obstacles that lie ahead”. Avoiding extinction and “lock-in” of suboptimal states is necessary for realizing the full potential benefit of the future.

Many initiatives are underway in the x-risk reduction cause area. Larks’ annual review of AI safety work is excellent; Open Phil has good material about projects focused on other x-risks.

To prioritize – increase the number of well-intentioned, highly capable people

Well-intentioned, highly capable people are a scarce resource, and will almost certainly continue to be highly useful going forward. Increasing the number of well-intentioned, highly capable people seems robustly good, as such people are able to diagnosis & coordinate together on future problems as they arise.

Projects like CFAR and SPARC are in this category.

In a different vein, psychedelic experiences hold promise as a treatment for treatment-resistant depression, and may also improve the intentions of highly capable people who have not reflected much about what matters (“the betterment of well people”).

EA focus areas to deprioritize, maybe

The steering capacity framework suggests deprioritizing animal welfare & global health interventions, to the extent that these interventions’ effectiveness is driven by their proximate impacts.

Under this framework, prioritizing animal welfare & global health interventions may be justified, but only on the basis of improving our intent, wisdom, coordination, capability, or predictive power.

To deprioritize, maybe – animal welfare

To the extent that animal welfare interventions expand our civilization’s moral circle they may hold promise as interventions that improve our intentions & understanding of what matters (the Sentience Institute is doing work along this line).

However, following this framework, the case for animal welfare interventions has to be made on these grounds, not on the basis of cost-effectively reducing animal suffering in the present.

This is because the animals that are helped in such interventions cannot help “steer the ship” – they cannot contribute to making sure that our civilization’s trajectory is headed in a good direction.

To deprioritize, maybe – global health

To the extent that global health interventions improve coordination, or reduce x-risk by increasing socio-political stability, they may hold promise under the steering capacity framework.

However, the case for global health interventions would have to be made on the grounds of increasing coordination, reducing x-risk, or improving another steering capacity attribute. Arguments for global health interventions on the grounds that they cost-effectively help people in the present day (without consideration of how this bears on our future trajectory) are not competitive under this framework.


In sum, I think the fact that we are intractably clueless implies a portfolio approach to doing good – pursuing, in parallel, a large number of diverse hypotheses about how to do good.

Interventions that improve our understanding of what matters, improve governance, improve prediction-making ability, reduce existential risk, and increase the number of well-intentioned, highly capable people are all promising. Global health & animal welfare interventions may hold promise as well, but the case for these cause areas needs to be made on the basis of improving our steering capacity, not on the basis of their proximate impacts.

Thanks to members of the Mather essay discussion group and an anonymous collaborator for thoughtful feedback on drafts of this post. Views expressed above are my own.


[1]: Nick Beckstead has done the best work I know of on the topic of why the far future matters. This post is a good introduction; for a more in-depth treatment see his PhD thesis, On the Overwhelming Importance of Shaping the Far Future.

[2]: I'm grateful to Ben Hoffman for discussion that fleshed out the "steering capacity" concept; see this comment thread.

[3]: Note that this list of attributes is not exhaustive & this metaphor isn't perfect. I've found the space travel metaphor useful for thinking about cause prioritization given our uncertainty about the far future, so am deploying it here.

[4]: Maintaining reversibility is important because given our cluelessness, we are unsure of the net impact of any action. When uncertain about overall impact, it’s important to be able to walk back actions that we come to view as net negative.

[5]: I'm not sure of how to prioritize these things amongst themselves. Probably improving our understanding of what matters & our predictive power are highest priority, but that's a very weakly held view.


Fri, 16 Feb 2018 5:13:07 EST


“A chaotic evil character tends to have no respect for rules, other people’s lives, or anything but their own desires, which are typically selfish and cruel. They set a high value on personal freedom, but do not have much regard for the lives or freedom of other people. Chaotic evil characters do not work well in groups because they resent being given orders and do not usually behave themselves unless there is no alternative.”

Wikipedia: Alignment (Dungeons and Dragons)



Well, then. Hmm.

I knew in advance that any exploration of Unity wouldn’t be all kumbaya and smiles and roses… of course, there’s plenty of uncontroversial and straightforward points that need to be hit in order to establish those really exceptional, highly productive, and joyful cooperations and collaborations.

And yet, there’s some points that risk controversy.

Scratch that.

There’s some points about unity that are fundamentally controversial.

In this issue, I put forward the proposition that some, relatively small percentage of people are fundamentally incapable of unity in their current stage of life. Those people are incredibly detrimental to the well-being and thriving of teams and the individuals on those teams.

This would be bad enough if we could easily spot those people, but therein lies the problem — it’s pretty well-proven at this point that most people overestimate their ability to sense honesty and dishonesty and to evaluate other people accurately.

The American novelist Tom Clancy once remarked,

“The difference between fiction and reality is that fiction has to make sense.”

People that behave in a more dishonest and anti-social way, unfortunately, have much greater freedom of action than people who behave more fundamentally honestly and lawfully — and to our great detriment, we often fail to judge correctly.

This is one of the more important skillsets to establish for anyone who wants great cooperations and collaborations in their life — the ability to assess who is capable of cooperating effectively, and who is not.

And, perhaps due to its high level of potential controversy, it’s covered very poorly in general literature for the public — to be sure, law enforcement, lawyers, judges, diplomats, professional negotiators, and a host of other professions receive specific training in this, but I’ve frankly never read a good general treatise on the topic putting the issues into the frank and stark light of day.

Distasteful and dangerous ground occasionally, it’s nevertheless essential to learn. Shall we begin?



The tabletop role-playing game Dungeons and Dragons was first published in 1974.

The forerunner and a major influence on most role-playing video games, D&D became very popular when it combined two unrelated elements to make a unique sort of game.

Various board games and tabletop war games have been around for a very long time — Chess, in its modern form, has been around for about a thousand years; the beautifully elegant game of Go was invented in Ancient China over 2,000 years ago.

Dungeons and Dragons combined the tactical elements of boardgames and wargames with a “role-playing” angle — in Chess, for instance, you don’t become particularly attached to your knight or rook; you’re amoral as to whether these pieces are captured by the opponent so long as it advances your position and chance of winning.

Contrasting that, in D&D, each player would generate a fictional character with a fictional backstory, history, and code of values. Typically the character would start inexperienced — Level 1 — with very basic equipment and not much money.

Throughout a given game, each of these Level 1 characters would look to grow stronger and defeat enemies and puzzles in encounters, similar to a board game — but additionally, there were the narrative and role-playing elements of the story. Players were expected to act in accordance with the fictional backstory and morals of their character while playing the game.

Hence, a character oriented around good and justice and loyalty wouldn’t sacrifice an ally in trouble to gain an advantage in a combat. A devout member of a knightly order would be expected to act accordingly, not merely taking the best tactical actions in a board game sense.

This proved a very immersive experience and the game got very popular. It included all the elements of problemsolving and tactical play that a boardgame would produce, but the narrative elements make games more richer, immersive, and interesting than most board games could ever hope to be.



As part of creating a Dungeons and Dragons character, you’d define a set of attributes before starting the game — you’d choose a name, race (human, elf, dwarf, etc), gender, and an alignment before starting.

In the first 1974 version of the game, there were only three alignments — Lawful, Neutral, and Chaotic.


“The original version of D&D allowed players to choose among three alignments when creating a character: lawful, implying honor and respect for society’s rules; chaotic, implying rebelliousness and individualism; and neutral, seeking a balance between the extremes.”

But after some playtesting and refinement, it was found that the lawful-chaotic alignment axis didn’t get the job done very well — typically lawful characters would be the good guys, and chaotic characters the bad guys, but it didn’t always work out so well in practice. Would Robin Hood be classed as chaotic for breaking society’s rules, or lawful for behaving honorably?

In 1977, the more familiar two-alignment system was introduced —

Good, Neutral, Evil

Lawful, Neutral, Chaotic

The Dungeons and Dragons concept of good and evil roughly mapped to modern Western morality. Wikipedia

“Good implies altruism, respect for life, and a concern for the dignity of sentient beings. Good characters make personal sacrifices to help others.

Evil implies harming, oppressing, and killing others. Some evil creatures simply have no compassion for others and kill without qualms if doing so is convenient or if it can be set up. Others actively pursue evil, killing for sport or out of duty to some malevolent deity or master.

People who are neutral with respect to good and evil have compunctions against killing the innocent but lack the commitment to make sacrifices to protect or help others. Neutral people are committed to others by personal relationships.”

You could of course criticize the system for the lack of nuance — it’s obviously a highly simplified picture of how life would run, since few of us are universally altruistic or expedient.

And yet, this new schema worked and made intuitive sense to players — Robin Hood would be chaotic good, the Sheriff of Nottingham would be lawful evil.



Among people who like Dungeons and Dragons, there’s periodically internet threads debating the alignment of real-life historical figures and characters from other fictional universes.

The canonical “chaotic evil” character from outside of Dungeons and Dragons was Heath Ledger’s portrayal of the The Joker from Christopher Nolan’s 2008 film The Dark Knight.

The Joker: “Their morals, their code; it’s a bad joke. Dropped at the first sign of trouble. They’re only as good as the world allows them to be. You’ll see- I’ll show you. When the chips are down these, uh, civilized people? They’ll eat each other. See I’m not a monster, I’m just ahead of the curve.”

Batman’s butler, Alfred, described the Joker like this —

“Some men aren’t looking for anything logical, like money. They can’t be bought, bullied, reasoned, or negotiated with. Some men just want to watch the world burn.”

And of course that’s borne out by the action in the movie — perhaps the most stunning scene to me was when The Joker had finally accumulated tens of millions of dollars in a giant pile of money, and then takes out the can of gasoline and starts pouring it on the cash…

“It’s not about the money. It’s about sending a message ­­ — everything burns.

This is widely agreed to be the most extreme canonical representation of chaotic evil.

And likewise, in Dungeons and Dragons, most chaotic evil creatures and humans simply look evil — desiccated undead zombies, devils and demons, warlords in black spiked armor coated in dried blood.

This is convenient as a storytelling trope — if a group of players is exploring a ruined castle and comes across an undead, rotting, seemingly demonic figure — well, 99 times out of 100, that’s an antagonist.

Is there a problem with this concept?



Before we continue, please take the 3 minutes and 53 seconds to watch this —


Seriously, please watch it. The skills and lessons you can derive from analysis of those four minutes might be life-changing, even.

(If you’re reading this piece in the future and that Youtube link isn’t working, do take the time and search for ‘Golden Balls 100,000 split or steal’ — I imagine some version of this will be online forever.)

And for reference, 100,000 Great British Pounds in mid-2008 was around $200,000 USD. Or, as they said on the show, “This is a life-changing amount of money.”

I knew, before starting the Unity series, that I wanted to illustrate some of the more nuanced and controversial points with game theory and I started looking for good examples of people cooperating or defecting when significant amounts of money were at stake — finding Golden Balls not only set off the perfect illustrative example, but wound up teaching me more than I had expected.

Specifically, the final round of the British game show “Golden Balls” is an example of what students of game theory call “The Prisoner’s Dilemma.”

I imagine most TSR readers are relatively sophisticated in this area and already know about the Prisoner’s Dilemma, but if you haven’t heard about it, take the time to read up on it at some point.

The short version is that in the prisoner’s dilemma, you can choose to cooperate (“Split” on Golden Balls) or defect (“Steal” on Golden Balls). If both parties cooperate, they get the best possible group outcome.

If your partner cooperates and you defect, you do better than them.

If both people defect, both people do worse than if both cooperated.

Golden Balls is a fascinating game show, because the final round is almost a pure version of the Prisoner’s Dilemma from game theory — multiple academic economics papers have been written about the show.

At the final round of Golden Balls, Sarah and Stephen had 100,000 pounds sitting in front of them. ($200,000 USD.) If both chose split, they’d each walk away with 50,000 pounds — anywhere from one to four years of salary for most Britons at the time.

If one person chose split and the other chose steal, the stealer would get all 100,000 pounds ($200,000) and the splitter would get nothing.

If both chose steal, they’d both get zero.

It breaks down like this —

If Stephen and Sarah both split, Stephen gets 50k and Sarah gets 50k.

If Stephen steals and Sarah splits, Stephen gets 100k and Sarah gets 0.

If Stephen splits and Sarah steals, Stephen gets 0 and Sarah gets 100k.

If Stephen and Sarah both steal, Stephen gets 0 and Sarah gets 0.

You can’t control what the other person does, and both votes are secret and revealed at the same time.

Before reading on, did you watch the clip? I strongly recommend you watch it before continuing this piece. There’s huge implications.



Before choosing Split or Steal on the show, the contestants are allowed to talk to each for a few minutes — to explain what they’re going to do, negotiate, persuade each other… to talk about whatever they want.

The journalist Joe Posnanski wrote an analysis of the outcome of that show in “Golden Balls Revisited.” Again, I can’t recommend highly enough that you actually watch the clip before reading it in text —

“Her: Steven I just hope that those weren’t puppy dog tears, that they were real tears, and you’re genuinely going to split that money.

Him: I am going to split. That’s just … 50,000 … that’s just unbelievable. I’m very, very happy to go home with 50,000.

Her: Will you split that money?

Him: If I stole that money, every single person over there would go over here and lynch me.

Her: There’s no way I could … I mean everyone who knew me would be disgusted if I stole.

Him: When people watch this they’re not going to believe it.

Her: Please … I … please …

Him: Sarah I can look you straight in the eye and tell you that I’m going to split. I swear to you.

Her (Nods).

Him (as they hold the balls): “We’re going home with 50 grand each, I promise you that.”

Of course, Stephen… didn’t go home with 50 grand.

It’s been called “the most savage betrayal in TV history.”



Well, there a whole host of lessons available here.

The first, most controversial, and most important is that most chaotic evil do not look like monsters.

Observe Sarah. She’s not an undead creature, a demon from another world, a dark knight clad in spikes and dried blood.

She looks like a nice, young, middle class, slightly pretty woman. She pours on charm and vulnerability, begging Stephen to do the right thing.

He does… she does not.

After finding Golden Balls, I went on to watch over 30 hours of it over two months — and another couple dozen hours re-watching specific clips repeatedly to try to formulate patterns of honesty and dishonesty. I don’t like television and don’t watch much video at all, but I got a number of profitable lessons from it — most profitable of all is realizing just how broken my naive intuition is about who is honest and who is not.

If you wanted to understand how the show worked, I’d pick this episode as the best way to understand how the shows works —


(If that link isn’t working in the future, the contestants on that Golden Balls episode were Fred, Leanne, Scott, and Victoria — you might be able to find it by searching.)

Players are eliminated by a mix of pure math/randomness and their ability to negotiate, persuade, bluff, and influence.

It’s not purely about human factors — oftentimes, independently of trust and believability, a player would go forward because they have a very strong amount of cash in front of them.

Other times, it becomes purely about deception and figuring out who is lying.

In the first round of this example, Fred and Leanne had strong hands, and clearly it would be Scott or Victoria who was lying. Scott swears up and down that he’s telling the truth, and mathematically, it’s more likely he’s telling the truth than Victoria who has a worse showing hand.

I believed Scott. He swore up and down, repeatedly, that he was telling the truth. And again — good-looking, respectable-seeming, well-dressed middle-class guy.

He was lying.

Not even lying a little bit — lying a lot.

If you’d asked me to guess before the first reveal, I would have bet heavily on Scott telling the truth. He was swearing up and down that he was being honest.

I got it wrong.

Most people who behave badly don’t look like monsters at all.



To clarify, I don’t use concepts like “chaotic evil” in everyday life — I don’t think I’ve ever called anyone chaotic evil. It’s a bit… hyperbolic, no?

And yet, as a mental model, it’s worth exploring —

On the law/chaos axis, you see whether someone prefers to respect or break rules — including being consistent with their own word over time.

As for the good/evil axis, it certainly sounds a bit extreme in 2018 — but the name of it aside, there’s undoubtedly patterns of whether people behave more altruistically or selfishly, and certainly, some people care more for other people, and other people less so.

While obviously not as extreme as the literally-cartoonist Joker from Batman, you’ll often see participants on Golden Balls seemingly personifying the different types of alignment.

One episode stood out to me in particular as both good examples of the different type of alignment and showing you that your initial impressions can be rather flawed.

The episode had John (retired taxi driver with 15 grandchildren), Tina (accountant and mother), Carolann (young children’s entertainer), and Steve (young buffalo rancher).


Their bios described them —

Tina (accountant): “My plan is to play the game as honest as I can, because I don’t really feel happy about telling lies.”

John (retired grandpa): “I plan to play the nice old man card and win the confidence of the other contestants.”

Carolann (children’s entertainer): “My plan is to try to make the group laugh, make friends with my fellow contestants, before weeding out the weakest player.”

Steve (buffalo rancher): “My plan today is to play the game as honestly as possible, but if I do have to lie to save myself, I don’t have a problem with that.”

I’ll spoil this episode for you — there’s dozens more to watch if you want to do analysis — Tina, the accountant, was the only one that was honest the whole way through. She classes as lawful good to me.

Both Steve and Carolann got eliminated before the final round — both of them are likely neutral or worse.

As for John, who looks like the very embodiment of the “nice old man” stereotype as he described himself — well… chaotic evil once again.

When they privately described what they’d do if they reached the finals, those two said —

Tina: “If I was lucky enough to get to the final, my ideal situation would be to split with the person. I wouldn’t want to be called the greedy one.”

John: “No matter what happens, if I get through to the final — I will steal, every time.”

John was one of the more dastardly people on the show, outright lying multiple times after he’d already committed to defect at the end —

“[Tina], I’ll not let you down. I promise you. I’m a gentleman and I won’t let you down.”

Golden Balls has some inherent structural pressures towards dishonesty, as we discussed in Issue #1. Likewise, the show producers obviously cast interesting and varied personalities onto the show.

But the biggest lesson you can take from it, if you study multiple episodes, is that your eyes lie to you. We all form stereotypes and preconceptions of people — John looked like a nice old grandpa, really a handsome and dignified looking older guy. Despite being paired at the end with a woman who was honest for the entire time and seemed genuinely committing to cooperation, he stole. She (lawful good) got nothing; he (chaotic evil) took home the money.



“A chaotic evil character tends to have no respect for rules, other people’s lives, or anything but their own desires, which are typically selfish and cruel. They set a high value on personal freedom, but do not have much regard for the lives or freedom of other people. Chaotic evil characters do not work well in groups because they resent being given orders and do not usually behave themselves unless there is no alternative.”

Dungeons and Dragons is obviously a fictional game and a massive simplification of life — but as far as chaotic evil goes, I think it accurately captures some of reality. When a person is erratic and selfish, unity is not possible.

I mean, obviously, no?

Erratic and pro-social people — chaotic good — might be workable.

Consistent selfish people — lawful neutral or lawful evil — you might even be able to do business with those people.

But erratic and selfish is a very bad combination for teamwork. In any partnership, in any organization, in any company, in any collaboration, in any endeavor — inevitably there will come a time when you can add more to the group at some sacrifice to yourself.

If everyone is competent and everyone does everything they can for the team, you wind up in cooperate/cooperate situations — and you get the most gains possible for the group.

When “chaotic evil” people are matched with each other, contrarily, they wind up with Steal/Steal outcomes on the game — and each walk away with nothing at the end of the show.

The biggest danger is to those who want to behave lawfully and pro-socially — in these cases, it’s critical to be able to identify chaotic evil people and deal with them appropriately.

From my vantage point, I think you can analyze aspects of the law/chaos and good/evil spectrum separately — chaotic behavior in someone’s current day and recent past predicts more chaotic behavior; lawful and conscientious behavior predicts more lawful and conscientious behavior.

Thus, investigating to see whether someone keeps their word and stays consistent with the codes they live by is useful for predicting future behavior in the group.

Likewise, people who you observe to act pro-socially and altruistically now and in the very recent past are more likely to continue acting pro-socially and altruistically.

Over time, if you’re looking to build fantastic teams and collaborations, you’ll need to learn and train yourself how to identify people who won’t be good teammates and colleagues.

First and foremost, that means acknowledging that there’s people who are non-lawful about their words and promises, and who are self-centered to the point of being willing to defect from cooperative groups — these types of people, of course, make bad teammates.

Simply acknowledging that this is true goes a long way towards prompting you to learn how to detect it and take appropriate measures.



As already mentioned, you need to be very careful against just judging someone on surface appearances. Often, our snap-judgment intuitions are wrong — that’s the biggest thing I got from watching Golden Balls, and frankly, it seems most people aren’t too good at it.

It’s a natural temptation for people to “Generalize From One Example” — lawful-type people tend to assume others feel as strongly about keeping their word as others.

// http://lesswrong.com/lw/dr/generalizing_from_one_example/

As I watched multiple episodes, a stunning pattern emerged — it was very often lawful good people who got eliminated quickly in Round 1 or Round 2, since the dishonest people would go on the attack right away, and the lawful good people wouldn’t defend themselves very well. (I’ll make a recommended watching list if you’re curious to dig deeper and learn these lessons.)

There’s dozens of assessments and breakdowns of language and body language on Youtube — I watched a number of them. No single one stands out to me as excellent, but it’s worth searching and watching a varied mix of them. There’s tells like noticing a suppressed smirk from someone who is lying, or noting that they’re using obfuscating language instead of stating directly what they’re going to do — this takes significant practice to start spotting the patterns and is a bit beyond the scope of this piece, but you’ll benefit if you take the time to do some diligent studying on it at some point.



Though Unity is more about team dynamics and less about you as an individual, it should be noted that in both Dungeons and Dragons and in real life, people are capable of shifting alignment over time.

I’d always be skeptical of someone saying they shifted alignment on a very short period of time, especially around lawfulness and chaoticness — but much of chaotic behavior can be put down to bad life skills rather than an inherent flaw in someone’s character.

And even when there’s problems on a character level, there’s some hope — for instance, often young people revel in rulebreaking and defiance and dislike working within structure. But sooner or later, many young people with a more chaotic orientation towards the world realize that this type of behavior is actually more of a prison than establishing some rules and processes in one’s life to consistently follow.

If you found yourself having some “chaotic evil” elements to your personality and behavior, I’d start with the chaos before the selfishness. Establishing a baseline of fundamental competence and consistency is required, either way, to doing big things.

As for altruism and cooperation, it can be useful in any collaboration to try to do more than one’s fair share as your default target behavior. Oftentimes, we overrate our own contributions because we’re aware of everything we do and attempt, and underrate others’ contributions because we don’t see all the work they’re doing. Aiming to be the most significant contributor to any project or collaboration, being the person that works the hardest and is the most reliable, will mean you’re never the weak link on a team, and will allow you to eventually get into harmony with exceptional people.

Of course, you still have to establish collaborations with the right type of people — it’s de facto nearly impossible to deal cooperatively with people on the ‘chaotic evil’ end of the spectrum, but making yourself consistent in performance and incredibly pro-social in terms of contribution goes a long way towards establishing yourself on the ‘lawful good’ end of the spectrum, and makes you appealing for other people of the same type to work with.



We’re discussing Unity, and dealing with people who are fundamentally incompatible with unity is a little beyond the scope of the series.

But let’s share one last excellent video from Golden Balls, the most skillful endgame played on the how —


You can watch that one even if you haven’t watched the past episodes — you’ll get it right away.

Nick: “Ibrahim, I want you to trust me… 100% I’m going to pick the steal ball.”

Ibrahim: “Sorry, you’re going to do what?”

Nick: “I’m going to choose the steal ball. I want you to choose to Split, and I promise you I will split the money with you.”

Ibrahim: “After the show?”

Nick: “Yup. I promise you I’ll do that. If you do Steal, we both walk with nothing.”

It’s a fantastic clip, and Nick’s play worked — they split the money. There was a followup Radiolab interview some years later where Ibraham admitted, “I was always going to steal. Never going to split. Never. … I’ve never been a good guy.”

And Nick — hilariously — coerced him into doing the right thing.

That clip is well-worth watching. When dealing with someone with that chaotic evil personality — Ibraham’s future Radiolab interview made his intentions quite clear — Nick was able to appeal to his self-interest and get a split. Ibrahim hated it — he shouts at him and insults him in the clip… but the cooperative outcome happened in the end. While this type of thing is beyond the scope of our investigation into Unity, it should also prompt some smart thinking.



There’s two vastly simplified axes of human morality — law/chaos and good/evil.

As someone moves from more lawful to chaotic, they become harder to get into unity with for long term cooperation.

As someone moves from altruistic/concerned-with-others (“good”) towards more selfish/doesn’t-care-about-others (“evil”), they become harder to get into unity with for long term cooperation.

In fiction, “chaotic evil” people literally look like monsters — undead, demons, dripping with blood. They look and act like The Joker from Batman.

But in real life, there’s all sorts of chaotic evil people — Sarah looked like a nice young lady; John looked a very respectable and sterling grandpa. Impressions can be deceiving, and you should be learn that your first impressions can’t necessarily be trusted when sizing these things up.

For obvious reasons, “chaotic evil” type people make bad teammates, partners, and colleagues.

You should recognize the fundamental truth of this and learn to avoid them over time — both through detection of dishonesty when possible (it’s worth studying some, Golden Balls offers some fine lessons)… and through having rigorous selection procedures which we’ll discuss in the next issue of Unity.

If you find yourself leaning towards the chaotic and evil end of the spectrum, perhaps meditate on that and gradually start changing your patterns of behavior — at least, if you care about being effective within cooperative teams, which is certainly both highly productive and one of life’s greatest joys. Large epiphanies usually aren’t necessary — it’s the hard work of regularly doing the right thing, in dozens and hundreds of little ways.

As a bonus lesson, if you catch yourself dealing with fundamentally dishonest or uncooperative people, then leverage and incentives matter far more than appeals to humanity or consistency — which obviously don’t matter to people like that. But really, just learn to identify and avoid chaotic evil type people whenever possible.

“It’s just a gameshow!” — no, I don’t think so. Not at all. I think it’s a very informative slice of life. Study unity and study human nature — it’s the way to that truly fantastic, life-affirming, productive, and joyful cooperation that lets us do large-scale endeavors.

Until next time, yours,

Sebastian Marshall
Editor, TheStrategicReview.net


Want to learn more about game theory and human nature?

I have about 50 hours of research I did for this issue that didn’t make it in —click here if you want a copy of it.

It includes three academic papers and datasets on honesty/dishonesty from UChicago, Shippensburg University, and MIT — as well as some strategy and viewing/learning guides to episodes from Golden Balls and a couple similar shows.

I also wrote down what I’d do if I was ever on the final round of a show like that — you might dig it.

You can get all that here —



Unity #3: Chaotic Evil was originally published in The Strategic Review on Medium, where people are continuing the conversation by highlighting and responding to this story.

Wed, 14 Feb 2018 12:02:48 EST

DuckworthIn Grit: The Power of Passion and Perseverance, Angela Duckworth argues that outstanding achievement comes from a combination of passion – a focused approach to something you deeply care about – and perseverance – a resilience and desire to work hard. Duckworth calls this combination of passion and perseverance “grit”.

For Duckworth, grit is important as focused effort is required to both build skill and turn that skill into achievement. Talent plus effort leads to skill. Skill plus effort leads to achievement. Effort appears twice in the equation. If one expends that effort across too many domains (no focus through lack of passion), the necessary skills will not be developed and those skills won’t be translated into achievement.

While sounding almost obvious written this way, Duckworth’s claims go deeper. She argues that in many domains grit is more important than “talent” or intelligence. And she argues that we can increase people’s grit through the way we parent, educate, coach and manage.

Three articles from 2016 (in SlateThe New Yorker and npr) critiquing Grit and the associated research make a lot of the points that I would. But before turning to those articles and my thoughts, I will say that Duckworth appears to be one of the most open recipients of criticism in academia that I have come across. She readily concedes good arguments, and appears caught between her knowledge of the limitations of the research and the need to write or speak in a strong enough manner to sell a book or make a TED talk.

That said, I am sympathetic with the Slate and npr critiques. Grit is not the best predictor of success. To the extent there is a difference between “grit” and the big five trait of conscientiousness, it is minor (making grit largely an old idea rebranded with a funkier name). A meta-analysis (working paper) by Marcus Credé, Michael Tynan and Peter Harms makes this case (and forms the basis of the npr piece).

Also critiqued in the npr article is Duckworth’s example of grittier cadets being more likely to make it through the seven-week West Point training program Beast Barracks, which features in the book’s opening. As she states, “Grit turned out to be an astoundingly reliable predictor of who made it through and who did not.”

The West Point research comes from two papers by Duckworth and colleagues from 2007 (pdf) and 2009 (pdf). The difference in drop out rate is framed as a rather large in the 2009 article:

“Cadets who scored a standard deviation higher deviation higher than average on the Grit-S were 99% more likely to complete summer training”

But to report the results another way, 95% of all cadets made it through. 98% of the top quartile in grit stayed. As Marcus Credé states in the npr article, there is only a three percentage point difference between the average drop out rate and that of the grittiest cadets. Alternatively, you can consider that 88% of the bottom quartile made it through. That appears a decent success rate for these low grit cadets. (The number reported in the paper references the change in odds, which is not the way most people would interpret that sentence. But on Duckworth being a great recipient of criticism, she concedes in the npr article she should have put it another way.)

Having said this, I am sympathetic to the argument that there is something here that West Point could benefit from. If low grit were the underlying cause of cadet drop-outs, reducing the drop out rate of the least gritty half to that of the top half could cut the drop out rate by more than 50%. If they found a way of doing this (which I am more sceptical about), it could be a worthwhile investment.

One thing that I haven’t been able to determine from the two papers with the West Point analysis is the distribution of grit scores for the West Point cadets. Are they gritty relative to the rest of the population? In Duckworth’s other grit studies, the already high achievers (spelling bee contestants, Stanford students, etc.) look a lot like the rest of us. Why does it take no grit to enter into domains which many people would already consider to be success? Is this the same for West Point?

Possibly the biggest question I have about the West Point study is why people drop out. As Duckworth talks about later in the book (repeatedly), there is a need to engage in search to find the thing you are passionate about. Detours are to be expected. When setting top-level goals, don’t be afraid to erase an answer that isn’t working out. Finishing what you begin could be a way to miss opportunities. Be consistent over time, but first find a thing to be consistent with. If your mid-level goals are not aligned with your top level objective, abandon them. And so on. Many of the “grit paragons” that Duckworth interviewed for her book explored many different avenues before settling on the one that consumes them.

So, are the West Point drop-outs leaving because of low grit, or are they are shifting to the next phase of their search? If we find them later in their life (at a point of success), will they then score higher on grit as they have found something they are passionate about that they wish to persevere with? How much of the high grit score of the paragons is because they have succeeded in their search? To what extent is grit simply a reflection of current circumstances?

One of the more interesting sections of the book addresses whether there are limits to what we can achieve due to talent. Duckworth’s major point is that we are so far from whatever limits we have that they are irrelevant.

On the one hand, that is clearly right – in almost every domain people could improve through persistent effort (and deliberate practice). But another consideration is where their personal limits lie relative to the degree of skill required to successfully achieve a person’s goals. I am a long way from my limits as a tennis player, but my limits are well short of that required to ever make a living from it.

Following from this, Duckworth is of the view that people should follow their passion and argues against the common advice that following your passion is the path to poverty. I’m with Cal Newport on this one, and think that “follow your passion” is horrible advice. If you don’t have anything of value to offer related to your passion, you likely won’t succeed.

Duckworth’s evidence behind her argument is mixed. She notes that people are more satisfied with jobs when they follow a personal interest, but this is not evidence that people who want to find a job that matches their interest are more satisfied. Where are those who failed? Duckworth also notes that these people perform better, but again, what is the aggregate outcome of all the people who started out with this goal?

One chapter concerns parenting. Duckworth concedes the evidence here is thin, incomplete and that there are no randomised controlled trials. But she then suggests that she doesn’t have time to wait for the data come in (which I suppose you don’t if you are already raising children).

She cites research on supportive versus demanding parenting, derived from measures such as surveys of students. These demonstrate that students with more demanding parents have higher grades. Similarly, research on world-class performers shows that their parents are models of work ethic. The next chapter reports on the positive relationship between extracurricular activities while at school and job outcomes, particularly where they stick with the same activity for two or more years (i.e. consistent parents).

But Duckworth does not address the typical problem of studies in this domain – they all ignore biology. Do the students receive higher grades because their parents are more demanding, or because they are the genetic descendants of two demanding people? Are they world-class performers because their parents model a work ethic, or because they have inherited a work ethic? Are they consistent with their extracurricular activities because their parents consistently keep them at it, or because they are the type of people likely to be consistent?

These questions might appear speculation in themselves, but the large catalogue of twin, adoption and now genetic studies points to the answers. To the degree children resemble their parents, this is largely genetic. The effect of the shared environment – i.e. parenting – is low (and in many studies zero). That is not say interventions cannot be developed. But they are not reflected in the variation in parenting the subject of these studies.

Duckworth does briefly turn to genetics when making her case for the ability to change someone’s grit. Like a lot of other behavioural traits, the heritability of grit is moderate: 37% for perseverance, 20% for passion (the study referenced is here). Grit is not set in stone, so Duckworth takes this as a case for the effect of environment.

However, a heritability less than one provides little evidence that deliberate changes in environment can change a trait. The same study finding moderate heritability also found no effect of shared environment (e.g. parenting). The evidence of influence is thin.

Finally, Duckworth cites the Flynn effect as evidence of the malleability of IQ – and how similar effects could play out with grit – but she does not reference the extended trail of failed interventions designed to increase IQ (although a recent meta-analyses show some effect of education). I can understand Duckworth’s aims, but feel that the literature in support of them is somewhat thin.

Other random points or thoughts:

  • As for any book that contain colourful stories of success linked to the recipe it is selling, the stories of the grit paragons smack of survivorship bias. Maybe the coach of the Seattle Seahawks pushes toward a gritty culture, but I’m not sure the other NFL teams go and get ice-cream every time training gets tough. Jamie Dimon, CEO of JP Morgan, is praised for the $5 billion profit JP Morgan gained through the GFC (let’s skate over the $13 billion in fines). How would another CEO have gone?
  • Do those with higher grit display a higher level of sunk cost fallacy, being unwilling to let go?
  • Interesting study – Tsay and Banaji, Naturals and strivers: Preferences and beliefs about sources of achievement. The abstract:

To understand how talent and achievement are perceived, three experiments compared the assessments of “naturals” and “strivers.” Professional musicians learned about two pianists, equal in achievement but who varied in the source of achievement: the “natural” with early evidence of high innate ability, versus the “striver” with early evidence of high motivation and perseverance (Experiment 1). Although musicians reported the strong belief that strivers will achieve over naturals, their preferences and beliefs showed the reverse pattern: they judged the natural performer to be more talented, more likely to succeed, and more hirable than the striver. In Experiment 2, this “naturalness bias” was observed again in experts but not in nonexperts, and replicated in a between-subjects design in Experiment 3. Together, these experiments show a bias favoring naturals over strivers even when the achievement is equal, and a dissociation between stated beliefs about achievement and actual choices in expert decision-makers.”

  • A follow up study generalised the naturals and strivers research over some other domains.
  • Duckworth reports on the genius research of Catharine Cox, in which Cox looked at 300 eminent people and attempted to determine what it was that makes them a genius. All 300 had an IQ above 100. The average of the top 10 was 146. The average of the bottom 10 was 143. Duckworth points to the trivial link between IQ and ranking within that 300, with the substantive differentiator being level of persistence. But note those average IQ scores…

Wed, 14 Feb 2018 12:02:35 EST

I have been reading the thought provoking Elephant in the Brain, and will probably have more to say on it later. But if I understand correctly, a dominant theory of how humans came to be so smart is that they have been in an endless cat and mouse game with themselves, making norms and punishing violations on the one hand, and cleverly cheating their own norms and excusing themselves on the other (the ‘Social Brain Hypothesis’ or ‘Machiavellian Intelligence Hypothesis’). Intelligence purportedly evolved to get ourselves off the hook, and our ability to construct rocket ships and proofs about large prime numbers are just a lucky side product.

As a person who is both unusually smart, and who spent the last half hour wishing the seatbelt sign would go off so they could permissibly use the restroom, I feel like there is some tension between this theory and reality. I’m not the only unusually smart person who hates breaking rules, who wishes there were more rules telling them what to do, who incessantly makes up rules for themselves, who intentionally steers clear of borderline cases because it would be so annoying to think about, and who wishes the nominal rules were policed predictably and actually reflected expected behavior. This is a whole stereotype of person.

But if intelligence evolved for the prime purpose of evading rules, shouldn’t the smartest people be best at navigating rule evasion? Or at least reliably non-terrible at it? Shouldn’t they be the most delighted to find themselves in situations where the rules were ambiguous and the real situation didn’t match the claimed rules? Shouldn’t the people who are best at making rocket ships and proofs also be the best at making excuses and calculatedly risky norm-violations? Why is there this stereotype that the more you can make rocket ships, the more likely you are to break down crying if the social rules about when and how you are allowed to make rocket ships are ambiguous?

It could be that these nerds are rare, yet salient for some reason. Maybe such people are funny, not representative. Maybe the smartest people are actually savvy. I’m told that there is at least a positive correlation between social skills and other intellectual skills.

I offer a different theory. If the human brain grew out of an endless cat and mouse game, what if the thing we traditionally think of as ‘intelligence’ grew out of being the cat, not the mouse?

The skill it takes to apply abstract theories across a range of domains and to notice places where reality doesn’t fit sounds very much like policing norms, not breaking them. The love of consistency that fuels unifying theories sounds a lot like the one that insists on fair application of laws, and social codes that can apply in every circumstance. Math is basically just the construction of a bunch of rules, and then endless speculation about what they imply. A major object of science is even called discovering ‘the laws of nature’.

Rules need to generalize across a lot of situations—you will have a terrible time as rule-enforcer if you see every situation as having new, ad-hoc appropriate behavior. We wouldn’t even call this having a ‘rule’. But more to the point, when people bring you their excuses, if your rule doesn’t already imply an immovable position on every case you have never imagined, then you are open to accepting excuses. So you need to see the one law manifest everywhere. I posit that technical intelligence comes from the drive to make these generalizations, not the drive to thwart them.

On this theory, probably some other aspects of human skill are for evading norms. For instance, perhaps social or emotional intelligence (I hear these are things, but will not pretend to know much about them). If norm-policing and norm-evading are somewhat different activities, we might expect to have at least two systems that are engorged by this endless struggle.

I think this would solve another problem: if we came to have intelligence for cheating each other, it is unclear why general intelligence per se is is the answer to this, but not to other problems we have ever had as animals. Why did we get mental skills this time rather than earlier? Like that time we were competing over eating all the plants, or escaping predators better than our cousins? This isn’t the only time that a species was in fierce competition against themselves for something. In fact that has been happening forever. Why didn’t we develop intelligence to compete against each other for food, back when we lived in the sea? If the theory is just ‘there was strong competitive pressure for something that will help us win, so out came intelligence’, I think there is a lot left unexplained. Especially since the thing we most want to explain is the spaceship stuff, that on this theory is a random side effect anyway. (Note: I may be misunderstanding the usual theory, as a result of knowing almost nothing about it.)

I think this Principled Intelligence Hypothesis does better. Tracking general principles and spotting deviations from them is close to what scientific intelligence is, so if we were competing to do this (against people seeking to thwart us) it would make sense that we ended up with good theory-generalizing and deviation-spotting engines.

On the other hand, I think there are several reasons to doubt this theory, or details to resolve. For instance, while we are being unnecessarily norm-abiding and going with anecdotal evidence, I think I am actually pretty great at making up excuses, if I do say so. And I feel like this rests on is the same skill as ‘analogize one thing to another’ (my being here to hide from a party could just as well be interpreted as my being here to look for the drinks, much as the economy could also be interpreted as a kind of nervous system), which seems like it is quite similar to the skill of making up scientific theories (these five observations being true is much like theory X applying in general), though arguably not the skill of making up scientific theories well. So this is evidence against smart people being bad at norm evasion in general, and against norm evasion being a different kind of skill to norm enforcement, which is about generalizing across circumstances.

Some other outside view evidence against this theory’s correctness is that my friends all think it is wrong, and I know nothing about the relevant literature. I think it could also do with some inside view details – for instance, how exactly does any creature ever benefit from enforcing norms well? Isn’t it a bit of a tragedy of the commons? If norm evasion and norm policing skills vary in a population of agents, what happens over time? But I thought I’d tell you my rough thoughts, before I set this aside and fail to look into any of those details for the indefinite future.

Wed, 14 Feb 2018 12:02:16 EST

Someday we may be able to create brain emulations (ems), and someday later we may understand them sufficiently to allow substantial modifications to them. Many have expressed concern that competition for efficient em workers might then turn ems into inhuman creatures of little moral worth. This might happen via reductions of brain systems, features, and activities that are distinctly human but that contribute less to work effectiveness. For example Scott Alexander fears loss of moral value due to “a very powerful ability to focus the brain on the task at hand” and ems “neurologically incapable of having their minds drift off while on the job”.

A plausible candidate for em brain reduction to reduce mind drift is the default mode network:

The default mode network is active during passive rest and mind-wandering. Mind-wandering usually involves thinking about others, thinking about one’s self, remembering the past, and envisioning the future.… becomes activated within an order of a fraction of a second after participants finish a task. … deactivate during external goal-oriented tasks such as visual attention or cognitive working memory tasks. … The brain’s energy consumption is increased by less than 5% of its baseline energy consumption while performing a focused mental task. … The default mode network is known to be involved in many seemingly different functions:

It is the neurological basis for the self:

Autobiographical information: Memories of collection of events and facts about one’s self
Self-reference: Referring to traits and descriptions of one’s self
Emotion of one’s self: Reflecting about one’s own emotional state

Thinking about others:

Theory of Mind: Thinking about the thoughts of others and what they might or might not know
Emotions of other: Understanding the emotions of other people and empathizing with their feelings
Moral reasoning: Determining just and unjust result of an action
Social evaluations: Good-bad attitude judgments about social concepts
Social categories: Reflecting on important social characteristics and status of a group

Remembering the past and thinking about the future:

Remembering the past: Recalling events that happened in the past
Imagining the future: Envisioning events that might happen in the future
Episodic memory: Detailed memory related to specific events in time
Story comprehension: Understanding and remembering a narrative

In our book The Elephant in the Brain, we say that key tasks for our distant ancestors were tracking how others saw them, watching for ways others might accuse them of norm violations, and managing stories of their motives and plans to help them defend against such accusations. The difficulty of this task was a big reason humans had such big brains. So it made sense to design our brains to work on such tasks in spare moments. However, if ems could be productive workers even with a reduced capacity for managing their social image, it might make sense to design ems to spend a lot less time and energy ruminating on their image.

Interestingly, many who seek personal insight and spiritual enlightenment try hard to reduce the influence of this key default mode network. Here is Sam Harris from his recent book Waking Up: A Guide to Spirituality Without Religion:

Psychologists and neuroscientist now acknowledge that the human mind tends to wander. .. Subjects reported being lost in thought 46.9 percent of the time. .. People are consistently less happy when their minds are wander, even when the contents of their thoughts are pleasant. … The wandering mind has been correlated with activity in the … “default mode” or “resting state” network (DMN). .. Activity in the DMN decreases when subjects concentrate on tasks of the sort employed in most neuroimaging experiments.

The DMN has also been linked with our capacity for “self-representation.” … [it] is more engaged when we make such judgements of relevance about ourselves, as opposed to making them about other people. It also tends to be more active when we evaluate a scene from a first person point of view. … Generally speaking, to pay attention outwardly reduces activity in the [DMN], while thinking about oneself increases it. …

Mindfulness and loving-kindness mediation also decrease activity in the DMN – and the effect is most pronounced among experienced meditators. … Expert meditators … judge the intensity of an unpleasant stimulus the same but find it to be less unpleasant. They also show reduced activity in regions associated with anxiety while anticipanting the onsite of pain. … Mindfulness reduces both the unpleasantness and intensity of noxious stimuli. …

There is an enormous difference between being hostage to one’s thoughts and being freely and nonjudgmental aware of life in the present. To make this shift is to interrupt the process of rumination and reactivity that often keep us so desperately at odds with ourselves and with other people. … Meditation is simply the ability to stop suffering in many of the usual ways, if only for a few moments at a time. … The deepest goal of spirituality is freedom from the illusion of the self. (pp.119-123)

I see a big conflict here. On the one hand, many are concerned that competition could destroy moral value by cutting away distinctively human features of em brains, and the default network seems a prime candidate for cutting. On the other hand, many see meditation as a key to spiritual insight, one of the highest human callings, and a key task in meditation is cutting the influence of the default network. Ems with a reduced default network could more easily focus, be mindful, see the illusion of the self, and feel more at peace and less anxious about their social image. So which is it, do such ems achieve our highest spiritual ideals, or are they empty shells mostly devoid of human value? Can’t be both, right?

By the way, I was reading Harris because he and I will record a podcast Feb 21 in Denver.

Thu, 8 Feb 2018 7:15:25 EST

Rereading The Hungry Brain, I notice my review missed one of my favorite parts: the description of the motivational system. It starts with studies of lampreys, horrible little primitive parasitic fish:

How does the lamprey decide what to do? Within the lamprey basal ganglia lies a key structure called the striatum, which is the portion of the basal ganglia that receives most of the incoming signals from other parts of the brain. The striatum receives “bids” from other brain regions, each of which represents a specific action. A little piece of the lamprey’s brain is whispering “mate” to the striatum, while another piece is shouting “flee the predator” and so on. It would be a very bad idea for these movements to occur simultaneously – because a lamprey can’t do all of them at the same time – so to prevent simultaneous activation of many different movements, all these regions are held in check by powerful inhibitory connections from the basal ganglia. This means that the basal ganglia keep all behaviors in “off” mode by default. Only once a specific action’s bid has been selected do the basal ganglia turn off this inhibitory control, allowing the behavior to occur. You can think of the basal ganglia as a bouncer that chooses which behavior gets access to the muscles and turns away the rest. This fulfills the first key property of a selector: it must be able to pick one option and allow it access to the muscles.

Many of these action bids originate from a region of the lamprey brain called the pallium…

Spoiler: the pallium is the region that evolved into the cerebral cortex in higher animals.

Each little region of the pallium is responsible for a particular behavior, such as tracking prey, suctioning onto a rock, or fleeing predators. These regions are thought to have two basic functions. The first is to execute the behavior in which it specializes, once it has received permission from the basal ganglia. For example, the “track prey” region activates downstream pathways that contract the lamprey’s muscles in a pattern that causes the animal to track its prey. The second basic function of these regions is to collect relevant information about the lamprey’s surroundings and internal state, which determines how strong a bid it will put in to the striatum. For example, if there’s a predator nearby, the “flee predator” region will put in a very strong bid to the striatum, while the “build a nest” bid will be weak…

Each little region of the pallium is attempting to execute its specific behavior and competing against all other regions that are incompatible with it. The strength of each bid represents how valuable that specific behavior appears to the organism at that particular moment, and the striatum’s job is simple: select the strongest bid. This fulfills the second key property of a selector – that it must be able to choose the best option for a given situation…

With all this in mind, it’s helpful to think of each individual region of the lamprey pallium as an option generator that’s responsible for a specific behavior. Each option generator is constantly competing with all other incompatible option generators for access to the muscles, and the option generator with the strongest bid at any particular moment wins the competition.

The next subsection, which I’m skipping, quotes some scientists saying that the human motivation system works similarly to the lamprey motivation system, except that the human cerebrum has many more (and much more flexible/learnable) options than the lamprey pallium. Humans have to “make up our minds about things a lamprey cannot fathom, like what to cook for dinner, how to pay off the mortgage, and whether or not to believe in God”. It starts getting interesting again when it talks about basal ganglia-related disorders:

To illustrate the crucial importance of the basal ganglia in decision-making processes, let’s consider what happens when they don’t work.

As it turns out, several disorders affect the basal ganglia. The most common is Parkinson’s disease, which results from the progressive loss of cells in a part of the basal ganglia called the substantia nigra. These cells send connections to the dorsal striatum, where they produce dopamine, a chemical messenger that plays a very important role in the function of the striatum. Dopamine is a fascinating and widely misunderstood molecule that we’ll discuss further in the next chapter, but for now, its most relevant function is to increase the likelihood of engaging in any behavior.

When dopamine levels in the striatum are increased – for example, by cocaine or amphetamine – mice (and humans) tend to move around a lot. High levels of dopamine essentially make the basal ganglia more sensitive to incoming bids, lowering the threshold for activating movements…Conversely, when dopamine levels are low, the basal ganglia become less sensitive to incoming bids and the threshold for activating movements is high. In this scenario, animals tend to stay put. The most extreme example of this is the dopamine-deficient mice created by Richard Palmer, a neuroscience researcher at the University of Washington. These animals sit in their cages nearly motionless all day due to a complete absence of dopamine. “If you set a dopamine deficient mouse on a table,” explains Palmiter, “it will just sit there and look at you. It’s totally apathetic.” When Palmiter’s team chemically replaces the mice’s dopamine, they eat, drink, and run around like mad until the dopamine is gone.

The same can happen to humans with basal ganglia injuries:

Consider Jim, a former miner who was admitted to a psychiatric hospital at the age of fifty-seven with a cluster of unusual symptoms. As recorded in his case report, “during the preceding three years he had become increasingly withdrawn and unspontaneous. In the month before admission he had deteriorated to the point where he was doubly incontinent, answered only yes or no questions, and would sit or stand unmoving if not prompted. He only ate with prompting, and would sometimes continue putting spoon to mouth, sometimes for as long as two minutes after his plate was empty. Similarly, he would flush the toilet repeatedly until asked to stop.”

Jim was suffering from a rare disorder called abulia, which is Greek for “an absence of will”. Patients who suffer from abulia can respond to questions and perform specific tasks if prompted, but they have difficulty spontaneously initiating motivations, emotions, and thoughts. A severely abulic patient seated in a bare room by himself will remain immobile until someone enters the room. If asked what he was thinking or feeling, he’ll reply, “Nothing”…

Abulia is typically associated with damage to the basal ganglia and related circuits, and it often responds well to drugs that increase dopamine signaling. One of these is bromocriptine, the drug used to treat Jim…Researchers believe that the brain damage associated with abulia causes the basal ganglia to become insensitive to incoming bids, such that even the most appropriate feelings, thoughts, and motivations aren’t able to be expressed (or even to enter consciousness). Drugs that increase dopamine signaling make the striatum more sensitive to bids, allowing some abulic patients to recover the ability to feel, think, and move spontaneously.

All of this is standard neuroscience, but presented much better than the standard neuroscience books present it, so much so that it brings some important questions into sharper relief. Like: what does this have to do with willpower?

Guyenet describes high dopamine levels in the striatum as “increasing the likelihood of engaging in any behavior”. But that’s not really fair – outside a hospital, almost nobody just sits motionless in the middle of a room and does no behaviors. The relevant distinction isn’t between engaging in behavior vs. not doing so. It’s between low-effort behaviors like watching TV, and high-effort behaviors like writing a term paper. We know that this has to be related to the same dopamine system Guyenet’s talking about, because Adderall (which increases dopamine in the relevant areas) makes it much easier to do the high-effort behaviors. So a better description might be “high dopamine levels in the striatum increase the likelihood of engaging in high-willpower-requirement behaviors”.

But what is high willpower requirements? I’m always tempted to answer this with some sort of appeal to basic calorie expenditure, but taking a walk requires less willpower than writing a term paper even though the walk probably burns way more calories. My “watch TV” option generator, my “take a walk” option generator, and my “write a term paper” option generator are all putting in bids to my striatum – and for some reason, high dopamine levels privilege the “write a term paper” option and low dopamine levels privilege the others. Why?

I don’t know, and I think it’s the most interesting next question in the study of these kinds of systems.

But here’s a crazy idea (read: the first thing I thought of after thirty seconds). In the predictive processing model, dopamine represents confidence levels. Suppose there’s a high prior on taking a walk being a reasonable plan. Maybe this is for evo psych reasons (there was lots of walking in the ancestral environment), or for reinforcement related reasons (you enjoy walking, and your brain has learned to predict it will make you happy). And there’s a low prior on writing a term paper being a reasonable plan. Again, it’s not the sort of thing that happened much in the ancestral environment, and plausibly every previous time you’ve done it, you’ve hated it.

In this case, confidence in your new evidence (as opposed to your priors) is a pretty important variable. If your cortex makes its claims with high confidence (ie in a high-dopaminergic state), then its claim that it’s a good idea to write a term paper now may be so convincing that it’s able to overcome the high prior against this being true. If your cortex makes claims with low confidence, then it will tentatively suggest that maybe we should write a term paper now – but the striatum will remain unconvinced due to the inherent implausibility of the idea.

In this case, sitting in a dark room doing nothing is just an action plan with a very high prior; you need at least a tiny bit of confidence in your planning ability to shift to anything else.

I mentioned in Toward A Predictive Theory Of Depression that I didn’t understand the motivational system well enough to be able to explain why systematic underconfidence in neural predictions would make people less motivated. I think the idea of evolutionarily-primitive and heavily-reinforced actions as a prior – which logical judgments from the cortex have to “override” in order to produce more willpower-intensive actions – fills in this gap and provides another line of evidence for the theory.

Wed, 7 Feb 2018 11:48:23 EST

Technology may reach a point where free use of one person’s share of humanity’s resources is enough to easily destroy the world. I think society needs to make significant changes to cope with that scenario.

Mass surveillance is a natural response, and sometimes people think of it as the only response. I find mass surveillance pretty unappealing, but I think we can capture almost all of the value by surveilling things rather than surveilling people. This approach avoids some of the worst problems of mass surveillance; while it still has unattractive features it’s my favorite option so far.

This post outlines a very theoretical and abstract version of this idea. Any practical implementation would be much messier. I haven’t thought about this topic in great depth and I expect my views will change substantially over time.

The idea

We’ll choose a set of artifacts to surveil and restrict. I’ll call these heavy technology and everything else light technology. Our goal is to restrict as few things as possible, but we want to make sure that someone can’t cause unacceptable destruction with only light technology. By default something is light technology if it can be easily acquired by an individual or small group in 2017, and heavy technology otherwise (though we may need to make some exceptions, e.g. certain biological materials or equipment).

Heavy technology is subject to two rules:

  1. You can’t use heavy technology in a way that is unacceptably destructive.
  2. You can’t use heavy technology to undermine the machinery that enforces these two rules.

To enforce these rules, all heavy technology is under surveillance, and is situated such that it cannot be unilaterally used by any individual or small group. That is, individuals can own heavy technology, but they cannot have unmonitored physical access to that technology.

For example, a modern factory would be under surveillance to ensure that its operation doesn’t violate these rules. As a special case of rule #2, the factory could not be used to produce heavy technology without ensuring that technology is appropriately registered and monitored. The enforcement rules would require the factory be defended well enough that a small group (including the owner of the factory!) could not steal heavy machinery from the factory or use it illicitly. Because a small group would not have unrestricted access to any heavy technology, it might be very easy for a small amount of heavy technology to defend the factory.

The cost of this enforcement would be paid by people who make heavy technology. Because heavy technology can only be created by using other heavy technology (which are under surveillance) or by large groups (which have limited ability to coordinate illegal activities), it is feasible for law enforcement to be aware of all new heavy technology and ensure that it is monitored.

Sometimes this surveillance and enforcement can be provided by the technology itself. For example, computers could be built so that they can only perform approved computations (I realize this is objectively dystopian). An attacker using heavy technology would almost certainly be possible to circumvent restrictions, but a would-be attacker has access only to light technology. The technology may need to monitor its environment, phone home to law enforcement if things looks weird, and potentially be prepared to disable itself (e.g. with explosives).

In order to relax requirements on some type of heavy machinery, e.g. to release it to unmonitored consumers, the producer needs to convince regulators that it doesn’t constitute a threat to these rules. This could be due to inherent limitations of the technology (e.g. a genetically modified extra-juicy pineapple is not threatening) or because of restrictions that will be hard for an individual or small group to circumvent (e.g. the computer described above that only runs software that has been approved by law enforcement, and the defense looks solid). If I want to release heavy technology, I have to pay for the costs of the evaluation process to determine whether it is safe.

These evaluations could be organized hierarchically. At the “root” are very detailed and cautious evaluations that are expensive and carried out rarely. A root evaluation wouldn’t just approve a single object, it would specify a cheaper process that could be used to approve certain kinds of items. For example, I could propose a simpler process for evaluating new materials, and perform an extensive evaluation to convince regulators that this simple process is reasonably secure. Then rather than having an extensive evaluation when I want to release a new plastic, I can follow this much cheaper new-material-approval-process. There could be several levels of delegated evaluations.

Ideally we’d engage in red team exercises to probe enforcement mechanisms and ensure that they were adequate.These could apply to every step of the process, from finding creative ways to use light technology to cause unacceptable destruction, to beating the enforcement and security mechanisms around heavy technology, to making unsound proposals for relaxing restrictions on heavy technology. Red teams could be better-financed and organized than plausible criminal organizations and should have a much lower standard for success than “could have destroyed the whole world.”


This proposal has several advantages, relative to mass surveillance:

  • It allows people to continue living unmonitored lives, doing all the things they were able to do in 2017. In particular, people are free to say what they want, organize politically, trade with each other, study whatever topics they want, and so on.
  • It allows individuals to continue understanding and improving technology, without requiring a high degrees of secrecy about key technologies or radically slowing technological progress.
  • It provides a (relatively) clear and limited mandate for surveillance—with appropriate laws, it would require significant overreach for this surveillance to e.g. have a decisive political effect.
  • It probably allows releasing lots of heavy technology to consumers without much extra burden, by inserting safeguards that are secure against attackers with only light technology.

Relative to no surveillance, the advantage of this proposal is that it stops some random person from killing everyone. Realistically I think “don’t do anything” is not an option.

This proposal does give states a de facto monopoly on heavy technology, and would eventually make armed resistance totally impossible. But it’s already the case that states have a massive advantage in armed conflict, and it seems almost inevitable that progress in AI will make this advantage larger (and enable states to do much more with it). Realistically I’m not convinced this proposal makes things much worse than the default.

This proposal definitely expands regulators’ nominal authority and seems prone to abuses. But amongst candidates for handling a future with cheap and destructive dual-use technology, I feel this is the best of many bad options with respect to the potential for abuse.

This proposal puts things in the “dangerous” category by default and considers them safe only after argument. We could take a different default stance; I don’t have a strong view on this. In reality I expect that most particular domains will be governed by more specific norms that overrule any global default.

The cost of this proposal grows continuously as the amount of heavy technology increases, starting from a relatively modest level where only a few kinds of technology need to be monitored. Even once heavy technology is ubiquitous, this proposal is probably not much more expensive than mass surveillance and might be cheaper. Any kind of surveillance is made much cheaper by sophisticated AI, and this proposal is no different.

1 2 Next »

Got a web page? Want to add this automatically updating news feed? Just copy and paste the code below into your HTML.