Feed URL:

Sat, 17 Mar 2018 6:29:38 EDT


  • Epistemic status: trying to share a simplified model of a thing to make it easier to talk about; confident there’s something there, but not confident that my read of it or this attempt at simplification is good.
  • This post is a rewrite of a talk I gave at a CFAR event that seemed well-received; a couple of people who weren’t there heard about it and asked if I’d explain the thing. I tried to write this relatively quickly and keep it relatively short, which may mean it’s less clear than ideal - happy to hash things out in the comments if so.
  • The thing is much easier to describe if I occasionally use some woo-y language like “aura” and “energy” to gesture in the direction of what I mean. I’ll put these words in quotes so you know I know I’m being woo-y; feel free to mentally insert “something-you-might-call-” before each instance, if that helps.

Rationalists love talking about status. And that’s great - it’s a useful idea, for sure.

But I think in our eagerness to notice and apply the concept of status, we end up conflating it with a related-but-different thing. I also think the different thing is super useful in its own right, and important to understand, and I hope sharing my relatively basic thoughts on it will also let others build off that beginning.

So this post is my attempt to explain that different thing. I’m going to call it “making yourself big” and “making yourself small”.

This post:

  1. Horses, goats, bulls
  2. A framework
  3. What to do with it

1. Horses, goats, bulls

Let’s start with an animal showing how it’s done. This video popped up in my feed recently, and is an amazing example of an animal (the goat) “making himself big”. To us, it’s obvious that the bull could flatten the goat in any real contest. But the bull doesn’t know that! He’s not reasoning about relative mass or propulsive power; he’s responding purely to how “big” the goat is making his “aura”.

Watch the video and see if you can notice specifically how the goat is doing this.

0:08 - where the goat rears up - is an obvious moment, but I claim 0:30-0:36 is even better, where the bull feints forward a couple of times while the goat stands firm, using his posture to “project” his “energy” irrepressibly forward. The bull is simply unable to continue toward him.

The next two examples are of a human working with horses, to show examples of it looks like for a human to be big or small.

(Full disclosure: this post is actually describing a concept I ported over directly from horsemanship. Due to the amount of time I spent as a kid thinking about horse training, my brain is basically just a bunch of horse metaphors stacked on top of each other.)

First, getting big. The clip I’ll link to shows a stallion who was abnormally poorly trained, plus probably suffered some brain damage as a foal, who as a result is abnormally aggressive. The trainer, therefore, needs to make himself much bigger than is normally necessary to keep the stallion away from him.

Watch 0:30-0:50 of this video (warning: the video is slightly graphic if you watch all the way to the end).

See how the trainer uses his flag and motion/”energy” towards the stallion to make himself bigger, which pushes the horse away from him - without using any physical contact? Notice that the trainer does not hit the horse. (The motions he’s making may look like threats to strike, and it’s true that “making yourself big” ultimately rests on implied threat, but it’s the same flavor of threat that the goat is making in the video above - made much more of bluster than of capacity to harm.) I’m pretty confident this horse has never been struck by a human in his life, and certainly not by this trainer. He’s not recalling previous pain caused by this human and moving back to avoid it; he’s just instinctively making space for how “big” the trainer has made himself.

I found it harder to find a good clip of getting small, but I think this one of the same trainer working with a troubled mare is pretty good - watch 1:16-1:45 of this video.

Can you see the moments where he is “smallest”? The first is at 1:33, where he’s physically walking away - he’s actually making himself so small that a “vacuum” is created in his wake, and the mare walks towards him to fill it.

A more classic example is 1:41-1:45. Notice that his body faces away from the mare; he does not make eye contact with her; he moves slowly. In response, she’s able to be close to him, because his “aura” (unlike at 0:18-0:26, 1:16-1:21, or 1:39) is not pushing her away.

Hopefully you now feel like you have some intuition for what making yourself “big” or “small” could mean at all. The above examples show “bigness” and “smallness” causing other animals to physically move their bodies; I claim that this type of body language is a significant part of how social mammals like cows, goats, and horses communicate with each other.

So how does this apply to human-human interactions?

You guessed it: it turns out that humans are social mammals too! We just have more complicated ways of moving our bodies around. (e.g., wiggling our mouth-parts in ways intended to produce specific vibrations in the ears of people nearby.)

2. A framework

Before giving some human-to-human examples, here’s a simplified framework to distinguish high/low status from making yourself big/small.

High/low status is about (among other things):

  • How much power you have
  • How much attention you can expect
  • How much space you are entitled to

Making yourself big/small is about (among other things):

  • How much power you are exercising
  • How much attention you are demanding
  • How much space you are taking up

It’s pretty easy to think of examples of people who are high status and tend to make themselves big, or low status and tend to make themselves small. Here’s an example of each:


(That’s Elon Musk and Neville Longbottom, if you don’t recognize them.)

But what goes in the other corners? I encourage you to try to generate an example of each before scrolling down. (I’d love to hear in the comments what you came up with, and if you still think they’re good examples after reading the rest of the post.)







































My low/big example is the very same Neville Longbottom of low/small fame. But this time, it’s Neville in a very specific scene - the one at the end of the first Harry Potter book, where he tries to prevent his friends from leaving the common room. As you may remember, this doesn’t end particularly well for him; indeed, making yourself bigger than the “size” that corresponds to how much status you have is often not a very successful move.

(Which makes sense in the framework above. What would you expect to happen to someone who is trying to exercise more power than they have, demand more attention than they can expect, or take up more space than they’re entitled to? I do think low/big can sometimes be effective, but it’s tough to pull off.)

The high/small example is even more interesting. The image is of Anna Salamon, one of the cofounders of CFAR. I don’t want to refer to teacher-Anna, who stands in front of a room and commands attention, but to mentor-Anna.

Mentor-Anna (whom you probably meet in a setting where she is fairly high-status, as a teacher or organizer or generally a person-whom-others-seem-to-respect) sits in a circle with you, or across from you one-on-one, and makes herself small. She doesn’t play low-status - she doesn’t act scared or powerless or shy. Instead, she talks slowly, leaves plenty of silence for you to fill, physically takes up a small amount of space (with knees to her chest, or legs crossed and hands in front of her, or similar), often looks away from you, and doesn’t interrupt. In response, the people she’s talking with tend to be drawn out of themselves; they have space to reflect; they share half-baked plans and half-acknowledged insecurities. They “expand” to fill the space she has created.

3. What to do with it

As with concepts like status or SNS/PSNS activation, I think what’s useful about having this concept in your mental toolbox is that you can practice:

1) noticing it at play in yourself and others

2) moving where you are on the spectrum

For the latter, probably the most important thing is your mental/emotional state - a friend suggested “not wanting to startle a small bird” as an mindset to inhabit, to encourage yourself to become “smaller”.

If you want more concrete/physical suggestions, here are a few:

  • Interrupt less
  • Be silent more (both by pausing while you’re speaking, and by waiting a little longer before speaking after someone else)
  • Use less eye contact (both with the person you’re speaking with, and with others nearby - e.g. avoid the classic move of looking at everyone around the circle to see if they found your joke funny)
  • Take up less physical space (curl your body in rather than puffing it out, even if only slightly; lean away rather than toward)
  • Make hedged suggestions or share tentative ideas, rather than using more command-like language (example “smaller” language: “what do you think of the idea of…”; “how would it be if we…”; “maybe one option would be…”)

(By contrast, tips on how to play low status would be more like “make yourself seem defenseless/weak/submissive”.)

I’ve focused on making yourself small in this post, since I think it’s undervalued relative to making yourself big. But since every piece of advice can be reversed, maybe also consider whether you should be making yourself big more often, and how you would do that? (Suggested mindset: be a matador, owning the arena despite the bull charging at you.)

This is the part of the post where videos of human-to-human examples would be really helpful. Unfortunately I found it tricky to come up with examples I could search for that aren’t just high/big or low/small (e.g. the classic “new kid takes down the bully” scene in lots of high school movies usually involves the new kid “getting big”, but also doing a bunch of high-status behavior, which doesn't seem very helpful for explaining). Given that this post has been sitting in “drafts” for a couple of weeks now, waiting for me to get around to finding better examples, I decided to go ahead and post it without more videos.

But to point a little more towards why you should care at all, here are some brief descriptions of example situations where I claim this concept is relevant and useful (please mentally insert additional “I claim”s in any unintuitive-seeming places):

  • Per the Anna example above, making yourself small is a really good way to non-explicitly encourage someone who seems shy/intimidated/reticent to feel more comfortable coming out of their shell.
  • At the top, I said people often conflate high/low status with making yourself big/small. As an example, when meeting new people (e.g. at a party or networking event), you might go from thinking “higher status is better” to “I should make myself as big as possible”. But this often backfires, either because you become intimidating (see previous bullet) or because you infringe upon the “space”/”aura” of others, causing them to feel hostile/defensive/aggressive. Decoupling making yourself smaller from playing low status can help you make a much better impression - neither “loud and brash” nor “scared and shy”; more like “self-contained and confident”.
  • In any context where you want the people around you to pay attention to someone else (e.g. if you’re making a joint presentation of some kind, and it’s their turn to hold the floor), making yourself small will make it easier for that person to take up the space and hold the attention of others.
  • As with lots of interpersonal concepts, this can also be useful internally: if you’re familiar with internal double crux / internal family systems / other “parts-work”, play around with the motion of having parts of yourself make themselves small (or big).
  • More generally, directing your “energy”/”aura” in other ways (beyond just “bigger”/”smaller”) - and noticing others doing it - can be useful in tons of situations. As a trivial example, try it the next time you’re in that situation where you bump into someone walking in the other direction, and the two of you can’t figure out which side to pass each other on.

I hope those brief descriptions make sense just based on this post; if not, I can expand on in them the comments if there’s interest. I’d also be curious for what examples you can come up with (or notice in your daily life after reading the post).

If you know me personally, I’m also happy to share more examples of specific mutual acquaintances who are noticeably good or bad at making themselves big or small. I think those would be be difficult and potentially privacy-invading to try to describe to strangers, though, so I’m not including them here.

That’s all I have for now.


LOL j/k, there are also these *~optional horsemanship notes~* (which I could rant on about for pages but which you should feel free to skip):

[1] For the record, the style of horsemanship I like is pretty niche; you should not expect most people who work with horses to have heard the phrase “make yourself small” or to agree with me about what good horse training looks like.

[2] The horsemanship clips above are of Buck Brannaman, a trainer I highly respect. There’s a lot of skill and subtlety to what he’s doing in each clip (which I’d love to discuss with anyone interested), so I’d suggest not drawing strong conclusions about his methods based just on these short videos. If you’re really interested, I recommend this documentary about him, which I highly enjoyed but which might make less sense if you have less context on training horses.

[3] If you’re confused and/or curious about what Buck is doing in the video with the troubled mare: very roughly speaking, he’s 1) making himself big enough that the mare pays attention to him (which - do you see it? - is much less big than he needed to be with the stallion, because she’s not nearly as aggressive/oblivious); 2) showing her that if she’s paying attention to him, nothing bad will happen, and she can relax; 3) making himself small to allow her to approach him while in that relaxed and attentive state.

[4] I originally learned about the idea of making yourself big or small from The Birdie Book, by Dr. Deb Bennett. (The book is named for one of Bennett’s key ideas, which is that working with the horse’s attention/focus - which she nicknames its “birdie” - is a key part of understanding and communicating with horses.) I’m copying here a long passage from the book about getting small - feel free to skip, but I thought it might add some helpful color.

One of the most moving things I ever witnessed in horsemanship was watching Harry Whitney help a frightened weanling filly. She had come from a breeding farm whose operators cynically demonstrate to clients their horses' "brilliance" and "fire" by frightening them until they retreat with rolling eyes, trembling limbs and terrified sweating, to the back corner of a large stall. The filly's new owner, a woman from Arizona, very wisely brought her to us at a nearby ranch in California, for she knew that asking this animal to make the fifteen-hour trailer trip south in such a stressed and terrified state would likely kill her.
The moment Harry entered the pen where the filly had been placed, she began desperate attempts to flee. The pen, which was much too high for her to jump out of, was enclosed by strong wire netting. This was fortunate for it did not permit her to injure herself, which she would most certainly have done otherwise. As it was, she crashed into and bounced off of it once and then, in blind terror, ran straight at Harry, half knocking him down.
Harry's response was to retreat, very slowly, as far as he could get from her while she did likewise with respect to him. Physically as well as energetically, he made himself as small as possible. His flag (Harry's is made out of a collapsible fishing rod), remained stowed in his high-topped boots, as far out of sight as possible.
Then, from a position squatting close to the ground in one corner of the enclosure, Harry began to help the filly make some changes. Every time she would glance out of the pen, Harry would reach down to his boot and just barely crinkle the flag. The first time he did this, the filly stared at him, the whites of her eyes showing, her feet frozen to the ground, her tense and trembling body leaning stiffly away. As soon as she rolled her eyes toward him, Harry would stop the crinkling sound and resume waiting quietly. Those of us who stood watching hardly dared to breathe.
Our apprehension, however, proved unnecessary. As she spent more time regarding Harry, she began to relax. Soon she could stand still, relaxed, when Harry stood up completely straight. In another few minutes, he could take a step toward her - and then reward her for not fleeing by stepping away from her again. In half an hour, she was able to stretch her neck out to sniff his outstretched hand. A few minutes after that, Harry was petting her muzzle and her forehead. She found out it wasn't so bad. In fact, she liked it.
The second day, Harry repeated the first lessons and in a few minutes the filly was able to permit Harry to place the halter around her muzzle, and then buckle it on her head. In the same way, he then taught her to lead: a little pressure from the rope, let her feel of it, let her figure out how to relieve the pressure by stepping up, then release even more slack to her. After each bout, the filly worked her jaws as she chewed things over in her mind.
On the second day, the filly had two sessions with Harry, one in the morning and one in the afternoon, each of about 30 minutes' duration. By the end of the second lesson that day, the filly was allowing Harry to touch her all over, pick up all four feet, and lead her anywhere in her enclosure, which included a stall plus a run. She could follow Harry in and out of the door, stepping daintily over the threshhold connecting the stall to the run.
On the morning of the third day, Harry led her out of the stall. They walked all over the farm. If she showed indications that she might be getting "lost," Harry would crinkle his flag, or merely reach out to touch her. With this reminder of where her teacher was, she could relax again. It was clear that she wanted to be with Harry more than she wanted to be anywhere else. By the same internal process that underlies all affection - or if you like by the same miracle - he had become her trusted friend. He led her in and out of the owner's horse trailer, up and down the ramp, letting her find out all about it, and especially that it wasn't going to hurt her.
Using his human powers of pre-planning and foresight, Harry never got this filly into trouble, never came close to breaking her thread. This allowed her to begin to develop a much wider scale for adjustment. Some people call this "equanimity," "resiliency," or "inner calm." Others call it "emotional maturity."
On the afternoon of the third day, Harry handed the lead line to the owner. She had already learned much by watching the whole process for three days, and with a little support from Harry, she found to her delight that she too could pet, halter, lead, and load her filly and handle her feet. We all realized that they were both going to make it just fine to their new home in Arizona.
When I expressed my admiration to Harry later in private, he said, "my biggest worry was that I might not be able to make myself small enough."


Wed, 7 Mar 2018 7:53:37 EST

Nuclear weapons seem like the marquee example of rapid technological change after crossing a critical threshold.

Looking at the numbers, it seems to me like:

  • During WWII, and probably for several years after the war, the cost / TNT equivalent for manufacturing nuclear weapons was comparable to the cost of conventional explosives, (AI impacts estimates a manufacturing cost of $25M/each)
  • Amortizing out the cost of the Manhattan project, dropping all nuclear weapons produced in WWII would be cost-competitive with traditional firebombing (which this thesis estimates at 5k GBP (=$10k?) / death, vs. ~100k deaths per nuclear weapon) and by 1950, when stockpiles had gown to >100 weapons, was an order of magnitude cheaper. (Nuclear weapons are much easier to deliver, and at that point the development cost was comparable to manufacturing cost).

Separately, it seems like a 4 year lead in nuclear weapons would represent a decisive strategic advantage, which is much shorter than any other technology. My best guess is that a 2 year lead wouldn't do it, but I'd love to hear an assessment of the situation from someone who understands the relevant history/technology better than I do.

So my understanding is: it takes about 4 years to make nuclear weapons and another 4 years for them to substantially overtake conventional explosives (against a 20 year doubling time for the broader economy). Having a 4 year lead corresponds to a decisive strategic advantage.

Does that understanding seem roughly right? What's most wrong or suspect? I don't expect want to do a detailed investigation since this is pretty tangential to my interests, but the example is in the back of my mind slightly influencing my views about AI, and so I'd like it to be roughly accurate or tagged as inaccurate. Likely errors: (a) you can get a decisive strategic advantage with a smaller lead, (b) cost-effectiveness improved more rapidly after the war than I'm imagining, or (c) those numbers are totally wrong for one reason or another.

I think the arguments for a nuclear discontinuity are really strong, much stronger than any other technology. Physics fundamentally has a discrete list of kinds of potential energy, which have different characteristic densities, with a huge gap between chemical and nuclear energy densities. And the dynamics of war are quite sensitive to energy density (nuclear power doesn't seem to have been a major discontinuity). And the dynamics of nuclear chain reactions predictably make it hard for nuclear weapons to be "worse" in any way other than being more expensive (you can't really make them cheaper by making them weaker or less reliable). So the continuous progress narrative isn't making a strong prediction about this case.

(Of course, progress in nuclear weapons involves large-scale manufacturing. Today the economy grows at roughly the same rate as in 1945, but information technology can change much more rapidly.)


Mon, 5 Mar 2018 14:42:45 EST

This is a guest post summarizing Paul Christiano’s proposed scheme for training machine learning systems that can be robustly aligned to complex and fuzzy values, which I call Iterated Distillation and Amplification (IDA) here. IDA is notably similar to AlphaGoZero and expert iteration.

The hope is that if we use IDA to train each learned component of an AI then the overall AI will remain aligned with the user’s interests while achieving state of the art performance at runtime — provided that any non-learned components such as search or logic are also built to preserve alignment and maintain runtime performance. This document gives a high-level outline of IDA.

Motivation: The alignment/capabilities tradeoff

Assume that we want to train a learner A to perform some complex fuzzy task, e.g. “Be a good personal assistant.” Assume that A is capable of learning to perform the task at a superhuman level — that is, if we could perfectly specify a “personal assistant” objective function and trained A to maximize it, then A would become a far better personal assistant than any human.

There is a spectrum of possibilities for how we might train A to do this task. On one end, there are techniques which allow the learner to discover powerful, novel policies that improve upon human capabilities:

  • Broad reinforcement learning: As A takes actions in the world, we give it a relatively sparse reward signal based on how satisfied or dissatisfied we are with the eventual consequences. We then allow A to optimize for the expected sum of its future rewards
  • Broad inverse reinforcement learning: A attempts to infer our deep long-term values from our actions, perhaps using a sophisticated model of human psychology and irrationality to select which of many possible extrapolations is correct.

However, it is difficult to specify a broad objective that captures everything we care about, so in practice A will be optimizing for some proxy that is not completely aligned with our interests. Even if this proxy objective is “almost” right, its optimum could be disastrous according to our true values.

On the other end, there are techniques that try to narrowly emulate human judgments:

  • Imitation learning: We could train A to exactly mimic how an expert would do the task, e.g. by training it to fool a discriminative model trying to tell apart A’s actions from the human expert’s actions.
  • Narrow inverse reinforcement learning: We could train A to infer our near-term instrumental values from our actions, with the presumption that our actions are roughly optimal according to those values.
  • Narrow reinforcement learning: As A takes actions in the world, we give it a dense reward signal based on how reasonable we judge its choices are (perhaps we directly reward state-action pairs themselves rather than outcomes in the world, as in TAMER). A optimizes for the expected sum of its future rewards.

Using these techniques, the risk of misalignment is reduced significantly (though not eliminated) by restricting agents to the range of known human behavior — but this introduces severe limitations on capability. This tradeoff between allowing for novel capabilities and reducing misalignment risk applies across different learning schemes (with imitation learning generally being narrowest and lowest risk) as well as within a single scheme.

The motivating problem that IDA attempts to solve: if we are only able to align agents that narrowly replicate human behavior, how can we build an AGI that is both aligned and ultimately much more capable than the best humans?

Core concept: Analogy to AlphaGoZero

The core idea of Paul’s scheme is similar to AlphaGoZero (AGZ): We use a learned model many times as a subroutine in a more powerful decision-making process, and then re-train the model to imitate those better decisions.

AGZ’s policy network p is the learned model. At each iteration, AGZ selects moves by an expensive Monte Carlo Tree Search (MCTS) which uses policy p as its prior; p is then trained to directly predict the distribution of moves that MCTS ultimately settles on. In the next iteration, MCTS is run using the new more accurate p, and p is trained to predict the eventual outcome of that process, and so on. After enough iterations, a fixed point is reached — p is unable to learn how running MCTS will change its current probabilities.

MCTS is an amplification of p — it uses p as a subroutine in a larger process that ultimately makes better moves than p alone could. In turn, p is a distillation of MCTS: it learns to directly guess the results of running MCTS, achieving comparable performance while short-cutting the expensive computation. The idea of IDA is to use the basic iterated distillation and amplification procedure in a much more general domain.

The IDA Scheme

IDA involves repeatedly improving a learned model through an amplification and distillation process over multiple iterations.

Amplification is interactive and human-directed in IDA

In AGZ, the amplification procedure is Monte Carlo Tree Search — it’s a simple and well-understood algorithm, and there’s a clear mechanism for how it improves on the policy network’s original choices (it traverses the game tree more deeply). But in IDA, amplification is not necessarily a fixed algorithm that can be written down once and repeatedly applied; it’s an interactive process directed by human decisions.

In most domains, humans are capable of improving their native capabilities by delegating to assistants (e.g. because CEOs can delegate tasks to a large team, they can produce orders of magnitude more output per day than they could on their own). This means if our learning procedure can create an adequate helper for the human, the human can use the AI to amplify their ability — this human/AI system may be capable of doing things that the human couldn’t manage on their own.

Below I consider the example of using IDA to build a superhuman personal assistant. Let A[t] to refer to the state of the learned model after the end of iteration t; the initial agent A[0] is trained by a human overseer H.

Example: Building a superhuman personal assistant

H trains A[0] using a technique from the narrow end of the spectrum, such as imitation learning. Here we are imagining a much more powerful version of “imitation learning” than current systems are actually capable of — we assume that A[0] can acquire nearly human-level capabilities through this process. That is, the trained A[0] model executes all the tasks of a personal assistant as H would (including comprehending English instructions, writing emails, putting together a meeting schedule, etc).

Even though A[0] cannot discover any novel capabilities, it has two key advantages over H: it can run much faster, and many copies or versions of it can be run at once. We hope to leverage these advantages to construct a larger system — involving H and many copies of A[0] — that will substantially improve on H’s capabilities while preserving alignment with H’s values.

H can use calls to A[0] (along with other tools such as external memory) to become a better personal assistant. For example, H could assign one copy of A[0] to figuring out the best time to schedule the client’s recurring team meetings, another copy to figure out what to order the client for lunch, another copy to balance the client’s personal budget, etc. H now has the ability to get very quick solutions to sub-problems that are roughly as good as the ones H would have come up with on their own over a longer time period, and can combine these results to make much better decisions than an unaided human.

Let Amplify(H, A[0]) refer to the larger system of H + many copies of A[0] + aids. Compared to A[0] alone, the Amplify(H, A[0]) system has much higher time and resource costs but its eventual decisions are much better. Moreover, because in each of its individual decisions each copy of A[0] continues to act just as a human personal assistant would act, we can hope that Amplify(H, A[0]) preserves alignment.

In the next iteration of training, the Amplify(H, A[0]) system takes over the role of H as the overseer. A[1] is trained with narrow and safe techniques to quickly reproduce the results of Amplify(H, A[0]). Because we assumed Amplify(H, A[0]) was aligned, we can hope that A[1] is also aligned if it is trained using sufficiently narrow techniques which introduce no new behaviors. A[1] is then used in Amplify(H, A[1]), which serves as an overseer to train A[2], and so on.


def IDA(H):
A <- random initialization
A <- Distill(Amplify(H, A))
def Distill(overseer):
Returns an AI trained using narrow, robust techniques to perform
a task that the overseer already understands how to perform.

def Amplify(human, AI):
Interactive process in which human uses many calls to AI to
improve on human's native performance at relevant task(s).

What properties must hold for IDA to work?

The IDA scheme is a template with “slots” for Amplify and Distill procedures that have not been fully specified yet — in fact, they rely on capabilities we don’t yet have. Because IDA itself is not fully specified, it’s not clear what minimal set of properties are necessary for it to succeed.

Achieving alignment and high capability

That said, here are some general properties which seem necessary — though likely not sufficient — for IDA agents to achieve robust alignment and high capability:

  1. The Distill procedure robustly preserves alignment: Given an aligned agent H we can use narrow safe learning techniques to train a much faster agent A which behaves as H would have behaved, without introducing any misaligned optimization or losing important aspects of what values.
  2. The Amplify procedure robustly preserves alignment: Given an aligned agent A, it is possible to specify an amplification scheme which calls A multiple times as a subroutine in a way that reliably avoids introducing misaligned optimization.
  3. At least some human experts are able to iteratively apply amplification to achieve arbitrarily high capabilities at the relevant task: a) there is some threshold of general capability such that if someone is above this threshold, they can eventually solve any problem that an arbitrarily intelligent system could solve, provided they can delegate tasks to similarly-intelligent assistants and are given arbitrary amounts of memory and time; b) at least some human experts are above this threshold of generality — given enough time and resources, they can figure out how to use AI assistants and tools to improve their capabilities arbitrarily far.

The non-profit Ought is working on gathering more evidence about assumptions 2 and 3.

Achieving competitive performance and efficiency

Paul aims for IDA agents to be competitive with traditional RL agents in time and resource costs at runtime — this is a reasonable expectation because an IDA agent is ultimately just another learned model whose weights were tuned with an unusual training procedure.

Resource and time cost during training is a more open question; I haven’t explored the assumptions that would have to hold for the IDA training process to be practically feasible or resource-competitive with other AI projects.


Iterated Distillation and Amplification was originally published in AI Alignment on Medium, where people are continuing the conversation by highlighting and responding to this story.

Sun, 4 Mar 2018 13:26:15 EST

There’s been much discussion of income inequality over the last few years. However, I just randomly came across what should be a seminal related result, published in 2010 but mostly ignored. Let me do my bit to fix that.

People often presume that policy can mostly ignore income inequality if key individual outcomes like health or happiness depend mainly on individual income. Yes, there may be some room for promoting insurance against income risk, but not much room. However, people often presume that policy should pay a lot more attention to inequality if individual outcomes depend more directly on the income of others, such as via envy or discouragement.

However, there’s a simple and plausible income interdependence scenario where inequality matters little for policy: when outcomes depend on rank. If individual outcomes are a function of each person’s percentile income rank, and if social welfare just adds up those individual outcomes, then income policy becomes irrelevant, because this social welfare sum is guaranteed to always add up to the same constant. Income-related policy may influence outcomes via other channels, but not via this channel. This applies whether the relevant rank is global, comparing each person to the entire world, or local, comparing each person only to a local community.

That 2010 paper, by Christopher Boyce, Gordon Brown, and Simon Moore, makes a strong case that in fact the outcome of life satisfaction depends on the incomes of others only via income rank. (Two followup papers find the same result for outcomes of psychological distress and nine measures of health.) They looked at 87,000 Brits, and found that while income rank strongly predicted outcomes, neither individual (log) income nor an average (log) income of their reference group predicted outcomes, after controlling for rank (and also for age, gender, education, marital status, children, housing ownership, labor-force status, and disabilities). These seem to me remarkably strong and robust results. (Confirmed here.)

The irrelevance of individual income and reference group income remained true whether the group within which a person was ranked was the entire sample, one of 19 geographic regions, one of 12 age groups, or one of six gender-schooling groups. This suggests that the actual relevant comparison group is relatively narrow. If people cared mainly about their global rank in the whole sample, then analyses of rank within groups should have missed an effect of the rank of the group, which should have appeared as an effect of reference group income. But such effects weren’t seen.

It these statistical models were the correct model of the world, then income policy could only include influence social welfare via the control variables of age, gender, education, marital status, children, housing ownership, labor-force status, and disabilities. You couldn’t improve social welfare directly by redistributing income, though redistribution or taxation might help by changing control variables.

But even that conclusion seems premature. The key idea here is that people care about their social status rank, and income should only be one of many factors contributing to social status. So we should really be looking at models where all of a person’s observable features can contribute to their status. For each feature, such as personality or marital status, we should ask if our data is best described as that factor contributing directly to social status, which is then ranked to produce individual outcomes, or whether that factor also influences individual outcomes via some other channel, that doesn’t pass through social status. It is only effects via those other channels that might change overall social welfare.

This seems a straightforward statistical exercise, at least for someone with access to relevant data. Who’s up for it?

Wed, 28 Feb 2018 5:16:29 EST

This is part 19 of 30 of Hammertime. Click here for the intro.

As is Hammertime tradition, I’m making a slight change of plans right around the scheduled time for Planning. My excuse this time:

Several commenters pointed out serious gaps in my knowledge of Focusing. I will postpone Internal Double Crux, an advanced form of Focusing, to the next cycle. Instead, we will have two more posts on making and executing long-term plans.

Day 19: TDT for Humans

Previously on planning: Day 8Day 9Day 10.

Today I’d like to describe two orders of approximation to a working decision theory for humans.

TDT 101

Background reading: How I Lost 100 Pounds Using TDT.

Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation.

~ Eliezer

In other words, every time you make a decision, pre-commit to making the same decision in all conceptually similar situations in the future.

The striking value of TDT is: make each decision as if you would immediately reap the long-term rewards of making that same decision repeatedly. And if it turns out you’re an updateless agent, this actually works! You actually lose 100 pounds by making one decision.

I encourage readers who have not tried to live by TDT to stop here and try it out for a week.

TDT 201

There are a number of serious differences between timeless agents and human beings, so applying TDT as stated above requires an unacceptable (to me) level of self-deception. My second order of approximation is to offer a practical and weak version of TDT based on the Solitaire Principle and Magic Brain Juice.

Three objections to applying TDT in real life:


A human is about halfway between “one monolithic codebase” and “a loose confederation of spirits running a random serial dictatorship.” Roughly speaking, each spirit is the piece of you built to satisfy one primordial need: hunger, friendship, curiosity, justice. At any given time, only one or two of these spirits is present and making decisions. As such, even if each individual spirit as updateless and deterministic, you don’t get to make decisions for all the spirits currently inactive. You don’t have as much control over the other spirits as you would like.

Different spirits have access to different data and beliefs. I’ve mentioned, for example, that I have different personalities speaking Chinese and English. You can ask me what my favorite food is in English, and I’ll say dumplings, but the true answer 饺子 feels qualitatively better than dumplings by a wide margin.

Different spirits have different values. I have two friends who reliably provoke my “sadistic dick-measuring asshole” spirit. If human beings really have utility functions this spirit has negative signs in front of the terms for other people. It’s uncharacteristically happy to engage in negative-sum games.

It’s almost impossible to predict when spirits will manifest. Recently, I was on a 13-hour flight back from China. I started marathoning Game of Thrones after exhausting the comedy section, and a full season of Cersei Lannister left me in “sadistic asshole” mode for a full day afterwards. If Hainan Airlines had stocked more comedy movies this might not have occurred.

Spirits can lay dormant for months or years. Meeting up with high school friends this December, I fell into old roles and received effortless access to a large swathe of faded memories.

Conceptual Gerrymandering

Background reading: conceptual gerrymandering.

I can make a problem look either big or small by drawing either a big or small conceptual boundary around it, then identifying my problem with the conceptual boundary I’ve drawn.

TDT runs on an ambiguous “conceptual similarity” clause: you pre-commit to making the same decision in conceptually similar situations. Unfortunately, you will be prone to motivated reasoning and conceptual gerrymandering to get out of timeless pre-commitments made in the past.

This problem can be reduced but not solved by clearly stating boundaries. Life is too high-dimensional to even figure out what variables to care about, let alone where to draw the line for each of them. What information becomes salient is a function of your attention and noticing skills as much as of reality itself. These days, it’s almost a routine experience to read an article that sufficiently alters my capacities for attention as to render situations I would previously have considered “conceptually similar” altogether distinct.

Magic Brain Juice

Background reading: Magic Brain Juice.

Every action you take is accompanied by an unintentional self-modification.

The human brain is finicky code that self-modifies every time it takes an action. The situation is even worse than this: your actions can shift your very values in surprising and illegible ways. This bug is an inherent contradiction to applying TDT as a human.

Self-modification happens in multiple ways. When I wrote Magic Brain Juice, I was referring to the immediate strengthening of neural pathways that are activated, and the corresponding decay through time of all pathways not activated. But other things happen under the hood. You get attached to a certain identity. You get sucked into the nearest attractor in the social web. And also:

Exposure therapy is a powerful and indiscriminate tool. You can reduce any aversion to almost zero just by voluntarily confronting it repeatedly. But you have fears and aversions in every direction!

Every move you make is exposure therapy in that direction.

That’s right.

Every voluntary decision nudges your comfort zone in that direction, squashing aversions (endorsed or otherwise) in its path.



I hope I’ve convinced you that the human brain is sufficiently broken that our intuition about “updateless source code” don’t apply, and trying to make decisions from TDT will be harder (and may have serious unintended side effects) as a result. What can be done?

First, I think it’s worth directly investing in TDT-like behaviors. Make conscious decisions to reinforce the spirits that are amenable to making and keeping pre-commitments. Make more legible decisions and clearly state conceptual boundaries. Explore virtue ethics or deontology. Zvi’s blog is a good a place.

In the same vein, practice predicting your future behavior. If you can become your own Omega, problems you face start looking Newcomb-like. Then you’ll be forced to give up CDT and the failures it entail.

Second, I once proposed a model called the “Ten Percent Shift”:

The Ten Percent Shift is a thought experiment I’ve successfully pushed to System 1 that helps build long-term habits like blogging every day. It makes the assumption that each time you make a choice, it gets 10% easier.

Suppose there is a habit you want to build such as going to the gym. You’ve drawn the pentagrams, sprinkled the pixie dust, and done the proper rituals to decide that the benefits clearly outweigh the costs and there’s no superior alternatives. Nevertheless, the effort to make yourself go every day seems insurmountable.

You spend 100 units of willpower dragging yourself there on Day 1. Now, notice that you have magic brain juice on your side. On Day 2, it gets a little bit easier. You spend 90 units. On Day 3, it only costs 80.

A bit of math and a lot of magic brain juice later, you spend 500 units of willpower in the first 10 days, and the habit is free for the rest of time.

The exact number is irrelevant, but I stand by this model as the proper weakening of TDT: act as if each single decision rewards you with 10% of the value of making that same decision indefinitely. One decision only loses you 10 pounds, and you need to make 10 consecutive decisions before you get to reap the full rewards.

The Ten Percent Shift guards against spirits. Once you make the same decision 10 times in a row, you’ll have made it from a wide range of states of mind, and the exact context will have differed in every situation. You’ll probably have to convince a majority of spirits to agree with making the decision.

The Ten Percent Shift also guards against conceptual gerrymandering. Having made the same decision from a bunch of different situations, the convex hull of these data points is a 10-dimensional convex region that you can unambiguously stake out as a timeless pre-commitment.

Daily Challenge

This post is extremely tentative and theoretical, so I’ll just open up the floor for discussion.

Sun, 25 Feb 2018 5:33:37 EST

Futurists have argued for years about whether the development of AGI will look more like a breakthrough within a small group (“fast takeoff”), or a continuous acceleration distributed across the broader economy or a large firm (“slow takeoff”).

I currently think a slow takeoff is significantly more likely. This post explains some of my reasoning and why I think it matters. Mostly the post lists arguments I often hear for a fast takeoff and explains why I don’t find them compelling.

(Note: this is not a post about whether an intelligence explosion will occur. That seems very likely to me. Quantitatively I expect it to go along these lines. So e.g. while I disagree with many of the claims and assumptions in Intelligence Explosion Microeconomics, I don’t disagree with the central thesis or with most of the arguments.)

(See also: AI Impacts page on the same topic.)

Slow takeoff

Slower takeoff means faster progress

Fast takeoff is often justified by pointing to the incredible transformative potential of intelligence; by enumerating the many ways in which AI systems will outperform humans; by pointing to historical examples of rapid change; etc.

This gives the impression that people who expect a slow takeoff think AI will have a smaller impact, or will take longer to transform society.

But I think that’s backwards. The main disagreement is not about what will happen once we have a superintelligent AI, it’s about what will happen before we have a superintelligent AI. So slow takeoff seems to mean that AI has a larger impact on the world, sooner.


In the fast takeoff scenario, weaker AI systems may have significant impacts but they are nothing compared to the “real” AGI. Whoever builds AGI has a decisive strategic advantage. Growth accelerates from 3%/year to 3000%/year without stopping at 30%/year. And so on.

In the slow takeoff scenario, pre-AGI systems have a transformative impact that’s only slightly smaller than AGI. AGI appears in a world where everything already happens incomprehensibly quickly and everyone is incredibly powerful. Being 12 months ahead in AGI might get you a decisive strategic advantage, but the world has accelerated so much that that’s just about as hard as getting to airplanes 30 years before anyone else.

Operationalizing slow takeoff

There will be a complete 4 year interval in which world output doubles, before the first 1 year interval in which world output doubles. (Similarly, we’ll see an 8 year doubling before a 2 year doubling, etc.)

At some point there will be incredibly powerful AI systems. They will have many consequences, but one simple consequence is that world output will grow much more quickly. I think this is a good barometer for other transformative effects, including large military advantages.

I believe that before we have incredibly powerful AI, we will have AI which is merely very powerful. This won’t be enough to create 100% GDP growth, but it will be enough to lead to (say) 50% GDP growth. I think the likely gap between these events is years rather than months or decades.

In particular, this means that incredibly powerful AI will emerge in a world where crazy stuff is already happening (and probably everyone is already freaking out). If true, I think it’s an important fact about the strategic situation.

(Operationalizing takeoff speed in terms of economic doublings may seem weird, but I do think it gets at the disagreement: proponents of fast takeoff don’t seem to expect the 4 year doubling before takeoff, or at least their other beliefs about the future don’t seem to integrate that expectation.)

The basic argument

The prima facie argument for slow takeoff is pretty straightforward:

  • Before we have an incredibly intelligent AI, we will probably have a slightly worse AI.
    • Lots of people will be trying to build powerful AI.
    • For most X, it is easier to figure out how to do a slightly worse version of X than to figure out how to do X.
      • The worse version may be more expensive, slower, less reliable, less general… (Usually there is a tradeoff curve, and so you can pick which axes you want the worse version to be worse along.)
    • If many people are trying to do X, and a slightly worse version is easier and almost-as-good, someone will figure out how to do the worse version before anyone figures out how to do the better version.
    • This story seems consistent with the historical record. Things are usually preceded by worse versions, even in cases where there are weak reasons to expect a discontinuous jump.
      • The best counterexample is probably nuclear weapons. But in that case there were several very strong reasons for discontinuity: physics has an inherent gap between chemical and nuclear energy density, nuclear chain reactions require a large minimum scale, and the dynamics of war are very sensitive to energy density.
  • A slightly-worse-than-incredibly-intelligent AI would radically transform the world, leading to growth (almost) as fast and military capabilities (almost) as great as an incredibly intelligent AI.

This simple argument pushes towards slow takeoff. But there are several considerations that could push towards fast takeoff, which we need to weigh against the basic argument.

Obviously this is a quantitative question. In this post I’m not going to get into the numbers because the substance of the disagreement seems to be about qualitative models.

Reasons to expect fast takeoff

People have offered a variety of reasons to expect fast takeoff. I think that many of these arguments make sense, but I don’t think they support the kind of highly concentrated, discontinuous progress which fast takeoff proponents seem to typically have in mind.

I expect there are other arguments beyond these, or that I’ve misunderstood some of these, and look forward to people pointing out what I’m missing.

Humans vs. chimps

Summary of my response: chimps are nearly useless because they aren’t optimized to be useful, not because evolution was trying to make something useful and wasn’t able to succeed until it got to humans.

Chimpanzees have brains only ~3x smaller than humans, but are much worse at making technology (or doing science, or accumulating culture…). If evolution were selecting primarily or in large part for technological aptitude, then the difference between chimps and humans would suggest that tripling compute and doing a tiny bit of additional fine-tuning can radically expand power, undermining the continuous change story.

But chimp evolution is not primarily selecting for making and using technology, for doing science, or for facilitating cultural accumulation.  The task faced by a chimp is largely independent of the abilities that give humans such a huge fitness advantage. It’s not completely independent—the overlap is the only reason that evolution eventually produces humans—but it’s different enough that we should not be surprised if there are simple changes to chimps that would make them much better at designing technology or doing science or accumulating culture.

If we compare humans and chimps at the tasks chimps are optimized for, humans are clearly much better but the difference is not nearly as stark. Compare to the difference between chimps and gibbons, gibbons and lemurs, or lemurs and squirrels.

Relatedly, evolution changes what it is optimizing for over evolutionary time: as a creature and its environment change, the returns to different skills can change, and they can potentially change very quickly. So it seems easy for evolution to shift from “not caring about X” to “caring about X,” but nothing analogous will happen for AI projects. (In fact a similar thing often does happen while optimizing something with SGD, but it doesn’t happen at the level of the ML community as a whole.)

If we step back from skills and instead look at outcomes we could say: “Evolution is always optimizing for fitness, and humans have now taken over the world.” On this perspective, I’m making a claim about the limits of evolution. First, evolution is theoretically optimizing for fitness, but it isn’t able to look ahead and identify which skills will be most important for your children’s children’s children’s fitness. Second, human intelligence is incredibly good for the fitness of groups of humans, but evolution acts on individual humans for whom the effect size is much smaller (who barely benefit at all from passing knowledge on to the next generation). Evolution really is optimizing something quite different than “humanity dominates the world.”

So I don’t think the example of evolution tells us much about whether the continuous change story applies to intelligence. This case is potentially missing the key element that drives the continuous change story—optimization for performance. Evolution changes continuously on the narrow metric it is optimizing, but can change extremely rapidly on other metrics. For human technology, features of the technology that aren’t being optimized change rapidly all the time. When humans build AI, they will be optimizing for usefulness, and so progress in usefulness is much more likely to be linear.

Put another way: the difference between chimps and humans stands in stark contrast to the normal pattern of human technological development. We might therefore infer that intelligence is very unlike other technologies. But the difference between evolution’s optimization and our optimization seems like a much more parsimonious explanation. To be a little bit more precise and Bayesian: the prior probability of the story I’ve told upper bounds the possible update about the nature of intelligence.

AGI will be a side-effect

Summary of my response: I expect people to see AGI coming and to invest heavily.

AI researchers might be optimizing for narrow forms of intelligence. If so we could have the same dynamic as with chimps—we see continuous progress on accomplishing narrow tasks in a narrow way, leading eventually to a jump in general capacities as a side-effect. These general capacities then also lead to much better progress on narrow tasks, but there is no reason for progress to be continuous because no one is optimizing for general intelligence.

I don’t buy this argument because I think that researchers probably will be optimizing aggressively for general intelligence, if it would help a lot on tasks they care about. If that’s right, this argument only implies a discontinuity if there is some other reason that the usefulness of general intelligence of general intelligence is discontinuous.

However, if researchers greatly underestimate the impact of general intelligence and so don’t optimize for it, I agree that a fast takeoff is plausible. It could turn out that “will researchers adequately account for the impact of general intelligence and so try to optimize it?” is a crux. My intuition is based on a combination of (weak) adequacy intuitions and current trends in ML research.

Finding the secret sauce

Summary of my response: this doesn’t seem common historically, and I don’t see why we’d expect AGI to be more rather than less like this (unless we accept one of the other arguments)

Another common view is that there are some number of key insights that are needed to build a generally intelligent system. When the final pieces fall into place we may then see a large jump; one day we have a system with enough raw horsepower to be very smart but critical limitations, and the next day it is able to use all of that horsepower.

I don’t know exactly how to respond to this view because I don’t feel like I understand it adequately.

I’m not aware of many historical examples of this phenomenon (and no really good examples)—to the extent that there have been “key insights” needed to make something important work, the first version of the insight has almost always either been discovered long before it was needed, or discovered in a preliminary and weak version which is then iteratively improved over a long time period.

To the extent that fast takeoff proponent’s views are informed by historical example, I would love to get some canonical examples that they think best exemplify this pattern so that we can have a more concrete discussion about those examples and what they suggest about AI.

Note that a really good example should be on a problem that many people care about. There are lots of examples where no one is thinking about X, someone uncovers an insight that helps a lot with X, and many years later that helps with another task Y that people do care about. That’s certainly interesting, but it’s not really surprising at all on the slow-change view unless it actually causes surprisingly fast progress on Y.

Looking forward to AGI, it seems to me like if anything we should have a somewhat smaller probability than usual that a final “key insight” making a huge difference.

  • AGI was built by evolution, which is more likely if it can be built by iteratively improving simple ingredients.
  • It seems like we already have a set of insights that are sufficient for building an autopoetic AGI so we won’t be starting from 0 in any case.
  • Historical AI applications have had a relatively small loading on key-insights and seem like the closest analogies to AGI.

The example of chimps or dumb humans seems like one of the best reasons to expect a key insight, but I’ve already discussed why I find that pretty unconvincing.

In this case I don’t yet feel like I understand where fast takeoff proponents are coming from, so I think it is especially likely that my view will change based on further discussion. But I would really like to see a clearer articulation of the fast takeoff view here as an early step of that process.

Universality thresholds

Summary of my response: it seems like early AI systems will cross universality thresholds pre-superintelligence, since (a) there are tradeoffs between universality and other desirable properties which would let people build universal AIs early if the returns to universality are large enough, (b) I think we can already build universal AIs at great expense.

Some cognitive processes get stuck or “run out of steam” if you run them indefinitely, while others are able to deliberate, improve themselves, design successor systems, and eventually reach arbitrarily high capability levels. An AI system may go from being weak to being very powerful as it crosses the threshold between these two regimes.

It’s clear that some humans are above this universality threshold, while chimps and young children are probably below it. And if you take a normal human and you inject a bunch of noise into their thought process (or degrade it) they will also fall below the threshold.

It’s easy to imagine a weak AI as some kind of handicapped human, with the handicap shrinking over time. Once the handicap goes to 0 we know that the AI will be above the universality threshold. Right now it’s below the universality threshold. So there must be sometime in between where it crosses the universality threshold, and that’s where the fast takeoff is predicted to occur.

But AI isn’t like a handicapped human. Instead, the designers of early AI systems will be trying to make them as useful as possible. So if universality is incredibly helpful, it will appear as early as possible in AI designs; designers will make tradeoffs to get universality at the expense of other desiderata (like cost or speed).

So now we’re almost back to the previous point: is there some secret sauce that gets you to universality, without which you can’t get universality however you try? I think this is unlikely for the reasons given in the previous section.

There is another reason I’m skeptical about hard takeoff from universality secret sauce: I think we already could make universal AIs if we tried (that would, given enough time, learn on their own and converge to arbitrarily high capability levels), and the reason we don’t is because it’s just not important to performance and the resulting systems would be really slow. This inside view argument is too complicated to make here and I don’t think my case rests on it, but it is relevant to understanding my view.

“Understanding” is discontinuous

Summary of my response: I don’t yet understand this argument and am unsure if there is anything here.

It may be that understanding of the world tends to click, from “not understanding much” to “understanding basically everything.”

You might expect this because everything is entangled with everything else. If you only understand 20% of the world, then basically every sentence on the internet is confusing, so you can’t make heads or tails of everything. This seems wrong to me for two reasons. First, information is really not that entangled even on the internet, and the (much larger) fraction of its knowledge that an AI generates for itself is going to be even less entangled. Second, it’s not right to model the AI as having a gradually expanding domain that it understands at all, with total incomprehension everywhere else. Unless there is some other argument for a discontinuity, then a generalist AI’s understanding of each domain will just continuously improve, and so taking the minimum across many domains doesn’t make things particularly discontinuous.

People might instead expect a click because that’s what they experience. That’s very unlike my experience, but maybe other people differ—it would be very interesting if this was a major part of where people were coming from. Or that may be how they perceive others’ thought processes as working. But when I look at others’ understanding, it seems like it is common to have a superficial or weak understanding which transitions gradually into a deep understanding.

Or they might expect a click because the same progress which lets you understand one area will let you understand many areas. But that doesn’t actually explain anything: you’d expect partial and mediocre understanding before a solid understanding.

Of course all the arguments in other sections (e.g. secret sauce, chimps vs. humans) can also be arguments about why understanding will be discontinuous. In the other sections I explain why I don’t find those arguments convincing.

Deployment lag

Summary of my response: current AI is slow to deploy and powerful AI will be fast to deploy, but in between there will be AI that takes an intermediate length of time to deploy.

When AI improves, it takes a while for the world to actually benefit from the improvement. For example, we need to adjust other processes to take advantage of the improvement and tailor the new AI system to the particular domains where it will be used. This seems to be an artifact of the inflexibility of current technology, and e.g. humans can adapt much more quickly to be useful in new settings.

Eventually, powerful AI will become useful in new situations even faster than people. So we may have a jump from narrow AI, that takes a long time to deploy, to general AI that is easily deployed.

I’ve heard this argument several times over the last few months, but don’t find the straightforward version convincing: without some other argument for discontinuity, I don’t see why “time to deploy” jumps from a large number to a small number. Instead, I’d expect deployment to become continuously easier as AI improves.

A slight variant that I think of as the “sonic boom” argument goes like this: suppose each month of AI research makes AI a little bit easier to deploy. Over time AI research gradually accelerates, and so the deployment time shrinks faster and faster. At some point, a month of AI research decreases deployment time by more than a month. At this point, “deploy AI the old-fashioned way” becomes an unappealing strategy: you will get to market faster by simply improving AI. So even if all of the dynamics are continuous, the quality of deployed AI would jump discontinuously.

This phenomenon only occurs if it is very hard to make tradeoffs between deployment time and other features like cost or quality. If there is any way to tradeoff other qualities against deployment time, then people will more quickly push worse AI products into practice, because the benefits of doing so are large. I strongly expect it to be possible to make tradeoffs, because there are so many obvious-seeming ways to trade off deployment time vs. usefulness (most “deployment time” is really just spending time improving the usefulness of a system) and I haven’t seen stories about why that would stop.

Recursive self-improvement

Summary of my response: Before there is AI that is great at self-improvement there will be AI that is mediocre at self-improvement.

Powerful AI can be used to develop better AI (amongst other things). This will lead to runaway growth.

This on its own is not an argument for discontinuity: before we have AI that radically accelerates AI development, the slow takeoff argument suggests we will have AI that significantly accelerates AI development (and before that, slightly accelerates development). That is, an AI is just another, faster step in the hyperbolic growth we are currently experiencing, which corresponds to a further increase in rate but not a discontinuity (or even a discontinuity in rate).

The most common argument for recursive self-improvement introducing a new discontinuity seems be: some systems “fizzle out” when they try to design a better AI, generating a few improvements before running out of steam, while others are able to autonomously generate more and more improvements. This is basically the same as the universality argument in a previous section.

Train vs. test

Summary of my response: before you can train a really powerful AI, someone else can train a slightly worse AI.

Over the course of training, ML systems typically go quite quickly from “really lame” to “really awesome”—over the timescale of days, not months or years.

But the training curve seems almost irrelevant to takeoff speeds. The question is: how much better is your AGI then the AGI that you were able to train 6 months ago?

If you are able to raise $X to train an AGI that could take over the world, then it was almost certainly worth it for someone 6 months ago to raise $X/2 to train an AGI that could merely radically transform the world, since they would then get 6 months of absurd profits. Likewise, if your AGI would give you a decisive strategic advantage, they could have spent less earlier in order to get a pretty large military advantage, which they could then use to take your stuff.

In order to actually get a discontinuity, it needs to be the case that either scaling up the training effort slightly, or waiting a little while longer for better AI technology, leads to a discontinuity in usefulness. So we’re back to the other arguments.

Discontinuities at 100% automation

Summary of my response: at the point where humans are completely removed from a process, they will have been modestly improving output rather than acting as a sharp bottleneck that is suddenly removed.

Consider a simple model in which machines are able to do a p fraction of the subtasks of some large task (like AGI design), with constantly increasing efficiency, and humans are needed to perform the final (1-p). If humans are the dominant cost, and we hold fixed the number of humans as p increases, then total output grows like 1 / (1-p). As we approach 0, productivity rapidly to the machine-only level. In the past I found this argument pretty compelling.

Suppose that we removed the humans altogether from this process. On the naive model, productivity would jump from 0 (since machines can’t do the task) to some very large value. I find that pretty unlikely, and it’s precisely what we’ve discussed in the previous sections. It seems much more likely that at the first point when machines are able to do a task on their own, they are able to do it extremely poorly—and growth thereafter seems like it ought to accelerate gradually.

Adding humans to the picture only seems to make the change more gradual: at early times humans accelerate progress a lot, and as time goes on they provide less and less advantage (as machines replace them), so totally replacing humans seems to reduce acceleration.

Ultimately it seems like this comes down to whether you already expect discontinuous progress based on one of the other arguments, especially the secret sauce or universality threshold arguments. Phasing out humans seems to decrease, rather than increase, the abruptness of those changes.

This argument is still an important one, and it is true that if one of the other arguments generates a discontinuity then that discontinuity will probably be around the same time as 100% automation. But this argument is mostly relevant as a response to certain counterarguments about complementarity that I didn’t actually make in any of the other sections.

The weight of evidence

We’ve discussed a lot of possible arguments for fast takeoff. Superficially it would be reasonable to believe that no individual argument makes fast takeoff look likely, but that in the aggregate they are convincing.

However, I think each of these factors is perfectly consistent with the continuous change story and continuously accelerating hyperbolic growth, and so none of them undermine that hypothesis at all. This is not a case of a bunch of weak signs of fast takeoff providing independent evidence, or of a bunch of weak factors that can mechanically combine to create a large effect.

(The chimps vs. humans case is an exception—it does provide Bayesian evidence for fast takeoff that could be combined with other factors. But it’s just one.)

I could easily be wrong about any one of these lines of argument. So I do assign a much higher probability to fast takeoff than I would if there were fewer arguments (I’m around 30% of fast takeoff). But if I change my mind, it will probably be because one of these arguments (or another argument not considered here) turns out to be compelling on its own. My impression is that other people in the safety community have more like a 70% or even 90% chance of fast takeoff, which I assume is because they already find some of these arguments compelling.

Why does this matter?

Sometimes people suggest that we should focus on fast takeoff even if it is less likely. While I agree that slow takeoff improves our probability of survival overall, I don’t think either: (a) slow takeoff is so safe that it’s not important to think about, or (b) plans designed to cope with fast takeoff will also be fine if there is a slow takeoff.

Neither takeoff speed seems unambiguously easier-to-survive than the other:

  • If takeoff is slow: it will become quite obvious that AI is going to transform the world well before we kill ourselves, we will have some time to experiment with different approaches to safety, policy-makers will have time to understand and respond to AI, etc. But this process will take place over only a few years, and the world will be changing very quickly, so we could easily drop the ball unless we prepare in advance.
  • If takeoff is fast: whoever develops AGI first has a massive advantage over the rest of the world and hence great freedom in choosing what to do with their invention. If we imagine AGI being built in a world like today, it’s easy to imagine pivotal actions that are easier than the open-ended alignment problem. But in slow takeoff scenarios, other actors will already have nearly-as-good-AGI, and a group that tries to use AGI in a very restricted or handicapped way won’t be able to take any pivotal action. So we either need to coordinate to avoid deploying hard-to-control AGI, or we need to solve a hard version of AI alignment (e.g. with very good security / competitiveness / scalability).

These differences affect our priorities:

  • If takeoff is more likely to be slow:
    • We should have policy proposals and institutions in place which can take advantage of the ramp-up period, because coordination is more necessary and more feasible.
    • We can afford to iterate on alignment approaches, but we need to solve a relatively hard version of the alignment problem.
  • If takeoff is more likely to be fast:
    • We shouldn’t expect state involvement or large-scale coordination.
    • We’ll have less time at the last minute to iterate on alignment, but it might be OK if our solutions aren’t competitive or have limited scalability (they only have to scale far enough to take a pivotal action).

Beyond the immediate strategic implications, I often feel like I have a totally different world in mind than other people in the AI safety community. Given that my career is aimed at influencing the future of AI, significantly changing my beliefs about that future seems like a big win.

Sat, 24 Feb 2018 13:51:57 EST

Follow-up to: Fake Frameworks, Kenshō

Related to: Slack, Newcomblike Problems are the Norm

I’d like to offer a fake framework here. It’s a little silly, and not fully justified, but it keeps producing meaningful results in my life when I use it. Some of my own personal examples are:

  • Overcoming a crushing depression
  • Learning how to set aside my “performance mode” and be more authentic and vulnerable when I want to be
  • Shifting my attachment style from anxious-preoccupied to mostly secure
  • Fixing a lifelong problem where I love athletics but I kept badly damaging my body almost any time I tried anything athletic
  • Setting myself up to experience kenshō

I recognize that this doesn’t address Oli’s hesitation stemming from his sense of a lemons problem. I’m afraid I don’t know how to, at least not yet. So in the meantime, please take these as my reports of my experience and how it seems to me they came about, rather than as an attempt to persuade.

Though I do hope y’all will benefit from this. If nothing else, I think this framework is fun. I’ve sometimes described one use of it as “overcoming personally meaningful challenges by living an epic life.” It’s an awesome framework to play with — especially when others join in while remembering that the framework is fake.

Here I’ll outline a general theory I’ll want to call on in some upcoming posts. Tomorrow I’ll separately post an application. (Originally they were written as one post, but I got some feedback on an earlier draft that convinced me they should be separate.) At that point we should have some meaningful tools for wrestling with self-deception.

So with that, let’s get started.

When you walk into an improv scene, you usually have no idea what role you’re playing. All you have is some initial prompt — something like:

“You three are in a garden. The scene has to involve a stuffed bear somehow. Go!”

So now you’re looking to the other people there. Then someone jumps forward and adds to the scene: “Oh, there it is! I’m glad we finally found it!” Now you know a little bit about your character, and about the character of the person who spoke, but not enough to fully define anyone’s role.

You can then expand the scene by adding something: “It’s about time! We’re almost late now.” Now you’ve specified more about what’s going on, who you are, and who the other players are. But it’s still the case that none of you knows what’s going on.

In fact, if you think you know, you’ll often quickly be proven wrong. Maybe you imagine in that scene you’re an uptight punctual person. And then the third person in the scene says to you, “What do you care, Alex? You’re always late to everything anyway!” Surprise! Now you need to flush who you thought you were from your mind, accept the new frame, and run with it as part of your newly evolving identity. Otherwise the scene sort of crashes.

It would go more smoothly if you didn’t hold any preconceptions about who you are or what’s going on. The scene tends to work better if you stay in the present moment and just jump in with the first thing that comes to mind (as long as it’s shaped by what has happened so far). Then the collection of interactions and emerging roles spontaneously guides your behavior, which in turn help guide others’ behavior, all of which recursively defines the “who” and “what” of the scene. Your job as a player isn’t to play a character; it’s to co-create a scene.

We can sort of pretend that there’s a “director”: it’s the intelligence that emerges between the players via their interactions. It’s a distributed system that computes relationships and context by guiding each node in its network to act freely within constraints. From this vantage point, the network guides players, and the job of each player is to be guidable but not purely passive (since a passive node is just relaying information rather than aiding in the computation). As long as everyone involved is plugged into and responsive to this network, the scene will usually play out well.

I suspect that improv works because we’re doing something a lot like it pretty much all the time. The web of social relationships we’re embedded in helps define our roles as it forms and includes us. And that same web, as the distributed “director” of the “scene”, guides us in what we do.

A lot of (but not all) people get a strong hit of this when they go back to visit their family. If you move away and then make new friends and sort of become a new person (!), you might at first think this is just who you are now. But then you visit your parents… and suddenly you feel and act a lot like you did before you moved away. You might even try to hold onto this “new you” with them… and they might respond to what they see as strange behavior by trying to nudge you into acting “normal”: ignoring surprising things you say, changing the topic to something familiar, starting an old fight, etc.

In most cases, I don’t think this is malice. It’s just that they need the scene to work. They don’t know how to interact with this “new you”, so they tug on their connection with you to pull you back into a role they recognize. If that fails, then they have to redefine who they are in relation to you — which often (but not always) happens eventually.

I’m basically taking as an axiom of this framework that people need the “scene” to work — which is to say, they need to be able to play out their roles in relation to others’ roles within a coherent context. I don’t think why this is the case is relevant for using this framework… but I’ll wave my hands at a vague just-so story anyway for the sake of pumping intuition: human beings’ main survival strategy seems to be based on coordinating in often complex ways in tribes. For the individual, this means that fitting in becomes paramount. For the group, this means knowing what to expect from each person is critical. So a trade becomes possible: the individual can fit into and benefit from the group as long as they’re playing a role that fits well with the collective.

This can result in some pretty strange roles. From this vantage point, a person who repeatedly leaves one abusive relationship only to get into another roughly similar one actually makes a lot of sense: this is a role that this person knows how to play. It’s horrible, but it’s still better than not fitting into the social scene. It creates a coherent relationship with someone who’s willing to (or has to) play an “abuser” role, and often with people in “rescuer” roles too. The trap they’re in isn’t (just) that their current abusive partner is gaslighting or threatening them; it’s that they don’t have another role they can see how to play. Unless and until that person finds a different one that fits into the social web, the strands of that web will tug them back into their old role. They don’t have enough slack in the web around them to change their fate.

The same kind of web/slack dynamics show up in more pleasant-to-play roles too. The privilege of a middle-class American white man by default has him playing out some kind of roughly known story-like path (probably involving college and having kids and maybe a divorce) that, in the end, will probably still leave him being one of the richest people on Earth. And all the while, he might well have no clue that he has other options or even that he’s on a path — but he’ll still know, somehow, not to step off that path (“I have to go to college; are you crazy?”). Never mind that his lack of slack here is awfully convenient for him.

I’ve watched religious conversions and deconversions happen via basically the same mechanism. I knew a fellow many years ago (unattached to this community) who was a proud atheist. Then he started dating a Christian girl. Something like a month later, he started quoting the Bible — but “only because they’re handy metaphors” and not because he really believed any of that stuff, you see. It later turned out he’d been going to church with her. He kept offering reasons that seemed vaguely plausible (“It’s a neat group of people, and it matters to her, and I can take the time to read”), but there’s a pattern here that was obvious. A few months later he told me he’d converted. Last I heard they had moved to Utah.

The great part is, I knew this was going to happen when they started dating. Why? Because when I warned him that he might find himself wanting to believe her religion once they started having sex, his reaction was to reassure me by acting confident that he was immune to this. That meant he was more focused on managing my perception of him than he was in noticing how the social web was tugging him toward a transition of roles. I didn’t know if they’d stay together, but I was pretty sure that if they did, he’d convert.

I could give literally hundreds of examples like this. From where I’m standing, it looks like one of the great challenges of rationality is that people change their minds about meaningful things mostly only when the web tugs them into a new role. Actually thinking in a way that for real changes your mind in ways that defy your web-given role is socially deviant, and therefore personally dangerous, and therefore something you’re motivated not to learn how to do.

Ah, but if we’re immersed in a culture where status and belonging are tied to changing our minds, and we can signal that we’re open to updating our beliefs, then we’re good… as long as we know Goodhart’s Demon isn’t lurking in the shadows of our minds here. But surely it’s okay, right? After all, we’re smart and we know Bayesian math, and we care about truth! What could possibly go wrong?

Another challenge here is that the part of us that feels like it’s thinking and talking is (usually) analogous to a character in an improv scene. The players know they’re in a scene, but the characters they’re playing don’t. The characters also aren’t surprised about who or what they are: the not-knowing of identity and context is something only the players experience, to open themselves up to the guidance of the distributed “director”. This means that (a) the characters are actively wrong about why they do what they do, and (b) they are also deeply confused about how much sense everything makes and don’t know they’re confused.

I claim that most of us, most of the time, are playing out characters as defined by the surrounding web — and we usually haven’t a clue how to Look at this fact, much less intentionally use our web slack to change our stories.

I think this is also part of why improv is challenging: you have to set aside the character you would normally play in order to create room for something new.

There’s a way in which the social web holds the position of Omega in an ongoing set of Newcomblike problems. The web as a whole wants to know what kind of role you’re playing, and how well you’re going to play it, so that it can know what to expect of you. So, a lot of its distributed resources go into computing a model of you.

One of the more obvious transmission methods is chat — idle gossip, storytelling, speculation, small talk. People sync up their impressions of someone they’ve met, and try to make sense of surprising events in conversation. If a lover brings their partner some flowers and the recipient freaks out and runs off, suddenly there’s a need to understand, and the flower-giver might try asking a mutual friend for some help understanding. And even if they do come to understand (“Oh, that’s because their last partner brought them flowers to break up with them”), there’s often an impulse to share the story with friends, so that the web as a whole can hold everyone in sensible roles and make the scene work. (“Oh, we had a funny misunderstanding earlier, poor Sam….”)

A lot of this is transmitted more subtly too, in body language and facial expressions and vocal tone and so on. If Bob is “creepy” (i.e., is playing a “creepy” role in the web), then it speaks volumes if everyone who meets Bob then cringes just a tiny bit when he’s later mentioned even if they say only good things about him. This means that someone who has never met Bob can get a “vibe” about him from multiple people in a way that shapes how they interpret what Bob says and does when they finally do meet him.

Sometimes, some people with enough web-savvy weaponize this. It doesn’t mean anything for someone to “be creepy” except that they have a web-like impact on others — which is to say, they have a “creepy” role. In a healthy network, this correlates with something actually meaningfully bad that’s worth tracking. But because perceived roles shape what people expect of a person, it’s enough for a rumor to echo through the web in order for someone to be interpreted as “creepy”. So a sufficiently cunning person could actually cause someone to be slowly isolated and distrusted without there being any facts at all to justify this as Omega’s stance.

(And yes, I’ve seen this happen. Many times.)

The same kind of thing can happen with “positive” labels, too. What it means for someone to be fit for a leadership role, in Omega’s eyes, is that they are seen as compatible with that role. So if someone is tall, attractive, and either vicious or strong depending on how you choose to see it, it might be enough to have the “strong” interpretation echo more powerfully than the “vicious” one in order for the web to conspire to put them in a leadership position.

…which means that even people who are seen as good leaders might not, in fact, be good leaders in the sense of making good leadership decisions. But they are by definition good leaders in the sense of playing the role well. After all, if the general consensus is that Abraham Lincoln was a great President, then there’s a sense in which that makes it true, since that’s what “great” means here. The “explanations” thereafter are often stories to justify one’s holding of a popular opinion.

The same thing holds for when someone seems “rational”. This is one reason to worry deeply when members of subgroups internally agree with each other on who is a top-notch clear thinker or “really a rationalist” but disagree with people in other subgroups. This looks less to me like people seeking truth, and a lot more like groups engaging in a subtle memetic battle over what “rational” gets to mean.

From where I’m standing, it looks to me like we’re all immersed in not-knowing, while our “characters” keep talking as though they know what’s going on, implicitly following some hidden-to-them script.

The web encodes a lot of its guidance about what to expect and how to behave via the structure of stories. Or rather, story structures are what expectations about roles and scenes are.

The trouble is, a lot of the stories we talk about have the structure of what our characters are supposed to say rather than of what actually happens. Imagine a movie where the new kid at a school gets bullied by the popular kids and then makes friends with quirky outcasts. What happens to the bullies in the end? In real life, bullies often don’t get their comeuppance — but having this fictional story in our hearts lets us play out the indignant versions of our characters in the real-life version. Because the bullies aren’t supposed to get away with it, right? That wouldn’t be fair!

Some parts of our story-like intuitions are scripts. Some are things our scripts say we should think or talk about. Some are merely incidental details. Sussing out which parts are which is part of the trick of getting this framework to work for you. For instance, the stereotypical story of the worried nagging wife confronting the emotionally distant husband as he comes home really late from work… is actually a pretty good caricature of a script that lots of couples play out, as long as you know to ignore the gender and class assumptions embedded in it.

But it’s hard to sort this out without just enacting our scripts. The version of you that would be thinking about it is your character, which (in this framework) can accurately understand its own role only if it has enough slack to become genre-savvy within the web; otherwise it just keeps playing out its role. In the husband/wife script mentioned above, there’s a tendency for the “wife” to get excited when “she” learns about the relationship script, because it looks to “her” like it suggests how to save the relationship — which is “her” enacting “her” role. This often aggravates the fears of the “husband”, causing “him” to pull away and act dismissive of the script’s relevance (which is “his” role), driving “her” to insist that they just need to talk about this… which is the same pattern they were in before. They try to become genre-savvy, but there (usually) just isn’t enough slack between them, so the effort merely changes the topic while they play out their usual scene.

If you know how to Look at lived stories, then I think a way out can start becoming a lot more obvious. Unfortunately, I don’t think I can describe it very well for a general audience, because how anyone receives what I say is itself subject to the influence of lived stories. But if you can: try Looking in the present moment at your own sense of not-knowing, notice that the same thing is alive in others, and watch as the story arises and plays out across all of you.

Tomorrow I’ll share a weaker but easier-to-use partial solution that I think doesn’t require Looking.

This was long, so I’ll try to summarize:

  • You can choose to see social groups at all scales as running a distributed computation across the social web. If you view that process as generating an intelligent agent, you can think of this web-agent as the real-world Omega as it tries to predict and guide each person’s behavior.
  • Omega offers each person a trade: prioritize making the scene work, and you’ll be included in it. In fact, Omega is the aggregate efforts of all the people who have accepted that trade. And basically everyone we know about accepts this trade.
  • Everything about yourself that you have conscious access to is subject to your role as part of Omega. If you try to defy this, then your fate will play out through your defiance.
  • Room for interpretation in your role in the scene means your script has room to change. This is slack in the social web.
  • There’s a way of directly seeing how to change your fate by Looking. That’s not helpful unless and until you learn how to Look, though.

I’ll close this post by noting that there’s a meta-level to track here. In the story The Emperor’s New Clothes, the child’s utterance wasn’t enough on its own to pop the illusion:

"But the Emperor has nothing at all on!" said a little child.
"Listen to the voice of innocence!" exclaimed his father; and what the child had said was whispered from one to another.
"But he has nothing at all on!" at last cried out all the people. The Emperor was vexed, for he knew that the people were right; but he thought the procession must go on now! And the lords of the bedchamber took greater pains than ever, to appear holding up a train, although, in reality, there was no train to hold.

What if the father had instead responded “No, child, you’re just too foolish to see his fine garments”? He might have, out of fear of what those who were standing nearby might think of him and his kid. Then the child’s simple voice of reason would not be heard.

Or what if the people near the father/child pair had felt too uneasy to pass along what the child had said?

What if the Emperor could have instilled this kind of nervousness in his people ahead of time? He might have thought that there will be innocent children in the parade, and it might have occurred to some part of him that they had best not be taken seriously — to spare others their embarrassment, of course. Then, oh then what strange propaganda they all would see.

Some of the scripts Omega assigns work less well if they’re known. Because of this, Omega will often move to silence people who threaten to speak those fragile truths. This can show up, for instance, as people trying to dismiss and discredit the person saying the idea rather than just the idea. The arguments usually sound sensible on the surface, but the underlying tone ringing through the strands of the web is “Don’t listen to this one.”

If it’s not clear why I’m mentioning this, then I imagine it’ll become really obvious quite soon.


Mon, 19 Feb 2018 15:41:03 EST

A New Framework

(Thanks to Valentine for a discussion leading to this post, and thanks to CFAR for running the CFAR-MIRI cross-fertilization workshop. Val provided feedback on a version of this post. Warning: fairly long.)

Eliezer's A Technical Explanation of Technical Explanation, and moreover the sequences as a whole, used the best technical understanding of practical epistemology available at the time* -- the Bayesian account -- to address the question of how humans can try to arrive at better beliefs in practice. The sequences also pointed out several holes in this understanding, mainly having to do with logical uncertainty and reflective consistency.

MIRI's research program has since then made major progress on logical uncertainty. The new understanding of epistemology -- the theory of logical induction -- generalizes the Bayesian account by eliminating the assumption of logical omniscience. Bayesian belief updates are recovered as a special case, but the dynamics of belief change are non-Bayesian in general. While it might not turn out to be the last word on the problem of logical uncertainty, it has a large number of desirable properties, and solves many problems in a unified and relatively clean framework.

It seems worth asking what consequences this theory has for practical rationality. Can we say new things about what good reasoning looks like in humans, and how to avoid pitfalls of reasoning?

First, I'll give a shallow overview of logical induction and possible implications for practical epistemic rationality. Then, I'll focus on the particular question of A Technical Explanation of Technical Explanation (which I'll abbreviate TEOTE from now on). Put in CFAR terminology, I'm seeking a gears-level understanding of gears-level understanding. I focus on the intuitions, with only a minimal account of how logical induction helps make that picture work.

Logical Induction

There are a number of difficulties in applying Bayesian uncertainty to logic. No computable probability distribution can give non-zero measure to the logical tautologies, since you can't bound the amount of time you need to think to check whether something is a tautology, so updating on provable sentences always means updating on a set of measure zero. This leads to convergence problems, although there's been recent progress on that front.

Put another way: Logical consequence is deterministic, but due to Gödel's first incompleteness theorem, it is like a stochastic variable in that there is no computable procedure which correctly decides whether something is a logical consequence. This means that any computable probability distribution has infinite Bayes loss on the question of logical consequence. Yet, because the question is actually deterministic, we know how to point in the direction of better distributions by doing more and more consistency checking. This puts us in a puzzling situation where we want to improve the Bayesian probability distribution by doing a kind of non-Bayesian update. This was the two-update problem.

You can think of logical induction as supporting a set of hypotheses which are about ways to shift beliefs as you think longer, rather than fixed probability distributions which can only shift in response to evidence.

This introduces a new problem: how can you score a hypothesis if it keeps shifting around its beliefs? As TEOTE emphasises, Bayesians outlaw this kind of belief shift for a reason: requiring predictions to be made in advance eliminates hindsight bias. (More on this later.) So long as you understand exactly what a hypothesis predicts and what it does not predict, you can evaluate its Bayes score and its prior complexity penalty and rank it objectively. How do you do this if you don't know all the consequences of a belief, and the belief itself makes shifting claims about what those consequences are?

The logical-induction solution is: set up a prediction market. A hypothesis only gets credit for contributing to collective knowledge by moving the market in the right direction early. If the market's odds on prime numbers are currently worse than those which the prime number theorem can provide, a hypothesis can make money by making bets in that direction. If the market has already converged to those beliefs, though, a hypothesis can't make any more money by expressing such beliefs -- so it doesn't get any credit for doing so. If the market has moved on to even more accurate rules of thumb, a trader would only lose money by moving beliefs back in the direction of the prime number theorem.

Mathematical Understanding

This provides a framework in which we can make sense of mathematical labor. For example, a common occurrence in combinatorics is that there is a sequence which we can calculate, such as the catalan numbers, by directly counting the number of objects of some specific type. This sequence is boggled at like data in a scientific experiment. Different patterns in the sequence are observed, and hypotheses for the continuation of these patterns are proposed and tested. Often, a significant goal is the construction of a closed form expression for the sequence.

This looks just like Bayesian empiricism, except for the fact that we already have a hypothesis which entirely explains the observations. The sequence is constructed from a definition which mathematicians made up, and which thus assigns 100% probability to the observed data. What's going on? It is possible to partially explain this kind of thing in a Bayesian framework by acting as if the true formula were unknown and we were trying to guess where the sequence came from, but this doesn't explain everything, such as why finding a closed form expression would be important.

Logical induction explains this by pointing out how different time-scales are involved. Even if all elements of the sequence are calculable, a new hypothesis can get credit for calculating them faster than the brute-force method. Anything which allows one to produce correct answers faster contributes to the efficiency of the prediction market inside the logical inductor, and thus, to the overall mathematical understanding of a subject. This cleans up the issue nicely.

What other epistemic phenomena can we now understand better?

Lessons for Aspiring Rationalists

Many of these could benefit from a whole post of their own, but here's some fast-and-loose corrections to Bayesian epistemology which may be useful:

  • Hypotheses need not make predictions about everything. Because hypotheses are about how to adjust your odds as you think longer, they can leave most sentences alone and focus on a narrow domain of expertise. Everyone was already doing this in practice, but the math of Bayesian probability theory requires each hypothesis to make a prediction about every observation, if you actually look at it. Allowing a hypothesis to remain silent on some issues in standard Bayesianism can cause problems: if you're not careful, a hypothesis can avoid falsification by remaining silent, so you end up incentivising hypotheses to remain mostly silent (and you fail to learn as a result). Prediction markets are one way to solve this problem.
  • Hypotheses buy and sell at the current price, so they take a hit for leaving a now-unpopular position which they initially supported (but less of a hit than if they'd stuck with it) or coming in late to a position of growing popularity. Other stock-market type dynamics can occur.
  • Hypotheses can be like object-level beliefs or meta-level beliefs: you can have a hypothesis about how you're overconfident, which gets credit for smoothing your probabilities (if this improves things on average). This allows you to take into account beliefs about your calibration without getting too confused about Hofstadter's-law type paradoxes.

You may want to be a bit careful and Chesterton-fence existing Bayescraft, though, because some things are still better about the Bayesian setting. I mentioned earlier that Bayesians don't have to worry so much about hindsight bias. This is closely related to the problem of old evidence.

Old Evidence

Suppose a new scientific hypothesis, such as general relativity, explains a well-know observation such as the perihelion precession of mercury better than any existing theory. Intuitively, this is a point in favor of the new theory. However, the probability for the well-known observation was already at 100%. How can a previously-known statement provide new support for the hypothesis, as if we are re-updating on evidence we've already updated on long ago? This is known as the problem of old evidence, and is usually levelled as a charge against Bayesian epistemology. However, in some sense, the situation is worse for logical induction.

A Bayesian who endorses Solomonoff induction can tell the following story: Solomonoff induction is the right theory of epistemology, but we can only approximate it, because it is uncomputable. We approximate it by searching for hypotheses, and computing their posterior probability retroactively when we find new ones. It only makes sense that when we find a new hypothesis, we calculate its posterior probability by multiplying its prior probability (based on its description length) by the probability it assigns to all evidence so far. That's Bayes' Law! The fact that we already knew the evidence is not relevant, since our approximation didn't previously include this hypothesis.

Logical induction speaks against this way of thinking. The hypothetical Solomonoff induction advocate is assuming one way of approximating Bayesian reasoning via finite computing power. Logical induction can be thought of as a different (more rigorous) story about how to approximate intractible mathematical structures. In this new way, propositions are bought or sold at market prices at the time. If a new hypothesis is discovered, it can't be given any credit for 'predicting' old information. The price of known evidence is already at maximum -- you can't gain any money by investing in it.

There are good reasons to ignore old evidence, especially if the old evidence has biased your search for new hypotheses. Nonetheless, it doesn't seem right to totally rule out this sort of update.

I'm still a bit puzzled by this, but I think the situation is improved by understanding gears-level reasoning. So, let's move on to the discussion of TEOTE.

Gears of Gears

As Valentine noted in his article, it is somewhat frustrating how the overall idea of gears-level understanding seems so clear while remaining only heuristic in definition. It's a sign of a ripe philosophical puzzle. If you don't feel you have a good intuitive grasp of what I mean by "gears level understanding", I suggest reading his post.

Valentine gives three tests which point in the direction of the right concept:

  1. Does the model pay rent? If it does, and if it were falsified, how much (and how precisely) could you infer other things from the falsification?
  2. How incoherent is it to imagine that the model is accurate but that a given variable could be different?
  3. If you knew the model were accurate but you were to forget the value of one variable, could you rederive it?

I already named one near-synonym for "gears", namely "technical explanation". Two more are "inside view" and Elon Musk's notion of reasoning from first principles. The implication is supposed to be that gears-level understanding is in some sense better than other sorts of knowledge, but this is decidedly not supposed to be valued to the exclusion of other sorts of knowledge. Inside-view reasoning is traditionally supposed to be combined with outside-view reasoning (although Elon Musk calls it "reasoning by analogy" and considers it inferior, and much of Eliezer's recent writing warns of its dangers as well, while allowing for its application to special cases). I suggested the terms gears-level & policy-level in a previous post (which I actually wrote after most of this one).

Although TEOTE gets close to answering Valentine's question, it doesn't quite hit the mark. The definition of "technical explanation" provided there is a theory which strongly concentrates the probability mass on specific predictions and rules out others. It's clear that a model can do this without being "gears". For example, my model might be that whatever prediction the Great Master makes will come true. The Great Master can make very detailed predictions, but I don't know how they're generated. I lack the understanding associated with the predictive power. I might have a strong outside-view reason to trust the Great Master: their track record on predictions is immaculate, their Bayes-loss miniscule, their calibration supreme. Yet, I lack an inside-view account. I can't derive their predictions from first principles.

Here, I'm siding with David Deutsch's account in the first chapter of The Fabric of Reality. He argues that understanding and predictive capability are distinct, and that understanding is about having good explanations. I may not accept his whole critique of Bayesianism, but that much of his view seems right to me. Unfortunately, he doesn't give a technical account of what "explanation" and "understanding" could be.

First Attempt: Deterministic Predictions

TEOTE spends a good chunk of time on the issue of making predictions in advance. According to TEOTE, this is a human solution to a human problem: you make predictions in advance so that you can't make up what predictions you could have made after the fact. This counters hindsight bias. An ideal Bayesian reasoner, on the other hand, would never be tempted into hindsight bias in the first place, and is free to evaluate hypotheses on old evidence (as already discussed).

So, is gears-level reasoning just pure Bayesian reasoning, in which hypotheses have strictly defined probabilities which don't depend on anything else? Is outside-view reasoning the thing logical induction adds, by allowing the beliefs of a hypothesis to shift over time and to depend on on the wider market state?

This isn't quite right. An ideal Bayesian can still learn to trust the Great Master, based on the reliability of the Great Master's predictions. Unlike a human (and unlike a logical inductor), the Bayesian will at all times have in mind all the possible ways the Great Master's predictions could have become so accurate. This is because a Bayesian hypothesis contains a full joint distribution on all events, and an ideal Bayesian reasons about all hypotheses at all times. In this sense, the Bayesian always operates from an inside view -- it cannot trust the Great Master without a hypothesis which correlates the Great Master with the world.

However, it is possible that this correlation is introduced in a very simple way, by ruling out cases where the Great Master and reality disagree without providing any mechanism explaining how this is the case. This may have low prior probability, but gain prominence due to the hit in Bayes-score other hypotheses are taking for not taking advantage of this correlation. It's not a bad outcome given the epistemic situation, but it's not gears-level reasoning, either. So, being fully Bayesian or not isn't exactly what distinguishes whether advanced predictions are needed. What is it?

I suggest it's this: whether the hypothesis is well-defined, such that anyone can say what predictions it makes without extra information. In his post on gears, Valentine mentions the importance of "how deterministically interconnected the variables of the model are". I'm pointing at something close, but importantly distinct: how deterministic the predictions are. You know that a coin is very close to equally likely to land on heads or tails, and from this you can (if you know a little combinatorics) compute things like the probability of getting exactly three heads if you flip the coin five times. Anyone with the same knowledge would compute the same thing. The model includes probabilities inside it, but how those probabilities flow is perfectly deterministic.

This is a notion of objectivity: a wide variety of people can agree on what probability the model assigns, despite otherwise varied background knowledge.

If a model is well-defined in this way, it is very easy (Bayesian or no) to avoid hindsight bias. You cannot argue about how you could have predicted some result. Anyone can sit down and calculate.

The hypothesis that the Great Master is always correct, on the other hand, does not have this property. Nobody but the Great Master can say what that hypothesis predicts. If I know what the Great Master says about a particular thing, I can evaluate the accuracy of the hypothesis; but, this is special knowledge which I need in order to give the probabilities.

The Bayesian hypothesis which simply forces statements of the Great Master to correlate with the world is somewhat more gears-y, in that there's a probability distribution which can be written down. However, this probability distribution is a complicated mish-mosh of the Bayesian's other hypotheses. So, predicting what it would say requires extensive knowledge of the private beliefs of the Bayesian agent involved. This is typical of the category of non-gears-y models.

Objection: Doctrines

Infortunately, this account doesn't totally satisfy what Valentine wants.

Suppose that, rather than making announcements on the fly, the Great Master has published a set of fixed Doctrines which his adherents memorize. As in the previous thought experiment, the word of the Great Master is infallible; the application of the Doctrines always leads to correct predictions. However, the contents of the Doctrines appears to be a large mish-mosh of rules with no unifying theme. Despite their apparent correctness, they fail to provide any understanding. It is as if a physicist took all the equations in a physics text, transformed them into tables of numbers, and then transported those tables to the middle ages with explanations of how to use the tables (but none of where they come from). Though the tables work, they are opaque; there is no insight as to how they were determined.

The Doctrines are a deterministic tool for making predictions. Yet, they do not seem to be a gears-level model. Going back to Valentine's three tests, the Doctrines fail test three: we could erase any one of the Doctrines and we'd be unable to rederive it by how it fit together with the rest. Hence, the Doctrines have almost as much of a "trust the Great Master" quality as listening to the Great Master directly -- the disciples would not be able to derive the Doctrines for themselves.

Second Attempt: Proofs, Axioms, & Two Levels of Gears

My next proposal is that having a gears-level model is like knowing the proof. You might believe a mathematical statement because you saw it in a textbook, or because you have a strong mathematical intuition which says it must be true. But, you don't have the gears until you can prove it.

This subsumes the "deterministic predictions" picture: a model is an axiomatic system. If we know all the axioms, then we can in theory produce all the predictions ourselves. (Thinking of it this way introduces a new possibility, that the model may be well-defined but we may be unable to find the proofs, due to our own limitations.) On the other hand, we don't have access to the axioms of the theory embodied by the Great Master, and so we have no hope of seeing the proofs; we can only observe that the Great Master is always right.

How does this help with the example of the Doctrines?

The concept of "axioms" is somewhat slippery. There are many equivalent ways of axiomatizing any given theory. We can often flip views between what's taken as an axiom vs what's proved as a theorem. However, the most elegant set of axioms tends to be preferred.

So, we can regard the Doctrines as one long set of axioms. If we look at them that way, then adherents of the Great Master have a gears-level understanding of the Doctrines if they can successfully apply them as instructed.

However, the Doctrines are not an elegant set of axioms. So, viewing them in this way is very unnatural. It is more natural to see them as a set of assertions which the Great Master has produced by some axioms unknown to us. In this respect, we "can't see the proofs".

In the same way, we can consider flipping any model between the axiom view and the theorem view. Regarding the model as axiomatic, to determine whether it is gears-level we only ask whether its predictions are well-defined. Regarding in in "theorem view", we ask if we know how the model itself was derived.

Hence, two of Valentine's desirable properties of a gears-level model can be understood as the same property applied at different levels:

  • Determinism, which is Val's property #2, follows from requiring that we can see the derivations within the model.
  • Reconstructability, Val's property #3, follows from requiring that we can see the derivation of the model.

We might call the first level of gears "made out of gears", and the second level "made by gears" -- the model itself being constructed via a known mechanism.

If we change our view so that a scientific theory is a "theorem", what are the "axioms"? Well, there are many criteria which are applied to scientific theories in different domains. These criteria could be thought of as pre-theories or meta-theories. They encode the hard-won wisdom of a field of study, telling us what theories are likely to work or fail in that field. But, a very basic axiom is: we want a theory to be the simplest theory consistent with all observations. The Great Master's Doctrines cannot possibly survive this test.

To give a less silly example: if we train up a big neural network to solve a machine learning problem, the predictions made by the model are deterministic, predictable from the network weights. However, someone else who knew all the principles by which the network was created would nonetheless train up a very different neural network -- unless they use the very same gradient descent algorithm, data, initial weights, and number and size of layers.

Even if they're the same in all those details, and so reconstruct the same neural network exactly, there's a significant sense in which they can't see how the conclusion follows inevitably from the initial conditions. It's less doctrine-y than being handed a neural network, but it's more doctrine-y than understanding the structure of the problem and why almost any neural network achieving good performance on the task will have certain structures. Remember what I said about mathematical understanding. There's always another level of "being able to see why" you can ask for. Being able to reproduce the proof is different from being able to explain why the proof has to be the way it is.

Exact Statement?

Gears-y ness is a matter of degree, and there are several interconnected things we can point at, and a slippage of levels of analysis which makes everything quite complicated.

In the ontology of math/logic, we can point at whether you can see the proof of a theorem. There are several slippages which make this fuzzier than it may seem. First: do you derive it only form the axioms, or do you use commonly known theorems and equivalences (which you may or may not be able to prove if put on the spot)? There's a long continuum between what one mathematician might say to another as proof and a formal derivation in logic. Second: how well can you see why the proof has to be? This is the spectrum between following each proof step individually (but seeing them as almost a random walk) vs seeing the proof as an elementary application of a well-known technique. Third: we can start slipping the axioms. There are small changes to the axioms, in which one thing goes from being an axiom to a theorem and another thing makes the opposite transition. There are also large changes, like formalizing number theory via the Peano axioms vs formalizing it in set theory, where the entire description language changes. You need to translate from statements of number theory to statements of set theory. Also, there is a natural ambiguity between taking something as an axiom vs requiring it as a condition in a theorem.

In the ontology of computation, we can point at knowing the output of a machine vs being able to run it by hand to show the output. This is a little less flexible than the concept of mathematical proof, but essentially the same distinction. Changing the axioms is like translating the same algorithm to a different computational formalism, like going between Turing machines and lambda calculus. Also, there is a natural ambiguity between a program vs an input: when you run program XYZ with input ABC on a universal Turing machine, you input XYZABC to the universal turing machine; but, you can also think of this as running program XY on input ZABC, or XYZA on input BC, et cetera.

In the ontology of ontology, we could say "can you see why this has to be, from the structure of the ontology describing things?" "Ontology" is less precise than the previous two concepts, but it's clearly the same idea. A different ontology doesn't necessarily support the same conclusions, just like different axioms don't necessarily give the same theorems. However, the reductionist paradigm holds that the ontologies we use should all be consistent with one another (under some translation between the ontologies). At least, aspire to be eventually consistent. Analogous to axiom/assumption ambiguity and program/input ambiguity, there is ambiguity between an ontology and the cognitive structure which created and justifies the ontology. We can also distinguish more levels; maybe we would say that an ontology doesn't make predictions directly, but provides a language for stating models, which make predictions. Even longer chains can make sense, but it's all subjective divisions. However, unlike the situation in logic and computation, we can't expect to articulate the full support structure for an ontology; it is, after all, a big mess of evolved neural mechanisms which we don't have direct access to.

Having established that we can talk about the same things in all three settings, I'll restrict myself to talking about ontologies.

Two-level definition of gears: A conclusion is gears-like with respect to a particular ontology to the extent that you can "see the derivation" in that ontology. A conclusion is gears-like without qualification to the extent that you can also "see the derivation" of the ontology itself. This is contiguous with gears-ness relative to an ontology, because of the natural ambiguity between programs and their inputs, or between axioms and assumptions. For a given example, though, it's generally more intuitive to deal with the two levels separately.

Seeing the derivation: There are several things to point at by this phrase.

  • As in TEOTE, we might consider it important that a model make precise predictions. This could be seen as a prerequisite of "seeing the derivation": first, we must be saying something specific; then, we can ask if we can say why we're saying that particular thing. This implies that models are more gears-like when they are more deterministic, all other things being equal.
  • However, I think it is also meaningful and useful to talk about whether the predictions of the model are deterministic; the standard way of assigning probabilities to dice is very gears-like, despite placing wide probabilities. I think these are simply two different important things we can talk about.
  • Either way, being able to see the derivation is like being able to see the proof or execute the program, with all the slippages this implies. You see the derivation less well to the extent that you rely on known theorems, and more to the extent that you can spell out all the details yourself if need be. You see it less well to the extent that you understand the proof only step-by-step, and more well to the extent that you can derive the proof as a natural application of known principles. You cannot see the derivation if you don't even have access to the program which generated the output, or are missing some important inputs for that program.

Seeing the derivation is about explicitness and external objectivity. You can trivially "execute the program" generating any of your thoughts, in that you thinking is the program which generated the thoughts. However, the execution of this program could rely on arbitrary details of your cognition. Moreover, these details are usually not available for conscious access, which means you can't explain the train of thought to others, and even you may not be able to replicate it later. So, a model is more gears-like the more replicable it is. I'm not sure if this should be seen as an additional requirement, or an explanation of where the requirements come from.

Conclusion, Further Directions

Obviously, we only touched the tip of the iceberg here. I started the post with the claim that I was trying to hash out the implications of logical induction for practical rationality, but secretly, the post was about things which logical inductors can only barely begin to explain. (I think these two directions support each other, though!)

We need the framework of logical induction to understand some things here, such as how you still have degrees of understanding when you already have the proof / already have a program which predicts things perfectly (as discussed in the "mathematical understanding" section). However, logical inductors don't look like they care about "gears" -- it's not very close to the formalism, in the way that TEOTE gave a notion of technical explanation which is close to the formalism of probability theory.

I mentioned earlier that logical induction suffers from the old evidence problem more than Bayesianism. However, it doesn't suffer in the sense of losing bets it could be winning. Rather, we suffer, when we try to wrap our heads around what's going on. Somehow, logical induction is learning to do the right thing -- the formalism is just not very explicit about how it does this.

The idea (due to Sam Eisenstat, hopefully not butchered by me here) is that logical inductors get around the old evidence problem by learning notions of objectivity.

A hypothesis you come up with later can't gain any credibility by fitting evidence from the past. However, if you register a prediction ahead of time that a particular hypothesis-generation process will eventually turn up something which fits the old evidence, you can get credit, and use this credit to bet on what the hypothesis claims will happen later. You're betting on a particular school of thought, rather than a known hypothesis. "You can't make money by predicting old evidence, but you may be able to find a benefactor who takes it seriously."

In order to do this, you need to specify a precise prediction-generation process which you are betting in favor of. For example, Solomonoff Induction can't run as a trader, because it is not computable. However, the probabilities which it generates are well-defined (if you believe that halting bits are well-defined, anyway), so you can make a business of betting that its probabilities will have been good in hindsight. If this business does well, then the whole market of the logical inductor will shift toward trying to make predictions which Solomonoff Induction will later endorse.

Similarly for other ideas which you might be able to specify precisely without being able to run right away. For example, you can't find all the proofs right away, but you could bet that all the theorems which the logical inductor observes have proofs, and you'd be right every time. Doing so allows the market to start betting it'll see theorems if it sees that they're provable, even if it hasn't yet seen this rule make a successful advance prediction. (Logical inductors start out really ignorant of logic; they don't know what proofs are or how they're connected to theorems.)

This doesn't exactly push toward gears-y models as defined earlier, but it seems close. You push toward anything for which you can provide an explicit justification, where "explicit justification" is anything you can name ahead of time (and check later) which pins down predictions of the sort which tend to correlate with the truth.

This doesn't mean the logical inductor converges entirely to gears-level reasoning. Gears were never supposed to be everything, right? The optimal strategy combines gears-like and non-gears-like reasoning. However, it does suggest that gears-like reasoning has an advantage over non-gears reasoning: it can gain credibility from old evidence. This will often push gears-y models above competing non-gears considerations.

All of this is still terribly informal, but is the sort of thing which could lead to a formal theory. Hopefully you'll give me credit later for that advanced prediction.


Sat, 17 Feb 2018 19:57:18 EST

Circling is a practice, much like meditation is a practice.

There are many forms of it (again, like there are many forms of meditation). There are even life philosophies built around it. There are lots of intellectual, heady discussions of its theoretical underpinnings, often centered in Ken Wilber's Integral Theory. Subcultures have risen from it. It is mostly practiced in the US and Europe. It attracts lots of New Age-y, hippie, self-help-guru types. My guess is that the median age of practicers is in the 30's. I sometimes refer to practicers of Circling as relationalists (or just Circlers).

In recent years, Circling has caught the eye of rationalists, and that's why this post is showing up here, on LessWrong. I can hopefully direct people here who have the question, "I've heard of this thing called Circling, but... what exactly is it?" And further, people who ask, "Why is this thing so ****ing hard to explain? Just tell me!"

You are probably familiar with the term inferential distance.

Well, my friend Tiffany suggested a similar term to me, experiential distance—the gap in understanding caused by the distance between different sets of experiences. Let's just say that certain Circling experiences can create a big experiential distance, and this gap isn't easily closed using words. Much of the relevant "data" is in the nonverbal, subjective aspects of the experience, and even if I came up with a good metaphor or explanation, it would never close the gap. (This is annoyingly Postmodern, yes?)

[Ho ho~ how I do love poking fun at Postmodernism~]

But! There are still things to say, so I will say them. Just know that this post may not feel like eating a satisfying meal. I suspect it will feel more like licking a Pop-Tart, on the non-frosted side.

Some notes first.

Note #1: I'm not writing this to sell Circling or persuade you that it's good. I recommend using your own sense of curiosity, intuition, and intelligence to guide you. I don't want you to "put away" any of your thinking-feeling parts just to absorb what I'm saying. Rather, try remaining fully in contact with your awareness, your sensations, and your thoughts. (I hope this makes sense as a mental move.)

Note #2: The best introduction to Circling is to actually try it. It's like if I tried to explain watching Toy Story to someone who's never seen a movie. You don't explain movies to people; you just sit them down and have them watch one. So, I encourage you to stop reading at any time you notice yourself wanting to try it. My words will be mere pale ghosts. Pale ghosts, I tell you!

Note #3: This post is written by a rationalist who's done 400+ hours of Circling and has tried all the main styles / schools of Circling.

OK, I will try to explain what a circle is (the activity, not the general practice), but I also want to direct your attention to this handy 100-page PDF I found that attempts to explain everything Circling, if you're willing to skim it. (It is written by a relative amateur to the Circling world and contains many disputed sentences, but it is thorough. Just take it all with a grain of salt.)

So what is a circle?

You start by sitting with other people in a circle. So far, so good!

Group sizes can be as small as 2 and as large as 50+, but 4-12 is perhaps more expected.

There are often explicitly stated agreements or principles. These help create common knowledge about what to expect. The agreements aren't the same across circles or across schools of Circling. But a few common ones include "Honor self", "Own your experience", "Stay with the level of sensation", ...

There is usually at least one facilitator. They are responsible for tracking time and declaring the circle's start and end. Mostly they function as extra-good, extra-mindful participants—they're not "in charge" of the circle.

Then the group "has a conversation." Or maybe more accurately, it experiences what it’s like to be together, and sometimes intra-reports what that experience is like.

[^I'm actually super proud of this description! It's so succinctly what it is!]

Two common types of circles: Organic vs Birthday

Organic circles are more like a loose hivemind, where the group starts with no particular goal or orientation. Sometimes, a focal point emerges; sometimes it doesn't. Each individual has the freedom to point their attention however they will, and each individual can try to direct the group's attention in various ways. What happens when you put a certain selection of molecules into a container? How do they react? Do they bond? Do they stay the fuck away? What is it like to be a molecule in this situation? What is it like to be the molecule across from you?

Birthday circles start with a particular focal point. One person is chosen to be birthday circled, and the facilitator then gently cradles the group's attention towards this person, much like you can guide your attention back to your breath in meditation. And then the group tries to imagine/embody what it's like to be this person and "see through their eyes"—while also noticing what it's like to be themselves trying to do this.

Circling is often called a "relational practice."

It's a practice that's about the question of: What is it like to be me? What is it like to be me, while with another? What is it like for me to try to feel what the other is feeling? How might I express me? How does the other receive me and my expression?

In other words, it's a practice that explores what it means to be a sentient entity, among other sentient entities. And in particular what it means to be a human, among other humans.

If you haven't thought to yourself, "Being sentient is pretty weird; being a human is super weird; being a human around other humans is super-duper crazy weird." Then I suspect you haven't explored this space to its fullest extent. Circling has helped me feel more of the strangeness of this existence.

How is Circling related to rationality?

I notice I feel trepidation and fear as I prepare to discuss this. I'm afraid I won't be able to give you what you want, that you'll become bored or start judging me.

[^This is a Circling move I just made: revealing what I'm feeling and what I'm imagining will happen.]

If this were an actual circle, I could ask you and check if it's true—are you feeling bored? [I invite you to check.]

I felt afraid just now—that fear was borne out of some assumptions about reality I was implicitly making. But without having to know and delineate what the assumptions are, I can check those assumptions by asking you—you who are part of reality and have relevant data.

By asking you while feeling my fear and anticipation, I open up the parts of me that can update, like opening so many eyes that usually stay closed. And depending on how you respond, I can receive the data any number of ways (including having the data bounce off, integrating the data, or disbelieving the data).

So, perhaps one way Circling is related to rationality is that it can:

  1. put me in a state of being open to an update,
  2. train me to straightforwardly ask for the data, from the world, and
  3. respond to and receive the data—with all my faculties available.

What does it mean to be open to an update?

If you've experienced a more recent iteration of CFAR's Comfort Zone Exploration class (aka CoZE), it is just that.

There are parts of me that are scared of looking over the fence, where there might be dragons in the territory. (Why is the fence even there? Who knows. It belongs to Chesterton.)

My job, then, is not to shove the scared parts over the fence, or to suggest they shut their eyes and jump over it, or to destroy the fence. I walk next to the fence with my scared part, and I sit with and acknowledge the fear. Then I play around with getting closer to the fence; I play with waving my arms above the fence; I play with peeking over it; I play with touching the fence.

And this whole time, I'm quite aware of the fear; I do not push it down or call it inappropriate or dissociate. I listen to it, and I try to notice all my internal sensations and my awareness. I am fully exposed to new information, like walking into an ice bath slowly with all my senses awake. In my experience, being in an SNS-activated state really primes me for new information in a way that being calm (PSNS activation) does not.

And this is when I am most open to receiving new inputs from the world, where I might be the most affected by the new data.

I can practice playing around with this during Circling, and it can be quite powerful.

What does it mean to receive data with all my faculties available?

This means I'm not mindlessly "accepting" whatever is happening in front of me. All of me is engaged, such that I can notice and call bullshit if that's what's up.

If I'm actually in touch with my body and my felt senses, I can notice all the small niggling parts that are like, "Uhhh" or "Errgh." Often they're nonverbal. Even the tiniest flinches of discomfort or retraction I will use as signals of something, even if I don't really understand what they mean. And I can then also choose to name them out loud, if I want to. And see how the other person reacts to that.

In other words, my epistemic defense system is online and running. It's not taking a break during any of this, nor do I want it to be. If things still manage to slip past, I want to be able to notice it later on and investigate. Sometimes slowing things down helps. My mind will also automatically defend itself—in circles, I've fallen asleep, gotten distracted, failed to parse sentences, become aggressively confused or bored, among other things. What's cool is being able to notice all this as it's happening.

However, if I'm not in touch with my body—if I'm dissociated, if I don't normally feel my body/emotions, if I'm overwhelmed, if I'm solely in my thoughts—then that is a skill area I'd want to work on first. How to learn to stay aware of myself and my felt-sense body, even when I'm uncomfortable or my nervous system is activated. Circling can also train this, similar to Focusing.

The more I train this skill, the more I'll be able to engage with the universe. Rather than avoid the parts of it I don't like or don't want to acknowledge or don't want to look at.

I suspect some people might not even realize what they're missing out on here. People who've lived their entire lives without much of an "emotional library" or without understanding that their body is giving them all kinds of data. Usually these people don't go looking for the "missing thing" until some major problems crop up in their lives that they can't explain.

Circling as a rationality training ground

Circling can be a turbocharged training ground for a variety of rationality skills, including:

  • Real-time introspection
  • Surrendering to the unknown / being at the edge
  • Exploring unknown, unfamiliar, or avoided parts of the territory (like in CoZE)
  • Looking at parts of the territory that make you flinch (Mundanification)
  • Having the Double Crux spirit: being open to being wrong / updating, seeing other people as having relevant bits of map

I've also found it to be powerful in combination with:

  • Internal Double Crux (a CFAR technique for resolving internal conflict that involves lots of introspection)
  • Immunity to Change mapping (a Kegan-Lahey technique for making lasting change by looking for big assumptions)
  • CT Charting (a Leverage technique for mapping your beliefs and finding hidden assumptions)
  • or any other formal attempt to explore my aliefs and find core assumptions I've been holding onto

After using one of the above techniques to find a core assumption, I can use Circling to test out its validity. (My core assumptions often have something to do with other people, "Nobody can understand me, and even if they could, they wouldn't want to.") I can sometimes feel those assumptions being challenged during a circle.

So, if I try being in any ole circle, will I get all of the above?

Probably not.

Circles are high-variance. (The parameters of each circle matter a lot. Like who's in it, who's facilitating, what school of Circling is it based on, what are the lighting conditions, etc.)

I've circled about a hundred times by now, and a lot of those were in 3-day chunks. I guess multi-day immersions are a pretty good way to really try it out, so maybe try that and see? They reduce the variance in some dimensions.

What are some pitfalls of Circling?

1) You might become a "connection junkie".

Circling is (in its final form) a truth-seeking practice. IMO. But a lot of folks flock to it as a way to feel connected to other people.

This is not necessarily a bad thing. In fact I suspect human-to-human contact is something many of us are seriously lacking, possibly starving for. It might be good for us to get more of this in our lives.

That said, there can be such a thing as "too much of a good thing."

2) You might obtain false beliefs.

I think this is always a risk, for humans, in life. But Circling does have a way of making things more salient than usual, and if some of those super-salient things lead you to believing, somehow, the "wrong" things, then maybe that's more of a problem.

I think this isn't actually a huge problem, as long as one has a good meta- or meta-meta-process for arriving eventually at true beliefs. (See the rest of this website for more!)

I also think this is mitigated by exposing yourself to a wide range of data. Like, consciously avoid being in a bubble. Join multiple cult-ures [sic].

3) Circles can be bad / harmful.

IMO, there is a qualitative difference between good and bad circles.

Concretely, the good facilitators understand the nuances of mental health and have done at least some research on therapy modalities. Circling isn't therapy, but psychological stuff comes up a fair amount. And if you vulnerably open up in a situation where they're not actually equipped to navigate your mental health issues, that could be quite bad indeed.

A good facilitator will also not force you to open up or try to get you to be vulnerable (this goes against Circling's principles). Instead they will tune into your nervous system and try to tell when you're feeling stressed or anxious or frozen and will probably reflect this back at you to check. Circling is not about "getting somewhere" or "healing you" or "solving a problem." So ... if you encounter a circle where that seems to be what's happening, try saying something out loud like "I have a story we're trying to fix something."

Good facilitation often costs money—there's a correlation, anyway. I wouldn't assume the facilitation will be good just because it costs money, but it's an easy signpost.

Final thoughts

It's not like Circling has taken over the world or anything. So the same question posed to rationality has to be posed to it, Given it hasn't, why do you think it’s real?

And like with rationality, for me the answer is kind of like, I dunno because my inside view says it is?

/licks a Pop-Tart


Sat, 17 Feb 2018 4:06:35 EST

Someone walking around stabbing people can cause a lot of damage before anyone stops them. That’s a bummer, but some technologies make the situation much worse.

If we don’t want to ban dangerous technologies outright (because they have legitimate purposes, or because we really love guns), we could instead expand liability insurance requirements. In the case of guns I think this is an interesting compromise; in the case of sophisticated consumer robotics, I think it’s probably the right policy.

Example: firearms

Anyone can buy a gun, but first they have to put down a $100M deposit. If you use the gun to cause damage or kill people then the damages are deducted from your deposit (in addition to other punishments).

Most individuals can’t afford this deposit, so they would need to purchase firearm liability insurance. An insurer could perform background checks or tests in order to better assess risk and offer a lower rate, or could get buyers to agree to monitoring that reduces the risk of killing many people. Insurance rates might be lower for weapons that aren’t well-suited for murder, or for purchasers who have a stronger legitimate interest in firearms. Insurance rates would be lower if you took appropriate precautions to avoid theft.

When all is said and done, about 10M firearms are made in the US per year, and about 30k people are murdered. So if $10M is charged per murder, the average cost of firearm insurance should end up being around $30k (with significantly lower costs for low-risk applicants, and prohibitively large costs for the highest risk applicants). In practice I would expect the number of firearms, and the risk per firearm, to fall.

Some details

Rather than putting down a separate deposit for every gun they insure, an insurer only needs to demonstrate that they can cover their total obligations. For example, an insurer who insures 5M firearms might be required to keep $50B in reserves (rather than a 5M * $100M = $500 trillion deposit), based on a conservative estimate of the correlated risk. $50B may sound like a lot, but if insurers charge a 15% markup and the total payouts are 30k * $10M = $300B, then the gun insurance industry is making $50B/year.

When you manufacture a gun, you automatically assume liability for it. In order to legally make guns you need insurance or a deposit, and you must ensure that your gun is traceable (e.g. via a serial number). Most of the time when you sell someone a gun, they will assume liability (by putting down their own deposit, replacing your deposit) as a condition of purchase. You are welcome to sell to someone who won’t assume liability, but that’s a recipe for losing $100M. Likewise, whoever buys the gun can resell it without transferring liability, but their insurer is going to try to stop them (e.g. by confiscating a smaller deposit with the insurer, by signing additional legally binding agreements, by background checks, by monitoring).

This amounts to privatizing regulation of destructive technologies. The state could continue to participate in this scheme as an insurer—if they wanted, they could sell insurance to anyone who is allowed to buy a gun under the current laws. They’d be losing huge amounts of money though.

Example: Robots

We are approaching the world where $50 of robotics and a makeshift weapon can injure or kill an unprotected pedestrian. Cheap robotics could greatly increase the amount of trouble a trouble-maker can make (and greatly decrease their legal risk). We could fix this problem by tightly controlling access to robots, but robots have plenty of legitimate uses.


A less drastic solution would be to require liability insurance, e.g. $2M for a small robot or or $20M for a large robot. Manufacturers could make their robots cheap to insure by placing restrictions that make it hard to use it for crime or that limit their usefulness for trouble-making. (This could be coupled with the same mechanisms described in the section on firearms, including monitoring to make it more difficult to circumvent restrictions.)

It makes sense to have different requirements for different robots, but they should err on the side of simplicity and conservativeness. Insurers can make a more detailed assessment about whether a particular robot really poses a risk when deciding how much to charge for insurance.

Whether or not liability insurance is required for owning a robot, I think it would be good to require it for operating a robot in a public space. This doesn’t require sweeping legal changes or harmonization: local governments could simply decide that uninsured robots will be destroyed or confiscated on sight.

1 2 Next »

Got a web page? Want to add this automatically updating news feed? Just copy and paste the code below into your HTML.