AI & wisdom 3: AI effects on amortised optimisation


Written for the AI Impacts essay competition on the automation of wisdom and philosophy

Having waxed philosophical about wisdom and amortised optimisation, I now turn to the question of what AI will concretely do:

  1. How will AI affect human mechanisms for amortised optimisation, such as cultural evolution & written knowledge?
  2. How can AI help us avoid strategic errors? I consider two types:
    1. Mistakes in planning for big things.
    2. Having the wrong frame/ontology for a problem.

A note on amortised and direct optimisation in LLMs

Current LLMs are products of amortised optimisation: they're trained with vast compute on massive datasets to create a distilled function-approximator, are not known to do significant direct optimisation during inference, and do relatively little work during inference compared to training. GPT-4's training is rumoured to have required on the order of $10^{25}$ FLOPs, compared to about $5 \times 10^{14}$ FLOPs to generate a token, so GPT-4 could generate 20 billion tokens with the compute it was trained on. (In comparison, humans could say only about 300k words in the working time it takes us to complete a 16-year education.)

So: are current LLMs the peak of wisdom over intelligence?

In the amortised v direct optimisation framework:

  1. LLMs are products of amortised optimisation, and most of their strength does lie in doing tasks that are more like amortised optimisation: distilling and synthesising information from their past "experience", and being extremely knowledgeable. LLMs already play the role of a wise guru to many: any programmer these days will use LLMs like they might have used an especially-available senior programmer who knows the best established practice for every scenario. It is true that they seem worse at having useful world models than humans are, and their wisdom runs less deep. More than that, perhaps, their ability to seem wise is dented by their blandness, a result both of their pre-training to mimic average internet text, and the fine-tuning done on them. They also seem quite bad at epistemics; as discussed in the first part, some kind of good epistemics is often (but not always) required to be able to make good use of the products of amortised optimisation, even if you have them at hand. Currently, that has to be largely provided by the human who's interacting with the LLM.
  2. The extent to which LLMs do direct optimisation within a forward pass is unclear. However, like policy and value networks in MCTS, LLMs can be used as part of a direct optimisation process. Most trivially, you can prompt an LLM to search over some action space, and then evaluate the actions and pick one. LLM scaffolds go much further. It is true that the amortised optimisation core is still doing a lot of the work there, but that is true also of the amortised optimisation parts of MCTS variants that use neural networks, and probably of human thinking too. Therefore, LLMs can clearly be used for direct optimisation as well. In particular, we should expect LLMs to increase that part of the world's direct optimisation where the quality of the outputs is not hard to judge - lots of automating boilerplate writing and maths, less automating business decisions and philosophy.

AI & our mechanisms for amortised optimisation

We've seen that amortised optimisation powers a lot of the world, and in particular a lot of the things we regard as wise. Amortised optimisation needs a dataset. Humans currently maintain and improve their "dataset" through, for example, cultural evolution and written knowledge. We'll discuss how AI will affect each one for humans, and what the AI version of each one might look like.

The future of human cultural evolution

In the ancestral environment and still today, a key driver of cultural evolution is prestige-biased social learning: people imitating the habits of people who seem successful. Successful here can mean either directly successful, like a tendency to successfully get fresh meat or build companies, or just high in perceived status because other people pay attention to them (modern celebrities are a side product of this instinct). That humans can do this seems to be a big part of the human cognitive advantage compared to other apes. It's clear that this is a big boost to cultural evolution: unlike genes, memes don't have to rely on the differential survival of their hosts to spread, since nearby hosts will tend to copy the meme the instant the meme's host seems to be doing well.

None of this requires understanding why the memes work, or even which memes are driving someone's success. In Fiji, there are taboos against pregnant women eating sharks. The reason this is useful is that the sharks contain chemicals that increase birth defect risks, but no one knew this. This is an insanely sophisticated and subtle meme: the causal connection between the sharks and the birth defects is long, weak, and totally beyond understanding before modern science.

There's a story that the famous mathematician Paul Erdos used to touch walls as he walked, and other mathematicians started imitating this behaviour - maybe it was the secret to Erdos's success, after all. Given our species' history, this isn't as crazy as it seems.

However, there are some prerequisites for blindly copying memes to go well. First, your brain's prestige-learning machinery probably only fires on other humans (though cue a wave of human-like AI avatars trying to seem like prestige superstimuli). Second, it helps a lot if you know that who you're copying has similar wants as you, and comes from an environment that was selecting for memes that drive towards the same wants as you. You're better off copying the behaviour of a human who wants partners, food, and power just like you, and comes from an environment where everyone was competing for those same things, than you are copying the behaviour of an alien who also wants power but will pass on the human food and partners - who knows what side effects their "memome" (their set of memes; c.f. "genome") will have. Crucially, what's required here is not alignment with you, but instead taking actions that you'd also want to take if you were in their shoes - if someone wants to steal your land, but is very effective at it, you might still want to copy their behaviours, since you sure want to steal their land too. But if you copy the farming practices of an alien, you end up growing alien zucchini that you can't even eat. Thirdly, it's a bad idea to try to copy the behaviour of someone who can take actions that you can't take; an experienced skateboarder can seem prestigious by doing a trick, but you would probably end up in the hospital instead.

All of this means that even if AI slots into the economic and even political system we have, AIs probably won't slot into our prestige-biased learning system (or if they do, it'll be bad). This means that AI probably won't help with the imitation of role models that is a big part of how human culture builds and transmits successful practices. It might also weaken this mechanism among humans. If, say, science is mostly automated and human performance in it is no longer a status marker, Richard Feynman's curiosity and irreverence will hold less sway over the next generation. If human success largely becomes about interfacing with LLMs, success might become more decoupled from positive human qualities, and the signal-to-noise ratio of cultural evolution may get worse.

The future of AI cultural evolution

In contrast, the AIs themselves might be very good at cultural evolution. While vastly more efficient than genetic evolution, human cultural evolution still requires one human to exhibit a behaviour and succeed, and this fact to be realised by others, and for others to then manage to copy that behaviour, which might take months of practise. Whether or not AI learning will be more or less sample-efficient than human learning is still unclear, so they may continue to require more examples to learn something than humans. However, AIs can be copied once they've learnt something. The best AI at a task can instantly become most of the entire global AI population working on that task (depending on the surrounding political, economic, and business environment, this may be by its own choice, the choice of another AI, or a human's choice). Consider how quickly LLM prompts or LLM scaffolds spread, and imagine that sort of selection pressure, acting over long timescales over AIs that are individually increasingly capable. Much like human cultures develop memes that no individual could've come up with, the AIs might develop adaptations that they themselves could not have come up with, not through any direct optimisation that they're doing, or even through explicit training, but through external selection pressures and selective mimicry in the population of AIs. If humans took over the world by dint of cultural evolution despite individual human capabilities being static, imagine what ever-improving AIs might do.

This makes it important that we make good choices over how AI cultural evolution is allowed to happen. For example, we should be careful to structure things so that the AI types with large populations are doing good and useful things, and it's hard for persuasion-based or influence-seeking strategies on part of an AI to increase the number of copies of that AI being run. Also, we want to understand the routes by which one AI's behaviour might be copied by others. For example, this could happen through the behaviour being included in another AI's training data, or the behaviour appearing on the internet and being discovered by an AI agent doing an online search. We should also benchmark what cues increase the chance of an AI mimicking some behaviour. For example, it's known that language models preferentially learn facts from more consistent sources.

Written knowledge

One potential harm of AI is reducing some valuable types of human-written content. For example visits to the programming-Q&A website Stack Overflow declined after LLMs got good at helping programmers. This is good, though: LLMs are better than Stack Overflow, and if LLMs can't solve a problem, people can still go to Stack Overflow and post a question. Then the next generation of LLMs gets trained on it.

A more serious harm is that LLM text might be skewed to be more like median human text than human text is, as argued here. This might reduce richness and variety in areas where LLM text replaces human text - i.e., eventually probably everything. In the same way that industrialisation made many physical products cheaper and better, but also more uniform and less personalised, AI might do the same for products of mental work. This will likely be worse than the loss of personalisation with physical goods, since diversity and variety is more of the value of mental work, and therefore people will pay a steeper premium to maintain it than they would in physical products. Less variety would also mean a smaller population of memes for cultural evolution to select from. It also seems to be harder for LLMs to learn from LLM text - see for example this paper, or this exploration of how scaling laws change when there's LLM data in the mix. However, this type of "model collapse" shouldn't be exaggerated - it seems that dramatic slowdowns on LLM progress are not likely from it.

A big problem with current human written knowledge is that it often can't be used effectively. There isn't enough time to read everything, or find every useful paper on a topic. Lots of knowledge, rather than being part of amortised optimisation processes, sits ignored. LLMs could help. Already, LLMs are doing a good job as forecasters simply by having the patience to read every news article, or at finding papers on a topic, or getting close to beating Google. The ultimate search method is talking to a wise guru who knows everything; LLMs are tending towards this.

LLMs could also help reduce "research debt", the mountain of poor explanations, undigested thoughts, and noise that researchers have to process to get to the frontier. They're already decent at answering simple questions about a paper. In the future, they might distil a set of papers into a Chris Olah -level explanation that could be used by humans or AIs. This would be a very good situation for humans - think of how fun good explanations are to read - but will human mental labour still be relevant once this is possible? Producing a good explanation sounds close to an AGI-complete problem, but if AI disproportionately improves at amortised optimisation without getting much better at direct optimisation, a world where AIs can explain existing things well but humans are still needed for novel discoveries may exist for a while.

Alternatively, the massive context lengths of some recent LLMs could reduce the need for distillation at all (e.g. Claude 3 can read a long-ish novel in a single context). The entire requirement for explanations and distillations and even papers could disappear, replaced with LLMs taking in massive contexts full of experimental results and historical trends, and outputting answers based on that. This would be especially useful for literature-based discoveries. However, the principle generalises: want good relationship advice? Just dump your entire texting history into a massive LLM context and have it in mere seconds spit out advice that no human could've figured out without hours of reading and digesting. Humans would then likely become just consumers of wisdom, rather than producers of it: all the experience- or history-based insights you might eventually have would be told to you by AIs faster than you can have them.

AIs may also do away with written knowledge, by instead passing direct vector representations of concepts among each other. It's already known that the vector representations from of separate training runs and even different architectures can often be passed between each other with little modification, suggesting that translation issues wouldn't be a blocker. There are some reasons why language-like discrete representations may be quite fundamental (note that all of language, code, maths and music are written in discrete symbols). However, if these reasons aren't strong enough, we might end up with most information about the world existing only in vector representations, except when humans specifically ask an AI for an explanation.

Avoiding strategy errors

A supposed benefit of being wise over just smart is avoiding large-scale errors, where individual actions are clever and make sense but the end result is silly. As the saying goes: "it doesn't matter how sharp your axe is if you're cutting the wrong tree". Making good high-level decisions is generally called strategy, so I'll use that term.

I'll discuss two types of strategy errors, connect them to the frame amortised versus direct optimisation, and suggest AI effects.

But first, why is getting large things right something that relies on amortised optimisation? You can run a search process that operates on big things and answers big questions (for example, here's an example of someone explicitly drawing search trees over large-scale US-China war actions). Many of the best uses of machine learning, the ultimate modern amortised optimisation technique, are for doing small steps (like next-token prediction) well. So why is getting large things right on the side of amortised optimisation?

Any direct optimisation process means running a search. If you're trying to plan something big, that's going to take multiple steps in a complex world. If you're running a search over multi-step plans, the number of possibilities to search over quickly blows up. There are two choices:

  • Use heuristics to prune the search tree. How do we get those heuristics? Amortised optimisation. For example, AlphaGo's high-level architecture is good old Monte Carlo Tree Search (MCTS). But the search through the tree is guided by a value network (that estimates the probability of winning from a position, to limit the requirement to search over subsequent moves to figure out how good a position is) and policy network (that guides decisions over which parts of the search tree to explore first, to reduce the total amount of search required). Both the value and policy networks are implemented as deep neural networks that are trained on on lots of data. Amortised optimisation saves the day.
  • Do the search on a higher level of abstraction. Governments planning their nation's grand strategy do not push around individual division on maps, they instead chunk things together until they're thinking about allies and fronts and economies. To do good planning on the more abstract, chunked level requires a good model of how those chunks act. There seem to be two ways to get this:
    • First, you can chunk history into chunks of the same size as you're thinking about, and look at the patterns: when a nation that looks like this fought a nation that looked like that, the results tended to be this, and success correlated with how well they did X. But this requires a lot of history to learn patterns from - in other words, you're making use of amortised optimisation.
    • Second, you can be good at modelling things. If you have a good-enough model of people, economics, game theory, and war tactics, you can probably derive many of the patterns for what large-scale moves will be good, even without access to a lot of history about which nations win wars with which strategies. Doing this well does requires searching over alternatives and thinking on-the-fly - that is, direct optimisation. I'd also guess there's an some kind of important "model-building skill" involved. Parts of this is probably amortised optimisation to figure out what model types have worked well in the past. Another part is amortised optimisation over how those smaller primitives work (unless you're doing maths or extrapolating known fundamental physics, you always need some histories to fit your model to). I'd claim that it's hard to be good at modelling things without relying a lot on amortised optimisation, but I admit some confusion over how "modelling skill" fits into this framework, or where it comes from more generally.

Mistakes in strategic planning

Doing high-level strategic planning well requires being able to run a search over high-level plans well. We discussed two ways to achieve this:

  1. Using amortised optimisation to learn the associations between high-level actions and their outcomes.
  2. Having a good enough model of the lower-level details to be able to extrapolate how the high-level plans would go. This likely requires a lot of existing amortised optimisation about those lower-level details. It also likely requires something like simulation, which in turn requires having good models.

The first recommendation to be better at strategic planning would then be to know the relevant histories, so you can do the amortised optimisation thing of applying past lessons. However, if we're worrying about something like strategy for dealing with a big new thing transformative AI, this approach is limited because there aren't many good historical analogues.

Therefore, making wiser choices about AI likely relies in large part on having good enough models of things that are not AI strategy that you can extrapolate the consequences of different AI strategies. This can be done top-down (find empirical principles even more general, where AI is a special case) or bottom-up (simulate lower-level principles to try to figure out how the bigger picture of AI works).

A fair amount of discussion on LessWrong is about very general patterns of how the world works. To take one example, John Wentworth has written on agency and general-purpose search, and the difficulties of delegation. This can be interpreted as trying to distil the way the world works to fundamental blocks general enough that everything from AI to Amazon reviews for air conditioners falls out as a special case ~~and also as everyone involved getting nerdsniped - says I, while 9000 words into an overly-abstract essay series on the automation of wisdom~~.

This is the top-down approach. A lot of it is fundamentally about distilling a large body of data into some general approximate model that can then be queried cheaply - amortised optimisation. However, as with any theory-building process, there's also a lot of direct optimisation involved (e.g. searching over a space of ideas). I'd guess this sort of work is close to AGI-complete, and I'm uncertain what it's bottlenecked on.

On the other hand, there's the bottom-up approach. For example, Eliezer Yudkowsky's focus on coherence theorems seems to be due to a model where non-coherent agents predictably lose or self-modify, and therefore eventually we're dealing with coherent (i.e. Bayesian expected-utility maximising goal-driven) AI agents. This is a high-level (and contested) prediction based on extrapolating the behaviour of more basic primitives forward (and where the argument mostly does not make reference to prior histories). This post by Ajeya Cotra, or this post by Leopold Aschenbrenner, present very simulation-style arguments that take a model of how AIs and the AI race works and extrapolate it forward.

The full version of bottom-up strategy work also feels hard for foreseeable AIs to fully automate. There's a significant direct optimisation bottleneck, especially if trying to predict the behaviour of actors with large action spaces that themselves have a lot of direct optimisation at hand. However, there seems to be a clear path for AI to help. Even current LLMs could do a decent job of extrapolating the consequences of simple maths, or at role-playing the decision-making of given actors. Scenario-planning exercises and simulations, from Pentagon war games to Intelligence Rising, are useful for decision-makers and can reveal surprising options like nuking Belarus. AI can make these cheaper, by reducing the human mental labour needed to run good ones. This could help explore and evaluate possible strategies when we don't have much history to do amortised optimisation on.

Framing mistakes

In addition to being bad at strategy, another error of insufficient wisdom is having the wrong frame / ontology / paradigm. This is when you notice the Earth isn't the centre of the universe, or that a watch doesn't imply a watchmaker. It's when you go to your wise mountain guru to ask how to finally find that treasure, and they tell you the real treasure is the friends you made along the way.

If there were a simple explanation for how paradigm shifts are found, many famous philosophers of science would be a lot less famous. However, a paradigm shift takes at least two things: someone has to discover the new framing, and many someones have to care.

Current LLMs are bad at anything like finding paradigm shifts. Consider how many people use LLMs, and the lack of any breakthrough, even of the size of a decent research paper, where LLMs were claimed to be the main source of the concept. And paradigm shifts are much harder than research papers.

Inventing important moral principles, like utilitarianism or the categorical imperative, seems even harder. Bentham's and Kant's search for moral principles was presumably guided by trying to make precise their intuitive human ethics. There's some amount of data with which they could've succeeded without having those built-in moral intuitions, but it seems very helpful to have had those intuitions in their head as something they could easily query.

It's in having people care that AI might have a larger effect, and maybe a bad one. First, one driver of paradigm shifts is that someone gets annoyed by complexity or ugliness, and wants to fix it. By making mental labour cheaper, AI might reduce the ickiness of bad models. Consider how geocentrism may have lasted longer if it were trivial to ask your Jupyter notebook copilot AI to add a few more epicycles to your model, and it didn't mean more laborious longhand arithmetic for you. Second, increasing use of AI might mean that less and less of the paradigm-incompatible data is seen by human eyes. Imagine if the AI adds the epicycles in the background to improve the model, without any human ever noticing. Potential fixes might be keeping simplicity and elegance as key cultural values in scientific fields, and somehow making propagating this to the AIs (while it has many problems, xAI's "curious AIs" plan has some of this spirit).

Conclusion

The description of wisdom above may feel reductive: is a lot of it really just applying past data and the results of past computation to current problems? Is a lot of the task of improving our civilisation's wisdom, whether through AI or human actions, just the task of storing and effectively using written knowledge, letting selection processes like cultural evolution build up impressive results, and training opaque (natural or artificial) neural networks to compress insights from data?

A good explanation should feel somewhat reductive, though. Taking this perspective, the picture that emerges is one where a lot of the wisdom needed to take wise actions emerges almost automatically. Wisdom's signature is less brilliant mental moves, and more what's left standing once time and chance have taken their toll, or a training process finished compressing data in a brain or transformer. The most worrying thing, then, is systemic advantages that AIs likely have that might lead to them taking a dominant role in the use and production of wisdom. For example, human success is based on cultural evolution, but AIs might be better than humans at it, and we should take care to direct AI cultural evolution in a good direction.

AIs are likely to be helpful in many ways, though, for example by helping distil existing bodies of work, helping simulating scenarios as part of strategy planning, and generally becoming a wise guru that knows everything that everyone has access to all the time. However, it's still unclear how when and how they might help with other parts of wisdom, like avoiding ontology errors.