Two Buckets
I am now a ‘fieldbuilder’. I help run the Cambridge AI Safety Hub, and it’s been a mild tonal whiplash. I expect to talk a little more about this later on, but for now I am here to announce the ‘Alignment Desk’ - a writing accountability group that we are running. In order to cultivate a sense of kinship between myself and the rest of the Alignment ‘deskers’, I am joining in on the fun. I will be writing four pieces in the next two months.
The first one is trying to consolidate thoughts I have on the state of AI safety fieldbuilding. Given the nature of this programme, these posts are written in a hurry.
—
A few years ago, before all of the mass media marketing attempts to make AI safety mainstream, I felt enthusiastic about the average person doing work in this field. Not that I thought any of their research areas would ultimately work out, but that they would learn from their work, and if they thought it was a dead end, they would deal with it and try to deploy their talents elsewhere after some rigorous evaluation of their options. There are multiple people I know who took this counterfactual use of their time seriously in the earlier part of this decade.
Now, the field has more attention and more money, and most people want to continue hammering away at things they don’t expect to reasonably work out, because it is now easy enough to publish. Given how little established research there is in AI safety, there are lots of low-hanging fruits to spend a few months on, and you stand a decent chance of getting a paper into NeurIPS, ICML, or pick your favourite conference. Because it is fairly easy to publish, most people naturally keep working on things that fit the mould. But not every fruit needs picking. Not every conference paper is useful. Not everyone should work on AI safety.1 Most fieldbuilding programmes do not have good proxies for impact, and the proxies we do have -- conference papers, career switches -- aren’t necessarily the ones we should be optimising towards.
Most people are normal people. They want good pay, a stable life, and to feel like their work at least contributes. The cost of quitting your job because you no longer believe the work you do will pan out is high, and I do not wish to pretend it is any other way. Motivated reasoning has a strong hold on all of us. I do wish more people stopped and reassessed what value their work was generating. But the same attention and money that produces this also attracts very talented, agentic people who would not have looked at AI safety before. I have to work with both of these realities. So I think of two buckets to fill.
The first is the ‘gets it’ bucket. Those who ‘get it’ have deep threat models, can navigate between different research directions, and can settle on what to prioritise. People like Marius Hobbhahn, Beth Barnes, or Neel Nanda. People in this bucket can set a good overall strategic direction and take tough calls. Apollo deciding to split away from interpretability work last year is a good example. This bucket is extremely hard to fill, and harder now to spot because there is more noise. If CAISH finds the next Neel or Beth, I would consider it an outstanding success. Perhaps all the impact in my life will have been mostly located here.
The second is the ‘cracked’ bucket: technically excellent people who can execute at a high level, but who haven’t developed deep views on what the field should prioritise and are not particularly interested in AI safety as an intellectual exercise. This looks like the CS student who will go and work at Jane Street, but then is presented with AI safety being an exciting problem that they can get paid well to tackle. This bucket is far easier to fill. Cambridge has hundreds of people who fit the description. You cannot philosophise them into good strategic takes. You need to get them to work. Most cracked maths students want to do maths. They do not want, or perhaps even need, to hear about instrumental convergence.
Those who ‘get it’ need to set the strategic direction. They have a clearer sense of what to work on, so they should decide what gets prioritised. Those who are ‘cracked’ execute on that direction to a high standard. Getting this the wrong way round -- letting someone with no strategic sense lead an agenda because they’re competent -- would be an obvious mistake.
What does this mean for CAISH? We need programmes that cater for these buckets differently. For the ‘gets it’ bucket: reading groups, writing accountability setups like the Alignment Desk, and research fellowships where people can spend time developing their models of the world. Potentially giving them opportunities to found organisations too.
For the ‘cracked’ bucket: hackathons, compute that their university departments won’t give them, access to researchers, and concrete projects to work on. These people should not waste their time developing threat models. They should do an entry-level programme like MARS or SPAR and be ready to get to work afterwards. They do not need to walk through the long funnel of increasingly competitive programmes.2
The ‘cracked’ people also care about what they’re joining. Competent people want to interact with competent setups. CAISH has historically been disorganised and all over the place, and a lot of my strategy right now is fixing this: having good systems in place, making the website more professional, branding well on LinkedIn, and making the environment at CAISH one that feels exciting to be part of.
The two-bucket model also resolves something that has been vexing me. I have generally found it shocking talking to smart participants on expensive programmes who have poor arguments for why their work matters. Now I see that this is fine -- they are in the ‘cracked’ bucket.3
I am still workshopping this and expect my sense of it to clarify over time. The two-bucket model fits technical work well, but policy is harder. Policy work requires someone who ‘gets it’, but ‘getting it’ means more there -- it is not enough to understand the threat landscape, you also need to know how to navigate institutions, build coalitions, and win.4
Thanks to E for helping me put my thoughts on this together.
This could be wrong - I at least get the sense that the bar has dropped.
I sometimes wonder whether the ‘gets it’ bucket is actually already fairly full -- whether we already have enough people with good threat models, and the real bottleneck is just getting more ‘cracked’ people to work on the right things.
As well - if a programme cares about counterfactual value, then maybe it is fine if participants don’t yet know why what they are doing is worth it. They will develop this over time.
I mean it is also suspiciously convenient that you can separate this into two buckets. The buckets might have their own buckets.



Glad you wrote this out!
I think the buckets are less of a binary than you present them as. While these profiles are largely accurate imo, I think it is important the cracked technical people actually do spend ~some time thinking about threat models and strategy. Dave Banerjee just wrote a post on this here: https://davebanerjee.substack.com/p/if-you-dont-feel-deeply-confused that I like (with some thoughts I add in the comments)
My guess is we do still need more of the "gets it" people, but also that these people are not always best positioned to have impact. I think some end up in the latter bucket without realising their strategic edge. I also think some of these people likely work too much in isolation, so you have lots of good thinkers, but not much cross-talk between them, meaning the strategy and ideas do not spread as much as they could