Category: Psychology

You can just do things, but you don’t have to
Every big software engineering team right now is racing to out-do themselves on their adoption of agentic coding practices, and ship faster. There is something more insidious going on with many of the software engineers I talk to¹ though. A lot of pressure to build “more! faster!” comes from themselves.

This shows up all over: the “you only have 2 years to escape the permanent underclass” meme², or the various breathless LinkedIn or Twitter posts of 996’ing startups, labs, or particularly obsessive interns.

Things that used to require teams can now be done by a sufficiently keen solo engineer with a gang of Claudes, or Codexes, or a K2 agentic-swarms. That is thrilling, and it opens up the door to projects that you wouldn’t normally have bothered building. But it also open the door to thinking you need to build those things, and that’s not quite the same.

One of the observations of most people that take an extended leave from a large corporation is that much of the work they were doing wasn’t all that important. Either no one did it while they were out, or how they left it was… fine. Yet, much of that work somehow regains urgency as they come back to the role.

It’s very hard to tease apart how much of your output actually matters. Coordinating a large group of people inevitably takes overhead, and so many annoying aspects of work are genuinely important. But, much like Wanamaker’s famous quote about advertising, half of the work you do doesn’t matter, the trouble is you don’t know which half.

Adding a helpful and harmless model to the mix can certainly accelerate the rate of output, but it doesn’t do much about determining which bucket the work goes into. In fact, I’d say that the problems you take on when given a Max subscription are mildly more likely to to be things that haven’t been done because they are not worth doing. The combination of increased capacity and a pervasive sense of urgency is not a great recipe for quality decision making, or for a healthy relationship with your work.

It can be helpful to take the outsider perspective, at work or with personal projects. Would ask you someone else to do whatever you are considering, even with the expectation they would leverage agents to help them?

It’s often easier to see the value in something, or lack thereof, if you have to convince someone else of it. That can save you from some rabbit-holes filled with a sense of obligation to “extract value” from the time you already sunk into a misguided project.

This doesn’t mean you should ignore all of the ideas you have: you really can just do things, and you sometimes should! Just be clear about whether you want to spend your time³ that way, regardless of what the agent is doing.

Footnote:
1. Including myself! ↩︎
2. I appreciate Scott Alexander’s contribution on this topic: “You only have X years to escape permanent moon ownership” ↩︎
3. I didn’t actually quote him but everything about this article feels like a poor software engineer’s Oliver Burkeman, so you should just read him. ↩︎
February 2, 2026
Values in AI
Daniel Schmachtenberger has made the argument:
1. All technologies embody value systems
2. Some technologies are obligate in a competitive environment
The example of his that stuck with me was the plough: many cultures were animistic (a belief in the spirit of the animal), but after the scaling up of agriculture enabled by the plough, most weren’t. The plough’s enablement of large-scale agriculture likely shifted societies toward sedentism (vs nomadism) and surplus, altering spiritual relationships with animals as they became tools for labor. The perspective shift — the value it encodes — is embedded in the technology.

The plough is also obligate. If one group uses it and other doesn’t, the group that does will be able to farm more per person. That surplus enables for more specialization, which yields an advantage either in terms of trading or conflict. If the second group doesn’t adopt the plough they will be taken-over, outgrown, or both, by the first.

AI may well be an obligate technology, which forces us to make deliberate ethical choices about its deployment and values. We are in the early stages of seeing that with software development. That’s going to change the nature of certain careers: changing what the day-to-day work looks like and impacting demand for software engineers. That isn’t necessarily negative: it will depend on the opportunities that replace the current ones. It also isn’t neutral: our approach to AI, how we deploy it, how it is used are all a series of choices that embed values.

Some of those values are encoded into the models by the training data and loss functions, some are encoded in the systems engineering, the choices of which tasks to apply it to, which interactions to explore and so on, and some are explicitly engineered in through fine tuning and reinforcement learning.

One way of looking at those values is through the study of ethics, how to live in a just way. This is a core topic for philosophers. One example is Kant’s Categorical Imperative, which requires actions to follow maxims that could be universally applied without contradiction, ensuring rational consistency.

It’s somewhat akin to asking the question: Would I still support this if I knew everyone else would act this way? Further, would I support this action if knew I would be born again randomly into the world, maybe in a much different situation than my one now?

The proliferation of useful AI agents adds a somewhat realistic flavor to the question: if, in the future, you are dependent on systems constrained by these specific guidelines or rules , are you happy about that?

Kantian (or deontological) thinking is far from the only ethical system. A lot of thinking about AI ethics has been consequentialist. Consequentialism is practical: the “goodness” of an action is whether it results in a good outcome! Inherently we judge AI training (at least for RL and supervised learning) by the achievement of the outcome encoded in a loss function, reward function or similar. Stuart Russel (of & Norvig fame from university courses of my youth) has written about “provably beneficial” AI where the AI maximizes a human-involved reward signal (a little like the Assistant Games pattern we discussed before).

The downside of all this is well documented — Nick Bostrom’s famous paperclip maximizer thought experiment is an AI that achieves the objective, but in a way that was undesirable. A more benign but annoying example might be a cleaning robot that pushes everything outside the house in order to make it tidier. Because outcome-based rules just judge the what, and now the how, they can also encourage power-seeking (as called out by Bostrom) in order to better achieve objectives.

standard forms of consequentialism recommend taking unsafe actions when such acts maximize expected utility. Adding features like risk-aversion and future discounting may mitigate some of these safety issues, but it’s not clear they solve them entirely.

Deontology and safe artificial intelligence – William D’Allesandro

Anthropic’s constitutional AI approach can be seen as a blend of approaches; the constitution is a set of principles that can be used by another AI to criticize and improve output in response to requests:

As AI systems become more capable, we would like to enlist their help to supervise other AIs. We experiment with methods for training a harmless AI assistant through self-improvement, without any human labels identifying harmful outputs. The only human oversight is provided through a list of rules or principles, and so we refer to the method as ‘Constitutional AI’.

The training still ultimately uses a form of reinforcement learning (which is inherently consequentialist), but the reward is given according to how well the outputs adhere to the constitutional principles.

A more recent philosopher, Derek Parfitt, argued that all moral systems were hill climbing towards a shared perspective, and you can evaluate an action on multiple in order to gain confidence. For example, when considering an option, you could ask:

a) Would it maximize overall good? (consequentialist)
b) Could everyone rationally will it? (Kantian)
c) Could anyone reasonably reject it? (contractualist¹)

“Rationally” here is doing a bit of work: it means “with reasoning”, as in there is a chain of thought that can support and justify the decision.

Part of the challenge with rationalism is that part of the reward signal here is coming from human raters. We have seen this play out with LMSys where models which are “friendlier” score better, and in a more extreme version in the ChatGPT 4O misalignment where the model became excessively sycophantic in a way that resulted in better rewards in short doses, and didn’t impact any of the quantitative evaluations, despite being an overall negative to the experience.

As we move into more agentic systems we often have fewer tools to evaluate or make visible the values we are encoding, but we are still doing it!

For example. Google’s recent AlphaEvolve project uses Gemini underneath, which is an LLM that can be evaluated and aligned. But on top of that it uses an evolutionary search scheme (another reminder of Rich Sutton’s bitter lesson) to generate different prompts and evaluations and iterate towards a new, externally defined goal: in that case generating better algorithms and code. We are searching for superior outcomes, but that search itself is -somewhat unconstrained by other values: it’s a more consequentialist approach.

The current crop of agentic coding tools often recommends encoding preference data into a project specific file. For example, Claude Code recommends a CLAUDE.md file
- Include frequently used commands (build, test, lint) to avoid repeated searches
- Document code style preferences and naming conventions
- Add important architectural patterns specific to your project
- CLAUDE.md memories can be used for both instructions shared with your team and for your individual preferences.
While it presents them as memory, the idea here is to guide the choices of the model in a way that aligns with the principles by which the project being modified is managed.

OpenAI have published work in allowing hierarchies of instruction: The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions | OpenAI

we argue that one of the primary vulnerabilities underlying these attacks is that LLMs often consider system prompts (e.g., text from an application developer) to be the same priority as text from untrusted users and third parties. To address this, we propose an instruction hierarchy that explicitly defines how models should behave when instructions of different priorities conflict.

As well as using a single model that can incorporate different safeguards, we can use models themself to verify actions and outputs. Verification is generally an easier problem than generation, so a model that is unable to consistently follow a set of principles may still be able to validate whether a given example does or does not follow them.

LlamaGuard is a good example of this kind of system, built and released by Meta’s GenAI team alongside Llama. One example of seeing this process in the wild is OpenAI’s safety systems on 4O image generation. Inherently agentic, 4O can generate image ideas, then the image itself. Despite the model having constraints on it, it will happily generate things which violate OpenAI’s content policy, necessitating a monitoring model that whisks them away before a use can access a violating image.

If AI becomes an obligate technology, we will benefit from encoding values intentionally, balancing outcomes, universal principles, and fairness. The challenge is ensuring these choices reflect the world we want, not just the one we’re competing in.
1. Another theory of ethics that weights mutuality heavily: it’s frames ethical considerations as something derived between people rather than just based on outcomes or on abstract principles. Its featured particularly in Scanlon’s What We Owe to Each Other for those, like me, who get all of their ethical understanding from watching The Good Place ↩︎
May 23, 2025
Useful Reasoning Behaviors
[2503.01307] Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Very useful insight in this paper out of Stanford.

Test-time inference has emerged as a powerful paradigm for enabling language models to “think” longer and more carefully about complex challenges, much like skilled human experts. While reinforcement learning (RL) can drive self-improvement in language models on verifiable tasks, some models exhibit substantial gains while others quickly plateau.

The authors were running a reasoning post-training process on both Qwen 2.5 3B and Llama 3.2 3B. They noticed that while both learned, Llama was consistently worse than Qwen, which feels odd as both models are strong. In looking at the reasoning approaches exhibited they observed 4 distinct reasoning strategies:
- verification
- backtracking
- subgoal setting
- backward chaining
They noticed that Qwen exhibited these behaviors more from the base model, and those behaviors were enhanced more in the RL process.

While the larger Llama-3.1-70B showed generally increased activation of these behaviors compared to Llama-3.2-3B, this improvement was notably uneven — backtracking, in particular, remained limited even in the larger model.

They then generated some custom reasoning traces that intentionally demonstrated all 4 behaviors, using Claude.

We generate these datasets using Claude-3.5-Sonnet⁴, leveraging its ability to produce reasoning trajectories with precisely specified behavioral characteristics. While Claude does not always produce the correct answer (see Fig. 9), it consistently demonstrates the requested reasoning patterns, providing clean behavioral primitives for our analysis.

They found that when using the SFT set before RL they closed most of the gap between Llama and Qwen. They also found that it isn’t even important that the reasoning traces are correct – demonstrating the behavior is more important than the reasoning itself at the SFT stage.

Priming models with cognitive behaviors, by a small amount of finetuning, enabled significant performance gains even in models that initially lack these capabilities. Remarkably, this holds even when primed with incorrect solutions that exhibit the target behavioral patterns, suggesting that cognitive behaviors matter more than solution accuracy.

This fits with the elicitation idea — the SFT is training a style. By having increased activation of the reasoning styles the RL process is more able to explore these capabilities and reinforce extended reasoning generation.

This also fits with my mental model that a base model’s capabilities are often pretty underexplored: a combination of targeted SFT + RL seems to be a very powerful elicitation tool!
March 28, 2025
Engineering Culture at Meta

A question I’ve been asked recently is how Meta compares to other places I’ve worked, or what makes it different.

From my conversations and observations, those who disliked working at Meta often cited chaos, short-term focus, and internal politics, while those who liked it called out autonomy, speed, and the feeling they could work on important projects. To explain that disparity, I refer to three values or themes that shape the work culture.

Individuals are responsible for doing impactful work: “impact” is an important concept at Meta, and having a level-appropriate collection of impactful work at performance review time is important for every engineer. If you find yourself in a situation where your impact feels limited, you are generally responsible for exploring ways to address it, or make a change.

This means that ICs (individual contributors) at Meta are willing to cross team boundaries to find important work, and will also gravitate towards highly visible projects. They care about how their work is regarded and how it fits in to the wider organization. Internal mobility is fairly easy, so folks will leave teams if they can’t find the right kind of work. Its also reflected in the growth expectations: when I worked at Google and Lyft they also had expectations that IC3s would become IC4s, and IC4s become IC5s (though Google later removed this part), but the timelines were somewhat soft. At Meta, they are firm, and expectations ramp at defined intervals as you approach the boundaries.

Practically, this means ICs should expect to identify and collaborate on projects that align with organizational goals and take initiative to push them forward. Managers and leadership provide support, but success heavily depends on individuals ability and desire to chart a path, adapt as needed, and ensure their contributions are visible. In general, there’s a strong bias towards getting things done, getting things out, “rough consensus and running code”.

Dave Anderson has written about how much more helpful he found teams at Meta, vs Amazon where there was a lot of horse trading for collaboration. Part of that is driven, I think, by this responsibility for impact. Having another team’s thanks, or enabling results for them, allows you to claim some credit for their impact with relatively little effort. Conversely, intentionally blocking another team can be seen as gatekeeping, which is frowned upon.

No gatekeeping: Some version of “Move Fast” has been in Meta’s official values for a long time, and the company still operates at a good clip. Part of that is aided by generally making it easy to go make changes wherever they need to be made. One example of this I use with Google folks is OWNERS files. Google and Meta are both monorepo based, but at Google you have sets of services with clear owners, and touching code in another team’s service requires their full blessing. Meta also operates a monorepo, but there is much more fluidity — folks can land changes anywhere they need to. In part this is because the original Meta product, Facebook itself, is a monolith, but there is a deeper cultural aspect here.

For example, even very senior engineers will very rarely say “no” to something. Instead of outright rejection, feedback is often framed as suggestions or concerns to consider. This encourages risk-taking and innovation, but it also places a significant responsibility on engineers to weigh feedback carefully, address risks , and seek out champions from stakeholders. This is one source of the disconnect between folks who see Meta as a place where it’s ok to take risks and folks who don’t: if you take a risk, it fails, and at PSC (performance review) time someone affirms they called out the problems that occurred and you didn’t take appropriate measures, you will be dinged. I have seen people interpret this kind of feedback as nits or suggestions, rather than weighing it heavily and convincing others that the risk is well managed ahead of time.

In general, folks are expected to be helpful, to provide guidance to others, and not to put up walls, so it can be a tricky balance when outside teams or other engineers come in and ride roughshod over a team’s plans or projects. As an engineer, escalating misalignments on goals/priorities to management is usually well supported, and as a manager, putting engineers together to get to technical solutions across teams is expected. Enacting hard blocks where one team can’t achieve their goals because another was in the way is less so.

The heavy dependence on individuals and relationships, particularly for cross-team projects, is another key theme:

It’s a social company: Somewhat unsurprisingly for a company that started around a social network, Meta is a pretty social company. The internal social network is a firehose of information, and there are deep networks of connections across the company between senior engineers, managers and executives.

Its important for engineers to talk about their work in order to find folks interested in it, build connections and relationships with them, and have a good sense not just of their org but the universe of organizations that they operate within. A fairly common failure pattern is to build a good relationship with one side of an org and ignore another, developing a sizable blind spot that later comes back to be a problem.

The official org chart is the secondary and lagging structure at the company. The more important structure is the informal network of relationships that where many things get done, and decisions get made.

This dynamic can sometimes feel political, which is why some describe Meta this way. While there are large-company politics (it is a large, influential company), for most people, it’s less about traditional power politics and more about navigating cliques and informal networks of folks who have worked together on multiple projects and have mutual trust and respect. The company leans a lot on strong, senior engineers to drive projects to success, and those folks may not report into the org that is officially doing the work, or may be at a lower or higher level in the org chart than you might expect. They will often work by going directly to the people they know to unblock issues, drive important changes, or get alignment on a controversial decisions.

For big enough changes, org structural changes do follow, but they usually lag rather than lead the work itself.

What are the downsides and upsides?

As folks who have had a bad time can attest, Meta can be chaotic. There can be parallel implementations, people can swarm on important projects to the detriment of those trying to work on them, and less impactful projects can end up unowned and passed around. It can feel like information overload, with more being published than you can possibly follow. At the same time, there’s can be an information drought when truly important conversations happen in small, exclusive groups. For example, I’ve seen feedback about a project be shared openly between a small collection of engineers and leaders, without ever clearly reaching the team responsible for developing it.

The flip side is that this combination of values allows Meta to pivot surprisingly fast. Changes that would have required months at some companies can be kicked off in a day, particularly by very senior, well-connected leaders. Senior ICs can take problem descriptions, and quickly form an idea of who might have thoughts on it, and pull them in. Soon you have a loose group (often later structured into a “v-team” or virtual team) that can quickly align and drive change. The lack of gatekeeping, both technically and culturally, reduces the corporate immune reaction to large changes. The incentive to individual impact encourages folks to jump onboard important things without having to work out if that means a team change, or what their long-term situation might be.

December 11, 2024