Essay №07 AI · linguistic · strategy

Briefing Is Not Chatting

Chat-style AI compressed the skill gap; agents reverse it — briefing rewards decomposition, specification and evaluation, the same managerial competences that educational capital has always distributed unequally.

The 2023 studies were clear: AI reduces inequality. The least skilled gain the most. Brynjolfsson, Li and Raymond documented an average productivity gain of 15%, with even larger impacts for novices in customer-support centres — and close to nothing for experts.[11] Noy and Zhang confirmed it for professional writing: ChatGPT compressed the quality distribution by lifting the bottom of the curve.[12] The conclusion looked robust: generative AI diffuses codified knowledge downward, levels performance distributions, compresses the advantage of the best. That story is empirically valid — for one precise category of tool. It becomes false, and perhaps dangerous, the moment you change category.

The move from chat to agents does not slide a cursor along a continuous scale: it reverses a direction. The skills that sat on the less-qualified side under the chat regime — following scripted instructions, injecting context into a conversational prompt — are precisely what becomes insufficient under the agentic regime. What the agentic regime rewards goes by other names: task decomposition, explicit specification of intent, critical evaluation of intermediate outputs, automation strategy. These are skills whose distribution tracks educational capital, not willingness to adopt.

Ken Liu pushes this intuition to its limit in "The Shape of Thought." Two species meet there with no shared language; understanding each other never consists in translating word for word, but in learning the grammar that structures the other's thinking — in Liu, language does not carry meaning, it reconfigures it. Agentic briefing is of the same order. Mastering the register of instruction does not speed up a task already understood: it installs a different relationship to the problem. The fault line, then, does not separate those who brief quickly from those who brief slowly, but those who think within the grammar of delegation from those who remain in conversational paraphrase.

essay

The Articulation Gap

As I argued in an earlier essay, linguistic capital already conditions access to the benefits of LLMs in chat mode. What follows documents that the shift to agents amplifies — rather than reproduces — that mechanism.

Read →

Compression as backdrop

The compression result is neither false nor anecdotal. It corresponds exactly to what economic theory predicts for a non-autonomous AI regime. In a general-equilibrium model where individuals are distributed by knowledge level, Ide and Talamàs formally prove that AI co-pilots — assistive tools with no capacity for independent initiative — benefit the least qualified first, because they do not compete with their production work and provide problem-solving assistance at a near-zero opportunity cost.[1] This is precisely what the 2023 studies observe: novices gain more than experts, the distribution contracts.

The BCG study — 758 consultants, GPT-4, randomised experimental protocol — documents this compression with unusually fine precision: on tasks lying within the model's current capability frontier, subjects at the bottom of the skill distribution are "the largest beneficiaries of AI use."[2] The result is real. It is also confined to a precise configuration: bounded tasks, a tool in co-pilot mode, users operating in the zone where the model excels.

The distinction between regimes — non-autonomous and autonomous AI — is not speculative. Ide and Talamàs formalise it in unambiguous terms: "Autonomous AI primarily benefits the most knowledgeable individuals; non-autonomous AI benefits the least knowledgeable. However, output is higher with autonomous AI. These results reconcile conflicting empirical evidence and reveal trade-offs in regulating AI autonomy."[1] The compression studies and the amplification studies are not contradictory — they describe two regimes. Ide and Talamàs say so themselves, naming Dell'Acqua and Noy & Zhang as belonging to the co-pilot regime, and anticipating that their results do not extend to "autonomous AI capable of performing independent knowledge work."[1] What current organisational deployments treat as an upgrade from one regime to another may well be a change of nature.

The linguistic-capital framework — developed in the earlier essay from Agirdag's work[3] — predicts that the ability to formulate instructions in an elaborated, decontextualised register is unequally distributed by class and schooling. Under the agentic regime, that initial inequality does not disappear: it reproduces itself at each of the four stages that briefing structurally imposes.

What briefing an agent actually demands

The most honest metaphor comes from a CHI 2024 Best Paper. Tankelevitch et al. (2024) write: "The metacognitive demands of working with generative AI systems parallel those of a manager delegating tasks to a team. A manager must understand and clearly articulate their goals, decompose those goals into communicable tasks, confidently evaluate the quality of the team's outputs, and adjust plans accordingly. Moreover, they must decide whether, when and how to delegate the tasks."[4] The metaphor is not rhetorical. It is operational.

"The metacognitive demands of working with generative AI systems parallel those of a manager delegating tasks to a team."

— Tankelevitch et al. (2024), p. 2

To brief an agent is to exercise a managerial competence. And managerial competences are not randomly distributed across the population. Four structural demands separate agentic briefing from conversational interaction.

The first is task decomposition. In a chained workflow, a problem must be divided into self-contained sub-tasks, each mapped to a distinct step with a corresponding prompt.[6] Wu, Terry and Cai compared, in the lab, a visible-chaining interface against a standard chat interface, in a two-condition protocol with N=20 participants (UX designers, linguists, data analysts, non-ML engineers).[6] Blind raters preferred the chaining-interface outputs in 85% and 80% of paired comparisons depending on the task — about 82% on average (perceived quality judged by blind raters).[6] But the most instructive datum is not that preference — it is the difference between types of chaining: users who designed their own chains achieved higher quality than those using pre-defined chains. And designing your own chains requires precisely the decomposition skill the scaffolding is meant to teach. One participant's verdict on the open interface is curt: "Too much freedom can be a curse."[6] Nine of twenty participants reported increased complexity and a steeper learning curve — on an already highly selected sample of technical employees at a large technology company.

"Too much freedom can be a curse."

— Participant P9, in Wu, Terry & Cai (2022), p. 12

The second demand is the explicit formulation of intent. In performing a task by hand, "many unstated goals and intentions can remain implicit without ever being verbalised," Tankelevitch notes.[4] An email to a senior colleague implies a certain tone — the user knows it without spelling it out. Many generative-AI systems require that specification to be made explicit. The capacity to make explicit what was implicit is exactly what institutional socialisation distributes unequally. Zamfirescu-Pereira et al. (2023) documented this empirically in a prompt-design study with N=10 non-experts (professors, designers, engineers, researchers — all from prestigious academic settings: Berkeley, Cornell, Georgia Tech).[5] Participants "almost exclusively adopted an opportunistic, ad hoc approach to prompt exploration."[5] Zero out of ten spontaneously used the systematic-testing interface available — even after it was demonstrated. The authors identify two fundamental sources of difficulty: over-generalisation from isolated observations (giving up after a first failure, stopping after a first partial success) and the application of a human-to-human social lens to interactions with the model — to the point that some participants avoided effective designs after the interviewer had demonstrated their superiority.[5] These behaviours were not observed in experts — but Zamfirescu-Pereira's sample was already selected upward. In a population with lower educational attainment, the failures would likely be more pronounced, not less.

The third demand is evaluating outputs under uncertainty. "Many workflows have shifted from content generation to content evaluation," Tankelevitch summarises.[4] That shift requires "well-calibrated confidence" in one's own evaluative capacity — exactly what novices structurally lack: "If you don't know what you're doing, it can confuse you more," one participant remarks of long code suggestions.[4] Dell'Acqua adds a counter-intuitive layer. His protocol compares three conditions: participants could use GPT with a page presenting the model's limitations (GPT+overview), GPT alone with no warnings (GPT-only), or work without AI assistance (control). On tasks outside the model's frontier, GPT+overview participants produced 24.5% fewer correct answers than the no-AI control group, while GPT-only participants produced 13.9% fewer.[2] Worse: these users simultaneously achieved 25% higher subjective coherence on their wrong answers.[2] Better briefing skills amplify the gain inside the frontier and amplify the loss — and the miscalibrated confidence — beyond it. The frontier is structurally invisible to the user: "Because AI capabilities are advancing rapidly and are poorly understood, it can be difficult, ex ante, for knowledge workers to grasp exactly where the frontier lies at any given moment."[2]

The fourth demand is automation strategy. Sharp et al. (2026) name the conceptual fault line: "Agents act as autonomous delegates rather than tools, generating new asymmetries through the large-scale delegation of goals."[7] Deciding whether, when and how to delegate a task to an agent is not a technical decision. It is a managerial decision that presupposes the capacity to model the delegate's competences, anticipate breaking points, and design evaluation criteria up front. Sharp et al. frame it in terms of complementary assets: even with identical formal access to the same agent, benefits diverge according to capacities for "managerial oversight, domain knowledge and strategic agent deployment."[7] These are not skills acquired in an afternoon's training.

Amplification at three scales

Three radically different field studies — Kenya, Harvard, ETH Zurich — converge on the same structure of results. On tasks with an open-judgment component, AI amplifies the initial skill gap instead of compressing it. Their methods have nothing in common. Their conclusions do.

In Kenya, Otis et al. (2024) ran a randomised experiment with 640 entrepreneurs (SMEs), comparing access to GPT-4 via WhatsApp against a control group receiving ILO training guides.[8] The average effect is null — neither gain nor loss on revenues and profits (β = 0.04, p = 0.34 — β is the standardised regression coefficient, near zero here; p = 0.34 indicates a non-significant result).[8] Behind that average, two opposite trajectories: low performers fell by about 10% (β = −0.08, p = 0.01), high performers rose by about 18% — a suggestive result, not significant at the 5% threshold (β = 0.16, p = 0.07).[8] The difference between the two trajectories is significant (Δ = 0.23 standard deviations, p = 0.01). What makes this analytically decisive is the mechanism: both groups asked similar questions, received similar advice, and were equally likely to act on it. What differed was which advice they implemented. "Low performers were particularly inclined to implement generic advice focused on cutting prices and investing in advertising. High performers, by contrast, worked with the AI to uncover targeted, specific changes that benefited their businesses."[8] Access to advice was symmetric. The capacity to select was not.

"Only high performers were able to both effectively screen and implement valuable, as opposed to detrimental, AI-generated suggestions."

— Otis et al. (2024), p. 4

At Harvard, Weidmann, Xu and Deming randomly assigned the same leaders to human teams and to GPT-4o agent teams, in a counter-balanced two-condition protocol (N=249).[9] The central result is blunt: "More than half of the variance in group performance across both tests can be explained by the identity of the leader alone."[9] The disattenuated correlation between performance with agents and performance with human teams is 0.81.[9] The predictors common to both conditions — fluid intelligence, emotional perception, turn-taking management — are not AI-specific technical skills. They are cognitive and social skills. A good leader (one standard deviation above the mean) solves 53% of problems correctly; a poor leader solves 10%.[9] AI does not shift that gap: it applies it to a wider surface of action.

At ETH Zurich, Thorgeirsson, Weidmann and Su measured the predictors of agentic vibe-coding performance in N=100 students (pre-registered cross-sectional study).[10] Computer-science achievement predicts performance (r = .39, p < .001), a correlation that survives cognitive controls (partial correlation = .28, p = .005). Writing skill also predicts performance (r = .29, p = .003), but that correlation becomes non-significant after controlling for general cognitive ability (partial r = .19, p = .066) — the writing effect operates partly through general cognition. The mediation is precisely located: prompt quality accounts for 52% of the association between writing skill and agentic performance.[10] In operational terms: participants with the best writing skills produced clearer, better-structured prompts, which in turn predicted better outcomes. Writing skill acts through the capacity to formulate structured instructions — a register, not a technique. A further, counter-intuitive result: frequency of LLM use correlates negatively with vibe-coding performance (r = −.26, p = .010) and with writing skill (r = −.28, p = .005).[10] Frequent use does not compensate for a deficit in initial capital — it may be a symptom of it.

These three studies were not designed to answer one another. They reach the same result by incomparable methods, in three unrelated institutional contexts. Ide and Talamàs formalise the theoretical reconciliation: "Several studies suggest that AI disproportionately benefits the least knowledgeable individuals and reduces performance inequality [...] However, these studies focus on AI co-pilots. Our framework supports these findings, but also emphasises that they may not extend to autonomous AI capable of performing independent knowledge work, such as conducting research, drafting documents or analysing data."[1] The two bodies of evidence describe two regimes. The confusion between them is not theoretical — it is organisational.

The structural condition

None of the cited studies directly compares the magnitude of the skill effect between chat and agentic interfaces on matched tasks with a sample stratified by skill level. It is precisely this missing study that would let us quantify the move from the compression regime to the amplification regime — to measure whether the gap is twice as wide, three times, or of a different order of magnitude. Its absence reveals the speed at which agentic architectures were deployed before the tools to assess inequalities of access had been built.

The standard organisational response to this kind of disparity takes the form of prompt libraries, decomposition assistants and briefing training. The intent is right. The intervention is structurally insufficient, and the data show it directly. Wu et al. (2022) find that users who designed their own chains outperformed those using pre-defined chains.[6] And designing your own chains requires exactly the decomposition skill the scaffolding is meant to transmit — the barrier moves, it does not disappear. Zamfirescu-Pereira documents the same phenomenon: participants kept using the opportunistic testing interface even after watching the interviewer demonstrate the effectiveness of the systematic approach.[5] Information is not enough to transfer skill. Sharp anticipates it: "In the short term, users may need considerable skill to specify complex goals and manage multi-step delegation, which will likely favour those with higher levels of digital literacy."[7] That prediction is not futuristic. It describes the current state of agentic deployment.

Chat-style AI did, in a precise sense, keep the promise of democratisation: it distributed codified knowledge downward, compressed performance distributions on bounded tasks, lowered the cost of accessing information once reserved for the better-trained. That result is real. It is also confined to a category of tool that is rapidly becoming a minority in enterprise deployments, as organisations migrate to semi-autonomous agent workflows.

Agents do not distribute codified knowledge. They multiply the capacity of those who already know how to formulate, decompose, evaluate and correct. These are the same skills that the sociology of education has identified for forty years as differentially distributed by class, schooling and professional socialisation — not by accident, but because they are the skills that elite professional-training institutions have spent decades producing and certifying.

The condition organisations refuse to name is this: deploying agents while assuming an equal capacity to brief them is deploying a productivity mechanism that amplifies pre-existing inequalities of human capital — and makes them harder to see, because everyone uses the same tool, in the same interface, at the same address.

This fault line is orthogonal to another I have mapped elsewhere: where organisational trust depletes collectively across failed deployments, independently of individual skill, the amplification described here widens the gap between people, independently of collective history. The two dynamics do not compete — they compound.

essay

The people in the middle of GenAI adoption

The collective half of the picture, in brief: each failed or over-promised rollout consumes a non-renewable stock of organisational trust — a depletion that operates on the whole workforce at once, regardless of any individual's skill.

Read →

essay

No Clean Slate

The full mechanics of that depletion: why trust in AI behaves like a partially non-renewable resource with asymmetric withdrawal rates, and why standard repair strategies do almost nothing to restore it once employees have reframed errors as integrity violations.

Read →

"Brief literacy" is not an optional skill to pick up on the job — it is a real condition of access to agentic value, as structuring as the mastery of professional reading. What this mechanism implies for organisations: invest explicitly in framing training (not "how to use the tool" but "how to structure a delegation"), design interfaces that scaffold the structure of the brief rather than inviting conversational chat, and recognise that unequal access to AI value is not technical — it is rhetorical and cognitive. Adoption programmes that skip this step are not deploying agentic AI: they are deploying chat with a more complex interface.

[1]

Ide, E., & Talamàs, E. (2023). Artificial intelligence in the knowledge economy. arXiv:2312.05481v12. doi:10.48550/arXiv.2312.05481

[2]

Dell'Acqua, F., McFowland III, E., Mollick, E. R., Lifshitz-Assaf, H., Kellogg, K., Rajendran, S., Krayer, L., Candelon, F., & Lakhani, K. R. (2026). Navigating the jagged technological frontier: Field experimental evidence of the effects of AI on knowledge worker productivity and quality. Organization Science. doi:10.1287/orsc.2025.21838

[3]

Agirdag, O. (2026). Beyond prompt engineering: Prompting (l)iteracy, linguistic capital, and educational inequality. Educational Theory. doi:10.1111/edth.70057

[4]

Tankelevitch, L., Kewenig, V., Simkute, A., Scott, A. E., Sarkar, A., Sellen, A., & Rintel, S. (2024). The metacognitive demands and opportunities of generative AI. Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), Article 902. ACM. doi:10.1145/3613904.3642902

[5]

Zamfirescu-Pereira, J. D., Wong, R., Hartmann, B., & Yang, Q. (2023). Why Johnny can't prompt: How non-AI experts try (and fail) to design LLM prompts. Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '23), Article 437. ACM. doi:10.1145/3544548.3581388

[6]

Wu, T., Terry, M., & Cai, C. J. (2022). AI Chains: Transparent and controllable human-AI interaction by chaining large language model prompts. Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '22), Article 385. ACM. doi:10.1145/3491102.3517582

[7]

Sharp, M., Bilgin, O., Gabriel, I., & Hammond, L. (2026). Agentic inequality. arXiv:2510.16853v3. doi:10.48550/arXiv.2510.16853

[8]

Otis, N. G., Clarke, R., Delecourt, S., Holtz, D., & Koning, R. (2024). The uneven impact of generative AI on entrepreneurial performance: Evidence from a field experiment in Kenya. HBS Working Paper No. 24-042. doi:10.2139/ssrn.4671369

[9]

Weidmann, B., Xu, Y., & Deming, D. J. (2025). Measuring human leadership skills with artificially intelligent agents. arXiv:2508.02966v1. doi:10.48550/arXiv.2508.02966

[10]

Thorgeirsson, S., Weidmann, T. B., & Su, Z. (2026). Computer science achievement and writing skills predict vibe coding proficiency. Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '26). ACM. doi:10.1145/3772318.3791666

[11]

Brynjolfsson, E., Li, D., & Raymond, L. R. (2025). Generative AI at work. Quarterly Journal of Economics. doi:10.1093/qje/qjae044

[12]

Noy, S., & Zhang, W. (2023). Experimental evidence on the productivity effects of generative artificial intelligence. Science, 381(6654), 187–192. doi:10.1126/science.adh2586