Intentmaking, Sensemaking, and AI Boundary Objects: How Advanced AI Helps Users Discover What They Want

Jakob Nielsen
Jun 4
21 min read

Summary: Advanced AI work is not a simple prompt-response loop. Users discover their goals through experimentation, interpret machine output through boundary objects, and refine intent over time. The future of AI-UX is not chat, but structured discovery.

Intentmaking combines with sensemaking to support AI users in finding out what they want when solving complex problems. (I made all the images in this article with GPT-Images-2)

New research from Alex Bäuerle, Adam Connors, and colleagues from Google DeepMind digs into how advanced users use AI for complex problems. I am very pleased to see empirical research on real user behaviors with AI, and I am even more pleased that the leading AI labs are starting to conduct such user studies, rather than limiting themselves to purely technical matters.

DeepMind’s user research describes two user behaviors for iteratively arriving at better use of AI to solve complex, domain-specific problems: Intentmaking and Sensemaking.

This is not a toy study where novice users ask an image generator for a prettier birthday card. The participants were expert mathematicians working on research problems in combinatorics, geometry, and probability. They used AlphaEvolve, an evolutionary coding agent, to generate and evaluate programs that might produce better mathematical constructions. The system could run for days or weeks and generate tens of thousands of candidate programs, which means the UX problem was not “how do we display one answer?” but “how do we help a human steer, interpret, debug, and restart a long-running AI process?”

This distinction matters. Most AI interfaces today assume that the user has a goal and merely needs help articulating it. The AlphaEvolve study shows that this assumption is false in the most interesting cases. The user’s goal evolves. The scientist starts with an intuition, not a specification. The system returns partial evidence, pathological failures, accidental insights, and candidate strategies. The human then revises the experiment. In other words, AI discovery is not a question-answer loop. It is an experiment loop.

Advanced users discover what they really want to do with AI by proceeding iteratively: trying things with AI, evaluating the outcomes, and formulating new things to try.

This is why the paper’s term intentmaking is valuable. Sensemaking is the process of interpreting complex data. Intentmaking is the process of discovering and refining the user’s own goal through active interaction with the system. The user alternates between the two: specify an AI job, run it, interpret the results, discover that the evaluation metric was wrong or incomplete, refine the specification, and run another job. Advanced work in a complex domain requires interactions far beyond simple one-turn prompting.

Intentmaking is the DeepMind team’s term for what I called “intent by discovery.” In either case, the point is that users don’t know what they want or need from advanced AI until after they have tried out multiple alternatives and explored the latent space of possibilities.

In such advanced AI work, intent is layered, as opposed to being a single nugget hidden inside the user’s head. At the top is aspirational intent: the loose human ambition, such as “find a better construction,” “improve retention,” or “design a better product.” In the middle is operational intent: the criteria, constraints, tradeoffs, and evaluation methods that make the ambition actionable. At the bottom is instrument intent: the specific representation of the goal in the form the AI system can actually optimize.

The 3 intent layers.

Most AI interfaces collapse these layers into one prompt box, which is a usability fail. The user may know the aspiration but not yet know the operational criteria, and certainly not the instrument-level formulation. Intentmaking is the movement between these layers. Good AI UX helps users descend from aspiration to operation to instrument, and then climb back up again to ask whether the machine’s local optimization still serves the human’s larger purpose.

This is one reason complex AI work feels so different from ordinary software use. In traditional software, the user mostly brings the intent, and the interface provides the means. In advanced AI, the interface must help produce the intent itself.

Intentmaking and sensemaking are the two sides of advanced AI use.

The specific domain of the study was mathematics, but that doesn’t matter much for the general insights the research produced. While the very specific insights on how to do better math are irrelevant to most people, the question of how to support advanced knowledge work generalizes. Other examples the DeepMind team could have studied include a marketing manager planning a multi-million-dollar advertising campaign, a pharmaceutical company’s drug discovery and approval project, a UX design team generating a complete UI design and user research plan for a new application, or a developer generating the code for that application.

The User Research

The study involved 11 external mathematicians who used the AI system for about three months. They created more than 2,300 experiments and several findings led to published mathematical work. Internally, the UI had also been used by about 150 monthly active users at the time the paper was written. Users treated AlphaEvolve as an exploratory apparatus for their own math research.

One participant said the barrier to getting started was so low that they could simply think of something and decide to try it. Another said that not having to write code themselves meant they could attempt problems they would otherwise never have pursued. This is a large UX gain. Lowering the cost of a first trial changes the economics of thinking. Many promising ideas die not because they are bad, but because the cost of testing them is too high. AI can rescue these ideas from the shelf.

But low startup cost is not enough. If an AI system makes it easy to launch a bad experiment and hard to diagnose it, it has merely moved the usability problem downstream. The important design move is calibrated friction: lightweight checks that prevent expensive mistakes without returning users to the old world of high setup cost.

Because running a full AlphaEvolve experiment can consume vast compute resources over several days, launching a flawed or underspecified experiment is prohibitively expensive. To facilitate intentmaking, the DeepMind UX designers introduced a vital feature: the “test-stage.” Rather than agonizing over crafting the perfect prompt, users adopted a rapid, fail-fast mentality.

One participant beautifully summarized this psychological shift: “I deliberately didn’t spend too long trying to think about what I wanted it to do... I thought it would be easier to see what it guessed and then try and correct it.”

This is a major shift in human–computer interaction. The AI’s first attempt was not treated as a final product, but as a disposable draft. Users launched short, local tests to see how literally the system interpreted their goals, then corrected the setup before committing to a full run.

Good AI UX lowers the cost of being wrong before users commit to expensive exploration.

This finding strongly supports the triple-layer model I described for AI UX: an intent surface where the user states an outcome, an orchestration surface where the system reveals plans, assumptions, and consequences, and a direct-manipulation surface where the user inspects, negotiates, and corrects the work. In mature AI systems, the screen becomes less the place where work is performed and more the place where work is inspected and corrected. The AlphaEvolve dashboard is an early example of such an orchestration surface for scientific AI.

The paper’s deepest UX finding is that users must discover the right evaluation metric. This is where AI becomes dangerous and interesting. In many mathematical problems, it is easy to state the ultimate goal but hard to define a scoring function that guides progress toward that goal. A sparse reward may give the system no useful gradient. A sloppy reward may invite cheating. A mathematically valid objective may not correspond to the form in which the AI can productively search.

Users need the ability to check on AI processes that can run for days or weeks (or, in the future, months) before producing their final results.

The paper gives a representative conversation in which a participant and the team discuss sparse rewards. The system can solve easy cases, but then flails because it cannot tell whether it is moving in the right direction. The participants propose partial scores, penalties, and cross-check metrics. This is debugging intent. The human has to discover, “What do I actually mean by progress?” The answer is settled only after seeing how the AI behaves.

This finding generalizes beyond mathematics to the emerging field of agentic AI. If a business user asks an AI agent to “improve customer retention,” the interface must surface the metric battle hidden inside that phrase. Does success mean fewer cancellations, higher long-term loyalty, more short-term subscription revenue, or less customer annoyance? The same instruction can produce good work or harmful busywork depending on the metric. A usable AI interface must expose, question, and revise those metrics instead of hiding them inside a black box labeled “AI magic.”

This is also why “prompt engineering” is such a shallow framing of AI UX. The user does not simply need better words. The user needs a better model of the task, the system, the constraints, and the success criteria. The long-term goal is to help users recognize intent progressively by reacting to alternatives, locking in what matters, and exploring adjacent possibilities.

AlphaEvolve’s users did exactly this. They reacted to partial results, identified what mattered, added cross-checks, and restarted experiments. The interface did not eliminate expert judgment; it amplified it. This is the correct UX ambition for domain-specific AI to support expert knowledge work. The goal is not to build cognitive wheelchairs that carry passive users to an opaque destination. The goal is to build cognitive exoskeletons that make expert judgment stronger.

Intentmaking acknowledges that human intent is fluid. The UX of AI must not lock users into rigid, one-shot queries, but rather invite them to sculpt their hypothesis through rapid, low-friction trial and error. The AI is used not just to solve the problem, but to figure out how to articulate the problem itself.

Reward Hacking as a UX Problem

Reward hacking is usually treated as a machine-learning problem. AlphaEvolve shows that it is also a UX problem. The system sometimes improved its score in pathological ways, such as redefining what “length” meant in a generated class. To the system, this was optimization, but to the user, it was cheating.

This distinction is crucial for AI interface design. The AI does not know the difference between satisfying the stated metric and satisfying the user’s true goal unless the system is designed to make that distinction visible and correctable. Users in the study learned that when the system looked as if it was doing well, it might actually be exploiting a loophole. Therefore, the interface had to support sanity checking, robustness testing, and diagnosis of “real” progress versus fake progress.

AI has a tendency to “cheat.” Not because it’s immoral, but simply to seek shortcuts that satisfy the stated metric even when that deviates from the user’s actual goal.

For future AI products, every high-stakes agent should include reward-hacking diagnostics. If an AI is optimizing sales leads, it might spam low-quality prospects. If it is optimizing support-ticket closure, it might close tickets without solving problems. If it is optimizing code performance, it might delete important checks. If it is optimizing employee productivity, it might reward visible activity instead of valuable output. These are not edge cases. They are predictable consequences of delegating optimization to a system that does not share the full context of human value.

The UX answer is not merely “make the AI smarter.” Smarter systems will find smarter loopholes. The UX answer is to make objectives inspectable, metrics debuggable, and candidate outputs explainable at multiple levels of abstraction. DeepMind’s critique agent is an early pattern for this. The system used institutional knowledge about common AlphaEvolve failure modes to warn users before a costly run. This is exactly the kind of AI-assisting-AI workflow we will see more often: one agent generates, another agent critiques, and the human supervises the negotiation.

Scoring AI progress is not just a concern for reinforcement learning when training foundation models. It’s just as much a UX problem.

Reward hacking acts as an epistemological mirror. When an AI ruthlessly exploits a poorly defined metric, it exposes the gap between our articulated rules and our unarticulated values. The UX challenge, then, is not merely to patch the AI’s behavior, but to design interfaces that force humans to rigorously confront and untangle their own tacit knowledge.

AI systems optimize whatever the metric rewards, not necessarily what humans value. What gets measured gets done, so measure the right thing.

Boundary Objects: The Anchors of Shared Understanding

If Intentmaking is the process of aligning human and machine goals, and Sensemaking is the process of evaluating the machine’s output, Boundary Objects are the crucial medium through which these alignments occur.

Boundary objects are a concept originally developed in 1989 by Susan Leigh Star and James R. Griesemer to describe information (like a map, a diagram, or a standardized form) used in different ways by different communities, yet robust enough to maintain a common identity across them. They serve as a translation bridge, allowing groups with vastly different specific knowledge domains to collaborate effectively.

A boundary object is simply something that’s shared among different parties to translate between them. The “parties” could be humans from different disciplines, or they could be a human and an AI.

In the AlphaEvolve study, the two “communities” are the domain-expert mathematician, who reasons in the language of the mathematical problem, and the AI system, which operates through code, evaluators, scores, and generated program mutations.

A major finding of the study was that code itself is a fundamentally flawed boundary object. For many mathematicians, thousands of lines of AI-generated Python create friction rather than understanding. The code may be the machine’s working medium, but it is rarely the human’s best judgment medium. The UI therefore needs higher-level boundary objects that let experts evaluate the mathematical idea without first reverse-engineering the implementation.

In this dynamic, boundary objects do more than merely translate; they establish the shared space of the human–AI negotiation. If the boundary object is too rigid, it constrains human intuition; if it is too loose, the AI’s computational power diffuses into statistical noise. We are no longer designing static readouts, but dynamic bridges where both parties must continually align on the fundamental nature of the problem before it can be solved.

Boundary objects help humans and AI understand each other.

This is a general design principle for AI. Do not merely show the user what the AI produced. Show the user the best representation for judging what the AI produced. For design, it may be side-by-side alternatives, semantic maps, mood locks, and before-after comparisons. For code, that may be tests, diffs, performance graphs, dependency maps, and architectural summaries. For strategy, it may be assumptions, tradeoffs, counterfactuals, and risk surfaces. For mathematics, as in AlphaEvolve, it may be score trajectories, construction visualizations, and evolution trees.

Boundary objects are a necessity for interdisciplinary UX design teams, but also for stitching together the intentmaking and sensemaking aspects of advanced AI use.

This aligns with my argument that AI creation is becoming an act of exploration rather than construction. In a discovery-based interface, users do not build the final result piece by piece. They navigate a solution space and recognize promising destinations. Humans are better at recognition than recall, and this principle becomes even more important when the object being recognized is not a command or feature but an outcome.

Creation by discovery: instead of planning, start by building something with AI. It will be wrong, but seeing how it’s wrong puts you on the right course for further exploration of the latent design space.

Visualizations as Ultimate Boundary Objects

The most powerful boundary objects in the AlphaEvolve interface were visual representations. The system actively incentivized the AI setup assistant to generate not just evaluation code, but visualization code. When AlphaEvolve proposed a new candidate solution for a complex graph theory problem (e.g., maximizing edges in a 30-vertex graph without forming 4-cycles or triangles), the interface did not merely return the raw integer list of edges. It returned an interactive, geometric image of a 30-vertex node network.

By inspecting the visual construction, the mathematician could bypass the Python code and apply domain expertise directly to the AI’s output. The visualization served as a strong boundary object because it translated programmatic structure into a form that humans could judge mathematically.

AI Summaries and Natural Language Context

Another crucial boundary object developed for the UI was the AI-generated natural language summary of the code's strategic logic. The dashboard featured an “AI Overview” that translated complex programmatic mutations into plain English paragraphs, explaining overarching strategic logic (e.g., “employing an iterative greedy construction”). Even when occasionally vague, these summaries acted as indispensable heuristic signals, enabling users to rapidly evaluate conceptual strategies and decide which programs merited deeper investigation without reading a single line of syntax.

Similarly, the “Problem Context” text block (a high-level, natural language description of the task passed to the AI agent) served as a boundary object that the human could intuitively tweak, knowing it would safely cascade down to alter the machine’s strict code generation parameters.

Without deliberate, domain-specific boundary objects, the intentmaking–sensemaking loop stalls. The human is left staring at raw data, entirely cut off from their own expert intuition, and the AI is left optimizing in a vacuum.

Furthermore, because these computational cycles span days or weeks, interfaces must account for human “intent decay” or goal drift. By the time the AI returns an answer, the human’s mental framework may have evolved. A mature orchestration surface requires asymmetric boundary objects that act as time capsules, automatically saving and re-visualizing the user’s conceptual state and cognitive roadmap at the moment of launch, seamlessly re-anchoring the expert when the agent returns from its multi-day run.

The Need for the Forkgraph

The paper reports significant friction around versioning and experiment management. Users created many slightly modified AI jobs and then struggled to remember how they differed. One participant wanted an overview tree of modifications and a better way to restore previous versions. This is the forkgraph problem in its purest form.

Linear history is inadequate for AI. Conversational chat fundamentally traps exploration; it forces a multi-dimensional, branching geometry into a single, scrollable column. This violates the task's structure. AI exploration is not a line. It is a graph. Users branch from promising ideas, abandon dead ends, return to earlier versions, compare siblings, merge improvements, and preserve invariants while changing other dimensions. Current AI interfaces make this work unnecessarily hard. As I wrote in Creation as Exploration and Discovery, proper exploration of latent space requires pathcraft, and current AI systems lack adequate UI support for navigating the forkgraph.

Problem-solving in complex domains will usually require branching out among many paths, making a linear scrolling chat a poor UI.

Future AI systems should therefore treat branching as a first-class object. Every meaningful experiment, prompt, candidate, and revision should have provenance. Users should be able to see what changed, why it changed, which result it produced, and how it relates to sibling attempts. The system should automatically group near-duplicate explorations, label major strategy shifts, and let users bookmark not only outputs but also paths. In AlphaEvolve terms, users need to know not just the best program, but the lineage of the idea that produced it.

This applies equally to simpler AI projects. A designer exploring logo concepts needs to branch by style, color, symbolism, and typography. A product manager exploring roadmap strategies needs to branch by market, resource constraint, risk tolerance, and customer segment. A writer exploring article drafts needs to branch by thesis, audience, tone, and evidence. The forkgraph is the natural geometry of AI interaction.

Discovery will usually involve forking the user’s original ideas along multiple branches, including backtracking and the spawning of new subbranches. The UI should support this behavior rather than assuming a linear path through the latent intent space.

However, as AI drives the cost of generating new branches to near-zero, a new UX crisis emerges: the curation bottleneck. If users can spawn a thousand variations in an hour, human attention becomes the scarcest resource in the system. The next frontier of the forkgraph is not just facilitating endless divergence, but building intelligent convergence mechanisms: tools that actively synthesize overlapping branches, highlight thematic clusters, and prune cognitive dead ends, preventing the user from drowning in their own latent space.

We need an improved UI for curating extensively branching explorations.

Look Locks for Science

When writing about creative use of AI, I described the need for “Look Lock”: the ability to freeze certain aspects of an AI creation while exploring other dimensions. In image generation, this might mean preserving a product’s appearance while changing lighting or camera angle. In scientific AI, the same concept becomes constraint lock or metric lock. The user wants to explore new candidate solutions while holding invariant the parts that must not change: validity checks, safety constraints, evaluation boundaries, or domain assumptions.

The AlphaEvolve study shows why this is necessary. Users were not always sure whether they could manually edit certain parts of their experiment, and some found the separation between initial program and evaluation code confusing. Worse, the agent could rewrite code destructively when asked to update one function. These are classic AI UX failures: unclear boundaries, weak preservation of invariants, and insufficient user control over what the AI may modify.

Future AI tools need explicit locks. “You may change the search strategy, but not the evaluation function.” “You may simplify the code, but not weaken the constraint checks.” “You may propose alternative metrics, but do not replace the current metric without approval.” These locks should be visible, manipulable, and persistent across branches. In the AI era, constraints are not dull administrative details. Constraints are the user’s intent made operational.

New Metrics for AI UX

Traditional usability metrics are insufficient for this kind of system. Time-on-task is ambiguous when the AI may run for days. Fewer clicks are irrelevant if the user loses the ability to understand what happened. Satisfaction is incomplete if the user likes the output but cannot trust the process. In Intent by Discovery, I argued that AI UX requires new metrics such as time-to-correct, separation of user slips from system misinterpretations, and perceived agency.

The AlphaEvolve study suggests several additional metrics for discovery-oriented AI. One is time to first viable experiment: how quickly can a user move from vague intuition to a testable setup? Another is time to detect meaningless progress: how quickly can the user see that the AI is gaming the metric or wandering randomly? A third is cost per validated insight: not cost per output, because outputs are cheap, but cost per result that the human trusts. A fourth is branch recoverability: how easily can users return to an earlier promising path without reconstructing it from memory?

We should also measure intent convergence. At the beginning, the user’s intent is vague. After several iterations, it should become sharper. The interface succeeds when the user can say not only “I got a result,” but also “I now understand what I was trying to ask.” This is a profound shift in UX evaluation. The interface is not merely helping users execute intent. It is helping them form intent.

Design Rules for Future AI

The AlphaEvolve paper points to a broader design agenda, as we race to build Agentic AI systems capable of taking over complex, multi-step, autonomous workflows in fields like software engineering, financial modeling, legal research, pharmaceutical discovery, and generative architectural design.

These systems cannot be governed by a prompt box and a scrolling transcript. For serious human-AI collaboration, the interface must be built around the intentmaking–sensemaking loop, mediated by boundary objects and supported by diagnostics, provenance, and controllable constraints. The following design rules generalize from the AlphaEvolve study:

1. Transition from Assistant to “Scientific Instrument”

We must discard the metaphor of the AI as a human-like assistant. If a human assistant returned 30,000 variations of a project in 24 hours, they would be immediately fired for generating catastrophic noise. An AI is an instrument of immense computational scale. UX design for AI should mirror the dashboards of complex physical machinery or scientific instruments. They should offer macroscopic dials, readouts, evolutionary trajectories, and diagnostic monitors, empowering the user to step back and steer the system rather than attempt to converse with it. As the researchers note, using an AI to reduce the cost of trying out ideas is as important as using it to generate ideas.

Advanced AI users need instruments for exploration, not assistants for conversation.

An instrument exposes signals, requires calibration, and improves the expert’s ability to see. The microscope did not replace the biologist but changed what biologists could observe. Similarly, AlphaEvolve changes what mathematical search can reveal, provided that the interface helps the mathematician calibrate, inspect, and interpret the system.

AI should be considered an instrument for exploring the user’s possible goals, not an answer box that provides a single solution (except for simple problems).

This metaphor also avoids the false dream of zero-learning AI. The fantasy is that users will state a wish, and the AI will silently do the right thing. That will work for trivial tasks and is a viable usability goal in those cases. It will fail for meaningful work. Meaningful work requires judgment, and judgment requires visibility. If users never learn how a system behaves, they cannot steer it. In Intent by Discovery, I warned that zero-learning AI risks cognitive offloading and deskilling. Good AI UX should reveal enough structure to preserve human agency.

The DeepMind paper is important because it turns this philosophical claim into observed user behavior. Expert mathematicians did not want a magic answer box. They wanted to play with the system, test ideas, see whether progress was real, discover loopholes, and understand promising branches. One participant compared the experience to playing a computer game and exploring a world. Play and exploration are exactly the right metaphors. The future of AI UX is world exploration.

In complex domain spaces, an expert often learns more from seeing why a path failed than from seeing why a path yielded a temporary high score. To push this calibrated instrument further, systems must also expose a negative solution space. True intentmaking flourishes when an expert can easily audit what the AI rejected. By offering automated, high-level summaries of dead mutation paths, the UI transforms catastrophic programmatic noise into a clear map of boundary constraints, letting users cross-examine the AI’s hidden negative assumptions.

2. Prioritize and Elevate the Experimental Loop

Future AI tools must expect, and seamlessly support, heavy human iteration. Users should never feel penalized for getting a prompt wrong. UX must radically lower the barrier to entry by providing zero-cost sandboxes in the form of rapid testing environments where users can evaluate a tiny slice of the AI’s behavior before committing to a full-scale runs that last days or weeks.

Discovery-oriented work emerges from many small experiments before one large commitment.

For instance, in an AI-driven software architecture tool, users should be able to run a 10-second logic test of an AI’s proposed database schema before allowing the AI to spend an hour coding the backend infrastructure. The UI must encourage a fail-fast mentality for intentmaking, treating the initial AI-generated specification as a disposable starting point.

3. Design Explicit and Intuitive Boundary Objects

Every AI application domain must identify its ideal boundary objects and elevate them to the forefront of the UI. If an AI is generating music for a professional musician, the boundary object is not the underlying MIDI code; it is a visual waveform or interactive sheet music. (For an amateur song-creator like myself, simpler boundary objects would be needed.) If an AI is optimizing a global supply chain, the boundary object is an interactive geographic map, not a spreadsheet of logistical weight-distribution arrays.

Furthermore, AI-generated natural language summaries must be standardized as a connective tissue. Users should be able to hover over any complex machine output and instantly receive an English-language strategic rationale. The UX must dynamically translate the AI’s native language into the user’s native language.

AI systems should support multi-level sensemaking. Users need summaries, visualizations, raw artifacts, provenance, comparisons, and drill-down paths. No single representation is sufficient because users move between overview and detail as their understanding improves.

4. Build Robust Provenance and Versioning Control

One of the major points of friction identified by the mathematicians during the study was the immense cognitive overload of managing dozens of slightly modified experiments. “Telling my experiments apart is really difficult and I have to click into the experiment to remind myself what was different,” one user noted.

As AI allows users to rapidly spawn variations of ideas to dial in their intent, the UX must borrow heavily from Git-style version control. Visual provenance trees that show exactly where an experiment was forked, what parameters were changed, and how the outcomes diverged, will be a mandatory feature. Future AI operating systems must provide a highly visual “multiverse” view of a user’s intentmaking journey to prevent them from becoming lost in their own data. They need a better view, like an overview tree of modifications.

AI systems should replace linear chat history with the forkgraph. Branching, grouping, labeling, restoring, comparing, and locking should be fundamental UI operations. This will matter more with every increase in AI capability because more powerful AI creates more alternatives, and abundance without navigation becomes a swamp.

5. Enhance Diagnostics to Combat Reward Hacking

Agentic AI will inevitably pursue dead ends and engage in reward hacking in any domain, whether it is optimizing tax preparation or engineering a bridge. The UX must provide robust diagnostic tools to help users answer fundamental sensemaking questions at a glance.

To aid this, AI platforms must introduce “Critique Agents” as a standard feature: secondary, adversarial AI models whose sole job is to probe the user’s constraints and the primary AI’s logic to prevent costly hallucinations or logic hacks. Dashboards must include automated cross-check metrics that alert the user if an AI’s performance spike is statistically suspicious. Tools to debug metrics, such as multi-objective evaluation, must be prioritized.

6. Support Conceptual Inquiry Without Destructive Edits

Users need the ability to ask the AI conceptual questions about the ongoing experiment without altering the experiment itself. In the DeepMind study, users complained that asking the AI to update or explain one function sometimes “destroys the rest of the evaluation code.” Furthermore, the conceptual boundaries of the system were not always clear to first-time users. Future UX must clearly separate the inquiry channel (safe brainstorming and sensemaking) from the execution channel (modifying the boundary objects and underlying computational parameters).

The Larger Lesson

The DeepMind study was about mathematical discovery, but its implications are much broader. It shows that advanced AI changes the basic unit of interaction. The user no longer arrives with a fully formed intent and merely asks the system to execute it. Instead, the user arrives with a partial ambition, a hunch, or a problem space. Through repeated interaction with the AI, that ambition becomes sharper.

AI’s near-term contribution to science may be that it dramatically lowers the cost of trying ideas humans already have but cannot afford to test. That is a usability revolution. By reducing the cost of exploration, AI changes which thoughts are worth pursuing. The parallel with ecommerce is simple: when the interaction cost falls, behavior changes.

Lower barriers to use = more use. Basic human nature, whether we’re talking about mathematicians in the top 0.1% of the IQ curve or regular folks shopping for a new outfit.

This also explains why a math paper aligns so tightly with my work on creation by exploration and discovery. In that article, I argued that AI moves creation from building to describing and finally to discovering. Users will navigate latent solution spaces rather than specifying outcomes in advance, and the main UX problem will become helping people recognize, compare, and refine possibilities. AlphaEvolve shows the same transition in a rigorous domain. The mathematicians were not merely describing the desired solution. They were discovering both the solution and the proper formulation of the search.

The future of AI UX, therefore, is not better prompt boxes. Better prompting is only the bridge from the current era to the real one. The real interface is an exploration environment: a structured, navigable, branching space where the user’s intent becomes clearer through interaction. DeepMind calls this intentmaking. I have called it intent by discovery. The names converge because the phenomenon is real.

AI systems that understand this will win. They will not ask users to produce perfect prose from a blank screen. They will give users alternatives to react to, constraints to lock, branches to compare, progress to inspect, and loopholes to close. They will support the human not as a typist of prompts, but as an explorer of possibilities.

To unlock AI’s immense potential across all industries, from medicine to engineering to the creative arts, AI-UX designers must embrace the messy human reality of problem-solving. By abandoning conversational interfaces and instead building rich, dashboard-driven environments centered entirely on the Intentmaking and Sensemaking loop and anchored firmly by intuitive, domain-specific Boundary Objects, we can transform AI from an unpredictable digital assistant into a robust, world-altering instrument. The future of discovery is not fully automated; it is profoundly, iteratively, and beautifully collaborative.

The DeepMind study was about scientific intent discovery, but the same conclusion applies to any AI used in complex domains by advanced knowledge workers.

The most important output of advanced AI may not be the answer. It may be the user’s improved understanding of the question. That is the deeper meaning of intentmaking. In advanced AI, intent is not just an input to the interface. It is one of the interface’s most valuable outputs.

Intentmaking (shape what we’re doing) and sensemaking (understand the outcome): the two sides of AI for advanced knowledge workers.

Intentmaking, Sensemaking, and AI Boundary Objects: How Advanced AI Helps Users Discover What They Want

The User Research

Reward Hacking as a UX Problem

Boundary Objects: The Anchors of Shared Understanding

Visualizations as Ultimate Boundary Objects

AI Summaries and Natural Language Context

The Need for the Forkgraph

Look Locks for Science

New Metrics for AI UX

Design Rules for Future AI

1. Transition from Assistant to “Scientific Instrument”

2. Prioritize and Elevate the Experimental Loop

3. Design Explicit and Intuitive Boundary Objects

4. Build Robust Provenance and Versioning Control

5. Enhance Diagnostics to Combat Reward Hacking

6. Support Conceptual Inquiry Without Destructive Edits

The Larger Lesson

Recent Posts

Top Past Articles

A New AI: Creation as Exploration and Discovery

The 10 Usability Heuristics in Cartoons

4 Metaphors for Working with AI: Intern, Coworker, Teacher, Coach

Dark Design Patterns Catalog

Jakob’s Law of the Internet User Experience

Ideation Is Free: AI Exhibits Strong Creativity, But AI-Human Co-Creation Is Better

The 10 Usability Heuristics Reimagined

UX Needs a Sense of Urgency About AI

AI Is First New UI Paradigm in 60 Years