Estimating AI’s Rate of Progress with UX Design and Research Methods

Jakob Nielsen
2 minutes ago
6 min read

Summary: When can AI replace UX designers? We need longitudinal research to compare AI with human experts across multiple UX methods and track the improvement rates. Results could define an AI scaling law for UX skills, helping companies decide when to delegate what UX work to AI versus humans.

We need to discover if a scaling law exists for AI’s ability to perform UX and usability work. (Similar to the well-established existing AI scaling laws.) This includes facilitating and analyzing usability tests with human users, performing heuristic evaluations (like the present case study), or serving as “artificial users” in simulated testing.

My challenge to the HCI research community: do some useful research and provide estimates of how quickly AI improves its skills with various UX methods. (Seedream 4)

Right now (2025), my recommendation remains to involve human UX specialists at all stages of the UX design lifecycle and treat AI only as an accelerant. However, it is guaranteed that AI is getting better with each generation. For example, as AI is trained on larger sets of usability study findings, it will likely improve its ability to handle new design problems, just as human UX specialists do with experience.

Currently, the optimal approach is symbiotic. As AI improves, human practitioners must shift their focus toward higher-level strategic oversight, delegating tactical tasks to AI. Estimating the velocity of this shifting boundary is critical. (Seedream 4)

AI is an accelerant. Work with it, not against it. (Seedream 4)

Understanding this trajectory is not just an academic exercise; it’s a strategic and economic imperative. Organizations require empirical data to make informed decisions about workforce allocation, strategic investments, and the cost–benefit analysis of integrating AI tools versus human expertise. Without this data, companies risk either falling behind competitors by underutilizing AI or degrading product quality by over-delegating critical tasks before AI is ready. Furthermore, educational institutions need this data to evolve curricula, ensuring future practitioners are trained in the higher-level strategic thinking that remains uniquely human.

How fast does AI improve its skills with various UX design and research methods? Estimating the different rates of progress is essential for planning, but requires longitudinal studies with benchmarks that are withheld from the open Internet to keep them out of future AI training runs. (Seedream 4)

Somebody who starts on a legacy university undergraduate education in 2026 will graduate in 2030: the predicted year of superintelligence. While nobody can say for sure how this world will be or how human jobs will work, it’s guaranteed that most of the elements in current university curricula will be completely irrelevant.

Most of what today’s students learn will crumble into irrelevance in the world of superintelligence. (Seedream 4)

How much better will AI get at UX, and how fast? While data abounds on AI improvements in mathematics, image recognition, and complex code generation (and its ability to work independently for ever-longer stretches of time), we possess only scattered, anecdotal evidence regarding AI performance in the nuanced domains of UX design and user research.

AI scaling is exceptionally well documented: AI gets smarter with more compute, meaning that each new generation of AI models is more capable than the last. We don’t know how this general AI scaling translates into scaling AI’s skills specifically with UX design and user research. I am convinced that there will be some scaling, but how fast is uncertain, pending the research I call for in this article. My main prediction (see the conclusion) is that this scaling will exhibit the same kind of “jaggedness” as AI has in other fields and progress faster for some aspects of UX than for others. (Seedream 4)

Research Project Outline

One idea for a longitudinal research project, specifically targeting heuristic evaluation (as I discussed in Monday’s newsletter): Create a single user interface design as a clickable prototype that’s password-protected and not available on the open Internet (to avoid contaminating the training data for future AI models). Combine prototype access with a complete set of screenshots and the design specs to serve as the basis for heuristic evaluation. Have at least 10 human UX experts perform this heuristic evaluation now, to serve as a baseline. This baseline must account for the inherent variability among human experts (the “evaluator effect”) by establishing an aggregated gold standard of usability problems.

Also run extensive usability testing with the prototype, involving at least 20 users, to oversaturate the empirical evidence on the design's usability problems. Save all this data in another password-protected repository.

Any longitudinal study of AI capabilities must ensure that benchmark information does not leak into the training data for new AI models. This is a difficult challenge that must be considered from the start. (Seedream 4)

Crucially, the research must establish a rigorous rubric for scoring quality. AI performance shouldn’t be measured solely by the number of issues identified, but by the validity of the findings, the accuracy of severity prioritization (a hallmark of expertise), the actionability of the recommendations, and the rate of false positives (issues flagged by AI that do not impact actual users). Save all this data, including the scoring rubrics, in another password-protected repository.

Then, for the exciting data: Every year, run heuristic evaluations of the sample design with that year’s best AI models. Pay to use the highest-end models available to the public, even if that costs $200 or $500 per model. Track progress for several years (at least 3 years), and you can be the one to name the new law for AI scaling in usability skills.

(Ideally, if this project is run in a well-funded lab, design a range of prototype products, so that the data is not limited to a single case study, which might be misleading as an indicator of general UX capabilities.)

Who’s up for my challenge?

Predicted AI Scaling Rates

My guess as to how it will turn out, if somebody actually does this research:

Fast progress in AI’s ability to conduct user interviews, analyze surveys or customer feedback, or any UX methods that are purely language-based.
Fast-to-Medium progress in AI’s ability to facilitate usability study sessions, observe what users do, and identify usability problems and design directions based on this empirical behavior. (The users serve to debug the UI, and all the AI has to do is to understand what’s going on. It doesn’t have to predict anything.)
Medium progress in generative design synthesis: AI’s ability to move from identified usability problems to effective design solutions. While AI can generate many variations quickly, creating truly innovative solutions that account for technical constraints, user context, and strategic goals will likely lag behind its analytical capabilities.
Medium-to-Slow progress in AI’s ability to conduct heuristic evaluation and accurately assess nuances in existing usability insights, and use them to judge the severity of usability problems. Such work requires applying abstract usability principles to specific, often novel contexts, demanding judgment and an understanding of user intent rather than just pattern recognition.
Slow progress in AI’s ability to simulate human users in detailed interaction behaviors or domain-specific behaviors. But fast progress in AI’s ability to simulate general human behavior that’s mostly genetically determined by evolution, such as whether people will find one visual design more attractive than another. This distinction is likely due to the embodied cognition gap, where AI lacks the lived, physical experience of navigating the world, making subtle, context-dependent behaviors and motivations difficult to simulate accurately.

What looks good? That’s mainly determined by genetics and thus should be easy for AI to learn. (Seedream 4)

Artificial users that eliminate the need to recruit real customers for usability research will likely be the AI's most formidable challenge. (Seedream 4)

We can talk again in 2030 and see whether I was right. I will count 4 of 5 as a good score and expect to be wrong in one of my four estimates.

When will AI take the lead on various UX methods? I expect it will be slower on the more analytical, judgment-based aspects of UX and faster on practical iteration. (Seedream 4)

Many luminaries, from Yogi Berra to Niels Bohr, are said to have noted that predictions are especially difficult to make about the future. If my predictions are 80% right, I will be ecstatic. If I’m more than half right, that’ll still be good. (Seedream 4)

Estimating AI’s Rate of Progress with UX Design and Research Methods

Research Project Outline

Predicted AI Scaling Rates

Recent Posts

Top Past Articles

A New AI: Creation as Exploration and Discovery

The 10 Usability Heuristics in Cartoons

4 Metaphors for Working with AI: Intern, Coworker, Teacher, Coach

Dark Design Patterns Catalog

Jakob’s Law of the Internet User Experience

Ideation Is Free: AI Exhibits Strong Creativity, But AI-Human Co-Creation Is Better

The 10 Usability Heuristics Reimagined

UX Needs a Sense of Urgency About AI

AI Is First New UI Paradigm in 60 Years