The Usability Scaling Law: Death of User Testing?

Jakob Nielsen
17 hours ago
13 min read

Summary: The AI scaling law says that more training data creates smarter AI. Similarly, training AI on large volumes of user research data may scale up its ability to predict usability problems and create better design in the first place. This could change the balance between predictive usability and observational usability, leading to less need for empirical studies in the future.

I predict a future where AI, trained on vast amounts of usability data, will dramatically enhance our ability to predict user interface quality, potentially surpassing traditional empirical methods in many common design scenarios. This Usability Scaling Law suggests that as we systematically accumulate and apply more usability knowledge through AI, its predictive power for UI design will grow exponentially, leading to a new era of highly optimized user experiences. However, empirical observation will likely retain a vital, albeit smaller, role.

Despite the title of this article, I don’t predict the death of user testing. Just that it will decline in importance and be partly replaced by AI-driven usability predictions. Read on to find out why. (ChatGPT)

If you’re a regular reader, you know I am usually fairly assured in my pronouncements. This article is different in that I honestly don’t know whether the usability scaling law I’m predicting will truly take form. However, speculation is still useful, so let’s plunge right in.

The potential usability scaling law is parallel to the well-documented AI scaling laws that state that “when you spend more gold, AI gets smart and bold,” to quote my song about AI scaling. Or, more prosaically, as we add more compute, training data, and reasoning time, AI gets smarter and smarter, with no end in sight. Quantity has a quality all its own when it comes to training these complex models. Similarly, as we add more usability knowledge to an AI system — not just a handful of heuristics, but comprehensive data from countless user interactions — it may become significantly better at predicting which user interfaces will be best, potentially even designing them with a high degree of inherent usability.

In usability, we have a tension between predicting and observing what designs work the best. All the many empirical user research methods, from qualitative user testing with a handful of users to large-scale statistical analytics of website traffic, are based on observation. We design a UI, whether as a rough prototype or a polished shipping product, and throw it in front of a bunch of users to see what happens: where they click, where they stumble, what they complain about, and, hopefully, what they accomplish with ease. These methods are rich in context but can be time-consuming and expensive.

Conversely, usability prediction methods range from heuristic evaluation to extensive UX design guidelines and pattern libraries. With prediction methods, we reuse knowledge gained from past empirical user research and analyze how well a proposed design complies with those accumulated usability insights. The goal is to anticipate problems before users ever encounter them.

In the past, usability prediction methods were relatively weak, though still valuable. For example, my own 10 usability heuristics help human UX experts find usability problems in designs, but they are far from perfect. I have always recommended tag-teaming heuristic evaluation with user testing, alternating the two methods as we refine a UI through many rounds of iterative design. The same is true when AI systems perform heuristic evaluations: they are also not perfect and should currently be complemented by a follow-up round of user testing to catch the more nuanced or context-specific issues.

User interface design and human behavior are both multidimensional along so many dimensions that it was never feasible to capture all the knowledge needed for good interaction design in a list of 10 principles, even though I did develop the current list based on a factor analysis to capture the maximum explanatory power possible for such a small set of insights. The complexity is immense; think of all the variables: user goals, prior experience, cognitive load, cultural context, device characteristics, and the sheer variety of tasks and information domains.

User behavior — and thus UI design for humans — is a multifaceted puzzle. (ChatGPT)

Usability Scaling Until Now

The first step in usability scaling was to extend the codified set of recommendations from a limited (but manageable) list of 10 heuristics to thousands of more detailed and specific UX design guidelines and design pattern libraries. For example, the Baymard Institute has published more than 700 documented UX guidelines specifically for e-commerce websites, each a distillation of extensive research. Design pattern libraries get even bigger, with Mobbin supposedly documenting over 10,000 UX patterns across a quarter million user flows.

Cumulatively, across all the publishers of usability guidelines and UX design pattern libraries, we may be a little short of 100,000 knowledge items about usability. If we take the AI scaling laws as a role model, increasing usability knowledge by a factor of 10,000x (relative to the 10 heuristics) should have pushed us up the scaling curve by two generations, since each generation of AI scaling required 100x more compute and training data.

If we equate the 10 heuristics with GPT-1 as an analogy, the current cumulative mass of usability guidelines and design pattern libraries should equate to GPT-3. While this AI model was not nearly as good as today’s models, it showed early promise and proved that AI scaling was indeed creating more and more capable intelligence. It was a clear signal that the approach was valid.

This is where the analogy between AI scaling and usability scaling breaks. I do think the world has scaled its usability knowledge by that factor of 10,000x since I did my early work. But contrary to GPT-3 (to be released as ChatGPT when OpenAI upgraded to the GPT 3.5 dot-release in late 2022), there is no “Usability-3” product with all the world’s usability knowledge in one AI product. Our usability knowledge is scattered across many guidelines publications, design libraries, internal company wikis, and research papers. Even worse, most of the truly detailed, context-rich usability knowledge is only found inside the skulls of experienced UX designers and user researchers who (hopefully) remember what they did in past projects, what worked, what didn’t, and why. This experience is rarely documented and published with the rigor and structure necessary for broad reuse.

We already have a veritable mountain of UX knowledge, but it’s scattered and not fully applied to design projects. No UX professional can read all the books published in the field, let alone consume all the other data sources. AI has no such limitations. (ChatGPT)

There is probably at least 100x more tacit usability knowledge trapped inside experienced UX staff’s heads than has ever been published. Think of all the A/B test results that never see the light of day outside a specific company, or the subtle interaction details tweaked based on observing just a few users, lessons that become ingrained intuition for that designer but aren’t formalized. Of course, there is much overlap between the experience of people who have worked on different projects, which is why I only estimate 100x for the trapped knowledge, as opposed to the 10,000x that would probably result if we could assign all the world’s UX experts to spend the next half year writing down an immense data dump of everything they could remember from past projects. Such an undertaking would be monumental, but illustrative of the scale of uncodified experience.

And when we account for the fallibility of human memory and the unfortunate tendency for knowledge to be lost when employees change jobs or projects are sunsetted, the total amount of usability knowledge that has ever been generated in the 60-year history of the field might sum to 1 million times the amount that has been published. Sadly, most of that has been forgotten, dooming current UX projects to rediscover almost everything in doing user research: to discover insights that should have been known (because they were known at some point in the past by some UX specialist in another company, or even their own).

Future Generations of Usability Scaling

Now, to my prediction of a glorious future for the usability scaling law: What if, instead of forgetting all the usability knowledge that is generated by incessant empirical user research, it were all accumulated, structured, and used as AI training data? Imagine every usability study, every A/B test result, every field study observation contributing to a growing, learning system.

If we go by my estimate of 1Mx more usability knowledge than currently documented, that would correspond to an extra 3 generations of usability-prediction capability (each generation requiring a 100x improvement in data, roughly speaking), relative to the current state, if we can get all this knowledge into AI. This leap could fundamentally change how we approach user interface design.

This article is not about current AI’s feeble usability prediction capabilities, but rather what we might get in 10 years, after another 3 generations of scaling up. (ChatGPT)

10 years from now, it is feasible that AI systems could analyze user interface designs (whether proposed or shipping) with greater accuracy than what we currently get from the combination of user testing, website analytics, and all the other empirical user research methods.

2025 (Current State): Usability observation still beats usability prediction. Prediction methods, including early AI-powered tools, help us save some research resources and get better prototypes ready for testing, but a design project that relies solely on usability prediction will not achieve a superior user experience. We still need user testing and other empirical methods to uncover the real issues. My analysis from last year holds true for now. While AI tools for UX design are emerging, they are aids, not replacements, for deep empirical insight.

2035: Usability prediction will beat usability observation for a wide range of common design tasks. User research will only be done on rare projects that break truly new ground, for example, by implementing new interaction paradigms or technologies (think direct brain interfaces or full sensory immersion). Most everyday design projects for websites, standard mobile apps, and enterprise software will achieve a better user experience by relying on AI analysis of design ideas than by spending extensive resources on traditional, broad-based user research for every iterative cycle. AI predictions will be faster, cheaper, and for many mainstream applications more accurate than user testing because they will be based on an unimaginably vast dataset of prior human-computer interactions.

Over the next decade, the balance will likely change from usability decisions being mainly based on empirical observations to being mostly done by AI-driven prediction. This doesn’t mean UX designers become obsolete; it means their role shifts to leveraging these powerful new tools, focusing on more strategic challenges and human agency. (ChatGPT)

Obviously, history won’t stop in 2035. Usability prediction can scale further. One reason to continue to perform some amount of user research will be to generate even more training data to ride the usability scaling law for additional generations of improvements. This ongoing data refresh will be crucial for keeping the AI’s knowledge current with evolving user expectations and technological landscapes, leading to the ability for AI to design user interfaces at a glorious usability level we can’t even imagine today, stuck as we are with often mediocre user experience, even in widely used products.

Generating Usability Training Data

The usability scaling law is currently stuck at a low level, bounded by individual humans’ ability to read, remember, and synthesize usability guidelines and other knowledge. These limits are set by time (only so many hours to read and study if you also have to do productive design or research work) and memory (who can remember the exact details, and the crucial context, of something they read a decade ago, or a user study they observed five years past?).

Collectively, we know a lot, but individually, we don’t. And products are made by individuals, or at most by a small group of UX professionals working on any given design team. Not by the world’s 3 million UX pros pooling their knowledge.

However, AI systems are already known to work well based on billions of parameters. AI could soak up all the world’s UX knowledge, both explicit (published guidelines) and implicit (derived from raw research data), and bring it to bear on the design of any individual dialog box you care to improve.

Future AI usability prediction systems will drink from a virtual firehose of usability data produced from a wide range of methods and sources. In contrast, human UX professionals build their knowledge by metaphorically sipping from private cups of tea. (ChatGPT)

Given that almost all of this potential training data exists as tacit knowledge or in proprietary, unstructured research recordings, how do we get these usability lessons into AI?

My assumption is that this will work the same way AI has learned to drive cars, which was done by Waymo, Tesla, BYD, XPeng, and other automakers: self-driving cars come from AI training on millions of hours of video recording of cars driving through all kinds of environments.

Similarly, we can build up AI’s usability knowledge base by having it watch recordings of old-school usability testing sessions and other user research studies. The big question is how many hours of study recordings will be needed? For human UX professionals, the answer seems to be in the low hundreds. Anybody with a decent talent who has carefully observed and analyzed 200–300 usability test sessions will be very well-versed in user behavior and usability principles. They develop a “nose” for problems.

I expect AI will need vastly more raw data than humans to learn, if we go by the analogy of AI training by reading the entire Internet versus humans gaining an education by reading a few thousand well-chosen books between kindergarten and graduate school. AI often compensates for a lack of innate understanding with sheer volume of data.

AI training can be expedited and made more effective by processing annotated recordings instead of just raw screen and audio. These annotations would need to be rich and structured:

Identifying specific UI elements users interact with.
Timestamping user hesitations, errors, or expressions of confusion/frustration.
Noting successful task completions and failures.
Linking observed behaviors to specific usability heuristics or principles being violated or upheld.
Capturing user quotes or think-aloud utterances related to specific interactions.
Defining task scenarios and user goals.

It could well become a decent job for many UX professionals in the coming years to watch user research videos and meticulously annotate them with the observed user behaviors and the deduced usability problems in the designs that were being tested. This data enrichment process will be crucial for building high-quality training sets.

The first rounds of usability training data for the prediction AIs will likely require substantial human annotation for better reinforcement learning. Later, AI may be able to learn from raw data directly. (ChatGPT)

Who will generate the needed AI training data? This is a critical bottleneck. Currently, companies that host and facilitate remote user testing, like UserTesting, or platforms for managing research data, like Dovetail, are well-positioned to dominate this future because of the likely need for at least 10,000 hours of richly annotated user-research recordings to get started with a truly powerful system.

I believe 100,000 hours is a more likely estimate for a training set that could yield transformative results. This is not a data set any individual UX team can generate or afford to annotate, which is why the centralized services, or large tech companies with vast internal research operations, are in a position to step up and help the world’s users through better future usability predictions. Of course, questions of data ownership, privacy, and competitive advantage will loom large.

The budget to develop a great AI system for usability predictions could easily stretch into the hundreds of millions of dollars. The ROI on this investment will still be high since the AI can substitute for more than a million UX professionals, leading to annual savings in the billions. (Given my prediction for the growth in UX design, it’s likely that AI will substitute for 10 M human UX staff by 2045, generating close to $1T in savings.)

AI usability predictions will substitute for millions of human UX workers. These savings will fund the expensive AI training budgets many times over. (ChatGPT)

This substitution won’t create unemployment in the UX field, because the UX staff whom the AI usability prediction models augment will become much more productive than the current UX staff. They will also create much better products. When something becomes simultaneously better and cheaper, more will be bought.

Shifting the balance to having most UX work done by AI will not create unemployment, because many more UX projects will be done. (ChatGPT)

Will There Be Any User Research in the Future?

Even though I am not certain, I believe it’s likely that usability prediction will become much stronger in the future, as AI trains on more usability data. Will it become so strong that usability observation will vanish, except for a small effort to update the training data for new technologies and design patterns?

I think it’s unlikely that the need for user research will drop to zero, even in 10 years. Usability prediction will be powerful, possibly very powerful. But even the most knowledgeable AI will probably be unable to predict all user needs and pain points in highly domain-specific use cases, or for entirely novel types of interfaces or user populations for which little prior data exists.

Simple UI design problems will be handled by AI usability prediction systems first, leaving the more complex and domain-dependent design problems for future decades. (ChatGPT)

More likely, it will still be necessary to perform usability observation at actual customer locations (contextual inquiry, field studies) to see how people perform their work in their true environment, how complex socio-technical systems interact, and how critical, often unstated, needs and the inevitable exceptions that always arise are handled. AI might be able to predict usability for a standard e-commerce checkout flow with stunning accuracy, but designing a control system for a novel scientific instrument or a collaborative tool for a highly specialized profession will likely still demand direct human-led investigation.

Rather than a complete replacement, what will happen is a rebalancing of the ratio between usability prediction and observation. Currently, human-based prediction methods like heuristic evaluation and UX design guidelines can bring a design project maybe a third of the way to high-quality UX. Observation methods like field studies and user testing are required to uplift the design quality by the remaining two-thirds.

The balance between usability prediction generated by AI and usability observations by human researchers will flip over the new few years. (ChatGPT)

By 2035: At the minimum, this ratio will flip, with two-thirds (or possibly 80%) of UX design quality for many applications derived from AI predictions and only the remaining one-third (or 20%) needed to be filled in through focused observation methods. These observational studies will become more targeted, aiming to answer specific questions that AI cannot, or to explore truly uncharted territory.
By 2045: Looking further into the future, perhaps 20 years from now, I think that the ratio flip will become more drastic. By then, superintelligent AI, built with an accumulation of ever-richer usability training data, will generate sufficiently insightful usability predictions to account for maybe 90% of UX design quality in many established domains. This leaves only perhaps 10% to be added through observation methods.

No matter how far into the future we look, at least within the 40-year career planning horizon of today’s new UX staff, I doubt usability observation will drop below 10%. Understanding emerging behaviors, new technologies, and unique contexts will always be needed. Furthermore, the definition of “good usability” itself evolves, and AI will need fresh observational data to keep pace with these changing expectations.

Our expected future AI usability prediction systems will always require more training data to become even better and to stay in touch with changing technologies and user behavior patterns. (ChatGPT)

When we remember that immensely more software (and other technology, including AI interfaces themselves) design projects will happen in the future as technology becomes even more pervasive, the absolute number of empirical usability observation studies to be performed may not drop much below half of the current number, even if they constitute a smaller percentage of the overall UX effort. The nature of UX work will transform: less routine testing of common interactions, more strategic investigation of the unknown, and more collaboration with AI design partners.

As we scale the contribution of AI usability predictions to UX design, we won’t just get more of the same. Scaling by orders of magnitude will result in drastic changes to workflows and the entire UX process. (ChatGPT)

This potential Usability Scaling Law offers a path to a future where high usability is not a luxury but a baseline expectation, built upon a systematic, AI-driven understanding of human–computer interaction. It’s a speculative vision, yes, but one with profound implications for how we design the digital world.

AI will do all the UX grunt work. Human UX staff will still be needed for strategy and agency, and to interface with the other humans in the organization. (ChatGPT)

The Usability Scaling Law: Death of User Testing?

Usability Scaling Until Now

Future Generations of Usability Scaling

Generating Usability Training Data

Will There Be Any User Research in the Future?

Recent Posts

Top Past Articles

Design Leaders Should Go “Founder Mode”

4 Metaphors for Working with AI: Intern, Coworker, Teacher, Coach

Dark Design Patterns Catalog

UX Angst of 2023-24

Jakob’s Law of the Internet User Experience

Ideation Is Free: AI Exhibits Strong Creativity, But AI-Human Co-Creation Is Better

The 10 Usability Heuristics Reimagined

UX Needs a Sense of Urgency About AI

AI Is First New UI Paradigm in 60 Years