top of page

UX Roundup: No More UI | Easy Quant | Personalized AI | State of AI: Growth but Risk | USA Gov Usability

  • Writer: Jakob Nielsen
    Jakob Nielsen
  • 8 hours ago
  • 14 min read
Summary: User interfaces for human interaction will be replaced by AI agents | Quantitative user research methods made easy | Use AI to produce personalized user assistance | Surveying today’s AI landscape: unprecedented growth, but few moats to defend against newcomers using the next AI advance | USA Federal Government starts hard push for website usability
ree

UX Roundup for August 25, 2025. (GPT Image-1)

 

No More User Interface

New video: No More UI (YouTube, 3 min.).


In the future, users will likely interact with their personal AI agents, but not with websites or traditional software applications. Of course, this doesn’t really mean the death of UI, because the agent will have an interface. However, the need for UI designers will be limited to a handful of agency vendors.


ree

User interfaces may soon become museum exhibits of curiosity to people who never encounter them in their everyday lives. (GPT Image-1)


The trend toward “no UI” doesn’t mean “no UX,” though. Remember the distinction between UI and UX. We will need to design the underlying experience that will be transmitted to users through the AI agents, but those design efforts will require a completely different skillset than what’s used in today’s legacy UX design process. (Better pivot your career now, if you want to stay employed after 2030!)


It is interesting to compare this new video with the video I made only 6 weeks ago with the same avatar about developing your capacity for human agency. Even though it’s the same avatar (designed with Midjourney), the animation is much more lively now. Still not perfect, of course, but AI video is accelerating immensely fast this year.


My AI stack for the new video: Suno 4.5+, Midjourney 7, HeyGen Avatar IV, Veo 3, Kling 2.1 Master, Hailou O2, Seedance 1 Pro, Gemini 2.5 Pro, and GPT 5 Thinking (the latter for the thumbnail design, so not represented in the video itself). 9 AI tools, each used for its strength.


I worked on this video project for about a month, as I was not satisfied with the initial results for some of the clips. I am still not satisfied with everything. For instance, the avatar’s outfit lacks consistency between the intro clip and the rest of the video, and the drone shots used to show the audience suffer from resolution degradation. Emerging AI models like Google’s Genie 3 are introducing better world models to address issues like morphing outfits, but they are not ready for production use yet.


However, even though I am not fully satisfied with this video, it was time to release it. Partly because “real artists ship,” as Steve Jobs is supposed to have said: Execution must triumph over perfectionism. But mostly because AI video is improving so fast that the clips I made a month ago would soon be showing their age, requiring me to start all over on the project if I wanted to delay release by another month.


ree

Ship now: only published work counts! Don’t worry about imperfections: AI will be better next month. (Old saying: the current AI is the worst you’ll have for the rest of your life.) Funny enough, the left comic strip proves the point: the user loses her glasses between the top and bottom panels. (GPT Image-1: Image-2 will hopefully not commit that continuity error with the eyeglasses.)

Watch my new video: No More UI (YouTube, 3 min.).


Quant Is Easy

I’m famously (or maybe infamously) a qual-bigot: possibly the world’s leading champion of fast and cheap user testing with a handful of users for each design iteration you want usability feedback on. I’ve always said that companies should devote no more than 10% of their user research budgets to quantitative research — and that only if they are at a high level of UX maturity. When starting in UX, stick to Qual!


However, there are those 10% to consider. The user research profession tends to attract people who are terrible at math, and thus, many UXR staff are intimidated by the prospect of running their first Quant research study.


To the rescue: doctors Jeff Sauro and Jim Lewis, who are the world’s leading gurus for Quant user research and run the MeasuringU consultancy. They recently published a helpful article titled “How to Get Comfortable with Quantitative UX Research,” to help people get started. I really like that they demystify this otherwise intimidating topic.


Debates over whether numbers belong in UX, which metric to use, what statistical test applies, how many scale points are needed, or what the “correct” sample size is create more heat than light. This noise drives avoidance. Math can feel intimidating, but UX research is applied and full of flexibility regarding when to quantify and how to interpret numbers.


Deciding whether to use quant or qual starts with the main emphasis of your research question (your statement of what you want to learn). Words that indicate a preference for quant methods include terms like “preference,” “which,” “option,” “difference,” “time,” “completion,” “satisfaction,” “recommendation,” “relationship,” “correlation,” “cause,” “key driver,” “attitude,” “brand,” and “rate.” Qualitative questions tend to include terms such as “why,” “what,” “describe,” “reasons,” “explain,” “explore,” and “how.”


Quantitative gives you a number (which can be compared with previous numbers or your competition), and qualitative tells you why your design scored that way.


ree

A single metric is useless on its own. Say you score 30% is that good or bad? However, if you have a comparison metric, now you’re in business. If last month’s score was 40%, then you have improved by a decent amount, assuming you measured something bad, such as shopping cart abandonment. Continue to measure over a longer time, and you can spot trends. (GPT Image-1)


MeasuringU has catalogued over 70 UX metrics, but you don’t need to worry about most of them for your first many quant studies. Instead, Sauro and Lewis recommend focusing on the following 4 common, versatile metrics:


  • Task Completion Rate: Often called the success rate, this is a fundamental “gateway” metric that measures a design’s effectiveness. It’s a binary, pass/fail measurement indicating whether a user was able to successfully complete a given task. Because of its straightforward nature, it’s one of the most widely used and easiest metrics to capture. Its gateway status means that if users cannot complete the core tasks a product is designed for, other metrics like efficiency or satisfaction become secondary. The rate is calculated as a percentage: Completion Rate = Number of tasks completed successfully / Total number of tasks attempted A low completion rate is a clear signal of significant usability problems that prevent users from achieving their goals.

  • Time on Task is the primary metric for measuring user efficiency. It records the duration, typically in minutes and seconds, that a user takes to complete a specific task. While popular and easy to measure, analyzing this data requires careful consideration. This number often skews high because of a few particularly slow participants. To mitigate this, it’s often better to report the median time or use a geometric mean instead of the arithmetic mean. A major limitation is the lack of universal benchmarks; because different tasks have inherently different complexities, comparing times across unrelated products is not meaningful. Its true value lies in comparing the efficiency of different designs for the same task.

  • Single Ease Question (SEQ): a 7‑point subjective satisfaction questionnaire used to measure a user’s perceived difficulty of a task. Immediately after a user attempts a task (whether they succeed or fail), they are asked: “Overall, how difficult or easy did you find this task?” The response options range from 1 (Very Difficult) to 7 (Very Easy). The SEQ is a powerful measure of user satisfaction because it is highly sensitive and effectively discriminates between poor and excellent user experiences. A key advantage of the SEQ is the existence of robust, publicly available benchmarks. An average score above 5.5 would place a design in the top half of all user interfaces, providing a clear target for design teams and allowing them to compare their interface’s performance against a large dataset of other experiences.


ree

The more questions you cut from a survey, the better: fewer questions translate into a higher response rate, which means stronger validity. The ultimate in cutting is to reduce the survey to a single question, as done with SEQ. (GPT Image-1)


ree

The mathematical midpoint of a 1–7 scale is 4. Why then does it take a score of 5.5 to be in the top half of SEQ? Because users tend to be polite and often don’t realize if they are using the design in a suboptimal manner (not knowing how the design is intended to be used). In this example, the fox thinks, “Yes, my tail is on fire, but that’s better than having all my fur on fire,” so it gives the Forest Service a decent rating for controlling forest fires.


  • UX‑Lite: a compact and predictive subjective questionnaire that assesses both the usability and usefulness of a product. It consists of two simple items that are presented to the user after they have interacted with the product and scored on a 1–5 scale for agreement:

    • Product} is easy to use (Measures usability)

    • {Product}’s features meet my needs (Measures usefulness)


By combining these two core components of user experience, UX-Lite provides a more holistic satisfaction score than a metric focused solely on ease of use. Its brevity makes it extremely easy to administer in surveys or at the end of a usability test without causing user fatigue. Despite its simplicity, the resulting score is highly predictive of overall user experience perceptions and satisfaction.


Together, these 4 metrics answer many common UX questions and build confidence in quant studies for new researchers.


ree

Four simple UX metrics. Begin with these when you first start dabbling in quantitative user research. (GPT I-1)


MeasuringU has extensive resources on its website to help you learn more about these (and more) quantitative methods, as well as well-regarded courses. In particular, their “UX Measurement Bootcamp 2025” starts September 9 and runs for 4 weeks with 2 full days of classes each week (8 days total). If you want to really go deep on metrics, as opposed to taking the easy path I just recommended, this bootcamp is the way to go.


The Meeting Hack: Personalized Jargon Help

“ParseJargon,” an AI-powered meeting helper, identifies domain jargon and explains it personalized to each participant’s background. A controlled experiment and field study by Yifan Song and colleagues from the University of Illinois and Fujitsu Research of America show higher comprehension, engagement, and appreciation of colleagues’ work vs. no support. Even so, a generic, non-personalized explainer hurt engagement.


If you’ve watched cross-functional teams stall because nobody wants to ask what “CTR uplift at p95” means, you know jargon is a usability bug. In a diary study of meetings, study participants identified 123 cases of unfamiliar jargon, but they only interrupted the speaker to ask for an explanation in 5 cases (4%).


The team behind ParseJargon designed the obvious fix (an AI that explains jargon when it identifies it when listening in on a meeting) and then did the non-obvious thing: personalize it. The results matter: personalized help improved comprehension and meeting experience the most; generic help was less helpful. Translation: contextless pop-ups are noise.


Self-rated comprehension on a 1-5 scale was as follows:

  • No AI help with jargon in meeting: 3.1

  • General (i.e., not personalized) AI help with jargon: 3.8

  • Personalized AI help: 4.4

These differences between these three means were all statistically significant at p<0.05.


Design recommendations:


  • User‑model first. The same definition should be shorter for a marketer and denser for a data scientist. Store role, past queries, and topical familiarity to tune explanations.

  • Right time, right place. Inline, tappable highlights beat walls of text. Don’t flood the transcript; make ambiguity affordances lightweight and voluntary.

  • “Explain this for me” control. Let people flag recurring terms for auto-explain and pin a one-line gloss. The best assistant feels like a personal memory prosthesis, not a lecturer.

  • Beware performative AI. The control condition here is the cautionary tale: unpersonalized helpers can degrade the experience, likely by interrupting flow and stating the obvious. If you can’t tailor, don’t ship.


This is classic usability: remove friction at the exact moment it hurts, for the exact person it hurts. If your enterprise agent already “translates” documents, add role-aware brevity and measure speaking turn balance and clarification requests as success metrics. Good AI assistive UX doesn’t just answer questions; it prevents them proactively.


ree

Since AI can generate new information for each user, you can personalize assistance to each user’s needs. In this case study, explain those jargon terms that a given user likely won’t know, and optimize the explanation to that user’s background, for example, in terms of domain expertise, education, and IQ. (GPT Image-1)


Luke Live

A live one-hour talk by Luke Wroblewski, speaking on how to rethink website interactions using AI and large language models. Thursday, September 4, at 9:00 AM USA Pacific time (check corresponding time in your time zone). Free, but advance registration required.


ree

Highly recommended one-hour online talk next week. (GPT Image-1)


AI Today: Faster than Hyperscaling

Great video published 3 days ago: the state of AI as of August 2025 (YouTube, 23 min.), by leading partners at A16Z (Silicon Valley’s leading VC firm).


My summary is that AI is growing much faster than previous technologies, because the value proposition for users is often so immensely compelling that new AI firms have no problems attracting customers. Many newcomers with an interesting new AI model or application grow to $100M ARR (Annual Recurring Revenue) much faster than was the case for even hyperscaling startups in the past. At the same time, AI technology improves so fast that the next startup can wipe out last year’s darlings unless they take advantage of their early lead to build a “most” (something that’s hard to do fast for newcomers).


ree

The “moat” is VC language for differentiating factors that are hard for competing firms to overcome. This is an analogy with moats that have helped defend castles since ancient times: the wider the moat, the more difficult it is for attackers to cross. (Seedream)


The moat often requires a deep understanding of customer use cases and the design of domain-specific features. Traditional UX methods like task analysis and field studies will likely be helpful here.


ree

AI-Native companies are growing much faster than the hyperscalers of the SaaS era that were so impressive a few years ago. However, the certainty of even better AI next year means that the next AI startup will overtake the current winners unless they start digging a moat with traditional design tools. (GPT Image-1)


More detailed takeaways:


  • AI Market: Unprecedented Growth with Paradoxical Wipeout Potential: The current AI market presents a paradoxical landscape: unprecedented growth paired with significant risk. AI companies are expanding faster and achieving larger scale than even seasoned investors anticipated, with value accruing across all layers of the stack: models, infrastructure, and applications. This rapid value creation, occurring in shorter timeframes, is, however, accompanied by an equally swift 'wipeout potential', meaning that more companies can fail in shorter timeframes than previously observed. Consequently, the stakes for investment and participation are considerably higher, demanding smarter, more selective strategies than ever before.


ree

The one certainty when thinking about today’s AI is that it will quickly become obsolete due to vastly better models next year (or in a few months: Grok 5 starts training in September, and we’re expecting Gemini 3 shortly). Apologies to Rodin. (GTP Image-1)


  • “GPT Wrapper” Is Not a Derogatory Term; It’s Legitimate Software Development: Initially, the term 'GPT wrapper' was employed derogatorily, implying a lack of innovation or inherent value in applications merely built atop foundational models. However, this perspective is now categorically dismissed. The analogy is drawn to cloud computing: one does not label software leveraging cloud infrastructure as a 'cloud wrapper'. The core assertion is that foundational models serve as an evolution in infrastructure, and substantial value is created by building sophisticated software on top of them, including deep integrations, complex workflows, or even specialized custom models. This constitutes a newsworthy recalibration of market terminology and valuation. Effective application design, not merely model access, is the key differentiator.

  • AI Market Fragmentation Contradicts Early Aggregation Beliefs: Early market speculation posited a winner-take-all scenario where foundational models, particularly those from OpenAI, would aggregate all value. Current data, however, reveals the opposite trend: significant market fragmentation. Despite OpenAI's pioneering efforts in areas like code (CoPilot), image generation (DALL-E), and video (Sora), leadership in these domains was quickly ceded to specialized competitors like Cursor and Midjourney. While OpenAI retains immense value in text, the market demonstrates that what were initially considered niche or minor applications are evolving into substantially large markets capable of sustaining multiple, highly valuable companies. This is newsworthy as it fundamentally reshapes competitive strategy. It invalidates simplistic aggregation theories, mandating that participants recognize AI not as a monolithic entity, but as a collection of diverse subspaces, each requiring a distinct, tailored approach to secure market leadership and generate value.

  • AI-Native Companies Outpace SaaS Counterparts Due to Innovator’s Dilemma: AI-native companies are demonstrating superior growth trajectories compared to their traditional SaaS counterparts, significantly reducing the time to achieve $100 million in Annual Recurring Revenue. This accelerated performance stems from several factors: AI solutions often deliver compelling 10x+ improvements in customer experience and ROI, a marked contrast to the incremental 25–50% gains typically seen with SaaS 2.0. Furthermore, AI-native firms are beginning to displace services budgets, not just software allocations. Crucially, established SaaS companies are grappling with the classic innovator’s dilemma: their existing revenue streams and resource commitments impede agile adaptation to new AI paradigms. This phenomenon signals a profound competitive reordering. It suggests that incumbency, with its inherent inertia, may be a liability in this rapidly evolving landscape, favoring the agility and focused vision of AI-native challengers who are building products from the ground up, optimized for AI capabilities.


ree

The “Innovator’s Dilemma,” articulated by Clayton Christensen, describes the paradox wherein successful companies are overtaken by disruptive technologies that initially serve niche or low-end markets. The dilemma arises because the prudent choices for an established provider (such as sustaining incremental improvements for loyal clients and maximizing profitability) are, at the same time, the very decisions that prevent these firms from embracing disruptive innovations. (GPT Image-1)


  • AI Lacks Inherent Defensibility; Traditional Software Moats Are Essential: The inherent 'magic' of AI models effectively solves the bootstrap problem, rapidly attracting initial users. However, a critical observation is that this technological novelty does not, in itself, confer an 'endemic moat' or inherent defensibility in the AI stack that guarantees customer retention. To establish a lasting competitive advantage, AI companies must ultimately revert to building traditional software moats: examples include two-sided marketplaces, deep long-tail integrations, or specialized workflow lock-ins. Interestingly, in this nascent market, brand effects (where market recognition drives adoption over objectively superior alternatives) are re-emerging, akin to the early internet era. This analysis dispels a common misconception that AI technology automatically generates defensibility. It underscores the enduring relevance of established business strategy and traditional software principles for long-term success, despite the revolutionary nature of the underlying AI infrastructure.


USA Chief Design Officer

The United States Federal Government is launching a significant, strategically backed effort to improve the design and usability of its websites substantially. This is backed at the highest level to the extent that the new effort is detailed in an Executive Order by the President of the United States.


Of course, I realize that the President didn’t write the document himself and may not have been the one to insist on the prominent use of the word “usability” in the Executive Order, but it’s still important that he is on the record for ordering government agencies to prioritize usability and design. The very first paragraph of the Executive Order says that, “With a sprawling ecosystem of digital services offered to Americans, the Government has lagged behind in usability and aesthetics.  There is a high financial cost to maintaining legacy systems, to say nothing of the cost in time lost by the American public trying to navigate them.”


I couldn’t have said it better myself, though I might not have placed aesthetics as a parallel priority at the same level as usability. Anyway, having usability named as one of the two top priorities is a huge win.


ree

Me, when I saw that the Federal National Design Studio order emphasized usability. (GPT Image-1)


The Federal government will now have a new role, that of Chief Design Office. Joe Gebbia has been appointed to this position. Gebbia was cofounder of Airbnb and its chief designer for many years. Whether or not you think that the Airbnb website has perfect usability, I am very happy to see the CDO position filled with somebody with a background in creating a website that has only survived by making it easy to do business. Much better than promoting an insider, given that the government websites are mainly usability failures. (The UK government has done better, so we know it’s possible to design good government websites.)


I particularly like that the new design and usability initiative is explicitly timeboxed, with the new agency only living for 3 years. (Don’t cry for Gebbia when his new job evaporates in 2028: he’s probably rich enough from Airbnb not to depend on a government salary.) Two reasons this is good: First, a short timeframe (as government projects go) will light a fire of urgency under the new National Design Studio. Second, it's likely that starting around 2030, there will be few traditional user interfaces left, and no more need for web designers. People will be using AI agents instead of web browsers.


ree

Hopefully, the new Federal Chief Design Officer will erase most of the crud built up over the last three decades of bad government websites that make them so hard to navigate. Paradoxically, the fact that the office expires in 2028 may increase the likelihood of success. (GPT Image-1)


Thus, a traditional web usability initiative (like those I spent decades promoting) only has a few more years left to make a positive impact. (But oh, what good it will do! Especially when redesigning websites used by large and broad audiences.) It is good to sunset a traditional design studio before it becomes obsolete, thereby freeing up resources for a new (but very different) initiative focused on improving citizenship services after 2028.


ree

I like the fact that the new government usability initiative only has 3 years to run. After that, a very different approach will likely be needed. (GPT Image-1)

 

Top Past Articles
bottom of page