UX Roundup: Scope Creep | Demis Hassabis on AI | Interpreting Probability Terms | Why People Use AI & Internet | Google AI

Jakob Nielsen
May 25
15 min read

Summary: Video about scope creep | Demis Hassabis on the present and future of AI | Humans and AI have different interpretations of words to denote probabilities | Very different reasons why people use AI versus the Internet | Google’s disappointing AI announcements

UX Roundup for May 25, 2026 (GPT-Images-2)

Scope Creep Video

I released a new short video about scope creep (YouTube, 2 min): the tendency to keep adding requests to a design project that spoils any clean interaction architecture. Featuritis is a usability disease!

Scope creep, the video. (GPT-Images-2)

Featuritis is a usability disease. Scope creep is common in product design, but adding more features often degrades the user experience. Less is More! The scope-creep scenario in my video has a happy ending, but sadly, most companies end up with bloated software instead.

I made this video with two different animation styles, which are mixed throughout, alternating styles for each new scene:

2-D character animation
Photorealistic

Which style do you prefer? Let me know in the comments.

Ironically, using two animation styles in a single video is an example of scope creep in its own right, but I thought it was fun as an experiment, and I create these videos mainly to test out the steadily advancing capabilities of the AI models.

I reused three of the character designs from my comic strip about my predictions for 2026, which were in a 2D style. I then uploaded the character reference sheets for these characters to the GPT-Images-2 model and asked it to generate new photorealistic character sheets for them, as if they were real people. I am impressed with how well it brought the cartoon characters to life, and in fact, after creating video clips based on the two sets of character reference sheets, I tend to prefer the photorealistic parts of my video.

One of my character reference sheets, in the original comic-strip 2D cartooning style (top), was transformed into photorealism (middle) and transformed again into Pixar-style 3D animation (bottom). I didn’t use the last version because I ran out of AI credits and didn’t want to complicate my final video by switching between three styles. (GPT-Images-2)

I probably still prefer the 2D cartooning style for comic strips.

It is one of the characteristics of AI as a native media form that one can transform the content in various ways, such as changing the visual style. I didn’t do this, but of course it’s also possible to translate the language and have updated lip-syncing, making it look as if the characters speak native, say, Danish or Korean. After one or two additional generations of AI advances, it will also be possible to update the sets and costumes in real time to reflect more culturally appropriate localization for each individual viewer.

AI can instantly remix a video into different languages and different settings. For now, this needs to be done before publishing the video, but in a few years, these adjustments will happen in real time based on each viewer’s individual preferences. (GPT-Images-2)

I used Seedance 2.0 for this project, since it remains the best video model on the market. It’s expensive, though: I ran through a full month’s worth of AI credits (roughly $200) to generate the final two minutes of useful footage. One problem is that even though Seedance generations usually look good (especially when working from attractive reference images), they often don’t sound good. This Chinese model often garbles English words. For example, I had to run through around 10 expensive generations to have it pronounce the word “readiness” correctly. Also, even though Seedance has decent physics, it doesn’t work from a full world model, so in multi-shot clips, when it cuts between camera angles, people often move around and don’t sit in the same place from shot to shot, or the direction in which they are looking suffers continuity errors.

Getting the words pronounced right is currently the biggest problem with Seedance. It also has a tendency to add extraneous words that are not in the specified dialogue. (GPT-Images-2)

Continuity problems permeate multi-shot generations with current video models. For a commercial project, you have to bite the bullet and spend more on a large number of rerolls. For my projects, which make no money, I let many such smaller problems slide and stop when the video is “good enough.” (GPT-Images-2)

Even though the underlying video model for Seedance seems to have remained the same since the version 2.0 release, the sound model must have been tweaked, as its English pronunciation keeps improving. Two months ago, it never pronounced the word “newsletter” correctly. Now, it pronounces it “news-later” about 20% of the time, which is hilarious, but it gets the word right most of the time.

Much as current limitations annoy me, Seedance is constantly improving. As is most AI. (GPT-Images-2)

Demis Hassabis on AI: Real Progress, Real Gaps, No Magic Required

The current AI paradigm of pre-training at scale, RLHF, and chain-of-thought reasoning is not a dead end. According to Demis Hassabis, co-founder of Google DeepMind and 2024 Nobel laureate, these components will form part of any future AGI architecture. The notion that we will discard them in two years and start over is, in his words, implausible.

Demis Hassabis is one of the world’s smartest people. It’s worth listening to his predictions and vision for how AI will help humanity. (GPT-Images-2)

But the architecture is not finished. In a recent conversation with Y Combinator’s Garry Tan (YouTube, 41 min.), Hassabis identified three concrete gaps: continual learning, long-term reasoning, and memory. He places roughly 50/50 odds that one or two more major conceptual breakthroughs are still required before we reach AGI. His personal timeline: around 2030.

The memory problem illustrates the wider issue. Today’s systems shove everything into the context window. Hassabis calls this “duct tape.” A million-token window sounds enormous until you try to record live video, at which point it covers about 20 minutes (only half of the interview on which I based this news item). Brains do not work this way. They retrieve selectively. AI systems do not yet do this well.

The same diagnosis applies to reasoning. Today’s models can solve gold-medal International Mathematical Olympiad problems while flunking elementary arithmetic when phrased oddly. Hassabis calls this “jagged intelligence.” Watching Gemini play chess, he sees the model identify a blunder mid-thought, fail to find a better move, then play the blunder anyway. A precise reasoning system should not do that. The gap is not raw capability. It is introspection: an honest assessment of one’s own thought process.

Agents fare similarly. Hassabis agrees they are hyped. He also agrees they are just beginning. Most current deployments are experiments dressed up as products. Real value is starting to appear only in the last few months. He has not yet seen a vibe-coded game top the charts. He has not seen a teenager ship a hit title using these tools (like he did at age 17). That absence is informative. Craft, taste, and human originality still matter.

This connects to Hassabis's deepest unresolved question. AlphaGo’s Move 37 was novel. But could a system invent the game of Go itself from a high-level description? Today’s models cannot. The leap from solving problems to generating problems worth solving remains unsolved, as does his “Einstein test,” of whether an AI trained on knowledge through 1901 could derive special relativity.

For builders, two practical observations. First, deep tech is no harder than shallow tech. It is differently hard. Pick problems whose absence would actually matter. Second, if AGI arrives around 2030 and your project takes ten years, AGI will arrive in the middle of your work. Plan for that. The likely architecture is general-purpose models orchestrating specialized tools like AlphaFold, not one giant brain doing everything. Build accordingly.

Smaller models will continue absorbing frontier capability through distillation, with no theoretical ceiling in sight. Inference will not become free; Jevons paradox guarantees demand scales to consume any supply. Open weights matter, particularly on edge devices where local execution serves privacy and latency.

In summary: real progress on AI, still real gaps, no magic required to reach AGI.

Summary of Garry Tan’s interview with Demis Hassabis. (GPT-Images-2)

How Likely is “Likely”? AI and Humans Disagree on Probability Words

Communication relies not just on words but on shared expectations about what those words mean. When a doctor tells a patient a treatment is “unlikely” to cause side effects, both parties assume a certain probability that may not be the same. But what happens when an AI interpreter sits in the middle? In their study, Mayank Kejriwal and colleagues from USC’s Information Sciences Institute posed this question. They compiled a list of 17 probability phrases, ranging from “almost certain” to “extremely unlikely,” and asked both human participants and several leading large language models to assign numerical probabilities to each term. The goal was to see whether machines and people share a common mental scale for uncertainty.

The results reveal a notable disconnect. For extreme phrases like “impossible” and “certain,” both humans and models aligned fairly well, mapping them to very low or high probabilities. But mid‑range likelihood terms like “likely,” “possible” and “maybe” diverged significantly. Humans tended to interpret these words as conveying 50–70 % confidence, whereas the models leaned toward 70–90 %. The discrepancy was largest for “likely”: the average human estimation was about 66 % while GPT‑4 and its peers interpreted it closer to 80 %. Interestingly, more advanced models did not converge toward human interpretations; some even amplified the differences.

Humans and AI interpret common words that indicate a degree of likelihood differently. (GPT-Images-2)

Why does this matter for user experience? Many AI applications, from medical diagnostics to weather forecasting, use natural language to communicate risk. This is probably (ha!) necessary, since most humans are notoriously incompetent at understanding probabilities, so resorting to formal statistics is a doomed idea.

If a chatbot tells a user that something is “unlikely,” the person may assume a lower probability than the model intended or vice versa. Misaligned expectations can lead to overconfidence or unwarranted caution. The USC team suggests that AI systems should avoid relying on vague probability words and instead either translate them into numerical ranges or ask users to specify what they mean. In other words, the system needs to “speak the user’s language” (following usability heuristic 2) rather than projecting its own calibration.

From a design standpoint, the study underscores the importance of clarity and transparency in AI outputs. Interfaces that present probabilities should allow users to drill down into the underlying data or choose their preferred form of expression. Designers should also be aware that training data influences how models interpret language; if a model has been exposed to contexts where “likely” means “almost certainly,” it will carry that bias into other domains. Testing AI systems with diverse user populations can help surface these mismatches and inform correction strategies.

The study invites a broader reflection on how humans and machines align on meaning. As AI becomes ubiquitous, miscommunications may erode trust and hinder adoption. Proactive design choices, like prompting for clarification or offering explicit probability ranges, can mitigate these risks. The research therefore contributes to the ongoing conversation about making AI outputs not just accurate but interpretable and aligned with human mental models.

As always, more research is needed: in particular I would like to see experiments with better ways of illustrating and communicating probabilities to people with different levels of statistical numeracy, recognizing that 90% of the population is completely incompetent at statistics, and only 1% have a decent understanding of probabilities and other stats concepts. After all, cave age people didn’t stop to calculate the probability of being eaten by a saber-toothed tiger. If they saw one, they ran, or they would not have passed their DNA down to us.

Why present-day humans are bad at probabilities. (GPT-Images-2)

Why People Use AI vs. the Internet

GWI (formerly GlobalWebIndex) is a global consumer research platform covering 50 countries. Their survey data shows a substantial difference in the top reasons people use the Internet versus AI.

For both services, the number-one reason is “Finding Information,” mentioned by 60% of Internet users and 59% of AI users. The same number, given the inherent uncertainties in self-reported data.

The following reasons differ dramatically. Main reasons (after finding information) for using the Internet:

58% say “staying in touch with friends and family”
53% say “watching videos, TV shows, or movies”
51% say “keeping up to date with news and events”
49% say “researching how to do things”
45% say “accessing and listening to music”

Basically very passive use cases, except for researching how to do things. There was even a fairly popular survey response where 44% of Internet users gave “filling up spare time” as one of their main reasons for using the Internet. Given how little time we have on this Earth, I find it depressing that many people want to spend it on nothing.

Answers were very different for the main reasons given by AI users (again, after finding information):

38% say “get advice on problems”
37% say “learn or improve skills”
34% say “create or enhance images or video”
33% say “save time or increase efficiency”
30% say “create or enhance written content”

AI use is dramatically more active and creative than Internet use, which is dominated by the passive consumption of content created by others, mainly as entertainment.

The Internet has become a pacifying medium, whereas AI encourages active users, as shown by the top reasons people state for the two forms of computer use. (GPT-Images-2)

Internet = Veg out
AI = Create and improve yourself.

The traditional Internet has largely devolved into an engine for passive consumption, while AI has emerged as a catalyst for active creation and self-actualization.

When we designed the modern web during the dot-com bubble, my vision (and that of many others) was a highly interactive empowering environment that would spread knowledge and turn everyone into a creator. Instead, the Internet has largely become the ultimate digital couch.

Looking at the top secondary reasons people use the Web, the underlying theme is frictionless escapism: watching videos (53%), reading news (51%), and listening to music (45%). These are entirely “lean-back” activities where the user is simply an audience member. Much like the legacy media like television we were hoping to replace.

The most tragic statistic is the 44% of users who explicitly view the Internet as a way of “filling up spare time.” Human time is our only truly non-renewable resource. The fact that nearly half of the digital population uses the most advanced communication network in history simply to kill time is existentially depressing.

The goal of UX design has changed over the years: During the early years of the web, we tried to empower users and make them better at navigating the web and expressing themselves in this new media form. Then the goal became to monetize passive users and make them buy and scroll endlessly to see more ads. Now, in the AI era, UX design again aims to put users in charge, but this time with an emphasis on intellectual augmentation and creation. (GPT-Images-2)

Why did this happen? It is a feature of the modern web’s business model. The attention economy is funded by advertising, which requires keeping users on a platform for as long as possible. To do this, tech giants built infinite scrolls, auto-playing videos, and algorithmic feeds optimized to remove all cognitive friction. The Internet was designed to make us “veg out” because a passive, sedated consumer is easier to monetize than an active one.

By contrast, the AI usage data shows a “lean-forward” technology (like we wanted the Web to become). If the Internet is a television, AI is a workshop.

Lean-back vs. lean-forward is an eternal dilemma in user experience. When I originally worked on Web usability in the 1990s, the early Web was definitely a lean-forward experience, but sadly the dominant use has turned into lean-back. We get a second chance with AI. (GPT-Images-2)

The verbs dominating the AI usage data require high cognitive engagement: Get, learn, improve, create, enhance.

The Interface Dictates the Mindset: The core reason for this divergence lies in the user experience. The fundamental interface of the modern Internet is the feed; it requires zero input to receive a dopamine hit. The fundamental interface of AI is the intent. You cannot doomscroll a blank ChatGPT or Seedream text box. To get any value out of AI, you must bring an intention, a problem, or a creative vision to the table. This prompt-driven architecture forces the user to be the director rather than the audience.

We may think we design the UI, but the UI also designs us. It’s a duality, where each partner influences and shapes the other. (GPT-Images-2)

Furthermore, AI radically lowers the barrier to entry for creativity. Tasks that previously required specialized software and years of training on drafting polished essays, writing code, generating art are now accessible to anyone with an idea. AI doesn’t just entertain you; it equips you to build.

An infinitely scrolling feed makes us passive, AI activates us. (GPT-Images-2)

Killing Time vs. Reclaiming Time: The most philosophical takeaway from the GWI data is how the two technologies interact with our human lifespan. While 44% of Internet users are actively trying to burn away their spare hours on “nothing,” 33% of AI users are actively using the technology to save time and increase efficiency.

While the Internet frequently functions as a waiting room where we passively pass the hours, AI functions as a personalized tutor and co-pilot. Users are deploying AI to offload digital drudgery so they can free up their cognitive bandwidth for real life. AI users are not looking to escape reality; they are asking for advice (38%) to help them navigate it.

Time is the ultimate scarce resource that runs out for all of us in the end. That’s why I have always been keen on the second usability quality attribute of efficient use. AI helps us get more out of our limited time. (GPT-Images-2)

The Future: Will the “Create and Improve” Era Last? Right now, the data reflects a beautiful, optimistic moment in digital history. It proves that humans do possess an innate drive to learn, build, solve, and improve themselves: they just needed a tool that scaled to their ambitions without getting bogged down by the attention economy.

However, there will inevitably be immense commercial pressure to integrate AI into passive feed mechanics. If tech companies use generative AI to create infinitely personalized, hyper-addictive streams of entertainment designed to keep us scrolling, AI could become the ultimate “vegging out” machine.

To ensure the data in future reports continues to reflect this current “create and improve yourself” paradigm, we must consciously fight to keep AI a tool of human agency. We must use our machines not to turn our brains off, but to turn our potential on.

The future doesn’t happen to us. We make it happen. It’s our responsibility, especially in UX, to create a positive AI future. (GPT-Images-2)

Google’s AI Announcements Were Disappointing

I was very disappointed with Google’s AI announcements at last week’s I/O conference. Very modest advances, compared to what we’ve been getting used to from the other AI labs this year, where AI has been accelerating and is approaching RSI (recursive self-improvement).

Google I/O was a disappointment. (GPT-Images-2)

To start on a positive note: It is great that Gemini 3.5 Flash is 4x faster: speed is a usability benefit in itself. (I wrote my first article on AI response times in August 2023; that’s how fundamental speed is to usability.)

Speed drives usability by allowing users to stay in a flow state. (GPT-Images-2)

So far, I’ve gotten less impressive results from Gemini 3.5 Flash Extended Thinking than from Gemini 3.1 Pro Deep Think, so I doubt I’ll use it much. (If I want a very simple question answered really fast, I'll use ChatGPT 5.5 Instant or Grok Fast.) For example, I asked both models for critical feedback on the manuscript for an upcoming article, and Gemini 3.1 Pro Deep Think was more insightful, though GPT 5.5 Pro Extended Thinking was even better.Google’s new video model is also a very limited upgrade, compared with what we have had from Chinese video models for months. Gemini Omni Flash can generate 10 seconds clips, which is an extremely modest upgrade compared with the 8-second clips from Veo 3.1. In contrast, Seedance 2.0, Kling 3.0, and HappyHorse 1.0 can all make clips that run 15 seconds: those extra 5 secs are an eternity in video production. (Kling even generates in 4K natively, which produces immensely better-looking video than the puny 720p delivered by Gemini Omni Flash. Seedance and HappyHorse aren’t 4K yet, but at least they already generate in 1080p natively and will surely be 4K soon, given the cutthroat competition in China.) Even Wan 2.7 and Grok Imagine run to 15 seconds.

Google is falling behind in video generation. (GPT-Images-2)

I made a small compilation of the exact same prompt on Gemini Omni Flash and Seedance 2.0: speedrunning through Chinese history (Instagram, 1 min.). Which model’s clips do you like best?

At least Google seems to have realized it is falling behind in AI capabilities with these releases, as it reduced the price of the Ultra subscription (needed to access Deep Think) from US$250 to $200 per month. Even better, Google introduced a new, cheaper Ultra subscription plan at $100 per month that offers the same benefits as the $200 plan, but lower rate limits: you get 1/4 of the compute allocation from paying half the price, so if you’re a heavy user of Google AI, do stay on the most expensive plan. But I promptly downgraded to $100 since I’m currently using OpenAI and Seedance much more than I’m using Google’s models.

Lower prices for Google AI subscriptions reflect its less competitive performance. (GPT-Images-2)

If Google becomes more competitive later, I can always upgrade back. Even though Google’s subscription UX has been terrible in the past, I should give them props for now having put a little bit of usability effort into separating customers from their money: it’s now easy to find the subscription plans and change your level.

Google’s subscription UI has changed from terrible to good. Nice! (GPT-Images-2)

So if you’re a Google Ultra subscriber but don’t use it enough to justify the high-cost plan, downgrade to $100/month. Google does not advertise the availability of this plan widely: they sent me an email touting the price reduction from $250 to $200, but without any mention of the $100 option. Borderline unethical customer communication.

Google has also reverted to its old bad habits, with a sprawling, confusing UX architecture for its AI offerings that are scattered ever more widely. For one, Google’s video generation changed its name from Veo to Gemini Omni. Unnecessary brand confusion right there. Second, Google is launching a new image editing capability any day now, called Google Pics, that will seemingly not be integrated with Google’s main image-generation feature at Google Flow. Confused already?

The architecture and naming schemes conspire to create a highly confusing user experience for Google’s universe of AI offerings. (GPT-Images-2)

From the demo, Google Pics looks promising, offering object-level editing similar to that Reve has had for ages. But why does this image editing feature have to be moved into a new location rather than being integrated with image generation?

My prediction number 6 for 2026 was that Google would get its AI act together. Sadly, it seems I was wrong. (Though they still have half a year to turn the supertanker.)

Will the Google behemoth be able to mend its ways and have a coherent AI strategy with a clean user experience? They still have time, but need to take this problem more seriously. (GPT-Images-2)

UX Design Video Game Launch Poster

What if UX design were a video game? Here’s the launch poster for this game. Somebody should build this. (Prompt credit: LudovicCreator.)

(GPT-Images-2)

UX Roundup: Scope Creep | Demis Hassabis on AI | Interpreting Probability Terms | Why People Use AI & Internet | Google AI

Scope Creep Video

Demis Hassabis on AI: Real Progress, Real Gaps, No Magic Required

How Likely is “Likely”? AI and Humans Disagree on Probability Words

Why People Use AI vs. the Internet

Google’s AI Announcements Were Disappointing

UX Design Video Game Launch Poster

Recent Posts

Top Past Articles

A New AI: Creation as Exploration and Discovery

The 10 Usability Heuristics in Cartoons

4 Metaphors for Working with AI: Intern, Coworker, Teacher, Coach

Dark Design Patterns Catalog

Jakob’s Law of the Internet User Experience

Ideation Is Free: AI Exhibits Strong Creativity, But AI-Human Co-Creation Is Better

The 10 Usability Heuristics Reimagined

UX Needs a Sense of Urgency About AI

AI Is First New UI Paradigm in 60 Years