UX Roundup: GPT 5 | AI in Education | UXR Repositories | AI Cannibalization | Map vs. Territory | Patients and AI vs. Doctors | AI Speed | User Test Methodology

Jakob Nielsen
Aug 11
18 min read

Summary: GPT 5 put to the test: Explain UXD and UXR to a 5-year-old | AI in Education | AI makes repositories of old user research findings more useful | AI summaries cannibalize clickthrough from web searches | The journey map is not the territory of user actions | Patients reveal more to clinical AI than when talking with human physicians | Response time matters for AI usability | Leading the participants in a usability study

UX Roundup for August 11, 2025. (GPT Image-1)

A Riddle for You

What is the meaning of this image, which I created with the GPT Image-1 native model? I’ll give you a hint at the end of this newsletter, but try to solve it yourself first. It’s a bit tough. I was very impressed with the AI’s ability to realize my vision for this riddle.

The answer will be in next week’s newsletter.

GPT 5 Put to the Test: Explain UXD And UXR To A 5-Year-Old

Initial feedback on the new GPT 5 model is mixed: most agree that it is indeed a better model than OpenAI’s previous models, and many also agree that it’s currently the world’s smartest AI model. (Which may only last until Gemini 3, Grok 5, or the first creation of Meta’s new superintelligence lab ships.)

On the other hand, excepting the narrow (but important) domain of software development and the even narrower domains such as solving Math Olympiad-level mathematics problems, many AI influencers also think that GPT 5 represents an incremental step and not the promised “I feel the AGI” revolutionary advance.

In judging the AI improvement represented by GPT 5, I think it’s wrong to compare it to whatever was our preferred model a week ago. Progress should be judged in generations of AI, which tend to be about two years apart. In this case, compare GPT 5 with the original version of GPT 4 that shipped in March 2023. With this comparison, GPT 5 is indeed a revolutionary advance in AI.

Pragmatically, it’s certainly nice for us users to get incremental advances in AI seemingly every week. Better reasoning, larger context windows, prettier images, richer instrumentation of AI songs, higher prompt adherence in video creation, or even the fact that AI videos come out in ever-higher resolutions and frame rates. And even more pragmatically, cheaper AI pricing all the time!

However, since each of these incremental advances only represents a small step forward, it’s easy to lose track of the cumulative advances between the AI generations. My analysis suggests we’re still on track to achieve superintelligence in another two generations of AI advances, likely around 2030. (AI generations used to be two years apart. In this latest case, 2 years and 5 months lapsed between GPT 4 and GPT 5, so the cadence may be slowing. That’s why I’m predicting ASI in 5 years and not 4, even though it likely only requires two generations' worth of AI advances over GPT 5.)

From our lived experience (left), GPT 5 seems like a small incremental advance over the more-or-less weekly upgrades we’ve been getting. From the perspective of AI generations (right), GPT 5 is a major step along the path to superintelligence. (GPT Image-1)

From the UX perspective, it’s interesting that OpenAI baked design knowledge and an aesthetic approach into GPT 5’s code generation, so that “vibe coding” automatically will include some degree of “vibe design” of the resulting application, even before taking any explicit steps to iteratively improve the design. As Greg Brockman (OpenAI President) said during the launch event (paraphrase from memory), “now there’s no more excuse for enterprise apps to be ugly.” Good design out of the box, though there are probably some usability aspects it’ll get wrong for now. There are still a few years left where we will need human designers and user researchers to improve software UX.

GPT 5 has a better sense of design built into the box, but we will probably still need to add human usability insights for a few more years, because good UX is more than pretty screens. The cycle should be generate-test-fix in an iterative loop. (GPT Image-1)

As my own test, I posed the following challenge to ChatGPT: “Explain what a UX Designer and a User Research Specialist do, but make it simple enough for a 5-year-old to understand.”

The Previous Best Model, GPT o3, Explains UDX and UXR:

Think of building a big, fun playground.

1. UX Designer — “The Playground Planner”

They decide where the slide goes and how high the swings are so kids can play safely and happily.
They sketch simple pictures (like coloring-book drawings) to show builders what the playground should look like.
Their main question is: “Will kids enjoy this?”

2. User Research Specialist — “The Kid Listener”

They watch children play on the new playground and ask, “Do you like the slide? Is the ladder too tall?”
They notice which toys kids ignore and which ones make them smile.
Their main question is: “What are kids telling us— with words or giggles— about our playground?”

How They Work Together

The Kid Listener brings back the kids’ ideas.
The Playground Planner changes the design so it’s even more fun.
They repeat until the playground is a giggle factory.

That’s exactly how grown-ups make websites and apps easy and fun to use.

The New GPT 5-Pro Model Explains UDX and UXR:

Two helpers who make apps and websites easy

1) UX Designer — the planner

Think of a UX Designer like the person who plans a playground.
They decide where the buttons go, what the screens look like, and how you move around so it feels easy and fun.
They draw simple pictures (like doodles) of the screens to show the plan.

2) User Research Specialist — the detective

Think of a User Researcher like a friendly detective.
They ask people questions, watch them try the app or website, and listen to see what’s easy or tricky.
Then they tell the designer what to fix so it works better.

How they work together

Designer: “Here’s the playground plan.”
Researcher: “Let’s watch kids play and see what works.”
Together: “We’ll move the slide closer and make the ladder easier to climb.”

Short version: The UX Designer plans and draws how it should work. The User Researcher checks with real people to make sure it really does work. Together, they make things simple.

Jakob’s Analysis

Interestingly, both models picked the same analogy to explain UX to kids: playground design. This may be an obvious example when explaining UX to children, because Gemini picked the same metaphor (see below). Claude Opus used a different, but related, example: toy design.

However, GPT 5 grounded most of its explanation in the mainstream UX domain of screen design, which I think is appropriate for today’s 5-year-olds who all have experience using screen-based applications (mainly games, probably, but surely enough UI exposure to understand the concept of “where the buttons go”).

I think the two explanations are roughly equally good, except that GPT 5 is better at avoiding the methodology trap of having the researcher ask the users what they want, as opposed to watching them and figuring out what they need. In contrast, GPT o3 presented the main activity of the user researcher as asking children “do you like the slide?” — surely enough, that poor methodology was picked up when I asked for the descriptions to be further simplified as comic strips:

UXD and UXR explained for 5-year-olds by GPT o3 (left) and GPT-5 (right). Both comic strips were drawn by the GPT Image-1 model, but the image model was art-directed by GPT o3 for the left drawing and GPT-5 for the right drawing.

On balance, I don’t think that GPT 5 was much better than o3 at explaining the two jobs, but as a methodology freak, I do give it major props for avoiding the common pitfall of assuming that user research is mostly based on asking users. (No, you should watch users, not ask them what they want. They don’t know.)

For comparison, here are the answers from two other leading AI models.

Gemini 2.5 Pro Explains UDX and UXR

Imagine we are building a brand new, super-fun playground. We need two helpers to make it the best playground ever.

The User Research Specialist (The “Watcher and Asker”)

This person's job is to be a detective! 🕵️

Before we build anything, they go to other playgrounds and watch how kids play. They see what’s fun and what’s not fun.

They might see that kids keep tripping on a certain step.
They might notice that everyone wants to go on the swirly slide, but nobody uses the wobbly bridge.

They also ask questions, like:

“What’s your favorite thing to play on?”
“If you could invent a new ride, what would it be?”
“Is that swing too high?”

They gather all this information to figure out the secret plan for the most fun playground.

The UX Designer (The “Fun Planner”)

This person is like the playground architect. 🖍️

They take the secret plan from the “Watcher and Asker" and start designing the playground. They draw maps and decide where everything should go.

“Okay, kids love swirly slides, so let’s put a big one right in the middle!”
“Kids were tripping on that step, so let’s make our steps bigger and safer.”
“Let’s put the swings far away from the slide so nobody bumps into each other.”

Their job is to make sure the playground is easy to use, safe, and super fun, so you know exactly how to play on everything without getting confused or frustrated.

So, together:

The Watcher and Asker figures out what kids want.

The Fun Planner figures out how to build it so it’s awesome.

They are a team that makes sure everything from new toys to phone apps is easy and fun for you to use!

(Yes, Gemini committed an elementary subject-verb agreement grammar error in using the singular form of the verb “figures” for what “the watcher and asker do.”)

Claude Opus 4.1 Explains UDX and UXR

Imagine you have a favorite toy or game that's really fun and easy to play with. A UX Designer is like someone who invents and builds those toys! But instead of toys, they make apps and websites (like the games on a tablet or phone). They think really hard about making buttons big enough for your fingers, putting things where you can find them easily, and making everything look pretty with nice colors. Just like how a good toy has the buttons in the right spots and fun colors, they make sure everything on the screen is in the perfect place so you don’t get confused or frustrated.

A User Research Specialist is like a detective who wants to know what makes kids happy! They’re the person who would come to your classroom and ask you and your friends questions like “What’s your favorite game?” or “What makes this toy hard to play with?” They watch how you play and listen really carefully to what you say. Then they go tell the toy makers (or in grown-up world, the app makers) what kids really want and what would make things better.

So they work together like a team — the Research Specialist finds out what people want by asking lots of questions and watching, and then the UX Designer uses all those ideas to build something awesome that everyone will love using! It's like one person figures out that kids want a slide that's not too scary but still fun, and the other person actually builds that perfect slide.

Flexibility & Efficiency of Use: The Music Video

New song: Flexibility and Efficiency of Use (Usability Heuristic 7), YouTube, 4 min.

For this video, my concept was a 1940s Hollywood nightclub. I am pleased with how it came out. In particular, I made the base image for the I2V animation with Grok’s new image model, which in general I have not liked, but it captured the 1940s glamour photography style perfectly. The singer looks ready to audition to play a mystery client walking into Humphrey Bogart’s detective agency to hire him to retrieve a missing statuette. (In contrast, Midjourney, which usually makes the best images, gave me a photo that looked like a present-day movie star cosplaying as a 1940s star.)

Following my concept, the avatar video is black-and-white footage, which looks great. The B-roll, on the other hand, is color. I particularly like the shock effect of cutting directly from the smooth grayscales to a very colorful tropical bird at the beginning of the first chorus. (I know Wizard of Oz did this first, so I’m not claiming creative genius, but it is an effect that works well if used sparingly.)

See for yourself, and let me know in the comments whether you would have preferred the consistency of B&W footage all the way.

Interaction flexibility is a vital usability criterion, especially for expert users. (GPT Image-1)

AI in Education: Two Models

This illustration presents two starkly different visions of AI in education, but one of them needs to be sent back to the drawing board. (GPT Image-1)

On the left, we have a cheerful but ultimately misguided scene of a classroom robot standing proudly as a replacement teacher. While charming as a cartoon, AI replacing human teachers entirely is a dreadful idea, and it misses the point of what artificial intelligence can achieve in education. Yes, it's tempting—particularly for budget-conscious administrators—to imagine tireless robots effortlessly delivering lectures, grading exams, and maintaining classroom order.

However, since the children are human, they will almost certainly require adult humans to serve as role models and mentors. Maybe even more important, the human teachers can push the students to continue to learn from AI, rather than tuning out the instructional lessons in favor of watching TikTok videos.

The right panel shows a more promising paradigm: AI as a personalized tutor. Here, the robot acts as a supportive tool, enabling tailored learning experiences. This aligns with user-centered design by adapting to individual needs, which is key to effective education.

AI-driven tutors can monitor progress in meticulous detail, dynamically adjusting difficulty and content to precisely match each student’s current understanding. The best students aren’t left bored, drumming their fingers while waiting for the rest of the class to catch up. Instead, they’re accelerated to new challenges, expanding their capabilities beyond what standardized curricula traditionally allow. Simultaneously, slower learners aren’t overwhelmed by standardized instruction soaring way above their heads. Instead, they receive patient repetition, simplified explanations, and targeted practice that reinforce foundational concepts and help them genuinely understand and retain core lessons.

Students progress at their own pace: top performers accelerate to advanced topics, while slower learners actually learn the most important material.

AI in education should not be a cold robotic takeover of education, but a warm augmentation of human teaching potential. Teachers remain central, freed from repetitive tasks to focus instead on higher-order instruction, mentorship, and relationship-building. Embrace AI as an ally in the classroom, personalizing learning to truly serve every student.

AI Makes Repositories of Old User Research Findings More Useful

Knowledge-management firm Stravito notes that AI is changing the way UX teams utilize their existing user research findings. In truth, repositories of old usability reports usually ended up gathering dust, but AI is resurfacing some of that hard-won existing knowledge.

Discovery tools can now surface patterns across both qualitative and quantitative data, reducing manual work and allowing researchers to spend more time on strategy.

Even more interesting, looking ahead to more powerful next-generation AI, Stravito predicts a shift from search‑driven retrieval to discovery‑oriented exploration:

Traditionally, if UX teams used their old data at all, it was question-based: Ask a question, get an answer, if that topic had already been studied. (For example, how do our customers use mega manus?)
In the future, AI agents can proactively scour the repository of existing user research, whether in synthesized form (those dreaded reports) or as raw data (facilitator notes from usability studies, or even the recordings from the sessions themselves). A single human research can only remember a small percentage of what he or she personally observed (with the very real risk of misremembering, especially for older studies) and a vanishingly small percentage of the synthesized insights from studies conducted by colleagues and reported in a team meeting. In contrast, AI knows it all, regardless of the data source and the number of years that have passed since the data was collected.

We should expect AI to discover patterns in user behavior across studies and also to identify emerging changes from past to current behavior.

AI can serve up a tasty dish of old insights made fresh. (GPT Image-1)

AI Cannibalization Is Real

Pew Research analyzed click data from 900 users who conducted searches on Google. People’s tendency to click through to a destination website differed, depending on whether one of Google’s new AI summaries was present on the SERP (search engine results page):

No AI summary: 15% clickthrough to a search result.
AI summary present: 8% clickthrough to a search result, plus 1% clicks on a link within the AI summary.

The good news for web usability: the AI summaries seem to help users! The bad news for websites: your traffic from Google is about to be cut in half.

Solution: Don’t cry over technical advances. Users will do what’s best for them, and we’ve known since November 2023 that AI is a better option for users than search. It’s simply that the chickens are only now coming home to roost for real, since chickens fly rather slowly when it comes to changing the behavior of the masses of Internet users.

Infographic summarizing this news item, automatically made with the new AI service Thoth. I think it’s too verbose to serve as a proper summary and lacks sufficient aesthetic appeal to capture users’ interest. However, it was made in one click, so there’s that.

To retain your Internet business, follow the advice in my articles:

A further interesting point in Pew’s study: too many of the citations in Google’s AI summaries are either from low-quality biased sources (Wikipedia with 6% of citations and Reddit with 5%) or Google’s own platform (YouTube with 4%).

So far, YouTube appears to be a fairly neutral platform that is indeed the best source for watching information that has been visualized and popularized in video form, so I don’t blame Google for featuring it prominently. (The antitrust authorities were asleep at the wheel when they approved Google’s purchase of YouTube in 2006.)

Will AI answers be the end of human-written websites? Websites will absolutely lose traffic, but they still have survival strategies that can stave off total doom. (GPT Image-1)

The infographic I made for this news item with GPT 5-Thinking (rendered by the GPT Image-1 model). Simpler, but I like it better than the Thoth infographic above.

Journey Map vs. Usage Territory

Every step the user takes matters. However, you must observe the steps people take in the territory of their real-world context, rather than assuming that any map you have created is accurate. (GPT Image-1)

There is a common corporate ritual that is an egregious waste of resources: the “Big Research Project.” A company spends a fortune to generate a set of personas and journey maps. These artifacts are printed, laminated, and celebrated. Then they are hung on a wall to gather dust, treated as immutable truth. This practice is fundamentally worthless.

Personas or journey maps developed last year may already be dangerously out of date. User expectations are not static. They evolve with every new app they use, every new technology they adopt, and every shift in the market. To base your current design decisions on a year-old snapshot of user behavior is to design for ghosts. An outdated map does not guide you; it misleads you. The belief that you “know the user” because you have a persona poster is a fallacy that leads directly to product failure.

The real territory of user behavior is messy, chaotic, and ruthlessly efficient. Users will always find the path of least resistance, which is rarely the pristine “happy path” you have designed. They use workarounds, ignore features, and get lost in ways your initial map never predicted. You cannot understand this complex terrain from the comfort of your conference room. The only ground truth is found by observing real users interacting with your product now, not a year ago.

Therefore, your design artifacts must be treated not as stone tablets, but as living documents: hypotheses that require constant validation. This is not a one-time project; it is a continuous operational discipline.

An outdated map may be more dangerous than no map at all because it provides a false sense of security, giving you the confidence to march your entire project in the wrong direction. Stop admiring your beautifully designed artifacts. Get out of the building and into the field. The goal is not to have a map; the goal is to have an accurate one. And accuracy demands constant, unrelenting contact with reality.

Patients Spill More Beans to AI Than to Human Doctors

A clinical trial at Lyell McEwin Hospital in Australia compared trainee human physicians with AI for interacting with patients to elicit their history. The AI in this study was based on Claude Sonnet 4, which is not Anthropic’s current top model (that would be Opus 4), let alone the top AI across all the labs. We should expect even better performance if using the best AI, which would be reasonable for real clinical use, considering the cost of the current healthcare system and the lack of qualified physicians in many locations.

One finding replicates what we’ve seen before: that patients feel that AI shows more empathy with their situation than human doctors do. Although this is not a new research finding, I am pleased to see the previous research replicated in a new setting with a new AI system and a different study methodology. Such replication substantially strengthens our faith that the research finding is correct and will generalize to yet more settings and the next generation of clinical AI.

In this study, the patient scores were as follows:

Empathy: AI 5.0, doctors 4.5 (significantly different at p=0.04 — it’s not clear from the paper whether a 5 or 7 point rating scale was used, but I suspect 5)
Politeness: AI 4.8, doctors 4.9 (the difference was not significant)
Net promoter score (likelihood to recommend): AI 9.5, doctors 8.1 (on a 0–10 scale; this difference was not significant)

A significant new finding was that the AI was able to elicit crucial clinical information that was not divulged to human physicians. Specifically, “this additional information related to mental health concerns, with one patient disclosing a history of suicidality to AI-DOCS that had not been disclosed to the human examiners.” This highlights a potential benefit of AI in creating a perceived safe space for patients to disclose information that they may feel ashamed or reluctant to reveal to a human doctor.

There’s an actual benefit to the AI not being a person when it comes to divulging sensitive information.

Patients reveal more sensitive information when talking with a clinical AI than when talking to human doctors. (GPT Image-1)

As one of the senior doctors who scored the medical histories taken by AI and the trainee doctors said, “Dangerously exciting (will I be replaced by AI) to see what AI can do with complex medical information.”

Two limitations of the study:

Current AI can’t perform physical examinations, so the study was limited to the clinicians (whether AI or human doctors) interviewing patients. Once we get robots, this will change, but clinical robots are probably at least 5 years away, if not 10.
The patients were well-spoken in English, which doesn’t represent the full spectrum of real-world patients, such as people with cognitive impairments or language barriers. My guess is that AI will be better than humans at filling in the blanks when communicating with such patients, based on its superior pattern-matching abilities, honed by consuming trillions of tokens of training data. However, “more research is needed,” as most papers conclude.

Response Time Matters for AI Usability

Computer response time is one of the most important determinants of usability, so much so that I made it dishonorable mention A in my list of top-10 UI annoyances and made a special music video about one of the best ways to alleviate slow response times, the progress indicator.

See my analysis of time scales in UX, from 0.1 seconds to 100 years, for more details about response time limits. These response time limits have been documented for decades, and yet AI systems have broken them from the start. The advent of reasoning models has only made the AI Wait worse.

The AI Wait severely degrades the usability of current AI models, causing users to iterate less. And it wastes millions of person-hours each year. (Grok 4)

I’m happy to bring some good news for once in the realm of AI response times: Grok’s new “Imagine” mode for image generation is fairly fast at generating images, very close to meeting the required 1-second limit for users to feel that they are moving freely. As expected, this improved speed is a game-changer in image-creation usability.

For any given prompt, such as “satirical cartoon about slow AI response times,” you can keep scrolling through Grok’s suggested images, where it generates new ones almost as fast as you can scroll. (Of course, the ideal would be to erase the word “almost” in this assessment. Maybe next year.)

If you like an image, you can click on it, after which Grok adjusts its image-generation to more closely match that image. Through multiple rounds of alternating scrolling through variations and clicking on preferred versions, you gradually narrow down the new images to slight tweaks of your final design. Two stages in this progression are shown in the following screenshots:

Two iterations of generating a satirical cartoon about slow AI response time with Grok 4’s “Imagine” model. In the beginning, the designs are fairly different, but as the user clicks on preferred images, the variability narrows.

A similar iterative design process is supposed by other image models, such as Midjourney, but because they are slower, you don’t get that feeling of moving freely through a vast design space that you get from Grok.

Biasing a User Test

It is very easy to bias a usability study by letting the user suspect what you want to see or what are the expected answer. One example is included in my recent video “Usability Testing as Theater” (YouTube, 2 min.), in the segment set in a Renaissance palace with a court jester acting as the study facilitator.

When the user gets the task right, the facilitator says “wow,” and audibly adds a checkmark to his study guide. We can also hear the observers applaud the user.

These many cues clearly tell the user that she did the right thing. Conversely, if she later uses a “wrong” feature in the UI for the task, the absence of praise and applause will let her in on the mistake and make her change course earlier than she would naturally have done. Thus, we don’t get to observe whether the feedback mechanisms in the UI suffice to let users know when things are going wrong.

When in the room with a study participant, facilitators and observers must employ a poker face at all times and not let the user know whether he or she is doing well or poorly. (If the observers are in a separate soundproof observation room, it’s only natural to celebrate user success, especially if it comes after a long struggle with the UI. But only celebrate if you’re sure the participant won’t be able to notice.)

If you praise usability study participants when they complete a task successfully, they will quickly pick up on that feedback and also know when they are using the design wrong. You have thus biased the research results, which will be more favorable than if the test users had to figure things out for themselves without outside help. (GPT Image-1)

Riddle Hint

Here’s a hint to help you solve the riddle at the beginning of this newsletter. Be warned, though, that the hint is direct, whereas the solution to the original riddle is indirect, and thus more challenging.

Prompt credit to Umesh for the idea for this kind of image. The hint is in his style, whereas my main riddle is a tweak that came to my fevered imagination.

UX Roundup: GPT 5 | AI in Education | UXR Repositories | AI Cannibalization | Map vs. Territory | Patients and AI vs. Doctors | AI Speed | User Test Methodology

A Riddle for You

GPT 5 Put to the Test: Explain UXD And UXR To A 5-Year-Old

The Previous Best Model, GPT o3, Explains UDX and UXR:

The New GPT 5-Pro Model Explains UDX and UXR:

Jakob’s Analysis

Gemini 2.5 Pro Explains UDX and UXR

Claude Opus 4.1 Explains UDX and UXR

Flexibility & Efficiency of Use: The Music Video

AI in Education: Two Models

AI Makes Repositories of Old User Research Findings More Useful

AI Cannibalization Is Real

Journey Map vs. Usage Territory

Patients Spill More Beans to AI Than to Human Doctors

Response Time Matters for AI Usability

Biasing a User Test

Riddle Hint

Recent Posts

Top Past Articles

A New AI: Creation as Exploration and Discovery

The 10 Usability Heuristics in Cartoons

4 Metaphors for Working with AI: Intern, Coworker, Teacher, Coach

Dark Design Patterns Catalog

Jakob’s Law of the Internet User Experience

Ideation Is Free: AI Exhibits Strong Creativity, But AI-Human Co-Creation Is Better

The 10 Usability Heuristics Reimagined

UX Needs a Sense of Urgency About AI

AI Is First New UI Paradigm in 60 Years