top of page

UX Roundup: Simplicity | Error Prevention | Educational AI | Heuristic Evaluation with AI | AI 4x Better at Medical Diagnosis | Grok 4 and AI Prices | AI Video

  • Writer: Jakob Nielsen
    Jakob Nielsen
  • 20 minutes ago
  • 14 min read
Summary: The Simplicity Poof | Error prevention explained by Vikings | AI in education | Heuristic Evaluation with AI | UX career pivot | AI was 4x better than human doctors at diagnosing complex cases | Grok 4 sets new record for AI subscription pricing | Advances in AI Video

UX Roundup for July 14, 2025. (GPT Image-1)


The Simplicity Poof

The simplicity poof. (ChatGPT)


Don’t we all wish it were this easy to achieve simplicity in the user experience? Unfortunately, simple design is hard work. However, once achieved, simplicity does feel as if the good fairy is looking out for you.


Error Prevention

New video, explaining Jakob Nielsen’s usability heuristic 5, Error Prevention (YouTube, 2 minutes). For this video, I took a hint from the popular influencer videos that are set in historical periods. I chose the Vikings, of course.


In my new video, the error prevention heuristic is explained by Vikings. (GPT Image 1)


I had a hard time getting AI to show Vikings without horns on their helmets (which were used in the Bronze Age, not by the Vikings). If you look closely, you’ll see that the Jarl (chief) of my Viking crew does sport horns. The horn stereotype is too strongly represented in the training data to avoid without a very large number of rerolls, which gets expensive for video generation.


AI in Education

Interesting video with experts in consumer AI and EdTech from A16Z about the emerging use of AI in schools (YouTube, 30 min.).


We have always known that individual tutoring is vastly superior to mass instruction, but it has been too expensive to hire one human tutor per student. AI now offers individualized instruction, which will vastly accelerate and deepen learning. (ChatGPT)


Schools have seen a major shift from initial skepticism to pragmatic adoption of. While major school districts initially banned AI, approximately 80% of K12 districts now actively explore and procure new AI technologies, with higher education institutions leading the transition, recognizing AI as a fundamental tool for future employment. Teachers, rather than students, are the primary drivers of AI adoption, using it extensively to automate administrative tasks such as grading, feedback, and curriculum design, which constitute 90% of the job they dislike. This adoption significantly enhances teacher productivity, potentially reducing burnout and allowing for personalized curriculum generation per student.


The future of AI in education points towards a personalized learning landscape where students dictate their pace and preferred modality. While AI is not expected to fully replace human teachers in the foreseeable future, it will enable more effective active teaching and fundamentally alter student interaction with learning content. However, significant hurdles remain, including the slow commercialization of advanced AI tools, high integration costs, and the need for traditional education gatekeepers like textbook companies to rapidly innovate or risk irrelevance. Ultimately, building engaging, adaptive experiences around increasingly capable language models is crucial for AI to become a core, transformative element of the educational system.


AI will be much better than human teachers at teaching students a curriculum, mostly because it can provide individualized instruction at a different pace for each student and optimize delivery to that student’s needs. However, contrary to this cartoon, I don’t expect that this change will mean that human teachers will have nothing to do. They should pivot from content instructors to student mentors and pace setters. (ChatGPT)


The following were the three most interesting points in the video:


  • Teachers account for most of the money spent on educational AI comes from teachers, not from parents/students or from educational institutions. This is because teachers see direct day-to-day benefits from their personal use of AI for things like creating lesson plans or worksheets. Unfortunately, these AI outputs are then used with legacy teaching methods, so the students don’t benefit, because they get the same education as always.


Currently, a major use of AI in education is to grade student papers. Many teachers dislike this task which is why they are willing to spend their personal money on paying for an AI to do grading on their behalf. (ChatGPT)


  • However, there’s a new crop of AI-Native schools, exemplified by Alpha School that demonstrates AI’s potential, where students, are able to learn the traditional curriculum in two hours of classroom sessions per day due to individualized teaching by AI tutors. This leaves the rest of the school day for more creative types of projects. I hope that these non-curricular projects will teach the kids the three job skills for the AI age: agency, judgment, and persuasion. In fact, I wonder why future-oriented schools bother teaching the old curriculum at all, even if they can disperse with it more efficiently with AI teaching. In the future, all traditional “hard-skills” style knowledge work will be done by AI, so there is no need for students to learn things they will never use once they are released from the education system. (I think they still ought to learn to read, which is interestingly done poorly by the current curriculum, where many students graduate from high school as functional illiterate.)


Some AI-Native schools can now teach the full traditional curriculum in only two hours per day, leaving the rest of the school day open for students to learn agency and other skills for the AI era. Unfortunately, pivoting the mainstream education establishment to teach students the skills they will need for the age of superintelligence will likely not happen until long after the advent of superintelligence when it has become obvious to everybody that old skills are no longer needed. Except for kids lucky enough to have parents who enroll them in AI-native schools, this inherent resistance to change will likely lead to a lost generation that graduates with obsolete skills. (ChatGPT)


  • Free, platform-agnostic edtech content, such as AI-generated deepfake videos of celebrities explaining complex topics, is becoming increasingly popular and motivates many people to learn, often explaining complex topics more effectively than traditional teaching methods. Such deepfakes separate content from delivery, enabling diverse and engaging multimodal learning experiences tailored to individual preferences and topics. (I am a strong opponent of deepfakes, so I hope that this type of content will work equally well once it is taught by non-celebrity avatars customized to each student’s learning style, as opposed to being presented by a faked Sydney Sweeney.)


Fake celebrity videos have become a popular form of instructional content. I cannot abide the use of deepfakes, but perhaps some celebrities will consent to the use of their likeness for this good purpose. (ChatGPT)


AI-generated videos can present realistic content, often by being set in a historical period and reenacting events with fictional characters from that time. I even dabbled in this genre myself by having a Senator from the Roman Republic and an Egyptian pyramid architect opine on my usability heuristics. An AI-animated talking dinosaur can be more entertaining than a traditional textbook and still convey accurate information. (ChatGPT)


The last point also raises the interesting question whether we need specialized EdTech AI solutions, or whether it’s just as good to simply create better learning content and deliver it over traditional platforms, whether TikTok or YouTube.


The video asks whether the better education we can get from AI should be funded by spending less money on human teachers. In the video, the answer is that we will still need human teachers for many years to come, and I suspect this is correct. However, the role of humans will change from teaching the actual content (which is better done by individualized AI tutors) to being more of a mentor for the students, motivating them to put in the work while keeping order in the classroom. Maybe this new role can be done by 20% fewer humans, which would easily fund any possible AI spend.


Currently, AI frees teachers from much of the drudgery of the profession, which is why many of them willingly pay for AI subscriptions from personal funds, while the official budgets are closed. In the future, a more revolutionary change will free teachers from presenting curriculum materials (that’ll be done much better by personalized AI tutors), leaving them time to serve deeper roles as mentors for the children. I expect we will need human teachers for a long time, probably indefinitely, even as AI takes over most of their old jobs. (ChatGPT)


A recent survey by Gallup found that 60% of K-12 teachers in the U.S. had used AI tools during the 2024–25 school year. (This percentage will likely be higher next school year.) These teachers reported saving an average of 6 hours per week through this use of AI. Assuming a 40-hour work week, this is a productivity gain of 15%, which is less than expected for knowledge workers, but as long as the existing workflow isn’t changed to accommodate AI, teachers won’t get any productivity gains during their classroom time: each hour of class time before AI will still be an hour of work after AI.


Heuristic Evaluation with AI

Noman Siddiqui and Angélica Morales conducted a small experiment in using AI for heuristic evaluation. They used ChatGPT, Clause, and Perplexity to conduct a heuristic evaluation of the Scotiabank mobile app.


Having 3 different AI tools perform the same task gave rise to an interesting observation: each AI tool could identify completely different issues within the same screens. ChatGPT might zero in on navigation, Claude might focus on visual hierarchy, and Perplexity highlights banking accessibility standards.


This is essentially the same as I found in the original research on heuristic evaluation in the 1990s, when I studied human evaluators. Similarly, with humans, each additional evaluator would identify new usability problems in the user interface.


However, adding human UX specialists to a usability evaluation is expensive, whereas it’s dirt cheap to subscribe to multiple AI models. (There are even aggregator services such as Poe that allow you to subscribe to a huge range of models for a single monthly fee.)


When using AI to help conduct a heuristic evaluation, ask two or more different AI models (say, ChatGPT 3o and Gemini Pro 2.5) to both evaluate the UI, because you will likely get additional usability findings from including that extra perspective. (Midjourney)


As we have seen with previous experiments using AI for heuristic evaluation, the main conclusion was that AI still doesn’t produce sufficiently deep usability insights, even though it identifies many more superficial usability problems. We need to await one more generation of AI advances, and likely two, before AI is good enough at heuristic evaluation. Again, leaning on my research from the 1990s, I am not surprised by this conclusion: heuristic evaluation is a particularly difficult endeavor in usability engineering, which is done best by what I called “double experts,” meaning people who are very experienced with both usability in general and the specific domain of the user interface under evaluation.


Recommendations for now:


  1. Do use AI for heuristic evaluation, but only as a starting point that frees you from documenting some of the simpler usability problems. You need to add the deeper insights yourself.

  2. Use two or three different AI tools when using AI to help you with a heuristic evaluation.

  3. As a third point, keep the output from some of these AI runs around so that you can compare when upgraded AI models are released and see if they do better. (This also means that you should save a copy of the exact user interface design that was evaluated.)


AI Beats Human Doctors, Again

A new paper by Harsha Nori and 14 co-authors from Microsoft AI provides further evidence that AI is surpassing human doctors in medical diagnosis accuracy. (We’ve already seen several such studies, to the extent that I made two videos about it, one featuring a singing penguin and seal, and one with a singing parrot. The parrot reported on a study showing that human doctors still beat AI when it comes to interviewing patients in the clinic.)


The stimulus in the new research was 304 “diagnostically challenging” clinicopathological conference (NEJM-CPC) cases. The procedure was that a human doctor or an AI would get a clinical summary of a patient case, after which they could iteratively order a series of tests, stopping when they believed they had enough data to make an accurate diagnosis.

With this procedure, the doctor or AI must first prioritize which tests are most useful for diagnosing the patient and then decide when to stop ordering additional tests, as medical tests are expensive. (And cause aggravation for the patient.)


The Microsoft team introduced a new AI model called the “MAI Diagnostic Orchestrator” (abbreviated “MAI-DxO”) to manage this process. MAI-DxO could draw upon a variety of foundation AI models, but performed best when using GPT-o3. The researchers compared MAI-DxO with 21 human physicians (mostly primary care doctors) with a median of 12 years of experience.


Results:

  • Human physicians: 20% diagnostic accuracy while spending an average of $2,963 on tests.

  • AI on a budget: 80% diagnostic accuracy while spending an average of $2,396 on tests

  • AI with no budget: 86% diagnostic accuracy while spending an average of $7,184 on tests.


Interestingly, the AI version that achieved 80% accuracy while spending an average of $2,396 was actually given a budget of $8,000. This means that the AI only when all out on ordering the full range of tests (within that budget) in a few cases: in most cases, the AI was able to get by with a much smaller (and thus cheaper) set of tests.


Bottom line: Even when AI was allowed to spend 2.7x more money than the human doctors did, it actually spent less on average and still performed 4x better. The AI with no spending limits only did a little better, while spending 3x the money, meaning that this medical AI was good at containing costs without substantial adverse effects.


A new study of challenging cases confirms previous research: AI is more accurate than human doctors at diagnosis. In this project, it was also better at containing the costs of diagnostic testing. (ChatGPT)


Even though they got the best results from o3, the research team ran the same problems through MAI-DxO when pairing it with other leading foundation models. The results show some variation, with the worst performance coming from using Deepseek-R1, Llama-4, and Gemini 2.5-flash (a distilled model). On the other hand, Gemini 2.5-Pro, Grok-3, and Claude 4-Opus all scored roughly as well as o3, when the orchestration provided by MAI-DxO improved their raw performance.


This latter result has deep implications for the AI industry as a whole, if it generalizes to other verticals besides medical diagnostics: the exact foundation model used for an AI product may be less important than the industry-specific “wrapper” built on top of the foundation model, since we can swap foundation models in and out without impacting the results much. Still, some foundation models are better than others (in this study, GPT-o3, Gemini 2.5-Pro, Grok-3, and Claude 4-Opus), and so you still want to choose from this set of good models. For important tasks like medical diagnosis, you can even eke out a few extra percent by picking the very best from the set of good models, but it could well be the case that this “very-best” model will change every time the labs launch updates, which is frequent.


Grok 4 Sets New Record for AI Subscription Pricing

Grok 4 just launched and immediately set new records for AI scores on the various benchmarks. Another record: the high-end version of Grok 4, the so-called SuperGrok Heavy, which does extensive reasoning and scored particularly high on the benchmarks, requires a US$300/month subscription fee. Every new release of a high-end AI model cranks up the subscription fee.


Every new high-end AI release is more expensive than the last. (GPT Image-1)


These high prices are worth paying, which is why AI companies continue to charge more. AI is now so useful that any business professional can add considerably more than $300 in monthly productivity from using the best-available AI.


The price tag for the best AI models continues to rise. (GPT Image-1)


My worry is not $300 for SuperGrok Heavy, the $200 for OpenAI ChatGPT Pro, or the $250 for Google Ultra with extra Veo3 generations. Any of these expenses can easily be recovered. (And any decent company should reimburse employees for spending that much on AI.) The problem is the cumulative cost of putting all the subscriptions together.


For example, for my recent music video on how to develop your capacity for human agency, I used 11 different AI models, requiring 9 different subscriptions (the two Google models and the two Midjourney models required only a single subscription each): CapCut 6.6 Upscaler, Claude Opus 4, Gemini 2.5 Pro, GPT Image-1, HeyGen Avatar IV, Kling 2.1 Master, Midjourney 7, Midjourney Video 1, Suno 4.5, Veo 3. Advanced professional use of AI requires you to know a wide variety of AI tools and models, and pick the most suited for any given job, and to combine them appropriately to get the best result.


You should combine a wide variety of different AI tools for the best results on any given project. (GPT Image-1)


Creators or marketing folks may also need several of the following:


  • $61/month: Kling Premier subscription for expressive video

  • $48/month: Midjourney Pro plan for stylish images

  • $29/month: HeyGen subscription to make avatars

  • $22/month: ElevenLabs Creator plan for naturalistic speech synthesis

  • $8/month: Suno to make songs and background music

  • $8/month: Gamma to make slides


Total AI spend can quickly reach $1,000 per month. Even this is relatively inexpensive, considering the monthly loaded cost of a business professional. (Remember that it typically costs a company twice an employee’s take-home salary to keep that person around, when you add benefits and overhead costs.)


Your AI bill can easily reach $1,000 per month, as higher-end products come out. (GPT Image-1)


Leave aside my analysis of the business ROI of spending on high-end AI, we empirically see strong willingness among consumers to spend their own money on AI. Menlo Ventures (a VC firm) recently estimated that consumers currently pay $12 billion for AI, even though most people are still using the free versions. (Menlo Ventures estimates that the collective DAU [daily active users] for AI tools numbers about half a billion people.)


I put Grok 4 to the test, asking it to explain why the following cartoon is funny and to give us some takeaway lessons. (Humor is a deep human characteristic, and some evolutionary psychologists speculate that it developed as a mating signal to quickly demonstrate reproductive fitness in the mental arena, to supplement physical characteristics that are fast to assess visually.)


I think Grok 4 did well, even to the extent of quoting one of my heuristics without being prompted. What do you think?



The cartoon depicts a museum exhibit where an old-fashioned fax machine is mounted like a dinosaur skeleton, complete with bony legs and a coiled phone cord, roped off on a pedestal. An elderly man and a younger woman observe it with serious, contemplative expressions, as if viewing a prehistoric fossil.


The humor arises from the visual pun: fax machines, once ubiquitous for document transmission, are now largely obsolete due to digital alternatives like email and cloud sharing. By portraying the fax as an “extinct species,” the cartoon satirizes how quickly technology can become archaic, evoking nostalgia and irony—much like dinosaurs, faxes “roamed the earth” but couldn’t adapt, leading to their “extinction.” It’s funny because it highlights the absurdity of clinging to outdated tech in a fast-evolving world, resonating with anyone who’s dealt with relics like dial-up or floppy disks.


The Extinction of Poor UX: Lessons from a Fax Fossil

In usability, obsolescence strikes fast, much like the fax machine cartooned as a museum dinosaur skeleton. This relic reminds us that interfaces failing to evolve perish, leaving users to adapt or abandon. As UX professionals, we must heed these takeaways to avoid digital extinction:


  1. Prioritize Adaptability: Fax machines ignored user shifts toward digital speed and convenience. Design systems that iterate based on real-world feedback, using A/B testing to evolve features before they fossilize.

  2. Simplify Interactions: The fax's clunky paper jams and noisy dials exemplify complexity. Apply heuristic 4 (consistency and standards) to streamline workflows—think one-click sharing over multi-step processes.

  3. Anticipate Technological Shifts: Just as email supplanted faxing, AI and voice interfaces are rising. Conduct trend audits to integrate emerging tech, ensuring your product doesn’t become an exhibit.

  4. Focus on User Needs, Not Legacy: Museums preserve history; products must serve the present. Run user studies to identify pain points, redesigning for efficiency—e.g., mobile-first over desktop relics.

  5. Measure Longevity: Track metrics like abandonment rates. If users flee like dinosaurs from a meteor, refactor ruthlessly.


By embracing these principles, we ensure designs thrive in the ecosystem, not gather dust. Usability isn’t optional; it’s survival.


Advances in AI Video

While I was in the middle of editing my recent music video on Developing Human Agency in the AI Age (YouTube, 3 min.), HeyGen updated its Avatar IV animation model. Unfortunately, they don’t use dot-release numbering, like we did for old-time software releases, so this is not made explicit unless you follow the company on social media.


I made a short video to compare the animation quality before and after this update: June and July Avatar Animation Side-by-Side (YouTube). Very nice progress in less than two weeks! In general, AI video creation is improving at an amazing speed this year.


Comparing Avatar IV lip-synch animations made less than two weeks apart.

Top Past Articles
bottom of page