UX Roundup: Developers Outrace UX | AI Traffic Safety | Dovetail Conference | Usability Poster | RIP Prompt Engineering | Comparing 6 Image Models
- Jakob Nielsen
- May 5
- 13 min read
Summary: Developers outracing UX because of AI acceleration | AI is 5x times safer than humans when driving cars | Dovetail conference session | Usability poster | Prompt engineering no longer a viable job | Comparing Midjourney, Google Imagen, ChatGPT, Ideogram, and Reve on the same image-generation task

UX Roundup for May 5, 2025. (ChatGPT)
Developers Outracing UX Because of AI Acceleration
AI is accelerating software development work to a much greater extent than it is improving UX designers’ and researchers’ productivity. This mismatch risks exacerbating the imbalance that already exists in most product design projects, where engineering dominates human factors.

Developers currently enjoy more productivity gains from AI than UX staff do. (ChatGPT)
It is incumbent on UX professionals to accelerate our work further and to ensure that AI is developed with greater emphasis on usability in the future.
I discussed the risk of UX falling further behind in a short video (YouTube, 2.5 min.).

(ChatGPT)
AI Is 5x Safer than Humans when Driving Cars
Tesla has released safety statistics for Q1 of 2025 for its Autopilot AI driving technology. The number of crashes for Tesla cars was as follows:
When driven by Autopilot: one crash for every 7.44 million miles
When driven by humans: ne crash for every 1.51 million miles
In other words, AI was 4.9x safer than human drivers.
Coincidentally, Google’s Waymo self-driving cars reported 5.4 times fewer injury-causing accidents in 2024 per million miles than human-driven cars in the same cities where Waymo is operating (mainly San Francisco).
The two statistics are not fully comparable, so we can’t say for certain that Waymo is safer than Tesla, even though its number is better. Tesla compared Autopilot with human drivers of other Teslas, but Autopilot is not a fully autonomous vehicle: a human is supposed to monitor it and stay alert to take over the controls in case of trouble. (I doubt that this supposed safety precaution actually helps, since humans are notoriously bad at vigilance tasks.) Also, Autopilot is probably engaged more during freeway driving, whereas Waymo operates mainly in densely trafficked cities, where traffic moves more slowly.
However, even if the stats were measured differently, taken together, they allow us to conclude that AI is currently about 5x safer than humans at driving cars. Since 40,000 people are killed in traffic accidents in the United States alone every year, I conclude that about 32,000 lives would be saved by going all-in on AI-driven cars and banning humans from driving outside closed racetracks.

AI is about 5 times safer than human drivers. (ChatGPT native image mode)
Alleviating Anxiety in Surgery Patients Through AI Assistance
Using ChatGPT to assist with the informed consent process before total knee arthroplasty significantly reduced patients’ anxiety levels before the operation and improved their satisfaction with preoperative education and the overall hospital experience, according to a new paper in the International Journal of Surgery by Wenyi Gan and many colleagues from several hospitals in China.

Chinese knee surgery patients experienced less anxiety when the human doctors’ explanation of the procedure was supplemented with AI answers to their questions. (Midjourney)
During the informed consent process before a surgical procedure, doctors should clearly explain the disease’s cause, progression, and treatment options, detail the stages of treatment along with benefits and surgical risks, and thoroughly address any patient questions. However, in practice, patients often don’t understand what’s being said, or they don’t fully trust the doctor and turn to social media for additional information.
In the Chinese study, the control group discussed the surgery with the human physician as per normal procedures. In the “human and AI” condition, patients started out the same way but were then offered the option of asking additional questions to ChatGPT 4. In this study, the human doctor reviewed the AI answers with the patient to assess its accuracy.

In the future, with better AI, human review may not be necessary. Even in this study, which used the fairly primitive ChatGPT 4 (instead of “smarter,” more recent models, such as o3), the expert physicians assessed the AI answers to all of the 10 most common patient questions as scoring 5 out of 5 for accuracy, completeness, objectivity, and acceptance.
Here are the findings, using a variety of different scoring scales for each metric:
Anxiety scores (lower is better):
Human physicians only: 12.75.
Humans and AI: 10.48 (statistically significant difference at p = 0.04).
Satisfaction with pre-operation education (higher is better):
Human physicians only: 3.43.
Humans and AI: 4.22 (statistically significant difference at p < 0.001).
Satisfaction with overall hospital experience (higher is better):
Human physicians only: 3.46.
Humans and AI: 4.11 (statistically significant difference at p =0.001).
Knee function 5 days after surgery (lower is better):
Human physicians only: 74.43.
Humans and AI: 73.33 (not significantly different).
Thus, the actual survey outcome was the same for both groups, no matter how the procedure was explained to the patient. But when AI was used to enhance the explanations, anxiety was reduced and satisfaction increased.
This research is an interesting inquiry into the patient experience, showing that AI can improve it. This supplements the many earlier research studies finding that AI is better than human doctors at many forms of diagnosis and even at some forms of mental health therapy. See the song I made about AI in patient care (YouTube, 3 min. video) and about the limitations of current AI in interacting with patients (YouTube, 2 min. video).

Patient anxiety dropped from 12.75 to 10.48 when AI was used to supplement the human physician’s explanations of a surgery procedure. (ChatGPT)
Dovetail Conference Session
I spoke at Dovetail’s “Insight Out” conference in San Francisco in front of a huge audience, in a session superbly moderated by Kristine Yuen from LinkedIn. The fact that Dovetail, which is a specialized tool for user research, can attract so many people to its conference (twice the audience of last year) is evidence that we’re reaching the end of the “UX Angst of 2023-24” and seeing better times for UXR,

Kristine Yuen and Jakob on stage at Insight Out (photo credit: Agostina Albamonte).
Here are 5 highlights from my session. Unfortunately, I had to gloss over some points in the presentation, which I have added here since I can lavish infinite word count on my ideas in my own newsletter.
I slammed current AI tools for lousy usability, especially in error recovery, and pinned the blame on their technologist roots. Look at image generation, where OpenAI’s ChatGPT native image mode regenerates the entire image when you tweak one part, wrecking your workflow. Midjourney, despite its own UI messes, nails this with inpainting, fixing only what you mark. The difference? Technologists may be brilliant at algorithms, but don’t get human factors. They’re AI scientists, not UX pros. They’re cursed by knowing their systems too well and assume everyone else does too. I’ve seen this for decades: developers overestimate user IQ and underestimate real-world struggles. AI’s “say anything” freedom sounds great, but without guidance or error support, it’s a usability nightmare. UX folks have been sidelined in AI’s rush, and it shows. Ideas like the time-proven 10 usability heuristics could fix this, but only if the right people step in: UX professionals must engage more with AI and not remain on the sidelines as too many of our peeps have done.
Today’s UX practice faces two major challenges: it’s too costly, and its value isn’t fully recognized by other business units. Usability work can be expensive, requiring time, user recruitment, and skilled researchers, which often makes it a target when budgets are tight. These twin problems hinder our progress: high costs limit how much user research gets done, and a lack of recognition means UX teams struggle for buy-in. However, I also argued there’s a clear path forward. AI and automation are beginning to reduce the cost of many UX activities, making both research and design faster and cheaper. This technological boost can tackle the “too expensive” issue by amplifying our productivity, letting us do more UX work with less effort. And as we deliver more UX improvements efficiently, it becomes easier to demonstrate ROI. That, in turn, can address the second issue: by proving UX’s impact on product success, we earn greater respect and understanding from colleagues in product, engineering, and finance. In short, I said that AI offers a chance to break the cost barrier and show skeptics what UX is truly worth, moving usability from a marginalized concern to a central, highly valued part of product strategy.
What I wish I also had time to say: In the future, AI-driven analytics could calculate the revenue impact of each UX improvement. If this happens, every design change, from a simplified checkout flow to an accessible interface tweak, can be linked to metrics like conversion rates or customer retention in real time. This would effectively eliminate the “proof” problem by tying usability directly to dollars. I also envision UX becoming so inexpensive (thanks to AI automation) that cost objections might disappear entirely. Even small startups could afford robust UX research for every release if AI generates usability test results and user feedback summaries at a fraction of today’s cost. Finally, in the future, UX professionals will likely be embedded in every multi-disciplinary team by default: not as a luxury, but as an obvious necessity. With AI handling grunt work and proving value in quantitative terms, UX might gain the same unquestioned status as engineering or QA, fully overcoming the recognition gap that exists today.
Within about a decade, AI will be capable of taking over all the low-level, routine tasks in UX. By “low-level,” I meant the production-level, repetitive, or detail-oriented work that occupies so much of designers’ time today: things like generating multiple layout variations, applying style guides across dozens of screens, or producing assets and specs. These tasks follow rules or patterns, making them ripe for automation. There’s debate about whether AI can do these tasks well today, but I am confident it will soon. Essentially, designers will be able to offload the pixel-pushing and documentation drudgery to AI tools, while the human focuses on higher-level decisions. This isn’t science fiction since we already see early signs (like generative AI creating basic UI mockups on command), and it will only improve. By 2035 or sooner, I expect it will be routine for UX teams to have AI members implement design system rules, build draft prototypes, or even conduct initial heuristic evaluations automatically. The crux of my point was that anything in design that’s formulaic or grunt work will not require human effort for much longer. This represents a seismic shift in how designers work: our roles will be freed from the minutiae because AI will handle those bits flawlessly and at scale. It’s an optimistic take: instead of fearing this automation, I see it as a liberation from tedious tasks, allowing us to focus on what humans do best.
Taking these points further (which I didn’t have time to do in the session), in 10 years, we might get AI systems that not only execute low-level tasks but also learns a designer’s personal style or a brand’s design language so well that it can preemptively design new interfaces on its own. In this extended vision, a product team might start a project by simply feeding requirements to an AI, which produces a fully fleshed-out initial design that is 90% ready for launch. Designers would then act like editors or curators, tweaking the last 10% for nuance and polish. I also think that many user research tasks will be automated. For example, AI bots will conduct simple interviews and usability tests with users, then analyze the results. This will extend the “low-level” automation beyond design execution into research execution. Another likely future option is that AI will not just do tasks for designers, but teach itself from the best designers. An AI trained with reinforcement learning from thousands of design critiques and usability reports could eventually flag potential design problems on its own. Human designers will transition into more creative, strategic roles because AI handles the entry-level production work. This future underscores my original point: when all the grunt work is automated, the very nature of design careers and processes will evolve in ways we’re only beginning to grasp.
Amid all the talk of change, I made it a point to remind everyone that one thing isn’t changing: human beings. Discussing aspects of human-centered design will remain unchanged regardless of new technology; my answer was simple: “Humans.” Technology might evolve at breakneck speed, but fundamental human characteristics (people’s cognitive abilities, limitations, and psychological needs) don’t change. I elaborated that we still need to design for human limitations and behaviors: people have short attention spans, they get busy, they don’t read lengthy instructions, they make mistakes, and they seek ease of use. These facts were true in 1994, and they’re true in 2025, and they will be true in 2035 even if AI is everywhere. I noted that back in 1994, I formulated 10 usability heuristics, and these timeless principles still apply in the age of AI. For example, if an AI-powered app gives unpredictable results without explanation, it’s violating the principle of feedback/transparency, and users will be frustrated, just as they would with any poorly designed tool. So, I stressed that designers should carry forward the hard-won lessons of usability. No matter what new interfaces or intelligent agents we create, they must accommodate the unchanging aspects of human cognition and perception. In practical terms, that means we must still prioritize clarity, simplicity, and learnability. My overall point was to ground the audience: even as we innovate, we shouldn’t throw out the rulebook of human-centered design. The users are still flesh-and-blood humans with the same brains, and serving their needs is still our north star.
I said my greatest fear isn’t rogue AI or dystopian scenarios, but that negative and sensationalist media coverage of AI, driven by “if it bleeds, it leads,” will amplify rare failures and chill investment in life-saving applications. I argued that endless headlines about facial-recognition mix-ups, privacy scandals, or isolated self-driving crashes generate disproportionate public fear, delaying deployment of diagnostic AIs in remote clinics, adaptive learning tools in underserved schools, and other innovations that could help millions worldwide. Fueled by Hollywood’s conflict-first storytelling, that risk-aversion will do much more harm than the flaws themselves. Unlike humans, who perpetually repeat mistakes, AI systems allow us to identify, isolate, and patch every flaw, so they become safer with each update. Using self-driving cars as an example, I noted that every time an AI vision system misclassifies a pedestrian or fails to detect an obstacle, engineers can refine the algorithm and roll out fixes fleet-wide, eliminating entire classes of errors. This iterative bug-fix loop, which is impossible for billions of human drivers, turns autonomous vehicles into a moral imperative because each improvement directly translates into thousands of lives saved. AI-driven advances in medical research and healthcare delivery in poor or rural areas will save even more lives, and AI-fueled individualized education will make children learn much more, especially in developing countries, causing an immense long-term lift in living standards. We need these AI advances sooner, rather than later, so the only important ethical imperative is to accelerate AI as much as possible.

5 of the main topics I discussed at the Dovetail conference. (ChatGPT native image mode)
Dovetail CEO Benjamin Humphrey’s keynote was an aggressive statement of going all-in with AI to support the company’s customer insight mission. The platform now positions itself as an “always-on continuous intelligence platform” that assembles, analyzes, and activates customer data from all interactions. Is this hype or real? Can’t tell from a demo, but I look forward to seeing how Dovetail customers use these new AI capabilities for better user research insights.
The keynote ended with an honestly hokey demonstration of a future concept for AI-driven presentation of just-in-time user research insights: a video avatar can attend team meetings and answer questions about any data collected on the Dovetail platform. This was just a concept demo and would need more work to be real, but some people will probably prefer asking questions of the data instead of reading reports, even summarized ones.

A virtual meeting participant to answer questions from the repository of user research results. Might be a way to make research findings more salient. (ChatGPT)
Usability Poster
In my eternal quest for a usability poster, I experimented with using the style of a video game poster:

(ChatGPT native image mode)
Logo Explorations
One more logo variant: Make it into a gold pendant:

“A photorealistic close-up of a gold pendant necklace. The pendant features a bas-relief engraving of [your logo].” Prompt credit Amira Zairi. (ChatGPT native image mode)
RIP Prompt Engineering
The Wall Street Journal has an article reporting that “Two years ago, prompt engineering was one of the buzziest jobs in tech, fetching salaries of up to $200,000 on the promise of becoming any company’s “AI Whisperer.” Now, the role is basically obsolete thanks to the breakneck speed of AI development and companies’ own maturity in terms of understanding how to use the technology.” (Subscription required)
This is one of those cases where I must gloat and say, “I told you so.” In early 2023, I posted “don’t count on a long-lasting career in prompt engineering.”

Prompt engineering is no longer a job. I never thought this skill would be the basis for a career in AI. (ChatGPT)

In two years, AI use went from a rare skill to something all staff are expected to do. (ChatGPT)
Comparing 6 Image Models
I compared 6 different image models with the prompt “In a high-resolution 3D render on wide aspect ratio, three translucent glass fruit—a red apple, a yellow banana, and a green pear—are displayed with breathtaking clarity. Each fruit's multifaceted surface captures and refracts light, casting mesmerizing reflections and subtle sheens. Set against a softly reflective white background, the elegant simplicity enhances the brilliance and captivating beauty of these crystalline fruits, enveloping them in a warm, ethereal glow.”

The two images in the top row are both from Midjourney v. 7. The left image is the initial result, where the requested banana seems to have been replaced with something like a spherical lemon. After several round of inpainting of this middle fruit, asking for a banana, I ended up with the right image, with a rather misshapen fruit that has a slight resemblance to a banana.
The second row shows two images from Google. The left is Imagen 3 and the right is the native image model in Gemini 2 Flash Experimental. Both show the requested fruits in nice translucent glass. The right image adds too big facets in my opinion, but does stay true to the prompt.
The third row has ChatGPT o3’s native image mode to the left and Ideogram 3 to the right. Both models had high prompt adherence from the start. ChatGPT also went overboard on the size of the facets according to my taste, but I could have prompted for a version with finer facets. I like Ideogram’s positioning of the three fruits as more of a still life composition instead of the linear display employed by the other image models.
The images in the bottom row are both from Reve. The left image is Reve’s initial generation, where the banana was a photorealistic fruit, as opposed to the requested glass. The right image shows the result after reprompting with the instruction to make the banana out of glass. Interestingly, Reve swapped the positions of the apple and pear in the redrawn image, even though this was not requested. It is a weakness of most prompt-driven image-revision models that they often alter party of the image beyond what was requested when iterating through image versions. (Midjourney’s inpaint and outpaint features are honorable exceptions: they only modify the specified pixels as you can see in the above example where the pear and 95% of the apple are unchanged between iterations. Midjourney thus wins for compliance with usability heuristic number 3, “user control and freedom.”)
What’s the result of this little experiment? ChatGPT, Google, and Ideogram win on prompt adherence. I still think Midjourney has the best aesthetics in image rendering, but its poor prompt adherence dooms it for most practical applications. At the start of 2025, Midjourney was the best image model, and now it’s the worst. Things change fast in AI.
For this specific image, I declare Ideogram the winner. However, for most of my image generation tasks I currently turn to ChatGPT’s native image mode because of its high prompt adherence and flexibility.

Even though I awarded Ideogram the first prize in my still-life contest, I still turned to ChatGPT’s native image mode for this picture of the art show.

Here’s Ideogram’s image for the same prompt as I gave ChatGPT for drawing a picture of the showdown between the top two image models in a still-life contest. While I like Ideogram’s idea to include one of the judges, its lower prompt adherence makes the two paintings a less accurate reflection of the contest (two bananas in the left painting!), and it also messed up the blue-ribbon prize for its own painting.