UX Roundup: Midjourney Usability Atrocity | AI Drives Traffic | Lip Synch Improvements | Progress in AI Music | You ≠ User
- Jakob Nielsen
- 1 minute ago
- 9 min read
Summary: Midjourney doubles down on usability atrocity | AI drives website traffic | Improvements in lip synch for AI avatar videos | AI music improves with Suno 4.5 | You are not the user

UX Roundup for May 12, 2025. (ChatGPT)
Midjourney Doubles Down on Usability Atrocity

What’s wrong with Midjourney’s UX design? Let’s zoom in on a specific design question: command parameters. (ChatGPT)
Midjourney recently launched two new features:
“Experimental Control” adds aesthetic complexity and energy to generated images. You use the -- exp parameter to control the degree of complexity: it ranges from 0 to 100, with a default of 0.
“Omni-Reference” can insert a character or object from a reference image into newly generated images, which promotes character consistency and benefits storytelling. You use the “omni-weight” parameter, --ow, to control how strong the similarities should be: it ranges from 0 to 1,000, with a default of 100.

(ChatGPT)
After reading these two bullets, I’m pretty sure that most readers have spotted the usability atrocity: terrible inconsistency in the parameter ranges and default values, violating usability heuristic number 4, “consistency and standards.”
The inconsistency-induced usability problems are exacerbated by Midjourney’s earlier wildly different parameter ranges, which I described as early as September 2023 in my article “Classic Usability Important for AI.” 1.5 years later, and these guys make the same mistake again!

Making a design mistake once may be regarded as a misfortune, but doubling down and repeating the same mistake 1.5 years later looks like carelessness. (Apologies to Oscar Wilde)

An undercover tiger communicated this snapshot of a design meeting at Midjourney HQ. (ChatGPT)
AI Drives Traffic
Julia Moore compiled traffic analytics across the top 100 US websites to find that AI currently drives less than 5% of the referrals these websites receive. The percent of referral traffic that derives from AI varies between site categories, with highs of 13% for user-generated content sites (e.g., Reddit) and 8% for jobs sites (e.g., LinkedIn) and lows of 2% for finance sites and legacy media.
While the percentage of traffic driven by AI tools is still low, it’s increasing every month. Most AI traffic currently originates from ChatGPT and Perplexity.
For my own website, www.uxtigers.com, AI accounted for 3% of referrals last month, whereas traditional search engines accounted for 48% of referrals. Google was the single biggest source, both among the search engines and among all referring sources. Among the AI tools, ChatGPT was 57% bigger than perplexity and 124% bigger than Gemini. ChatGPT, Perplexity, and Gemini were the only AI tools to matter for my traffic, though a few other AI services also provided referrals.
If we only look at discovery-oriented referrals that stem from users trying to find information, we can narrow the referrals down to only considering search engines and AI tools. Of this finding-derived traffic, AI now accounts for 6.4% of the referrals to the UX Tigers website. While this doesn’t sound like much, it is a growth rate of 59% from the 4.0% of finding-oriented referrals received in February 2025. Annualized, AI’s growth rate would correspond to more than a thousand percent if its growth could continue at the same pace for the full year. I don’t expect that this will happen because no hypergrowth lasts forever, but AI’s traffic share growing by 59% in a little more than two months is astounding.

I still stand by my two earlier pieces about the change from search to AI as users’ preferred way to access digital information:
SEO Is Dead, Long Live AI-Summarized Answers (November 2023: why this is happening)
From PageRank to DeepRank: Attracting AI-Driven Traffic to Digital Properties (February 2025: what you should do about this trend)

Getting mentioned in the results generated by AI tools will be imperative for website survival in the future. Even now, a 5% lift in business is worth paying attention to. (ChatGPT)
On May 7, Apple’s senior vice president of services, Eddy Cue, stated that the use of search in Safari (Apple’s web browser) had declined in April 2025. This is the first time Apple has recorded a drop in its customers’ use of web search. Google’s stock promptly dropped by more than 7% when this information hit the markets.
Apple’s own stock dropped by 1% on this news, probably because Apple rakes in about $20 B per year from its deal with Google, which can be expected to pay less in the future, as people use it less. This same-day reaction from the stock market reduced these two companies’ market capitalization by $190 B. ($140 B from Google and $50 B from Apple.) The move from search to AI-driven answers is worth a lot of money.

Users are changing their daily habits and turning to AI instead of search engines when they need answers. The latest data proving this switch wiped out $190 B of stock market value, including from Apple (something I admit I had not seen coming — which is why you should not take investment advice from me). (ChatGPT)
(Disclosure: I used to own Google stock, which I had received as a member of their advisory board, but I sold long ago and now have no financial interest in either company.)
Google will certainly be able to take some of the AI answering business. While they were off to a terrible start with their early consumer AI products, I have been impressed with Gemini recently. But Google must currently have very difficult internal discussions about how to overcome the innovator’s dilemma, since faster progress in AI will expedite the decline in its search ad revenue. The old story of “trading analogue dollars for digital pennies” (as legacy newspapers built websites) will repeat, now as replacing Search dollars with AI pennies. Some old-school journalists will say this is cosmic justice.

Two equivalent transitions drive immense profitability losses for incumbent companies: first, newspapers replaced printed advertising with website ads (which commanded much lower CPM), and now search engine results pages (with sky-high pay-per-click advertising rates) will be replaced with AI answers with substantially lower revenue. (ChatGPT)

Legacy search engines will soon have much smaller money bags. (ChatGPT)
Lip Synch Improvements
I made a short video (YouTube, 29 secs.) to compare 3 leading AI lipsync models: HeyGen's Avatar IV (released May 6, 2025), Synch Lab's lipsynch-2 (released April 1, 2025), and Kling's original model (released October 3, 2024).
Kling clearly needs an update for lipsynch, though it generates the most natural video for the speaker’s gestures. Being half a year old makes you ancient in AI terms. Back during the dot-com bubble, we used to talk about "Internet Years" the way people talk about "dog years," but "AI years" is something else. In fact, a year is too long a timeframe: AI operates on a monthly cadence.



(I also used the Kling video as the base for the Synch Labs clip.)
I used one of HeyGen’s built-in voices for its clip in this experiment, whereas I used MiniMax’s speech-02-hd text-to-speech model (released April 2, 2025) to generate the speech for the Synch Labs and Kling versions.
Which version do you prefer? Let me know in the comments.
My take: HeyGen Avatar IV is an impressive advance in AI-generated lip synchronization for avatar videos. Unfortunately, the feature is currently limited to generating 30-second clips for higher-level subscribers (and 10 seconds for free subscribers), which limits its practical applicability. For example, my explainer video about AI Agents runs 4 minutes, and even my song about Vibe Coding and Vibe Design is 2 minutes long. Thus, to make any of these videos with Avatar IV would currently require extensive editing.

HeyGen’s new Avatar IV model won my avatar lip-synch contest, and Kling lost. Synch Labs placed in the middle. (ChatGPT)
Luckily, HeyGen’s normal avatar model allows you to make videos up to 30 minutes long. To be honest, current AI avatars are too boring for users to want to watch 30-minute avatar videos, so this is plenty!

Currently, the best avatars are limited to 30-second pre-generated videos. Another year and they’ll likely be real-time and participate in your Zoom meetings for as long as you care to talk. Business use? Who knows, but the use in therapy will be extensive, since many people already prefer to discuss sensitive personal matters with an AI instead of a human therapist. (ChatGPT)
What I really want is a combination of Kling 2’s more expressive avatar animation with ElevenLab’s emotion-detecting script analysis and text-to-speech generation with HeyGen’s new Avatar IV lip synch. Maybe next month, since progress in AI video continues to accelerate at a rapid pace with the introduction of all these new features. Kling has clearly fallen far behind the competition, even though its lip-synch model is only half a year old. (Rumors in the creator community have it that Kling will be releasing a version 2 of its lipsynch model soon.)

AI video is improving at an impressive rate in avatar creation and virtually all other creative abilities, such as camera controls. (ChatGPT)
Progress in AI Music
It’s not just AI video that’s advancing. AI music is also improving, though possibly at a slightly less dizzying pace than is the case for AI video. (Of course, if you know my liking for creating AI music videos, you’ll recognize that music is an important part of video creation.)
Most recently, Suno launched version 4.5, which is a substantial upgrade on its earlier 4.0 model for AI music generation. The main advances are:
Voices (the singers) have more depth, emotion, and range. From intimate whispers to full-on power hooks.
More complex, textured sound for the instrumentals: v4.5 picks up subtleties in layered instruments, tone shifts, and sonic details.
Better prompt adherence.
A prompt augmentation feature that provides a “creative boost” to the user’s prompt.
I particularly appreciate the new prompt augmentation since I suffer from a particularly severe articulation barrier when attempting to express my musical intent in words. (The result of a long-neglected education in music theory and a particularly bad music teacher in high school.)
The improved prompt adherence comes through in reducing the number of generations needed to produce a song I like enough to publish. With Suno 4.0, I often needed to wade through 20 song versions to find a great take. Admittedly, this may not be that different from the experience of many old-school record producers, but it is time-consuming to listen to so many songs that almost get what I wanted. With Suno 4.5, I usually get a song I like in about 6 takes.
Based on my experiments so far, I think Suno delivered on its promise of richer sound for the singers and instruments. Compare the following two versions of the same song (YouTube, 2 min. per song):
Version made with Suno 4.5, employing prompt augmentation for a more detailed prompt
(The song is about variations of the UX Tigers logo made with ChatGPT, so it’s a little silly, but it was the last song I made with Suno 4.0, so it presented the fairest test of how much Suno improved during the week between my creation of the two song versions.)
For good measure, I made two more variations of this song, but they are less striking, so unless you have a strong interest in AI music, I recommend comparing the two versions in the above bullets. The other versions are:
Retaining the same prompt for Suno 4.5 as I had used for the Suno 4.0 song. (I.e., not taking advantage of prompt augmentation.)
Remastering the Suno 4.0 song with 4.5. In this case, everything was the same as for the 4.0 song, without benefiting from the enhanced prompt adherence in 4.5. The only benefit was the richer voice and instruments, but they performed exactly the same way as the original song.
Finally, if you don’t want to listen to 8 minutes of the same song in 4 versions, I also made an edited cut that presents one verse from each of the 4 versions in a single music video. Because of the abrupt changes between versions, listening to this cut is a less enjoyable experience, but it allows a more direct comparison between versions because the video alters so quickly between them.
A distinct weakness of AI music is the difficulty of comparing different takes to select the best. Music is inherently a linear media form, which means that you cannot simultaneously inspect several clips. You must stop one before you can listen to the next clip.
This is in contrast to images, where it’s easy to compare multiple visuals at a glance, as shown by this example:

Thumbnails from Midjourney 7 from a prompt to draw a singer performing in front of a robot band. It’s easy to compare the alternatives and pick the version you want to use as the basis for the final artwork. (In my case, to animate with Kling to create B-roll for the dance breaks.) Midjourney’s terrible prompt adherence is clear from this example: The band is often insufficiently prominent and often includes human players, who must be changed to robots during subsequent inpainting.
Even if you want to enlarge images and compare details, you can rapidly switch back and forth between pictures. For music, it makes little sense to listen to a one-second segment and compare it to another one-second segment while it is fresh in your short-term memory. Professional music producers might do this, but such close analysis of extremely short music segments does not help me choose the song version I like best.

Linear media forms are inherently harder to compare than media forms that allow random access and easy scanning. This creates a usability problem for AI-generated media, which benefits from the initial generation of a large number of alternatives from which the user selects the best option. (ChatGPT)
You Are Not the User
One of my most classic slogans, originating in my book Usability Engineering from 1994. Now drawn with ChatGPT.

If you want to print out this poster (for example, to hang in your team room), I have a huge print file available for download (warning: 40 MB download).