The Need for Speed in AI
Summary: AI tools need much faster response times to meet human demands, which remain unaltered from previous UI paradigms: subsecond results are ideal, and the UI requires progress indicators when this is unattainable due to technical constraints.
The same usability guidelines apply to AI tools as to all earlier UI generations. Specifically, the response times needed to retain the user’s attention and flow are dictated by the human brain, not technology. Sadly, current AI lumbers along with the energy of a lethargic sloth, often failing to meet even the least demanding of established response time thresholds.
Leonardo.AI routinely takes over a minute before it allows users to behold the result of a prompt. Midjourney isn’t much better regarding actual response time, but it provides a better user experience by displaying a percent-done indicator. For decades, it’s been a firm usability guideline to provide continuously-updated feedback on the progress of any computer action that takes more than 10 seconds, with the two primary designs being a progress bar (which incrementally fills as the computer approaches the objective) and a percent-done indicator (which states the fraction of the total work on the user’s command has been accomplished so far). See the screenshot below, captured when the image-generation process was 46% done.
Midjourney also provides sporadic visual updates on the anticipated appearance of the requested artwork as it advances toward completion. Here, I requested the generation of a snail for this article. Although we have yet to reach the midpoint, we can already discern that Midjourney will likely create an acceptable illustration in this run.
First, let’s review the time-honored response time guidelines and then see how current AI tools stack up.
The 3 Response Time Limits
My advice for response times has remained unchanged since my 1993 book, Usability Engineering. And to give due credit, my guidelines were roughly based on even earlier recommendations by Robert Miller from 1968. When something has held for 55 years in the computer business, it’ll probably remain true for many more laps around the sun.
Why have response time guidelines been the same for 55 years? Because they are derived from neuropsychology: the speed at which signals travel in the human body and how the human brain has evolved to cope with these biologically determined speed limits.
(As a reminder, the “user experience” resides entirely within the user. The computer presents a user interface, but the human lives the experience. This is why human characteristics determine most UX guidelines and why the guidelines remain stable, even while our technology undergoes radical transformations.)
0.1 seconds (100 ms) creates the illusion of instantaneous response — that is, the outcome feels like it was caused by the user, not the computer. You click the “A” button on the keyboard, and the letter “A” appears on the screen. If this happens in under 0.1 s, you made that happen. (Of course, it’s the computer painting the screen, but it feels like you did so.) This level of responsiveness is crucial to support the sensation of physicality in computer-created objects, allowing the user to manipulate them directly in the simulated on-screen space. (Direct manipulation is one of the key GUI techniques that define the 4th generation user interface that has been dominant for the last 40 years since the Macintosh launched in 1984).
1 second allows the user to maintain a seamless flow of thought. You can tell there’s a delay, so you’ll feel that the computer (rather than yourself) is generating the outcome. With subsecond response times, users still feel control of the overall experience and work freely rather than wait on the computer. The fact that there’s no significant delay supports exploration and immersive creativity as users try out alternatives without the impression of wading through molasses.
10 seconds is the maximum delay before the user’s attention drifts. It’s incredibly taxing to stay focused and alert in the absence of activity, but users can keep their attention on the goal for about 10 seconds. They can also retain information about what they are doing in short-term memory so that when the computer is done, they can proceed to the next step while preserving the previous mental context. With delays between 1 and 10 seconds, users perceive themselves to be at the mercy of the computer and wish for greater speed, so they are not exploring as freely as with subsecond response times. But the crux is that a person can retain short-term memory and focus on the goal for up to 10 seconds, even if the wait is unpleasant. After 10 seconds, people start thinking about other things, making it harder to get their brains back on track once the computer finally does respond. Users will need to reorient themselves when resuming their task after a delay of more than 10 seconds.
I recorded a video a few years ago demonstrating the 3 response time levels. As the guidelines have remained unchanged, this 4-minute video remains germane.
Subsecond response times: That’s the golden ticket — this picture is how you want your users to conceptualize your service if asked to draw it. (“Fast car” by Midjourney.)
How it feels when the computer takes more than 10 seconds to complete a user request. (“Snail” generated by Midjourney.)
Current AI = That Snail
No AI system I have tried delivers results in under a second. This takes a wrecking ball to users’ creativity by discouraging free exploration.
Image generation tools are the most egregious culprits, often failing to deliver even sub-minute response times. I assume the vendors have been swamped by the tidal wave of demand for their services and have not adequately scaled their servers. (It’s admittedly an exceptionally thorny engineering challenge to handle a thousand-fold increase in demand, so I can’t be overly critical of those companies, especially right now with GPU compute silicon in short supply. But they do have to fix this for the envisioned AI-generated economic boom to materialize.)
The large language models underpinning text generation seem more efficient than their image-creation siblings. (Or maybe the text-based companies have more competent engineering, which I’ll believe of Google and Microsoft.)
Bret Kinsella of Voicebot.AI recently published an analysis of the average response times for the major AI-powered chat services in late June 2023:
Gold winner: Google Search Generative Experience (SGE), with a 4.4 second response time.
Silver: ChatGPT 3.5 at 6.1 seconds.
Bronze: Perplexity AI, at 7.4 seconds.
Honorable mention: Bing Chat, at 13.0 seconds.
Big loser: ChatGPT 4.0 at 41.3 seconds
I ought to DQ Bing because their 13-second response is above the maximum permissible 10 seconds. However, Bing does deserve an honorable mention since its early-June response time had been a truly snail-worthy 25.6 seconds. Improving service by 100% (cutting response times in half) is an engineering feat worthy of Hercules. (Granted, he was not a systems engineer but gained fame for banging lions over the head with a hefty club. However, Hercules was the preeminent hero of antiquity, so he came to mind when seeking a heroic metaphor.)
UX Workarounds for Sluggish Performance
The only true solution is reengineering the software to be more snappy and scaling up backend compute power by orders of magnitude. We desperately need those subsecond response times.
Readers of a certain age may recall the search engine battles of the mid-to-late 1990s. Yahoo and Excite were battling it out with bloated, slow-to-download pages, often taking more than 10 seconds on dial-up modems. Search was assuredly not subsecond, even for users with high-speed connections. Then in 1998, nimble Google entered the fray and delivered subsecond response times. The rest is history, as any user I introduced to Google became a fast convert.
In the interest of full disclosure, I was on the advisory board for Google back when the company was a start-up so small that we held board meetings around the ping-pong table because it was the only table big enough in the entire company to host a meeting. I have since sold all the stock I received from them and have no financial interest in praising Google. But they were good in the early days and stated right on every SERP (Search Engine Results Page) how many milliseconds the server expended on each query to remind all employees that speed was Job #1.
Google’s search victory is a reminder of the spoils that await those companies that can transform their AI tools from snails to race cars.
While we wait for the underlying technology to improve, AI tools can deploy those same design elements that have long been beneficial in traditional GUI design. Remember, usability is usability, regardless of the platform, so we should transfer what works.
I already mentioned the most important guideline: to provide feedback to show that the computer is working and set expectations for when results will be delivered. If the response time exceeds 10 seconds, use a percent-done indicator or a progress bar. If the response time is between 2 and 10 seconds, show a lightweight “working on it” indicator, such as the infamous spinning beach ball.
You can also offer reduced functionality that can deliver the most critical results rapidly, leaving users to request further time-consuming computer work only when needed. As a variant, prioritized functionality can provide some information before the rest, even when a complete set is requested. For example, a command to generate 4 images might prioritize the first image on the backend to arrive sooner than the other 3. This will slightly placate impatient users by giving them something to do while waiting for the full set.
Finally, while not providing individual results any faster, asynchronous operation allows users to focus on the next task before the previous task is completed. Most services allow users to enter their next prompt before the previous one has been completed. As a further asynchronous refinement, Midjourney could display the upscale buttons before it had finished computing the initial generation. As mentioned, Midjourney admirably shows preliminary results (another response-time-mitigating design technique), so users will often decide which of a set of images they want to refine, even before they’re fully baked.
We can exploit the dead time while waiting for results to educate or entertain users with usage tips and other simplified instructions. Image-generation tools could also show top-rated images generated by other users for inspiration. Users often ignore such information as irrelevant to their current goals, but sometimes they pick up on a tip or enjoy what they are shown. Do tread carefully if aiming to entertain users because what you think is entertaining, many users may find annoying, which is often the case for animations or other particularly intrusive content. Don’t auto-play any audio without user permission.
Even though my last proposal lies outside the realm of UI design, it’s a UX technique that ought to be immediately embraced by all AI executives: offer premium subscriptions with guaranteed service levels. If making more money won’t make them act, I don’t know what will. (Tiered service levels are decidedly a UX consideration since “total user experience” encompasses business models and their impact on users.)
Future Vision: Fast AI Empowers Users
I hope to need a picture of “a happy user” in an article one joyful day that can’t come soon enough. So I was experimenting with Leonardo.AI to see what it would give me.
The following illustration shows some initial attempts. In the first run (top row), the lefthand user looks happy enough, but needs to be more serious to represent my target business audience. Maybe she’s too jazzed from her two cups of coffee. The righthand user does look earnest, but not happy, even though that had been my prompt.
Results of varying parameters for “happy user” in Leonardo.AI.
I tweaked some settings, obtaining the users shown in the bottom row. They are a more appropriate blend of happy and serious, though still not precisely what I was pursuing. Some more parameter tweaks might be in order. But the problem is that each run took about 90 seconds, preventing me from recalling precisely what I was doing with the parameters between runs. (Remember the need for results in less than 10 seconds to keep the user’s mind from wandering.)
Many of Leonardo’s parameters are represented as sliders in the UI. In principle, this GUI design is better than Midjourney’s command-line parameters. But the supposed direct-manipulation advantage of sliders requires subsecond response time so that the user feels a direct connection between moving the slider and seeing changes in the data. Take, for example, the “Resonance” parameter. I defy anybody who’s not a developer on the Leonardo team to understand what this parameter does. But if I could gradually move the slider and instantaneously see the impact of parameter modifications on the images, I might achieve a baby level of understanding. More importantly, I could zero in on the image I needed.
This is my vision for the future of AI software and services: that they comply with the usability guidelines we’ve painstakingly discovered during my 40 years in the business. Whether discussing websites, GUI productivity applications like Excel, or AI services, the guidelines remain the same. Usability guidelines are invariable, irrespective of the technology, because they are people-centered rules derived from what works with humans: UX is People.