Jakob Nielsen

Jul 810 min read

UX Roundup: Survey Response Options | Ideogram | Deepfaking User Test Videos | AI IA | LDM = Large Design Model | Web History | AI Writes 10% of Science Papers

Summary: Stick with discrete response options to measure users’ subjective satisfaction | Ideogram faithful AI image-generation | Disguising the users in usability test videos | Shopify AI helps merchants with IA | Large Design Model (LDM) approach to automating UI implementation | How the Web was built | AI has written 10% of published science papers

UX Roundup for July 8, 2024. (Ideogram)

Survey Responses for Subjective Satisfaction: Use a Discrete 1-7 Scale

Jim Lewis and Jeff Sauro from MeasuringU (the world’s leading authority on quant UX metrics) have published a nice study showing that there are no benefits from using a continuous response option when asking users to provide a subjective rating of a user interface.

The standard option is to simply show the seven options from 1 to 7 (if using the recommended 1-7 rating scale) and allow users to click their answers.

The potential problem? What if a user thinks that 5.5 is a better representation of his or her feelings about the design than either 5 or 6? In a discrete scale, you can’t provide ratings that are in-between the integer response values.

A continuous scale, for example, using a slider, offers such more granular response options.

However, the new study shows that the researchers got the same results when measuring users’ subjective satisfaction in either case. (They had 200 test participants rate 5 different tasks using a continuous survey instrument and compared with data collected with a single-click discrete survey instrument in previous studies.)

The two response designs generated virtually the same results. There was no benefit from adding the interaction overhead of the continuous slider.

Conclusion: it’s recommended to stick to a simple one-click response design.

Discrete vs. continuous response options. When measuring users’ subjective reactions to a design, the results are the same in either case, but a one-click UI makes discrete responses the better solution for UX surveys. (Ideogram)

Ideogram Generates High-Adherence AI Images

I continue to admire Ideogram for being the best AI image model for producing complex images that adhere to the prompt. Midjourney still makes the most beautiful images, but has trouble with any scene with several elements.

A16Z published an interesting interview with Ideogram cofounder Mohammad Norouzi titled “The Future of Image Models Is Multimodal.”

On the one hand, I wish Midjourney and Ideogram would merge. On the other hand, competition is good and will hopefully spur both to make even greater products, learning from each other. For example, Ideogram launched upscaling 10 days ago, whereas Midjourney and Leonardo have had this feature for ages.

Ideogram is currently the best AI tool for making images that adhere to the user’s concept. (Ideogram)

Disguising Users in Usability Test Videos

One key benefit of user testing is capturing videos showing real customers interacting with your products. Highlight reels from the studies can later be shown to stakeholders and company events and have long proven to be one of the best persuasion tools for generating buy-in for UX.

Seeing is believing.

However, there is a potential privacy problem with sharing videos of study participants widely, especially if the videos are stored on internal networks. Worst of all, if video clips of users are published on YouTube, TikTok, and the like. It’s very tempting to publish user study videos because they can be very engaging and sometimes humorous. But I can’t tell you how many times I have had study participants react to the video-recording clause in the consent form by saying something like, “I hope you’re not going to show me on YouTube!”

AI to the rescue. Llewyn Paine is working on a system that uses AI to create a deepfake version of user test videos by replacing the user’s real image and voice with AI-generated replicas that do and say the same thing as the real user but look and sound different. Her demo video (linked above) is not fully convincing, and I don’t think it would be as persuasive as the real video.

However, this is only the first step, and AI video and voice generation are improving by leaps and bounds. Next year, it’ll be better.

Even though I referred to these altered videos as " deepfakes,” I feel that they are ethical and appropriate. Hopefully, it goes without saying that they only remain ethical if you don’t alter any of the original user behaviors or statements. Disguise the user; don’t misrepresent the user.

Mirror world: replacing the actual test user with a fake in the video recording of a usability study could be a way to preserve participant privacy. (Ideogram)

Shopify AI Helps Merchants With IA

Shopify has launched an AI assistant for the merchants on its ecommerce platform, dubbed “Sidekick.” One feature is suggested replies to customer emails, which will surely save much time for small vendors, as long as they review the emails before hitting “send.”

A more substantial feature is that Shopify’s AI will suggest a taxonomy for structuring the products offered on a merchant’s website. Big ecommerce vendors have dedicated information architecture efforts, but smaller merchants may never have heard the term “IA.” Helping them create better IA can only make their sites better, even if the AI-suggested IA is not as perfect as that hand-crafted by a team of human IA experts.

Remember that these small merchants don’t have an IA on staff. The proper comparison is not between AI and the best human. It’s between the available AI and the available human(s). If no human experts are available, an AI IA will be an improvement over no IA.

AI can sort products into categories based on similarity, making it easier for ecommerce users to find related products on a website. (Leonardo)

Large Design Model (LDM) Approach to Automating UI Implementation

If you read this newsletter, you surely know of LLM (Large Language Models), the technology behind leading AI products like Claude and GPT. I was recently at ADPList’s Design Leadership event in San Francisco, and one of the sponsors was Locofy (thanks, Locofy: I ate a lot of oysters that night on your dime), which uses an LDM (Large Design Model) as the basis for its AI product.

As a sponsor, Locofy gave a short talk about its product, which is AI-based “design to code in 1 click,” according to the tagline on its website. (As an aside, turning to my roots in web usability, it is deplorably rare for websites — especially for startups — to have a tagline that explains what the company does.)

For more info, you can read the company’s 30-page whitepaper about the LDM. They basically trained the model on several million existing user interface designs. Now, this AI probably knows more about UI design than human UX professionals, though it may not know which designs have the best usability. It only knows that those designs were released by companies that may or may not have done proper user testing.

What I found most interesting in Locofy’s presentation about their LDM was the data nugget that the AI model currently has 2 M parameters. (They continue to train the model, so future releases will likely be bigger — according to the scaling law that seems to define all AI, more parameters produce more intelligence.)

On the one hand, 2 M parameters are puny compared to the leading LLMs, which have up to 2 trillion parameters, or a million times more. On the other hand, a product like ChatGPT aims at encoding all humanity’s knowledge and skills, whereas Locofy wants to build websites.

What’s most striking to me is that user interface design takes 2 M parameters to encode. This is a highly complex discipline and why I have always said that your first design is not good enough, no matter how talented you are. There are (at least) 2 M considerations in the design space: it’s so enormously multidimensional that taking a first stab has zero probability of being optimal (with as many digits after the decimal point as you care to calculate). Iterative design rules, and this latest data point is more proof of this old finding.

Juggling user interface design requires the AI to have a huge number of balls in the air. Sadly, I can’t show you 2 M balls in the air, so this image will have to suffice to remind you of the multidimensionality of the UX design space. (Midjourney)

How the Web Was Built

Exciting 2-hour interview with Marc Andreessen about the history of the Internet, the Web, and the GUI Web browser (the latter he invented personally, at least in practice — there were other GUI browsers that didn’t make it outside the research labs). Marc is one of the most insightful commentators about the future of AI and the Web, but here, he talks about the past and what we can learn from technology history. He was there to see it happen (and created much of it).

3 reasons to watch this video, even though it’s much longer than anything you would normally want to watch on YouTube:

You should know where we came from. To be ignorant of history is to lack an understanding of the context in which current events unfold. I was actively engaged in many of these developments and still learned a lot.
There are many analogies between the development of the Internet/Web and that of AI. Comparatively speaking, we’re still in the age of dial-up modems when it comes to AI.
Marc tells many entertaining and funny anecdotes: it’s good old-fashioned fun to watch this video.

Communication technology has come a long way. Marc Andreessen (and I) started out with acoustic coupler modems, which used suction cups attached to the two ends of a traditional telephone handset to transfer data by listening to and playing beeps. The bandwidth was a whopping 300 bits per second, so downloading this cartoon would take an hour. Now, we watch 4K video on our phones, streaming live through 5G cellular telephony. (Ideogram)

One of Marc’s many anecdotes was the big fight in 1993 about whether to design web browsers for regular people, meaning that web pages should include images. The old-school people in charge of the early Internet at places like the National Science Foundation wanted to reserve the Internet for use by serious scientists to exchange research data. They worried about diverting computer resources to the unwashed masses and didn’t like anything that smacked of commercial use of the Internet. The government originally prohibited commercial use of the Internet. Just think of how poor we would all be (in both GDP and life experience) if those government regulations had remained in place.

Luckily for the world, Marc Andreessen was developing the GUI web browser as an unauthorized side project, so he just did what he wanted, not what the science establishment wanted. He emphasized the user experience, making web browsing much easier than anything seen until that date, and even included the image tag to allow web pages to show pictures.

In liberating the Internet and making it possible for normal people to use it, Marc Andreessen probably became one of humanity’s biggest heroes during the last half-century.

Two visions for the Internet as of 1993: Marc Andreessen thought it should be for the people to do millions of different things, whereas the National Science Foundation wanted to keep it reserved for serious scientists to work with research data on expensive computers. Lucky for us, Marc defeated the establishment by shipping code that implemented his vision. (Ideogram)

AI Writes 10% of Science Papers

The Economist has an article reporting on studies finding that about 10% of recent science papers are at least partly written by AI. They also have an editorial about this, with the one-word conclusion, “Good.” (Subscription required.)

I agree with The Economist. Before proceeding, I need to state that it’s despicable to have AI make up research papers with false data and submit to journals in the hope of passing peer review because the manuscript seems plausible. But it has always been despicable to falsify research data, with or without the help of AI.

Leaving aside fake papers, the point is that even real papers that report on real data collected through real research are wholly or partly written by AI. I don’t see the problem. (One more caveat is that the actual scientist(s) conducting the research must review any AI-written text in detail to ensure that it accurately describes the research and is free of hallucinations.)

More detailed data from the study shows why using AI to write science papers is good. The estimate is that papers from the USA or UK are AI-written in less than 5% of cases, whereas papers from China or South Korea are AI-written in more than 15% of cases. In other words, there’s more than a factor 3x difference in AI use, depending on whether the scientist is from an English-speaking country or a place using a very different language family and character set. (The papers analyzed were all published in English.)

It is obviously much harder for Chinese and Korean scientists to write fluent and understandable descriptions of complex topics in English than it is for American and British scientists. AI helps the resulting papers read better, which is a boon for communication and the ability for other scientists to understand the research presented in the paper. (After all, the true purpose of a science paper is not to gain tenure for the author but to communicate the study findings to other researchers so that they can build on the new knowledge and advance the field further.)

Broken English makes research findings less valuable than when the same findings are presented in a well-written paper. Using AI to improve writing is to everybody’s advantage and should be encouraged.

It’s great when researchers employ AI to improve the writing of their papers for clearer communication. An abstract written by AI may often be better than the author's. But do check the resulting manuscript for accuracy. (Leonardo)

The Economist’s article is based on a paper by Dmitry Kobak, Rita González Márquez, Emőke-Ágnes Horvát, and Jan Lause, mostly from the University of Tübingen in Germany, who analyzed 14 million papers from the PubMed database, published from 2010–2024. If you have access, the journalistic version is a much easier read than the academic paper!

The researchers used a clever method to estimate the percentage of papers written with AI assistance. We know from prior research that it’s impossible to determine whether an individual piece of writing was produced by a human or an AI. (Software that purports to do so often exhibits various biases that claim that writing by non-native speakers such as myself can’t be human.)

Kobak et al. performed a statistical analysis of various style markers, including the frequency of simple words like “delve” that are more favored by AI than by human writers. I just used “delve” in the previous sentence, which I wrote, not AI. So, these markers can’t be used on an individual basis. But on an aggregate basis, multiple style markers combine to a strong statistical approximation of the percentage of documents written by AI (as a whole or in part).

Thus, we can’t say that paper X was written by AI. But we can estimate the AI percentage for large sets of, say, 100,000 papers by Chinese authors versus 100,000 papers by American authors.