top of page

UX Roundup: 2026 Predictions | AI Analyzing Usability | Character Consistency | DeepMind Documentary | Amazon Agent | AI Prescriptions | Mechanical Turk

  • Writer: Jakob Nielsen
    Jakob Nielsen
  • 3 minutes ago
  • 17 min read
Summary: New video with 2026 predictions for AI & UX | AI analyzing usability test recordings | Character consistency still difficult | Google DeepMind documentary film | Amazon’s ecommerce agent Rufus does well | AI renews drug prescriptions | Problems using Mechanical Turk for user research

 

UX Roundup for January 19, 2026. (Nano Banana Pro)


2026 Predictions: The Music Video

The article with my 18 predictions for AI and UX in 2026 is very long (almost 10,000 words), so I made a faster and more entertaining overview of my top predictions as a jazz song (YouTube, 5 min.)


I used the same C-pop idol avatar as my singer in this video, as I had used in my video with predictions for 2025. Compare the two videos to realize how much AI video animation has improved in just one year. (It’s a good question whether the music is much better this year. More a matter of taste. I like the saxophone solo in the 2025 video.)


AI Analyzing Usability Study Recordings

In a recent paper, Ahmed Alshehri from Al-Baha University in Saudi Arabia described a project to train an AI model to recognize users’ emotions during a usability study. He used the DAiSEE dataset of 9,068 10-second video snippets from 112 test participants in a test of e-learning content. The videos were recordings of the users via a front-facing webcam.


Human experts had previously annotated this video set with frame-level labels for each user’s emotional state, identifying 4 emotions: boredom, engagement, frustration, and confusion. (Each identified instance of a user exhibiting one of these 4 emotions was furthermore rated on a 4-level scale from very low to very high, but this data was not used in the present study.)


The AI model was trained on 70% of this data, with 15% being used for validation and the final 15% held back to be used as a test of the model.


When the trained model classified the held-back data, it achieved high agreement with the human experts’ coding of those frames: the AI and the humans agreed 86% of the time.


Overview of the study. (Nano Banana Pro)


The specific example only has limited use: to recognize and code a small number of emotions. However, because it’s AI, it can be used at scale, to encode user emotions in arbitrarily large studies. It’s also fast, requiring only 42 ms to recognize emotions, meaning that it could be used in real-time during a usability testing session.


Since the system is based on webcam video, it could also be used in live use. For example, it could recognize the emotions of students using learning software and offer extra help when the student was frustrated. The same could be done for visitors to e-commerce websites and many other situations, assuming that users were willing to allow the site access to their webcam, which is probably doubtful, unless such AI emotion recognition acquired an impeccable reputation for being highly ethical and being on the users’ side instead of exploiting their emotions to push more sales.


Possible use of the new system for AI classification of user emotions. (Nano Banana Pro)


Although automated emotion detection has some utility, I am more excited about generalizing it: this study shows that my idea of training AI on usability testing data can work. We will need much more than 10,000 short clips to train an AI for the main job of user testing: identifying usability problems and opportunities for improving the design. (Emotions are easier to train since they are determined by evolution and don’t change.) But the current study is a good first step.


If you are an organization that conducted usability testing at scale (there’s probably less than 100 companies in the world that would quality), then make sure to retain all your raw usability recordings now. They may well prove invaluable in 2–3 years when AI has become sufficiently powerful to conduct full-fledged usability studies on its own. The next step will be automated usability evaluation without the need for user testing, but that’s easily 5–6 years away and will require even more training data.


Character Consistency Still Difficult

Even though recent image models like Nano Banana Pro, Seedream 4.5, and GPT Image 1.5 have improved character consistency substantially, it is still hard to achieve across projects. For example, in the above news item about AI recognizing user emotions, I wanted a 3-page comic strip featuring the same design team I had used in my 7-page strip last month about my 2025 predictions revisited.


I needed to roll about 10 generations per page to get even decent character consistency, despite using character reference sheets for all the characters to supposedly lock their appearance.


For example, the character reference sheet for Davis (the manager) clearly specifies that he wears a smart suit but not a necktie:


(Nano Banana Pro character reference sheet)


Here’s a typical example of a page Nano Banana Pro drew for me:


Page 3 of my comicbook script with poor character consistency. (Nano Banana Pro)


Davis now wears a tie and is maybe 10 years younger. Lillian (the user researcher) is no longer Korean and has short brown hair instead of long black hair. Her dress is a new design and color. (In real life, few women would wear the same dress two comic book pages in a row, but in the cartoon universe, each character only owns a single outfit and wears it for decades for easy character identification.) Ella (the designer) changed from a blonde to a brunette and now wears a dress, which she doesn’t in the character reference sheet (at least she kept her signature red scarf). Rahul (the team geek) has lost his beard and appears about 10 years younger than his character reference sheet indicates, making him seem like an intern rather than a senior engineer. Finally, “Jakob” (the narrator character in the final frame) grew a full head of hair and is maybe 40 years younger.


For unknown reasons, Nano Banana Pro also duplicated frame 4 and drew a substantially similar frame 5.


DeepMind Documentary Film

The recent documentary film “The Thinking Game” about DeepMind (Google’s main AI lab) and its founder, Demis Hassabis, is now available for free on YouTube (also owned by Google, in what I’m sure is no coincidence). The documentary runs for 1 hour and 24 minutes, which is a long time, but worth it.


The documentary has much historically interesting footage from Hassabis’s childhood as a chess prodigy and from DeepMind’s many years of work on getting AI to play chess and go and solve the protein folding problem. I remember many of these events happening in the background during those years, but I never got much inside knowledge at the time. Now we have it.


DeepMind got its start playing very simple computer games like Pong and then progressed to beating the world Go champion and won the Nobel Prize for solving protein folding. It’s now the lab behind Google’s Gemini AI model. (GPT Image 1.5)


I almost stopped watching this documentary after the first minute, which is filmed in a highly sensationalist set that resembles a James Bond villain’s lair. Luckily, the film gets better. Toward the end, it becomes clear that this “James Bond villain’s lair” is actually a real room in DeepMind’s new offices (which we see being constructed halfway through). I am not sure I should be happy that the world’s premier IA lab deliberately designed its new HQ to look like a James Bond villain’s lair, from where the villain is usually plotting either world domination or world destruction.


Amazon’s E-Commerce Agent Rufus Does Well

On “Black Friday” (November 28, 2025 = the day after Thanksgiving), Amazon’s AI agent Rufus was used by 40% of users in its mobile app. Unfortunately, we don’t know how many of Amazon’s customers used the AI agent when accessing the site on a desktop computer, but I suspect that the percentage is lower than on mobile devices: even though mobile usability is now substantially better than it was in the early days, it is inherently more difficult to use websites on mobile devices with their small screens and awkward text input than it is on a desktop computer with a large monitor and a proper keyboard.


The usability hit on mobile is particularly harsh for comparison shopping, and this is an aspect of ecommerce where AI agents show great potential through their ability to pull out key differences between products and summarize them as briefly or as elaborately as the user requires.


Rufus also had high usage in the days surrounding Black Friday, but was only used in about 30% of users during most of November. This data comes from the analytics firm SensorTower, which specializes in data about mobile app use, which is  presumably why they don’t have data about the desktop users.


People using Rufus bought 117% more on Black Friday than on a day in early November, whereas non-Rufus users only bought 44% more on Black Friday than they did on the same normal day. As always, quant data only tells us “how much,” and not “why,” so we don’t know why the AI users increased their purchases more than the non-AI users. It could be the case that AI helped people find better bargains, causing them to buy more. Or it could equally well be the case that those users who were particularly in the bargain-hunting mode on Black Friday had a higher propensity to ask for AI help.


In either case, the data certainly shows that Amazon’s AI agent is a great success, and we can expect Amazon to continue to invest in this feature to improve it.


As an aside, I have never liked the name “Rufus” for a shopping assistant AI. Amazon named the feature after a dog owned by Amazon’s fifth software engineer in 1996. Rufus the dog faithfully accompanied his humans to the office every day, and supposedly loved sitting in on meetings, showing how he differed from humans. While being faithful and curious are great qualities for an AI assistant, the name still means nothing to the average user who doesn’t know Amazon’s early company history.


Rufus participating in a design meeting at Amazon, dreaming of the day Amazon’s AI assistant would be named after him. (Nano Banana Pro)


According to Hari Mehta, Amazon’s Vice President of Search and Conversational Shopping, “We conducted thorough research on the name Rufus across various marketplaces where Amazon operates and found it resonated with customers globally.” The naming strategy embodied deliberate choices about approachability and personality. By naming its assistant after an individual (albeit a dog), Amazon signaled to customers that this assistant would be helpful, friendly, and accessible in demeanor. (Similar to the early “Ask Jeeves” search engine.) This contrasts sharply with Sam Altman's stated reservations about overly personalized AI naming, which he articulated in 2023 when noting, “I personally really have deep misgivings about this vision of the future where everyone is super close to AI friends, and not more so with their human friends. We named it ChatGPT and not a person's name very intentionally.” (Indeed, ChatGPT is an even worse name than Rufus. At least “Rufus” is easy to pronounce. Of the major AI products, Gemini and Claude have the best names.)


Probably only a company as strong as Amazon could be successful with a name like Rufus for its AI. Anyway, success it is, which bodes well for the general future of AI agents.


The Rufus AI shopping agent works well for Amazon and its users. (Nano Banana Pro)


AI Prescribes Medication Renewals

The state of Utah is pioneering the ability for AI to prescribe medication refills. For now, the AI can not prescribe new medications, but once a human doctor has issued the first prescription, renewals can be handled by AI rather than having to bother the human doctor with this busywork.

While the inability to get new medications from AI is unfortunate (given that AI often is better than human doctors at diagnosing problems in the first place), the initial focus on renewals is still a major advance. About 80% of prescriptions are renewals of maintenance drugs for conditions like high blood pressure or cholesterol that most patients take for life, once their condition has been diagnosed.

AI prescription renewals will save large amounts of money. Remember that in countries where healthcare is “free,” this means that it’s paid by taxpayers, and in countries where it is “covered by insurance,” this means that premiums increase by more than the cost (since the insurance company needs to be profitable), and employees’ salaries decrease by the amount the company is charged in premiums. You pay in any case, whether directly or indirectly. Thus, paying for a service that does no good is a waste.


AI healthcare is a huge convenience, which in itself means that it’s better healthcare: lower barriers mean more access. (Nano Banana Pro)


The AI used in Utah is Doctronic, which, in one study, agreed with human doctors about patient treatment in 99.2% of cases. When humans and AI disagreed, the AI was superior to the human doctors 36% of the time (according to human experts retrospectively reviewing the cases), the humans were superior 9% of the time, and the two were equivalent the remaining 55% of the time.


Given this data, it’s plausible that moving to AI for prescription renewal will improve healthcare outcomes in itself, simply because of its superior decision-making. However, the main reason I expect patient outcomes to improve is because of the improved user experience of having AI issue prescription renewals rather than having to go through a human doctor who will often be busy or unavailable. (Or the patient may be between doctors for a variety of reasons.)


Mechanical Turk Data Pollution

In a recent paper, Cameron S. Kay from Union College argues that data collected on Amazon Mechanical Turk (MTurk) cannot be trusted due to extreme levels of careless responding that standard screening methods fail to fix. 


Key reasons for this lack of trust include:


  • Reversed Correlations: Kay administered 27 pairs of semantic antonyms: statements with contradictory content (e.g., “I talk a lot” vs. “I rarely talk”). The underlying logic is simple: attentive, honest respondents should agree with one statement and disagree with its opposite, producing negative correlations between paired items. While other online research platforms like Connect and Prolific showed expected negative correlations, over 96% of these pairs were positively correlated on MTurk, meaning participants gave similar responses to directly contradictory statements.

  • Ineffectiveness of Reputation Filters: Recruiting only “high-productivity” (those who had completed many previous tasks) or “high-reputation” participants (Master Workers with >95% approval ratings) did not remedy the issue. These elite workers, often recommended as best practice in MTurk research guides, produced data of similarly poor quality.

  • Dire State of the Platform: Kay’s findings suggest that careless responding on MTurk is widespread enough to not only weaken results but to reverse the direction of observed effects, creating completely spurious scientific conclusions. 


Perhaps most troubling, Kay demonstrated that conventional data quality measures proved entirely inadequate. He applied common screening criteria used throughout the research literature:​


  • Removing participants who took less than two seconds per question.

  • Excluding those who gave identical responses to many consecutive items (straightlining).

  • Filtering out those who failed attention check questions (e.g., “Choose ‘strongly disagree’ for this item”).


Even after these exclusions removed 47% of the original data, 67% of the remaining antonym pairs still showed nonsensical positive correlations. The data that passed all standard quality checks remained fundamentally unreliable.


The study suggests that the vast majority of MTurk participants were either not reading the items, responding randomly, or employing automated tools to complete surveys as quickly as possible.


Kay concludes that because these issues are so pervasive and difficult to filter out, MTurk is no longer a reliable tool for behavioral studies compared to alternatives like Prolific or CloudResearch Connect. This is unfortunate, since in the early years, MTurk had some limited use in UX research.


I am not surprised in these new findings, since previous studies had shown that MTurk is useless for user research, going back as far as 2018: Studies revealed that respondents outside the United States were using Virtual Private Servers (VPS) or Virtual Private Networks (VPN) to circumvent geographic restrictions and access studies intended for U.S. participants. These workers typically provided responses of substantially worse quality, often clicking through surveys as rapidly as possible with minimal attention to content.​


Research by Ahler and colleagues documented that approximately 25% of MTurk responses in 2018 were potentially untrustworthy, coming from suspicious IP addresses or exhibiting trolling behavior. While the rate of suspicious IP addresses declined somewhat by 2020, it remained three to five times higher than on commercial survey panels, and the prevalence of apparent trolling had tripled.


A 2023 study by Veniamin Veselovsky and colleagues found that approximately one-third of MTurk participants had used AI to generate their responses. The accessibility of AI chatbots creates powerful incentives for workers seeking to maximize income by completing surveys quickly with minimal cognitive effort.


For now, CloudResearch Connect and Prolific seem to be in the clear of these problems, with only MTurk being systematically implicated in untrustworthy survey completions. However, as it becomes more broadly known that scammers can take advantage of unsuspecting researchers’ willingness to compensate study participants, more fraud is likely to happen. Clear case of “buyer beware.”


In contrast, observational research where you or your AI agent directly watches a study participant as he or she uses your design is somewhat scam-proof for now. In 2–3 years, real-time AI avatars may be so perfect that you can’t tell them from human participants in a video feed or recording. This will invalidate any remote research unless we develop methods for remote identity proof. For now, you “only” have to watch out for participants faking their answers to your recruiting screener in an attempt to cash in on the incentive for a study where they are not qualified. At least it’s usually quickly obvious if a person claiming to be, say, a cardiologist, doesn’t actually know enough about the domain to realistically use your cardiology application, and you can discard the data from that session.


Beware of remote research where you don’t observe the respondents. And soon, beware of any research where the participants aren’t physically present, so that you can verify that they are humans. (Nano Banana Pro)


Marc Andreessen vs. Jensen Huang on AI 2026

Two of the world’s absolutely greatest heroes were recently interviewed about trends in AI: Marc Andreessen’s “2026 Outlook” (on the a16z Podcast) and Jensen Huang’s appearance on the No Priors podcast (Episode 145).


Andreessen co-leads the world’s leading venture capital firm for investing in AI applications, Andreessen Horowitz (usually abbreviated A16Z), and Huang leads NVIDIA, which is by far the world’s leading manufacturer of AI infrastructure. My two heroes thus come to the topic from almost opposite sides, making it interesting to contrast their perspectives. (Marc Andreessen was also one of the main heroes of the Internet revolution, being the inventor of the first practical GUI web browser. He’s thus a double hero of mine.)


Andreessen and Huang largely agreed on the economic legitimacy of the AI boom, the positive outlook for labor, and the counterproductive nature of “doomer” narratives and restrictive regulations. They mainly disagreed on the geopolitical strategy regarding China, the long-term competitive dynamics of the hardware (chip) market, and the terminology used to describe future AI models.


Topics of Agreement


The two were mainly in agreement on the following points. Agreement doesn’t guarantee correctness, but when two such smart people come from different angles and agree, at least we have to consider the possibility that they’re right.


1. The Validity of AI Investments (Rejection of the “AI Bubble”)


  • Agreement: Both speakers vigorously reject the notion that we are in an “AI bubble.” They argue that the massive infrastructure spending is rational, supply-constrained, and backed by tangible revenue and utility.

  • Marc Andreessen argued from a unit economics perspective. He noted that AI startups are generating actual customer revenue faster than any previous tech wave. He views the internet as a pre-built distribution network that allows AI adoption to happen at “light speed.”

  • Jensen Huang argued from an infrastructure perspective. He viewed the shift to AI as a fundamental upgrade of the $100 trillion global economy from general-purpose computing to “accelerated computing.” He cited structural demand across diverse fields like digital biology, robotics, and self-driving cars.

  • Nuance: Andreessen focuses on demand, noting that consumer AI creates monetization faster than the internet era did. Huang focuses on supply, noting a shift in which the entire $2 trillion global R&D market is moving toward accelerated computing, making the build-out rational rather than speculative.


2. The Impact of AI on Jobs


  • Agreement: Both are optimists who believe AI will not lead to mass unemployment but will instead solve labor shortages and increase overall productivity.

  • Marc Andreessen relies on the economic concept of “Revealed Preferences.” He notes that while people express fear of AI in polls, their behavior shows they eagerly use it to solve personal and professional problems. He views the anxiety as a recurring historical cycle that is always proven wrong.

  • Jensen Huang utilizes a “Task vs. Purpose” framework. He argues that AI automates specific tasks (like typing or reading scans) but does not replace the purpose of the job (solving problems or curing patients). He suggests this makes people more productive, leading companies to hire more workers to solve more problems.

  • Nuance: Andreessen argues from social dynamics and adoption behavior, whereas Huang argues from operational labor demand and how work expands when productivity rises.


3. The Importance of Open Source & The Dangers of Regulation


  • Agreement: Both view open source AI as critical for innovation and national competitiveness. They warn that “doomer” narratives are driving regulations that would disastrously stifle the US ecosystem by assigning unfair liability to developers.

  • Marc Andreessen focuses on the Talent Pipeline. He argues that open source is how the next generation of engineers learns. He views the proposed regulations as “draconian” and warns that holding open-source developers liable for downstream misuse would effectively ban the practice.

  • Jensen Huang suspects “Regulatory Capture.” He questions the intentions of those pushing doomsday narratives, suggesting they may be trying to “suffocate startups.” He argues that safety is achieved through more technology (better grounding and reasoning), not less.

  • Nuance: Huang warns about the negatives from banning open source; Andreessen stresses the positives from open source as a transmission mechanism for expertise.


4. The Capability Curve Is Compounding Through Research and Productization


  • Agreement: The frontier is moving fast enough that it routinely surprises experienced insiders.

  • Marc Andreessen describes a steady stream of new papers and new startups that keep redefining what’s possible, with “fits and starts” but a clear forward march.

  • Jensen Huang points to concrete capability improvements that made the systems more usable in practice (less hallucinations, better grounding, better reasoning, tighter coupling to retrieval/search).

  • Nuance: One of emphasis: Huang highlights reliability and “works as intended” behavior as a major qualitative shift, while Andreessen emphasizes breadth in the form of new modalities, new products, and constant recombination.


Marc Andreessen and Jensen Huang agreed on many aspects of AI in 2026. (Nano Banana Pro)

 

Topics of Disagreement


1. US–China Relations and Geopolitics


  • The Topic: How the United States should view and engage with China regarding technology, trade, and AI development.

  • Marc Andreessen’s Perspective (The Cold War Adversary): Andreessen frames the situation as a “New Cold War” and a “two-horse race.” While acknowledging the economic entanglement, he views Chinese open-source releases with suspicion, describing them potentially as “dumping” designed to commoditize the market. His focus is on the geopolitical necessity of the US winning the race to prevent Chinese tech dominance.

  • Jensen Huang’s Perspective (The Interdependent Partner): Huang calls the idea of decoupling “naïve” and advocates for a nuanced relationship. He emphasizes that the US and Chinese economies are inextricably linked, noting that China is a massive customer for US chips (thus funding US R&D) and that Chinese open-source contributions (citing DeepSeek) benefit US startups. He views China as a necessary partner in the global supply chain.

 

2. The Future of the Chip Market (Moats vs. Commodities)


  • The Topic: Whether Nvidia’s current dominance in AI hardware is durable or will be eroded by competition.

  • Jensen Huang’s Perspective (Programmability is the Moat): Huang implies that Nvidia’s lead is secure because they offer a “programmable architecture.” He argues that AI algorithms change so rapidly (from Transformers to SSMs to diffusion) that fixed, specialized chips (ASICs) become obsolete quickly. He believes customers need Nvidia’s flexibility to keep up with innovation.

  • Marc Andreessen’s Perspective (Commoditization is Coming): Andreessen argues that “shortages create gluts.” He views Nvidia’s high margins as a “Bat signal” (a Batman analogy) that is drawing fierce competition from hyperscalers (Google, Amazon) and startups to build their own chips. He explicitly states that GPUs are dominant due to historical happenstance and predicts that within five years, AI chips will be cheap and plentiful.


3. The Concept of “God AI” (AGI)


  • The Topic: The terminology and structure of future AI intelligence.

  • Marc Andreessen’s Perspective (Structural): Andreessen embraces the term “God Models” to describe the top tier of the AI hierarchy: massive supercomputers running the smartest possible models. He envisions a pyramid structure where these “God Models” sit at the top and trickle down capabilities to smaller, cheaper models.

  • Jensen Huang’s Perspective (Dismissive): Huang rejects the term “God AI” as “unhelpful” and “too extreme.” He dismisses the idea of a single, monolithic model that knows everything, arguing instead for a future of diverse, specialized agents (e.g., a biology model, a physics model) working together.


4. What “Tokens By The Drink” Implies For Downstream Pricing Strategy

 

  • The Topic: Usage-based pricing for AI services, as opposed to classic SaaS charging for the number of users accessing a service (“seats”).

  • Jensen Huang highlighted token profitability and willingness to pay as a sign that AI output is becoming valuable enough to sustain big buildouts.

  • Marc Andreessen treated token pricing as an enabling substrate, which is great for making the underlying intelligence widely available, but argues that many application businesses should ultimately price on delivered value (replacement cost, productivity uplift, business outcome) rather than mirroring underlying token costs.

  • Nuance: Huang focused on the health of the supply-side AI economics; Andreessen described how downstream companies defend margin and capture value.

 

Marc Andreessen and Jensen Huang disagreed on key aspects of AI in 2026. The two presented their analyses in separate podcasts, so getting them together for this debate is a figment of my imagination. (Nano Banana Pro)

 








Top Past Articles
bottom of page