top of page

UX Roundup: Generate in 4K | Research Reproducibility Crisis | NotebookLM Videos Perform Well | Midjourney v.8 | Micropayments | Microsoft Image Model | NVIDIA GTC

  • Writer: Jakob Nielsen
    Jakob Nielsen
  • 2 minutes ago
  • 14 min read
Summary: Why I generate images in 4K resolution | Half of UI academic research fails to reproduce | NotebookLM Cinematic Videos Do Well with Viewers | Midjourney Version 8 Disappoints | New protocol for AI agents to use micropayments | Microsoft releases its new MAI-Image-2 model | NVIDIA’s GTC Conference

UX Roundup for March 23, 2026 (Nano Banana 2)


Why I Generate in 4K

You may have noticed that almost all of the images I post in this newsletter are 1280 pixels wide. And yet I almost always generate in 4K resolution when using models that offer this resolution, such as Nano Banana Pro and 2. Why?


The reason is simple: a very large percentage of AI-generated images are close, but no cigar, in terms of fulfilling my intent, meaning that I need to iterate on the AI’s first attempt. Yes, AI is intent-based outcome specification, as I’ve said since the beginning. I tell the model what I want, and it usually obeys, unless I run afoul of its puritan “safeguards,” such as Google’s censorship of images of Goldilocks at the three bears’ house.


However, there are still two problems:


  • Even when AI does what it’s told, it sometimes makes mistakes. For example, in a comic strip, the speech bubble doesn’t point to the character who is supposed to be saying that statement. Or it renders the text as both a speech bubble and a caption box. These mistakes are getting fewer and fewer with each improved model release, but I still get them, and such rendering mistakes need to be edited out.

  • I can’t tell the model my full intent, for two reasons: First, I don’t even know what I want, since creation is discovery with AI. Only after seeing the result of my initial request do I realize that I really want something slightly different. Second, the full specification of what I want could run for pages of overly detailed prompting. I just want to state my high-level goal and then modify any details where the AI’s interpretation of my intent was off by more than I will accept.


Thus, my workflow is almost always iterative and involves editing. For example, next week I have a story about the history of a company that includes an infographic with the timeline, and I proceeded through six iterations of this visual before arriving at the one I will be sharing with you on Monday.


Even though models like Nano Banana 2 are pretty good at preserving high image quality every time it modifies an image, it doesn’t yet maintain perfect fidelity, so the image gradually degrades. Somewhat like making repeated photocopies of something in the old days. This is why starting with excessive image resolution relative to the final published image is the way to go.


Typical first-generation from Nano Banana 2: the model randomly decided to include the character’s name in every speech bubble. (More elegant visual speaker attribution is exactly the reason the speech bubble tail was invented.) There are a few typos, and in frame 5, the character has three arms.


Corrected comic strip. The character is monochrome in the final frame close-up, but that’s a creative choice I like, so I left this be, despite not being specified in my initial prompt. (Nano Banana 2.)


Half Of UI Academic Research Fails to Reproduce

In a recent project, Olga Iarygina and colleagues from the University of Copenhagen attempted to reproduce the research results from 76 papers published at the CHI conference, which is the world’s leading outlet for publishing academic papers in human–computer interaction (HCI). I was the papers co-chair for the conference in 1993 and can attest to its very thorough review process and very low acceptance rate, so CHI papers are the best of the best in this field.


Despite being the best of the best in academic papers on user interfaces, Dr. Iarygina and her colleagues could reproduce the findings from only 50% of these papers. Does this mean that the other half of the papers are fake news? Not necessarily. Their findings could still be correct (or mostly right), even if the study could not be replicated by other scientists.


Another caveat is that this replication project considered only papers for which sufficient data were published to allow a replication attempt using the method used by the independent outsiders. Most CHI papers didn’t qualify for inclusion in this project. Of course, one might suspect that the papers where it was not even possible to attempt to replicate the outcome could have even less robust findings, but we don’t know.


Given this new study, I am not willing to conclude that half of all published user interface research is false and should be disregarded. However, the new study does provide support for a point I have made many times before: HCI research is often weak and generalizes poorly to practical design projects. It is safest to rely only on findings that have been replicated repeatedly by many independent researchers, using different methods across different application domains.


Even though it may seem shocking that half of HCI research fails to reproduce, this is actually par for the course in current academia, which is beset by an overwhelming replication crisis. Estimates from similar projects in other research fields are that 75% of research results in social psychology fail to replicate, and that 60% of results in education research fail to replicate.


Even “highly-cited clinical research” fails 17% of the time, despite clinical research being literally a matter of life and death, and the citation count being the best metric we have for research quality.


Two of the main reasons for the replication crisis are:


  • Scientists are rewarded for new and striking results, not for confirming old findings already published by other researchers. This discourages replication attempts, leaving many false results unchallenged.

  • The paper publishing process has a distinct positivity bias, preferring to accept and publish papers that assert a particular finding. Research projects that don’t find anything are rarely published. If we go by the simplistic approach of accepting a finding as “statistically significant” if it only has a 5% chance of being a random fluke, and 20 researchers study a theory that’s wrong, the 19 who have a null conclusion don’t get published, whereas the one paper that erroneously finds that the theory is right makes it into the literature.


To conclude, HCI research quality is bad, but not as bad as much other academic research.


Academic research faces a replication crisis, so it should come as no surprise that the same is true of user interface research. At least HCI is not as bad as many other fields. (Nano Banana 2)


I continue to be impressed by Google’s NotebookLM’s ability to make sense of complicated content by explaining it in an animated video. I made one such video about Iarygina et al.’s replication paper that explains several technical details that I didn’t attempt to popularize in my own writeup here (YouTube, 8 min.).


NotebookLM Cinematic Videos Do Well with Viewers

As mentioned above, I am impressed by NotebookLM’s ability to transform complex information into a short animated explainer video. More importantly, the audience agrees and liked these videos, as proven by the YouTube analytics for the “cinematic overview videos” I have posted so far.


I have avoided these stereotypical YouTube thumbnails for my channel, but for my video about Design Critiques Gone Wrong, I thought the standard chocked influencer would be appropriate. I am sorry to say that the stereotype exists for a reason, because this thumbnail converted really well. (Nano Banana 2)


As shown in the following chart, YouTube’s analytics indicate that viewers keep watching the video at an unprecedented rate. 76% are watching after 30 seconds, which is already high, and there is almost no further drop-off until we hit the 5-minute mark, which is very unusual. Empirically, people are drawn in by NotebookLM’s combination of visualizations and B-roll animations, complemented by an engaging voiceover.


Viewing retention for “Design Critiques Gone Wrong.” (YouTube)


NotebookLM excels in creating non-fiction videos, but I wanted to see how well it does with fiction, so I made a video version of Tolstoy’s classic War & Peace, book 1 (YouTube, 7 min.). I think it came out quite well, though NotebookLM couldn’t completely constrain itself to the plot, but also had to insert a small amount of non-fiction literary analysis. I only think this enhances the video.


It’s been 50 years since I read War & Peace, so I don’t remember all the plot details. The abridged video version seems fine, but anyone with a better understanding of the book than I do, please comment on the accuracy of my video.


War & Peace is one of the true classics of world literature, but it’s massive. Just Book 1, about 1805, runs to 48,000 words in the English translation, or 13 times the length of today’s newsletter. Using AI to get an animated overview is a good way to grasp the main plotline. (Seedream 4.5)


My 7-minute video only covers Book 1 of War & Peace. The full story would run for almost two hours, which I think would push the audience’s patience, given the current quality of AI-generated video. But give my Book 1 video a view and see what you think.


Personalized Exercises Enhance Learning

A new study of 770 high school students in Taiwan by Angel Tsai-Hsuan Chung and colleagues (mainly from the University of Pennsylvania): students assigned exercises by AI performed 0.15 standard deviations better on the final exam than those progressing through the standard, fixed sequence of exercises.


The primary instruction was the same for both groups of students, and the workload for both teachers and students was the same. The only difference was which exercise problems the students were assigned: the same for everybody or personalized to each student’s achievement level.


On the one hand, an improvement of 0.15 SD in learning outcomes is fairly big: it corresponds to 2.25 IQ points, which again corresponds to about $50,000 higher lifetime earnings for a median American worker. The cost of having an AI assign exercises during high school is almost certainly much less than $50K per person, even if generalized across all courses rather than just a single one. (Of course, learning more because of improved education doesn’t improve IQ because the student’s brain remains the same, so this calculation is just an analogy to show the real-world importance of the statistical abstraction represented by standard deviations. There are two ways to learn more: to have a higher IQ with the same education or to receive improved education with the same IQ. Pick better parents or a better school: either will work.)


On the other hand, 0.15 SD is much less than the Bloom “two sigma” finding that a fully individualized education improves outcomes by two full standard deviations. (Something that was only possible in the past for billionaires who could hire a full-time personal tutor for each child.)


Two standard deviations are the goal for fully AI-powered education. Individualizing exercise progression gets us 7.5% of the way, according to this research study. I think that’s reasonable, because the AI in this study didn’t even create fully personal exercises for each student: it only chose the best exercise from a set of predefined problems for each student, given his or her current level of learning. Modest, but clearly worth doing, particularly since this approach didn’t involve any extra work for the teachers. The students spent the same time in both conditions: the only difference in the AI condition was that they solved problems that were more suited for them, not that they did any more homework.


Study summary. (Nano Banana 2)


An interesting twist in this study is that the AI-assigned problems were of most help to weak students. AI helped all students, but helped the weaker students the most. Less-smart students (as measured by the Taiwanese high school admissions exam) improved by 0.17 SD with AI, whereas more-smart students improved by only 0.04 SD.


This aligns with most early research on the benefits of AI: AI helps everyone, but it benefits poor performers more than strong performers. My analogy has always been that AI is a forklift for the mind: if you consider a traditional warehouse without a forklift, muscular warehouse workers could lift heavier boxes than the punier workers could, but when driving a forklift, both workers lift the same loads. (The analogy is not perfect, because all the forklift drivers end up lifting the same, whereas with AI, clever workers still outperform dull workers, just not by as much as they used to.)


AI is a forklift for the mind. (Nano Banana 2)


Midjourney Version 8 Disappoints

Midjourney released its long-awaited version 8 on March 17. The sense of the creator community is that it’s a disappointment. Text rendering is a little better, so one can often get a few words spelled correctly, but this is worlds away from the text rendering in models like Nano Banana 2, which almost always gets complete sentences right and often completes a full-page comic strip in a single run. (Once you get to infographics pushing a hundred words, NB2 does have more typos than are comfortable. NB3, soon, please!)


I let my Midjourney subscription lapse in February since I had not been using it much after the release of the GPT Image-1 model in April 2025. (I had stupidly paid for a full-year subscription in February 2025, thinking that this was a good deal. It would have been much cheaper to pay the higher by-the-month rate for maybe 4 months until I realized the superiority of the multimodal image models, relative to stand-alone image models.)


Midjourney can’t even get close to this kind of image with several components that relate to each other and contain substantial text. Here, Nano Banana 2 rendered 158 words correctly in a single generation, though the duplication of the word “slightly” in frame 3 is an error. (That I could have corrected in an iterative rendering using, for example, Freepix’s inpaint feature.) However, Midjourney fanboys will argue that any individual character would have been more artistic as a Midjourney render.


Lacking my own access to Midjourney, I have relied on the sample images posted by a wide range of creators. Uniformly, they show nice-enough images made with Midjourney 8, but nearly everybody says that they are hardly better than what could be made with Midjourney 7. Which again was a disappointingly small incremental improvement over Midjourney 6. This means that the step up from Midjourney 6 to 8 is also hardly there.


Midjourney 6 was released in an early form in December 2023, meaning Midjourney has now gone 27 months without any significant improvement. In the AI world, 27 months are an eternity. In March 2024 (after Midjourney 6 had stabilized), I made a video demonstrating the improvements in Midjourney image quality from 2022 to 2024 (YouTube, 4 min.): an immense difference between the first and the last example I discussed. I ended the video by saying, “Just think about what we’ll get in two more years!” Sadly, this turned out to be virtually zero in the case of Midjourney. Though the world of AI images in general did move ahead at breakneck speed, as I analyzed in a recent article.


Seeing the decline of Midjourney is sad, given that it was arguably the first important AI product, releasing before even ChatGPT 3.5.


The Midjourney saga proves the “bitter lesson” for AI once again: relying on overwhelming compute dominance like Google, OpenAI, and xAI beats any intellectually satisfying, clever, human-crafted domain knowledge.


More advanced AI image models are grounded in world knowledge through their integration with a general-purpose AI language model. 20 hours after the launch of Midjourney 9, I asked Nano Banana 2 to search the Web and X for initial reactions to the launch and it made me this image. A typo in frame 1: “Wat,” but otherwise a decent visual summary.


Micropayments for AI Agents

My hope was always to support good web content through micropayments: if users pay, they become customers, rather than being the product, as happens when websites are paid by advertisers. I thought users would pay somewhere between 1 and 10 cents per article, depending on its value.


When you pay, you’re the customer. When you access a service for free, you become the product since the advertiser is the paying customer. (Seedream 5 lite)


However, micropayments never took off on the manually operated web because it was easier for users to pay by wasting time on ads, and easier for websites to get paid by having a single contract with an advertising provider. Setting up a special payment system was too much overhead, especially in the early phase when almost no improved content would be available for users and almost no paying customers would be available for websites.


Micropayments failed during the dot-com bubble because it was too much hassle for users to set up a micropayment service and activate it for every article they wanted to read. (Seedream 5 lite)


However, there is now hope that we can get micropayments for AI agents. Agents can be dramatically more useful when they can access high-quality proprietary content and aren’t limited to generic web articles. For the content providers, as a steadily higher percentage of website use happens through AI agents, they lose the traditional advertising revenues (no eyeballs on a page that’s scanned by AI) and also get substantially less branding value (since AI reads many more websites than it ever mentions as sources to users and since users click through to very few of those sources).


The old Internet business models built on eyeballs are crumbling, since there won’t be any human eyes on digital content in the future when AI agents read everything on behalf of the users. (Nano Banana Pro)


AI skims hundreds of sources and shows only a few to users, meaning there is no branding value for the company behind the website accessed by the AI agent. (Nano Banana 2)


For example, when researching my recent article on the difference between median users and power users, I used four different Deep Research agents, that accessed almost 200 freely available sources. My research (and thus my article) would likely have been better if they could have also used proprietary sources. Let’s say I would pay $10 for better research, since publishing a free article is not a high-value project: that would allocate 2 cents per source that was actually used.


However, for higher-value corporate projects, it would often be reasonable to pay $100 for a deep research report, let’s say, to answer a competitive market analysis question. And for a drug discovery project, Deep Research could easily be worth $1,000 or more. In such cases, content providers might charge $1 or more per article or per data repository.


The agent budget for a Deep Research project will depend on the value to the company of solving that problem. (Nano Banana Pro top; Seedream 4.5 bottom)


AI agents don’t face the same issues with micropayments we had back in the days of manually-accessed websites:


  • No usability overhead in agreeing to pay a micropayment. The AI agent has a budget, and spends it as it sees fit.

  • Little chicken-and-egg dilemma in setting up agents or content providers: in the cutthroat AI market, it will be worth the small programming overhead for an agent provider to set up a micropayment option, if this results in higher-quality reports for its users. Similarly, for a content provider, even if only a few agents support micropayments in the beginning, each of them will drive thousands of requests, assuming that the content is indeed of higher value than what’s available for free oni the open web.


For these reasons, I am quite optimistic about the prospects for the new MPP Machine Payments Protocol for AI agents that Stripe (the biggest ecommerce payments provider) and Tempo (a blockchain company) launched last week.


Paid content means better content, which, in turn, means more useful AI. I root for the success of MPP.


AI will become much more useful once it can access proprietary, for-pay content. (Seedream 5 lite)


Microsoft MAI Image 2 Model

Microsoft’s superintelligence group released an improved image model called “MAI Image 2.” (MAI stands for Microsoft AI.) Virtually all the prompts I tried failed, so I can’t comment much on this model, but the images posted by various influencers are nice enough, though not rising to the standard of the leading models (Nano Banana 2 and Seedream 5).


What’s most interesting about this development is that Microsoft continues to invest in developing its own AI models, meaning it will not be dependent on OpenAI in the future.


Ironically, the only good image I got out of MAI Image 2 was this comic strip about its failures.


Same prompt with Nano Banana 2. Which do you prefer? Note that the prompt only specified “funny animal characters” but not the species.


NVIDIA GTC 2026

NVIDIA’s annual GTC conference was last week. “GTC” used to stand for “GPU Technology Conference,” but the biggest news this year was NVIDIA’s integration of other chip types with their GPUs, especially the ones they got when buying Groq.


GTC has turned into the main AI event of the year, and NVIDIA CEO Jensen Huang has become the senior statesman of the industry, making his keynote as eagerly anticipated as Steve Jobs’ in the old days. I didn’t have time to watch the full video, so I did what any proficient AI user should do these days and had Nano Banana 2 make a comic strip for me to summarize the event:




 


Top Past Articles
bottom of page