top of page

UX Roundup: AI is the Computer | Disrupting Filmed Entertainment | UX Trends | GOMS | Nano Banana 2 | Lyria 3

  • Writer: Jakob Nielsen
    Jakob Nielsen
  • 7 days ago
  • 15 min read
Summary: The AI is the computer | AI is disrupting the filmed entertainment industry | UX trends for 2026 | AI can use the GOMS UI analysis method | New image model: Nano Banana 2 | New music model: Lyria 3

UX Roundup for March 2, 2026 (Nano Banana 2)


The AI is the Computer

Aravind Srinivas (the founder of Perplexity, which is probably the best AI-driven answer engine but has ambitions to be more) wrote an interesting essay titled “The AI is the Computer.”


When I worked at Sun Microsystems, our motto was “The Network is the Computer,” by which we meant that once machines are networked, the primary “computer” for solving users’ problems is no longer the isolated box on the desk, but the distributed, cooperative system of interconnected services. This vision went far beyond the “cloud computing” model, which really just means that data is stored in a different location and that software runs on a different computer than your desktop. “The Network is the Computer” implies aggregation of many services, usually accessed through the Internet (which we profited from building, so it was a self-serving slogan).


Now, Srinivas proposes the next evolution of computational user service: that AI as a whole will become the solver of the user’s problem. Individual AI models are components of this higher-level “computer,” but they won’t be the focus of the user experience.


The AI is the computer now. (Nano Banana Pro)


More specifically, he says that the true potential of artificial intelligence lies in massively multimodel orchestration rather than any single model. He suggests that because frontier models are increasingly specialized, superior performance requires a system that can delegate specific tasks to the most qualified agent. By combining a file system, secure code execution, and web access, this orchestration layer effectively transforms AI as a whole into a functional computer capable of autonomous work.


Aravind Srinivas uses the distinction between an individual musical instrument and the full orchestra as his analogy for how AI as a whole will become the driver of the user experience. (NotebookLM)


Before good AI, algorithmic search engines were the primary way to access knowledge on the web, but they have always been unable to deeply research, synthesize, or take action on a user’s behalf. AI changed this: Perplexity pioneered the answer engine concept, focusing on accurate, reasoning-driven Deep Research that can find, analyze, and synthesize knowledge from the open web, effectively solving what Srinivas calls the “read” problem for the internet. With the web now functioning as a readable and writable knowledge store powered by AI agents that can navigate and complete tasks, the final frontier is personalization with persistent memory, private files, and tailored tools that make the computer truly yours, capable of working on your behalf in a deeply individual way.


Our old “The Network is the Computer” model has broken down due to the Web’s information overload and information pollution. Shifting to the AI as the computer defends users against this onslaught of information. (NotebookLM)


Ultimately, the user experience will shift from traditional user interfaces toward a symphony of models that operate as a unified, personal machine. (This latter point is similar to my prediction that old-school user interfaces will become much less important in the future: No More UI as the song says.)


AI Disrupting the Filmed Entertainment Industry

Consulting company McKinsey has released a report on AI’s impact on the film and TV production industry. To be honest, I am not overly impressed with the report, which lacks visionary insights into AI’s likely disruption of these legacy companies. The conservative nature of the report probably stems from its use of the classic strategy consulting method: they chatted up 20 industry executives about what they expected to happen and played back what they were told. Of course, senior people in a legacy business are unlikely to have disruptive insights. Breakthrough innovation in this domain (as in most others) is much more likely to come from startups run by AI-native folks in their 20s.


Think MrBeast, not Disney, if you had wanted to predict YouTube’s impact 10 years ago. Today, we need to find the AI-native equivalents of MrBeast to discover fresh thinking.

McKinsey is a private company, so we don’t know its recent stock performance, but the stock of similar consulting companies that have made a lucrative living from selling similar reports has dropped like the proverbial stone, with Gartner trading about 70% below its 12-month high and Forrester trading 60% below its 12-month high. Clients can now get similar consulting advice from a $200 subscription to the “max” level Deep Research from Perplexity or the “ultra” level from Google Gemini.


Despite being underwhelmed by McKinsey’s report, it still contains many interesting nuggets and has the advantage over AI-generated reports of being concise and having a prettier page layout.


The initial part of the report is rather pessimistic (but very realistic) about the commercial prospects of film and TV production: in recent years, audiences for theatrical movie releases have dropped by 6% per year, and linear TV viewing has dropped by 4% per year. (In contrast, social video viewing has been growing by 14% per year.) Since the money comes from movie theaters and legacy TV and cable, these stats bode poorly for the industry. While spending on filmed entertainment content production in the United States has been falling by 10% per year in recent years, McKinsey inexplicably predicts that this spending will only decline by 2% per year going forward.


Despite the filmed entertainment industry being in decline, it still spends big bucks, and the report predicts it will redirect about $60 billion in annual revenue in 5 years due to AI.


Interesting historical data and some rather conservative predictions in McKinsey’s new report about future prospects for film and video production companies: they predict that spending on producing filmed entertainment will drop by 2% per year and that about $60 billion in annual revenue will likely be disrupted by 2030 in the United States alone. (NotebookLM)


Whether or not we agree with this $60 B estimate, there is no doubt that filmed entertainment will be disrupted by AI. Anybody who has observed the current rate of improvements in AI video will agree that AI will take over, even if opinions may vary as to how many Hollywood legacy companies will be driven out of business, and how soon this will happen.


The report presents 3 scenarios for how AI will impact the film and video industry: changes to current production workflows, democratization of professional-grade content creation, and the emergence of new content formats and distribution channels. These scenarios are not mutually exclusive, and how they unfold depends on how quickly AI tools are adopted, whether consumer preferences shift, and who captures the resulting economic value.


McKinsey’s 3 scenarios for how AI will impact the film and video industry. (NotebookLM)


1. Improving Current Production Workflows

This is the most likely near-term outcome. AI is already beginning to touch every stage of production, from ideation to distribution. In a very conservative prediction, McKinsey suggests roughly $10 billion of US original content spend in 2030 will be redirected from legacy processes to AI.


The industry’s notorious conservatism may provide some support for McKinsey’s conservatism. CGI technology had existed for nearly two decades before Jurassic Park transformed the industry, illustrating the typical lag between a new technology’s development and its widespread adoption. Industry players will also need to navigate concerns around creativity and authorship before adoption accelerates.


Whether AI vendors can capture a significant share of the predicted $10B depends on competitive dynamics. Current developments in AI video models suggest fierce competition, meaning that most of the value unlocked by cheaper AI will flow to the producers, not the AI model providers.


2. Democratization of Professional-Grade Content Creation

AI could enable smaller studios and individual creators to produce content that rivals large-studio output, increasing total content supply and opening new creative opportunities. I think this is the biggest likely change over the next 10 years, but McKinsey views it as less certain, partly because it depends on whether smaller creators use AI to raise quality rather than simply flood the market with lower-value content (what many call AI “slop”).


My view: one person’s “slop” is another person’s must-watch content. Narrowly targeted video prospers by being more interesting to its target audience than mass-market productions can ever achieve.


The combination of more content and finite consumer attention could meaningfully reshape distribution. In a historical parallel, the rise of broadcast TV contributed to a 38 percent drop in US cinemas between 1930 and 1957.


Americans already watch an almost inconceivably large amount of video, averaging 7.5 hours per day. It’s hard to imagine this increasing much, so the availability of more AI-produced video will likely result in a drop in the attention people allocate to legacy producers. (NotebookLM)


3. New Content Formats and Distribution Channels

The most transformative but least certain scenario involves AI giving rise to entirely new formats and platforms, not just reshaping existing ones. Historical precedents are striking: the shifts from stage to cinema, from linear to streaming, and from long-form to short-form each reduced incumbent revenue by an average of 35 percent within five years of wide adoption. Applied to current forecasts, a comparable AI-driven disruption could redistribute around $60 billion in revenue within five years of mass adoption.


World models are currently the most dramatic new content format, but since AI-driven entertainment is in its infancy, many more new formats will likely be invented over the coming years.


Even the conservative approach in the McKinsey report predicts major disruptions. I think much larger disruptions are likely, though I acknowledge the inertia and sluggishness in changing the system, even as AI technology advances at a breathtaking pace. Despite my criticism, I recommend downloading the full report: it’s a quick read.


2026 UX Trends Report

The World Usability Congress has published an interesting 2026 UX Trends Report (30-page PDF). The trends discussed in the report may not be that new, but that’s probably an indication that they are real. (The main exception being brain–computer interfaces, which are likely happening as proved by Neuralink, but are certainly not going to be mainstream in 2026 for applications beyond helping paralyzed patients. However, the point that we need to move beyond low-level proxies for user intent, such as click-level analytics, is well taken.)


Key trends highlighted in the recent report from the World Usability Congress. (NotebookLM)


While the participants in a usability congress are a self-selected sample that may not generalize to all companies, the report contains some interesting survey response statistics:


“To what extent is the potential of Experience Design in your company being realized?”

  • High: 27%

  • Somewhat: 46%

  • Low: 27%


“To what extent does AI influence your work?”

  • Positively: 67%

  • Neutral: 25%

  • Negatively: 8%


AI Can GOMS

Matt Sharpe (Director of Design at Capital One, which is a huge finance company in the U.S.) reported on a case study where he successfully had Gemini conduct a GOMS evaluation of a UI design. (He doesn’t specify which version of Gemini he used, but I would expect it to be at least 3.1 Pro, and probably/hopefully the Deep Think variant, since GOMS will benefit from extended reasoning.)


GOMS (Goals, Operators, Methods, and Selection rules) is a model for predicting human performance with user interfaces. To conduct a GOMS analysis, you decompose a user’s task into a hierarchy: the user has a goal (e.g., delete a word in a document), achieves it through operators (the atomic perceptual, cognitive, and motor actions like keystrokes or mouse movements), employs methods (sequences of operators that accomplish a goal), and applies selection rules (the decision logic for choosing among competing methods).


Overview of the GOMS method for estimating skilled-user task performance. My apologies for the drawings of Card, Moran, and Newell not really looking like them. (Nano Banana 2)


The real value of GOMS is that it gives you quantitative, testable predictions of expert performance time, something most usability methods conspicuously fail to do. It won’t tell you whether your interface is learnable or pleasant, but it will tell you how efficient a skilled user’s workflow will be, measured in seconds and keystrokes to complete a task. That makes it useful for comparing design alternatives at the task-execution level, particularly in high-throughput, mission-critical systems where shaving a second off a repeated task has enormous cumulative payoff.


GOMS originated in the early 1980s from the foundational work of Stuart Card, Thomas Moran, and Allen Newell at Xerox PARC, formalized in their landmark 1983 book The Psychology of Human-Computer Interaction. It was, frankly, one of the first serious attempts to bring the rigor of cognitive science into interface design: to move the field beyond subjective opinion and toward predictive engineering models rooted in human information-processing theory. The most widely used variant, the Keystroke-Level Model (KLM), strips GOMS down to its most practical form: you simply list the physical and mental operators for a task and sum their empirically derived execution times. Later variants like CMN-GOMS, NGOMSL, and CPM-GOMS added richer representations of cognitive procedures and parallel processing, but the core insight remained the same: expert behavior is sufficiently routine and predictable that you can model it before you build anything, which is exactly the kind of discount analytical power that complements, but doesn’t replace, empirical usability testing with real users.


Despite these benefits of GOMS, I always used to recommend against using the method for practical projects. Two reasons:


  • The theory is fairly complicated, and it requires substantial expertise to conduct a GOMS analysis correctly.

  • Since a GOMS analysis is very detailed, it furthermore takes extensive time for even a highly skilled expert to carry one out.


High expertise requirements, multiplied by high time consumption, equal high cost! For most projects, it was better to run a quick discount user study with 5 users.


However, AI now enables “GOMS for Dummies.” You don’t need advanced expertise or theory, because the AI already knows all this, having read the old book and the many subsequent research papers published on GOMS over the 43 years since it was introduced.


You also don’t have to spend many expensive billable hours conducting the GOMS analysis. AI is cheap and fast. (You may need a $250 Google Ultra subscription to access Deep Think, but that’s nothing compared to the productivity gains from all the many other uses you can make of advanced AI reasoning.)


The basics of GOMS remain the same, no matter whether the analysis is cheap (AI) or expensive (human experts): Given a UI design and a task, GOMS estimates how much time it will take an experienced user to accomplish the task. GOMS thus targets usability criterion 2: efficiency (or time on task). It does not address the remaining 4 usability criteria: learnability, memorability, user errors, or satisfaction.


But when the efficiency for skilled operators is important, GOMS now becomes a feasible method in many more projects.


It’s very exciting for me that I have changed my position on GOMS after 40 years of being against it. AI truly does change how UX work is done.


AI now makes the GOMS method feasible for practical UI design projects that target fast skilled-user task performance. (Nano Banana Pro)


It is understandable that AI would be well-suited to conducting GOMS analyses, as this is a formal method that seems amenable to reasoning models. This is in contrast to informal methods like heuristic evaluation, which current-level AI can only do at a medium level.


Now that GOMS is virtually free, it makes sense to include it as a standard part of the UX toolkit and perform GOMS analyses of any UI where you expect to have a substantial amount of use by skilled users. Remember that it says nothing about learning, memorability, user errors, or satisfaction, which are the dominant concerns in website usability. But even if 10% of your users are skilled experts, it makes sense to consider their needs and conduct a cheap analysis. This is especially true since it is hard to include skilled use in traditional usability testing of a new design which won’t have any experienced users yet.


Nano Banana 2

Google launched a new version of its image model, moving from Nano Banana Pro to Nano Banana 2. (AI model names remain a confusing mess, which is a true UX problem since it makes it harder for users to understand what’s going on.)


I used the new model (Nano Banana 2) to create the infographic in the previous news item explaining the GOMS method. For comparison, here’s the infographic created by the old model (Nano Banana Pro) from the exact same prompt:


GOMS model infographic from Nano Banana Pro.


You can see that the new model does have better grounding in world knowledge and creates somewhat more interesting infographics, though it failed utterly in drawing true likenesses of the three model creators. For example, in all the years I have known Stu Card, he has always worn glasses.


As another example, here’s a comic strip Nano Banana 2 drew for me. Compare the above strip from Nano Banana Pro made with the same prompt. I like the animal version better (funnier), but of course, I could have prompted for funny characters rather than realistic characters in NB2, and it would probably have delivered.


Character consistency is still lacking: it moved my “Dr. Nielsen” name badge from one lapel to the other between frames 1 and 4. Also, in truth, I am a bit balder than shown in the closeup frame. (Nano Banana 2)


On balance, Nano Banana 2 is a modest upgrade over Nano Banana Pro but not a revolution in the capabilities of AI image generation like the one we saw last year. NB2 does deliver somewhat improved text rendering with fewer typos and garbled characters, though it’s still not a perfect speller.


What do we expect? NB Pro launched on November 19, 2025, and NB 2 was released on February 26, 2026, so basically only 3 months later. Even with the accelerating cadence of AI releases, 1/4 year isn’t enough for a revolution.


Another difference is that the API price to generate a 2K image with Nano Banana Pro was 13.4 cents, whereas Nano Banana 2 only charges 10.1 cents for a slightly improved image. This is a 25% price drop in one quarter. Nice!


A final point is that Google doesn’t really need a bigger jump in image-generation quality than it delivered with NB2. The competition is lame: OpenAI was a huge disappointment with GPT Image 1.5, and Midjourney 7 was too. (Midjourney 8 is coming any day soon, and is rumored to have vastly improved text rendering, but since it is still not connected to a reasoning model, it will continue to be useless for advanced work.) Grok Imagine is improving faster than anyone else, but is coming from behind as a new model. The closest competition currently is probably ByteDance’s Seedream 5, which is indeed a good image model. Just not as good as Nano Banana 2.


My assessment of Nano Banana 2, drawn, of course, with Nano Banana 2.


New Music Model: Lyria 3

As if a new image model wasn’t enough, Google also released a new music model, called Lyria 3. There must have been a Lyria 2 model, but it was probably so bad that I never encountered it. (Or it was hidden in Google’s mess of a UX architecture for its AI services. I stand by my prediction that they must improve this in 2026.)


Lyria 3 has one huge limitation, which is that it only generates 30 seconds worth of a song or instrumental music. This is woefully insufficient for any real song. Google likely targets creating background music for short-form social videos rather than creative song-making. For example, my recent songs last between 3 minutes for Year of the Fire Horse and almost 6 minutes for Predictions for AI & UX in 2026 (I had a lot of predictions 😄). Such song length is needed for decent storytelling and can currently only be reached by the likes of Suno.


A second problem is Google’s obsessive and overblown copyright safeguards. Because I wanted to create a head-to-head comparison between Lyria 3 and Suno 5, I wanted Lyria to make a song with the same lyrics I had used with Suno. However, Google rejected this prompt, claiming that Lyria cannot make songs with uploaded lyrics in order to protect copyright and IP. Absolutely crazy, given that this was my own song. (It’s not as if I was making an alternate version of Sir Paul McCartney’s “Yesterday.”)


I had to reprompt several times, explaining that the lyrics were my own and nobody else’s IP, before Lyria 3 finally gave in and generated the song I wanted.


I made a video comparing Lyria 3’s song with the identical lyrics in Suno 5, as well as the two versions Lyria gave me with its own alternate lyrics. (YouTube, 3 min.)


Lyria’s self-written lyrics included the acronym “GUI,” which is usually pronounced “gooey,” but Lyria apparently doesn’t connect deeply enough to the underlying Gemini model’s world knowledge when generating its exact sounds, because it pronounced “GUI” as “guy” (as in “the guy went fishing”). When I made my own song with Suno 5, it made the same mistake, but I could correct it by simply editing the lyrics for the next iteration of the song. Lyria currently doesn’t offer a specialized editing UI for music-making: it’s a take-it-or-leave-it model, which certainly has usability benefits for social-video creators who just want some quick background music.


Watch my full music video made with Suno 5 about the History of the GUI (YouTube, 5 min.) to see what I ended up with after extended editing.


I made Lyria go head-to-head with Suno. I even used the same avatar for a fair comparison, simply changing the singer’s outfit color in Nano Banana Pro. (Nano Banana 2)


Interestingly, despite Google’s overly sensitive “safeguards,” it gladly accepts the names of existing artists and existing songs as prompts: make a song in the style of X by Y. However, you might as well not bother, because the new song is not remotely in the same style or even genre as the example you provide in the prompt. (I asked for classic rock, but mostly got rap.)


On the topic of Google and copyrights: I was creating illustrations for an upcoming article in NotebookLM and asked it for images in the styles of various film genres, including Science Fiction, Action, and Film Noir. I honestly did not include the names of any actors or specific films, for two reasons: first, I know that any such images would constitute a copyright infringement, and second, I think it’s much more interesting to feature my own characters than those created by others.


Even so, here are some of the images Google generated. Do you think they look like famous movie stars? AI should absolutely not produce such images: if I hadn’t recognized the actors, I might inadvertently have published them.


Sample images generated by NotebookLM, even though I did not include the names of any actors or films in my prompts.


In general, I don’t think AI models should prevent users from generating content that might conflict with existing IP or is what’s euphemistically called “NSFW,” though it would be great if the model would warn the user against possibly infringing content. Obviously, if anybody posts a “James Bond” movie or a “Beatles” song, they should absolutely get slammed with a takedown note. Even better, services like YouTube already check uploaded videos for most copyright violations before they get published.


AI should not censor humans. They don’t know how we intend to use the content. I’m still upset with several models for disrespecting my ethnic heritage by refusing to show the Viking Erik Bloodaxe.


(Certainly, AI should refuse to make blatantly illegal material, such as certain forms of porn.)

Top Past Articles
bottom of page