Summary: Learn user testing through in-person sessions | Dialectic writing with AI | Video of Tesla’s AI training cluster | Runway shows experimental user interfaces for video generation | AI taking the lead to guide users | Saudia Arabia has natural advantages for AI | Semantic AI video editing with motion brush in Kling
UX Roundup for September 23, 2024. Happy Northern Hemisphere autumn (Ideogram).
Learn User Testing by Running In-Person Sessions
David Hamill has a well-argued post recommending that people who are new to user research should conduct their first usability studies as in-person sessions instead of using one of the multiple excellent remote tools, such as UserTesting.com (which has incorporated the tools from my old favorite, UserZoom since the two merged last year).
Once you are skilled at running user studies, you pick the best tool for the job. Depending on the circumstances, this will often be a remote study: cheaper, easier to set up, often easier recruiting, and international coverage without jetlag. Some studies are better done in person, especially exploratory research where you want to dig deep and are more interested in users’ behavior patterns than in debugging the design of specific features.
But things are different while you’re still learning how to run a study. David Hamill recommends running your first sessions in person, and I agree. He also points out that if you’ve never run an in-person study, you don’t know what you’re missing in remote sessions, meaning that you can’t compensate for problems such as often having more technically capable participants in remote studies. (People don’t sign up to be remote test participants if they have trouble using computers.)
For learning usability testing, David Hamill recommends David Travis' online Usability Testing Boot Camp on Udemy. Since I learned how to do user testing 41 years ago, I have not taken this course, but I know Davis Travis and he’s solid. Even better, his course is laughably cheap at $20 for 8.5 hours of instruction. (You may think that it’s bad that the course was recorded in 2021, but usability testing methodology has remained the same for at least 30 years, with the exception that remote testing used to be difficult but became easy about 10 years ago.)
Remote user testing vs. in-person user testing. Both have their place, but if you are just learning how to conduct user research, you should run your first study as in-person sessions. (Ideogram)
Dialectic Writing with AI
There’s a new academic journal called Journal of Writing and Artificial Intelligence, focusing on how to use AI in the writing process. An interesting paper by Robert Deacon in the first issue covered “Developing Thinking through LLM-Assisted Writing: Hegelian Synthesis and Critical Thinking” (42-page PDF).
Academics will be academics, and humanities professors have an especially nasty habit of complicating their language — paradoxically even when they teach writing. However, this paper has an exciting core idea that’s well worth exploring outside academic circles. The author points out that writing and thinking are linked and that both can be either linear or iterative. The latter is called dialectic writing and is associated with Hegel’s overly complicated philosophy. Progressing through thesis, antithesis, and synthesis provides deeper insight into an issue than simply writing and publishing your initial thoughts.
AI can hold up a mirror to writing students helping them see alternatives such as counterarguments and varied structuring principles for their essays. (Ideogram)
All very well for a professor to say, but it’s hard for most people to challenge their own thinking systematically. This is where AI can shine and become an intellectual sparring partner.
The paper includes sample prompts for taking writing students through different essay structures.
The writing process is strongly intertwined with the mechanisms we use to think through a problem. AI can help with this process. (Midjourney)
Video of Tesla’s AI Training Cluster
Short video walkthrough of the AI training supercluster that Tesla built recently. (Tesla is a major electric car maker.) Just for fun, but amazing to see this large room crammed full of NVIDIA chips, organized so neatly. It is unclear whether this AI training cluster is separate from the one built by Tesla’s sister company xAI to train the next generation of Grok which is a large language model (whereas Tesla presumably needs training on driving data to power autonomous cars).
This reminds me of the scene in the science-fiction movie “2001” where the astronaut enters the processing core for the HAL intelligent computer. One of the comments on the video posted an alternate take with a fun historical photo: this AI supercluster taking up a large room can also remind us of the way mainframe computers in the 1960s filled up an entire room. History repeats at some level, even in a field that changes as much as computing.
From room-sized mainframe computers to hand-held computers to room-sized AI training clusters. History rhymes. (Leonardo)
Runway Shows Experimental User Interfaces for Video Generation
I have long been impressed with the quality of the AI videos produced by Runway. (Here’s a music video I made with Runway Gen-3 for my song about dark design.)
At the same time, new video models are released constantly. Here’s another music video I made with Hotshot. Hotshot is a new service that’s notable for having been built by a super-efficient team of only 4 people in 13 months. I don’t think it’s as good as Runway, but given the team’s capabilities, I would not be surprised in future releases of Hotshot show major improvements.
New AI video tool Hotshot was built by a super-powered team of only 4 people. (Leonardo)
In fact, new AI video models are now releasing at a pace where I can’t keep up to try them all. For example, the Chinese service MiniMax was released this month to rave reviews from many creators. (OK, I couldn’t help myself: inspired by the previous news item, I made a 5-second MiniMax video with the prompt “walking through an AI training cluster.” I’m pretty impressed with the video quality, especially considering that MiniMax is free (for now!). I don’t know why so many of these AI video generators create slow-motion videos by default. If I needed this video for real, I would have to reprompt.) Reportedly, MinMax was built by a team of only 6 people. While this is 50% more than Hotshot, it’s still impressive, indicating how small hyper-accelerated AI-supported teams can beat large legacy companies.
Just like 2023 was the year of AI text, 2024 is proving to be the year of AI video. (Presumably, 2025 will be the year of AI text again, once the next-generation large-language models ship.) There’s a real race between many AI video models to improve video generation. (Midjourney)
When a handful of people can build the base functionality, and this is being done repeatedly around the world, competition is clearly heating up in the AI video space. For video (and many other AI services), the differentiating factor is no longer the AI itself, but rather the user interface to that AI that allows users to accomplish tasks with AI.
Good UI = good results from using AI.
Bad UI = disappointing results from AI.
This last bullet explains why some people are negative about AI’s potential, when they have only experienced a line-mode chat interface to AI. ChatGPT was the first good AI, and it’s still the most widely used AI product, which is unfortunate. Hybrid UI is the way forward for AI.
Runway has released a whitepaper about alternate user interfaces they are working on for video generation. These design explorations aim to strengthen the following UX qualities:
Wonder & Discovery: support “generative daydreaming,” with an example UI (screenshot below) showing alternative imagery, color palettes, people who might appear in the video, and mood words — all based on ways to enrich the user’s prompt.
Control: granular controls to fine-tune a video, derived from an analysis of the design space suggested by a prompt. An example shows a user prompting for “a film about a crab,” which results in buttons naming different crab characteristics, from size to hairiness.
Feedback: showing users the video created from their prompt and operations like the first two bullets, and then allowing users to select objects within that video for immediate modification.
Screenshot of one of the sample UI directions discussed in Runway’s whitepaper.
AI As Expert Tutor
A user named “Rachel V” posted an interesting observation about the new OpenAI o1 model that relies on reasoning and inference-time compute to provide deeper answers to certain complex questions. She says that o1 may be more like a mentor taking the lead and interpreting what their human is asking, instead of being a mere copilot or assistant.
New AI with improved reasoning skills can interpret what users are asking and provide improved help with complex tasks. Like a good professor would do. (Midjourney)
Whether or not this is true for the current product, I think it’ll be true for future AI models once they achieve ASI (artificial super-intelligence, or better than all — or almost all — humans). We should think of three different models for human-AI collaboration:
Humans take the lead, and AI is a mere assistant. This is often the most appropriate model right now, as most AI performance is at the level of a smart high school student or the proverbial eager but untrained intern.
Humans and AI are copilots sharing the initiative. Despite the product name “Copilot” for software-development AI, we’re probably not there yet, though some programming tasks may approximate the pair programming model where two partners trade back and forth.
AI takes the lead, serving as an expert tutor to guide humans in the proper direction. This is already the case in situations where humans want to learn about unfamiliar topics: AI will give you a lesson plan and take you through the steps. For AI to have this role in a domain where the user is already an expert will likely not happen for another 5 years.
When designing AI products, think about all 3 models. They likely require a different design.
Does the user direct the AI, or does the AI direct the user? Both models (and the intermediate copilot model) will have a role in future human-AI synergistic collaboration. A single task will likely swing between these extreme modes, as work progresses toward a solution. (Ideogram)
Saudia Arabia’s Natural Advantages For AI
A recent announcement that Groq is building a huge AI compute cluster in Saudi Arabia included some insights that I had never considered.
Groq with a Q should not be confused with Grok with a K. Grok is an AI model developed by xAI. It’s software and has an AI training cluster of its own in Texas which I mentioned earlier in this newsletter. In contrast, Groq is a hardware company that develops AI-optimized chips that are reportedly more than 10 times as efficient as NVIDIA’s Hopper chip for AI inference. (Measured by the number of tokens per seconds the chip can generate when using the same AI model to answer user questions.)
I tried to make Grok draw the difference between Groq and itself. Not too pleased with the result, but this is what I got to show that Groq is hardware whereas Grok is software. (Grok)
Why build one of the world's largest AI inference clusters in Saudi Arabia?
The Bloomberg article linked above lists 3 main Saudi advantages:
Low energy costs.
Plenty of empty land. (That famous desert.)
4 billion people within a 100-millisecond ping, allowing the compute cluster to deliver fast response times to this immense user base.
I had never thought of the population within a certain ping radius as bestowing geographical advantage on a country. (That goes to prove that I’m a usability guy, not a data center engineer.)
The Saudi energy advantage will sadly be oil for now, which is unfortunate for global warming. However, since it’s one of the few countries that can build infrastructure when they decide to go for it, I expect that some of that empty land will be turned into homes for anywhere between ten and a hundred emission-free nuclear power plants over the next decade.
In contrast, the United States and Europe (with the possible exception of France) are incapable of building infrastructure. They are unlikely to be able to finish the required large number of nuclear power plants within 10 years, even if they start now.
Saudi Arabia is one of the few countries capable of building the necessary infrastructure fast enough to deliver ASI (artificial super-intelligence) in less than a decade. (Joined by a handful of other countries such as the United Arab Emirates, China, South Korea, and hopefully France.) (Midjourney.)
Semantic AI Video Editing with Motion Brush in Kling
The Chinese AI video model Kling has launched new features that are taking creators by storm. Tim (he never gives his last name!) from Theoretically Media has an excellent overview video explaining the changes and showing great examples of newly created AI videos (YouTube, 14-min. video — what can I say, it does no good to link to an article when discussing video creation.)
One change is simply that videos are now 1080p instead of 720p. Still not 4K, which we’ll need for Ai creators to replace the legacy movie studios, but there’s a distinct quality difference in the output. Improved video resolution is a good example of how more powerful AI creates better quality products.
More conceptually interesting is the new motion brush tool. This allows the user to point to an object in a still image and draw the path they want it to move. For example, select a soldier next to a WW I trench and ask the model to make a video showing him either jumping across the trench or walk next to it. (Or crunch down and crawl next to it: you can combine two directions with this tool, though this seems less reliable for now.)
This motion brush feature is an example of the semantic-level editing I discussed in last week’s newsletter.
AI movie-making is scaling to bigger robot cameramen (bigger AI models) for higher-resolution video. (Ideogram)
Comments