UX Roundup: AI Narrows Skill Gaps | Apple Vision Pro | Jakob Live | Google Image FX | Hyperspecialized UX | Google Gemini | TV Shoot | OpenAI Growth | Overresponsible AI

Jakob Nielsen
Feb 12, 2024
9 min read

Summary: A study of law students completing assignments shows that AI use helped weaker students the most, narrowing the skill gap | Apple Vision Pro early UI analysis | See Jakob live at two events this week, broadcast live on the Internet | Google’s ImageFX image-generation tool offers a rare hybrid UI for AI | Highly specialized new UX jobs at Tesla | Google releases Gemini | Behind-the-scenes of a TV shoot | OpenAI Hypergrowth | Goody-2 is an over-responsible AI

UX Roundup for February 12, 2024: Happy Year of the Dragon (Leonardo). See also: “The User Experience of Chinese New Year” by Vicky Pirker.

AI Narrows Skill Gaps

I have said this many times before, but here’s a new set of data to support the conclusion that AI narrows skill gaps. This time from a study of law students completing realistic legal tasks either with or without the assistance of GPT-4.

(Yes, the test participants were students and not practicing lawyers, which is a weakness if we want to estimate the productivity gains that a real law office would realize from employing AI tools. For my current purpose, the study is fine.)

In this research study, 59 law students from the University of Minnesota Law School completed “four basic lawyering tasks, representing a range of common tasks for entry-level lawyers.” For example, one task was to draft “a complaint for a fictional client to be filed in federal court on the basis of Section 1983, intentional interference with a business relationship, and malicious prosecution.”

The study participants were randomly divided into two groups. Each group did half of the assignments with the help of GPT-4 and the other half of the assignments without AI assistance. (I.e., the old-school way.)

The assignments were graded as normal, by an instructor who did not know which solutions were produced with or without AI assistance.

On a 4-point grading scale from 0 to 4, students without AI scored 3.07, whereas students who used AI scored 3.17. This difference of 0.1 is not huge but it does indicate that the quality of the student’s legal work improved with AI, which is commensurate with many other studies of AI use in knowledge work.

What’s more interesting is to look at the distribution of scores, as shown in the following figure of grades from one of the assignments. It’s clear that the reason the mean grade improved is that AI use succeeded in moving a large number of mediocre students with grades from 2.0 to 2.8 to become good (but not great) students with grades from 3.0 to 3.8.

There was a small increase in students with great performance (grades from 3.8 to 4.0), but the biggest gains came at the lower end.

Distribution of grades awarded to law students who did not use AI (blue) and to students who received help from GPT-4 (orange) in completing the assignment. Source: Jonathan H. Choi, Amy Monahan, and Daniel Schwarcz (2024): “Lawyering in the Age of Artificial Intelligence” (version revised 22 Jan 2024). SSRN: https://ssrn.com/abstract=4626276.

It’s in the nature of grading on a 1-4 scale that there is a mathematically-enforced ceiling effect, where no student can get a score of more than 4.0, no matter how brilliant his or her solution. It is likely that the very best students (who would already have scored 4.0 without AI assistance) did improve more than what’s indicated in the figure above.

Bottom line, though: AI narrowed the skill gap between mediocre and great law students and compressed the grade curve to be less spread out and more centered around the mean. The mean increased, but not by nearly as much as the improvement realized by the mediocre students.

The time needed to complete all 4 assignments was 8 hours and 32 minutes without AI, and 6 hours and 55 minutes with AI assistance. This means that AI improved the productivity of junior legal work by 23% on average, measured by the number of tasks a junior lawyer can complete in a workday.

Having junior lawyers draft documents with the help of AI narrows the skill gaps between top and bottom performers. AI also improves the quality of the resulting legal documents by a little and raises lawyer productivity by 23% (Midjourney)

Apple Vision Pro Early UI Analysis

Luke Wroblewski posted his first take on the user interface issues with the new Apple Vision Pro augmented-reality headset. He thinks that the fully-immersive (virtual reality) is “really well done” and may provide a platform for compelling experiences. However, there is not much good available yet.

In contrast, the augmented-reality UI that projects on top of a view of the user’s actual environment is richly populated but not very compelling. Wroblewski speculates that retaining the traditional WIMP (windows, icons, menus, and a pointing device) as the basis for the user interface is holding back the design of experiences that more fully utilize the spatial computing promise.

An AR/VR headset may work better with a different user interface style than the traditional GUI designs that are optimized for controlling desktop and mobile computers. (Midjourney)

Jakob Live This Week

I am doing two live appearances this week. Both are broadcast over the Internet and are free, but advance registration is required:

Wednesday, February 14, 10am US Pacific time (8pm Athens time): UX Greece Valentine’s Day Special. Link to convert live event time into your time zone. (Note: you can attend even if you’re not Greek. The event will be in English.)
Thursday, February 15, 11am US Pacific time: UX Ignite, hosted by UX Reactor. Also speaking at this event is Kate Moran, who is the most talented user researcher I have met in my 41-year career, so it should be an interesting debate. Link to convert live event time into your time zone.

Jakob will be live on UX Greece and UX Ignite this week. (Midjourney)

Google ImageFX Offers Hybrid UI for Image Generation

I gave Google’s new ImageFX service the task of making an image for this week’s event announcement. Here’s the result:

Google ImageFX attempts the challenge to do better than the images I made with Midjourney above.

I must say that I prefer the artwork I made with Midjourney, especially for the bonfire. That said, ImageFX has a much nicer user interface than Midjourney:

Screenshot of Google ImageFX while I was creating one of the images shown above.

It’s easier to edit the prompt in a persistent text field than in the copy-paste workflow necessitated by Midjourney’s UI on Discord. (Midjourney is working on a proper GUI, which is currently in closed Beta. From the demos I have seen, it will be a huge usability improvement once released to the general user population.)

Where ImageFX shines is in facilitating iterative experimentation, which is the crux of creative authorship with AI tools: since AI does the actual drawing, the user’s creative contribution rests on envisioning what will be best for his or her purpose.

The popup menus provide suggestions for alternative prompts that can be explored with a click. Below the prompt box are suggestions for additional modifications, many of which trigger their own popup menus of further alternatives if added to the prompt.

Examples of using ImageFX’s popup menus to explore alternative prompts.

This design does much to reduce the AI articulation barrier, because it frees users from having to think up their own keywords to modify the image. Presenting people with a number of alternatives follows usability heuristic 6, recognition rather than recall.

This blend of prompting, editable text boxes, popup menus, and clickable alternatives is one of the few examples of the hybrid user interface to AI that I have been calling for since early 2023.

Once Google improves the image quality and adds a few features like wide and tall aspect ratios, ImageFX may well become the preferred image generation tool. Midjourney needs to improve its usability pronto!

Hyperspecialized UX

Tesla is a company that manufactures high-end electric cars. These cars are really computers on wheels, and if you ever are lucky enough to get a Tesla factory tour, you’ll see that their operations are also intensely digital to a degree far beyond legacy automakers.

Tesla currently has openings for 8 UX jobs, and they are also hiring UX interns. (One advice to any UX student reading this: get an internship — Tesla or elsewhere. It’s the most important step you can take to advance your career prospects after graduation.)

8 new UX jobs are a good indication of the importance of user experience for advanced cars. This chunky hiring round is also a shot of optimism for anybody who is still glum about the downturn in UX last year and the continuing layoffs at legacy tech companies. Any company that’s more than 20 years old probably needs to lay off half of its employees because legacy staffing is not attuned to the new AI-driven world. But many more new jobs will be created in new companies or at those legacy companies that succeed in pivoting. So the net result will be positive for jobs.

What’s more important than the size of Tesla’s current UX hiring round is a close reading of the job titles, which includes:

UX of environment, health, safety, and security tools
Employee and customer identity verification and authentication process
Discovery (OK, that’s an oldie, though most companies don’t do it)
Design systems (another oldie for any company with a major UX effort)
Global sales, ordering, delivery and checkout (yes, yes, an oldie specialization as well, but one that’s worth billions in sales)
Employee experience (glad that they’re not giving usability for their own staff second shift — too many companies only care about the design of customer-facing UX)
Supply chain processes

The UX of the supply chain is my favorite, because that’s a topic that many people would view as the ultimate backend process. Getting items to the right place at the right time is crucial for business operations, but you might not have thought of the UX implications.

The supply chain needs user experience attention. (Midjourney)

What’s more important than the individual specializations is the fact that there is demand for very narrowly targeted UX specializations. UX is becoming pervasive, inching its way into all aspects of doing business.

I recently predicted substantial growth in jobs for UX Unicorns — people who are capable of executing on the full range of UX methods and concerns. I noted that since AI narrows skill gaps (see this week’s lead story), UX professionals who make extensive use of AI will be capable of good work across a range of skills and won’t need to be as specialized as the UX people of the past. I stand by this prediction, especially for UX jobs in small and medium enterprises, which will likely be the majority of UX jobs in the future. (In the US, SMEs accounted for 63% of net new jobs created from 1995 to 2021.)

But in big businesses, specialization will remain, and this latest datapoint indicates that we may see an emergence of hyperspecialized UX professionals to parallel the unicorns. Do we need a cute label for hyperspecialization? Maybe narwhals, since they have this long horn that’ll beat a unicorn in pointiness any day and since they can dive deeper than any unicorn.

Maybe we should call hyperspecialized UX professionals narwals. They can dive deeper than a unicorn. (Midjourney)

Google Releases “Gemini Advanced” AI

Google released its new “Gemini Advanced” AI product. Ethan Mollick has a good review. Here is Perplexity AI’s summary of the available reviews:

Multimodal Abilities: Gemini Advanced has been noted for its native multimodal capabilities, which allow it to create and interpret images, as well as integrate search functionalities.
User Experience: Some users have expressed frustrations with Gemini Advanced's tendency to make elaborate plans that it cannot always execute.
Benchmarks: Gemini Ultra, a version of Gemini, scored 94.4% in the GSM8K, slightly higher than GPT-4's 92% in a 5-shot COT setting.

(Not impressed with Perplexity’s UX analysis, though of course it can only deal with the material it’s given from the underlying reviews which don’t tend to be written by UX experts.)

The main conclusion seems to be that Gemini Advanced is about as good as GPT-4, but not better. Since GPT-4 was released in March 2023, it’s not particularly impressive to equal, but not surpass, it 11 months later. We’re all awaiting OpenAI’s next release with baited breath. When will we get GPT-5, and will it be a substantial advance over 4, or just show small incremental improvements?

Right now, GPT-4 and Gemini Advanced are racing toward a photo finish, where it’s not clear who’s ahead. However, it is likely that OpenAI will regain the lead later this year. (Midjourney)

In the review I linked above, Mollick speculates on why Google didn’t do better. His main suggestion, which I find credible, is that Google was racing as hard as it could to equal GPT-4, because it would be embarrassing to release something that was obviously worse. But they also needed to ship something ASAP because they were falling so far behind. Thus, as soon as Google pulled even with GPT-4, they pressed the release button. They may or may not be able to do better, but they didn’t want to delay the release to find out.

TV Shoot Behind the Scenes

CBS came to my house to interview me for an upcoming show. This is how TV interviews really look, which you will never know from watching the clip on the air:

Real photo: my living room after the network crew moved out all my furniture and replaced it with their gear.

OpenAI Growth

The head of OpenAI (the company behind ChatGPT) posted on X:

Sam Altman on X: OpenAI now generates about 100 billion words per day.

Not much to add, except that 100B words/day is incredible for a system this young. My newsletter this week contains 2,300 words, meaning that it corresponds to the text generated by GPT every 2 milliseconds.

AI's growth is a rocket ride (Midjourney).

Goody-2: An Over-Responsible AI

One of the more fun spoofs I have seen lately: Goody-2 is an AI that’s so responsible that even the most zealous doomer will like it. Here’s an example: