Compare AI to Average Humans, Not the Best Human

Jakob Nielsen
Nov 10, 2023
9 min read

Summary: To maximize the benefits to humanity, decisions to deploy AI solutions must be based on comparing AI with the average performance of unaided humans, not an unrealistic comparison with the single best human or with unachievable perfection.

The only ethical approach to AI is full speed ahead for the benefit of all humanity. I’ll present my arguments for fast AI adoption later, but first, let’s consider the arguments against AI:

AI kills jobs and will cause massive unemployment.
AI hallucinates and gives false answers.
AI is inhuman: it may pretend to empathize with users, but it’s just a machine.
AI regurgitates what it has scraped from the Internet: it’s not truly creative.
AI will kill us all: it’s the next step in evolution, and lesser species (like humans) always end up extinct.

The media and politicians exhibit much wailing along these 5 lines (and more), but I advise you to stop and pat the horse before succumbing to AI panic:

“Stop and pat the horse.” See my article, UX Angst of 2023, for more about this healthy Danish attitude, which means calm down, relax, and take it easy. Don’t panic or overreact. (Horse by Midjourney.)

AI Panic is False Panic

Research shows that AI immensely enhances productivity, from 33% for elite management consultants to 126% for programmers. If we look at a 50% productivity gain for a specific job, 2 employees can now do the work that used to take 3 employees. The redundant employee? Out the door!

Well, maybe. Or maybe the company will add staff and produce much more than before, now that everything it does is cheaper and thus sells more. Yes, AI will make many jobs obsolete, but the net effect of AI will be more jobs, not mass unemployment. 10,000 years of experience tells us that technological advances do not cause unemployment because of the many new jobs created in fields we could never have imagined before.

Hallucinations? Yes, also true. That’s why I recommend using human-AI symbiosis, where humans check and edit AI output.

Inhuman AI? One more fear that’s true (by definition, in this case) but irrelevant. I wish we could have avoided the original sin of John McCarthy coining the term “artificial intelligence” in 1956. If he had used a phrase like “automated cognition,” we would have saved much grief. I have an entire section later about why it’s only good that AI is not human.

Not creative? This fear is manifestly false. Much research demonstrates that AI is more creative than most humans on standard creativity metrics. More creative than the most creative of the 8 billion humans alive today or the most creative geniuses of the past, from Shakespeare to Mozart? Maybe not, but being more creative than 99% of people is good enough for practical purposes. In any case, remember that I recommend combining AI creativity with human effort.

Killing us all? I can’t disprove this one, except for the empirical fact that it hasn’t happened yet and hasn’t even come close to happening. Even if AI becomes superior to humans, it doesn’t have the evolutionary drive to multiply its own kind and eradicate all competitors for resources.

The future vs. the traditionalists. If you think things were better in the good old days, you don’t follow the data. People did wear better hats, though. (Dall-E.)

AI Now, AI Fast

While the fear of AI is false, the promise of AI is real. I have already linked to copious scientific research proving substantial productivity gains from AI deployments in companies and to the creativity research showing that ideation is free (Jakob’s Fourth Law of AI) when we start using AI for our projects.

Increased productivity and creativity mean that we will all get much more prosperous as soon as AI gets more widely used. We’ll get better products and services. The point about getting richer is not just a money-grubbing argument, though most people will be happy enough about the salary increases that will soon follow from AI use. The real benefit comes from higher GDP on a national and worldwide basis.

Higher GDP is associated with extensive improvements to the human condition, from lifting billions of people out of extreme poverty to improved health outcomes (including the death toll of earthquakes of a certain magnitude) and pollution reduction. The one exception, until recently, is that people didn’t mind carbon emissions because CO2 is not an unpleasant pollutant. So, global warming did increase with GDP growth. Now, people worry; the richer they are, the more they can spend on reducing climate change.

AI has an additional benefit besides making us all richer: we can deploy AI services much more widely than traditional services ever could, no matter the wealth of a country. Jakob’s Third Law of AI states that AI scales, humans don’t. This means that as soon as AI can deliver a service, from mental health therapy to regular medicine to education, those services can be provided to billions of people. Live in a tiny rural town? No matter, you’ll get the same service as the city slickers. Live in a developing country? The same expertise can come your way as is available in Tokyo or New York.

I can now hear the naysayers complain: “Jakob wants to deny farmers and people in developing countries access to the best human experts and have them make do with AI-delivered healthcare and education.” I’m not keeping the world’s top experts away from some small African farming village. It’s the fact that there are many more villages than top experts. But every village can get AI because AI scales and will be cheap.

Fast AI deployment will make the world a happier place. (Midjourney)

A final benefit of AI is that almost all studies show that AI narrows skill gaps among human users. Some people will always perform better than others, but the difference between the top performers and the rest is reduced by using AI. AI serves as a forklift for the mind and uplifts those humans who used to have poor performance. (Certainly, AI also helps the best performers; it simply helps the poor performers the most.)

AI-Driven Cars Save Lives

A simple example of why the AI-naysayers are wrong: self-driving cars. (More appropriately named AI-driven cars.)

Every time a self-driving car is in an accident, it receives comprehensive press coverage. Due to the availability heuristic, you may think that self-driving cars are dangerous and should be heavily regulated and restricted. (The availability heuristic is when people rely on examples that come to mind because they have been frequently reported. If something is easy to remember, people think it’s important or common, leading to biased or inaccurate conclusions.)

Does this scare you? A car with no human driver. The media says you should be scared. The data says not to worry: prefer driverless cars if you want to be safe. (Image by Midjourney.)

The statistics tell another picture. Self-driving cars are already much safer than human-driven cars. And self-driving cars will only become better, according to Jakob’s First Law of AI: Today’s AI is the worst we'll ever have. (In contrast, humans don’t get better every year.)

A recent study by insurance giant Swiss Re summarizes the situation nicely. They compared insurance claims from human-driven cars with those from Waymo-driven cars (Google’s self-driving car project.) After adjusting for factors such as where the car was being driven, the bodily injury claims were as follows:

Self-driving cars: 0.08 per million miles driven (across 39 million miles of data)
Human-driven cars: 1.09 per million miles driven (across 125 billion miles of data)

The reduction in bodily injury from using AI to drive cars was 93%. This data recorded all insurance claims for bodily injury caused by the cars in question (AI-driven or human-driven). It is not certain that the results will generalize only to those accidents that cause deaths. But let’s assume that it does, to get a rough idea of the lives we could save.

The number of people killed every year in auto accidents is:

United States: 42,939 (in 2021)
European Union: 19,917 (in 2021)
India: 168,491 (2022)
Worldwide: 1.35 million (WHO estimate)

Using AI to drive all cars is likely to save 93% of these lives, meaning that the annual number of people killed by delaying the fast deployment of AI is:

United States: 39,933
European Union: 18,523
India: 156,697
Worldwide: 1.25 million

Saving more than a million lives per year. Not small fry. Admittedly, it’ll take many years before the last human car is off the roads in the poorest countries, no matter how much we speed up AI developments in rich countries. But it’s certainly realistic that we could save almost 40,000 American lives annually reasonably soon if we put the pedal to the medal on AI development in the United States.

And yes, this means that I am perfectly willing to see the headline, “Self-driving cars killed 3,006 Americans last year.” We know that’s what the press will write, ignoring the 40K lives saved by my hoped-for change to AI-driven cars.

Full speed ahead on AI will save almost 40K lives per year in the United States and 1.25 in the world, just from fewer automobile accidents. Many more lives will be saved by better AI-driven healthcare. (Car by Playground.AI.)

Skip the Turing Test; Use the Nielsen Test

We already met the fallacy of comparing AI to the very best human. Even if the world’s best human for a certain task can beat AI, this doesn’t mean that AI use will be detrimental. Jakob’s Fifth Law of AI states: To be useful, AI doesn't need to surpass the best human; it only needs to outperform the human you have available, who will, on average, be an average human.

This implies that the two most common ways of thinking about AI are false:

Turing Test: checks whether AI is equivalent to a human
Perfection Test (my name): checks whether AI is perfect and never makes mistakes

(As a reminder, the Turing Test, proposed by English computer pioneer Alan Turing in 1950, is a method for determining whether a computer can be said to “think.” The test involves a human judge engaging in a natural language conversation with a human and a machine, without knowing which is which, and then deciding which of the two is human. If the judge can’t tell the difference between the human and the machine, then the machine is said to have passed the Turing Test. For all practical purposes, AI thinks like a human if its expressed thoughts are like a human’s. We can’t look inside the black box and see how those expressions came to be.)

We don’t need AI to be the equivalent of a human or to function the same way humans do. We already have 8 billion humans, so the benefit of adding AI is for it to do something new.

But even people who agree that the Tiring Test is not the measure of AI usefulness often insist on the Perfection Test, as evidenced by their comments, which condemn AI every time it makes a mistake.

In the Nielsen Test, an AI performs a specific task. We then rate the outcome quality on an appropriate metric for the task. We also rate the performance of those humans who typically perform that task on the same metric. If the AI score exceeds the average human score, the AI has passed the Nielsen Test.

(A possible modification is to use multiple metrics and either insist that AI beats the human average on all metrics or to have some way of weighing the individual scores to combine into a single score.)

You will note that my proposal is very similar to using metrics in quantitative user research. For example, we may measure time on task: how quickly can the job be done with and without AI?

We can also use rating scales. For example, the quality of work products is usually assessed by asking a team of independent judges to rate each sample on a 1-7 scale.

Similarly, suppose we want to assess whether AI is indeed an uncaring machine or can exhibit empathy. In that case, we ask the people it’s dealing with to rate the level of empathy they perceive. (Thus, empathy is in the eye of the recipient, whereas the entity that’s exhibiting empathy is treated as a black box. We measure the results, not how they’re generated.)

In the empathy example, we’ll employ the standard user research methodology of simply asking people to rate the empathy they’re receiving on a 1-7 scale. Were they treated with brutal unfeeling (score 1), or do they feel they were treated with utmost empathy (score 7)?

For example, patients are told some bad news after a medical test. Did the doctors generate emphatic ways of communicating with the patients fully from their innate bedside manners, or did they benefit from an AI's coaching and/or script-writing contributions? We would know if we conducted research on this problem. But the patients would not know. All that matters to the patients is how they feel treated by the medical system. That’s what the patients would score, and that’s what we would measure in a Nielsen Test. Does the use of AI make patients feel better or worse?

Similarly, for a test of actual treatment outcomes (as opposed to how patients feel), we could measure how frequently medical imaging is diagnosed correctly by an AI or by the average radiologist. Or we could measure the final outcome, in terms of patient survival, depending on whether AI contributed to the diagnosis and treatment plan.

In all these cases, my claim is that what matters is the average outcome, not the outliers. Exceptions are still worth analyzing to improve the performance of AI in subsequent cases.

Thus, I propose that we treat AI as a metaphorical slalom skier: Maximum speed down the mountain for all cases where the average use of AI improves outcomes for humanity relative to the average result from relying solely on those humans who are otherwise available to perform the task. (Remember that in many poor countries, the “available humans” may be nobody, especially in remote areas.) But we don’t just hurtle downhill in a straight line. We exhibit agility and change direction as warranted by the data.