When Humans Add Negative Value: AI Alone vs. Human–AI Synergy

Jakob Nielsen
2 hours ago
32 min read

Summary: The prevailing “human in the loop” paradigm for AI use must die or be drastically reduced. In many cases, human involvement already degrades AI performance, especially in analytical tasks. Creative fields still benefit from the positive synergistic effects of human-AI collaboration, but as AI improves, the scope for human involvement will likely shrink. Humans must move up the food chain to exhibit agency and strategic management, while letting superior AI systems operate without human interference to avoid introducing bias and error.

Are you making your AI dumber? Is it often better for humans to get out of the way and let AI do its job without human interference? Stop trying to “help” AI. For a growing number of tasks, your input makes things worse.

The standard advice for using AI for complex tasks was to keep a human in the loop, rather than letting AI do the job on its own without human interference. This is usually good advice, given the limitations of current-generation AI: there is often a synergy effect to be had by combining human talent and AI since the two forms of intelligence have different strengths.

My most recent example is the ability of Google’s NotebookLM model to one-shot a video without any human involvement, other than asking for the video to be made. (It still requires human agency to decide which videos to make.) No manuscript writing, no B-roll production, certainly no human actors or camera crew, and no editing required. One click and a final video appears (after a long wait). However, right now, I still feel that the videos I make with AI-human symbiosis are better than the AI-only videos on the same topic.

Two years from now, this may not be true. I am admittedly not a great filmmaker, so it may soon be possible for an AI trained on billions of hours of YouTube or TikTok videos to do better than I. If I were to meddle in the video production, the result might well be worse than what AI will make on its own. (If measured, say, by the number of viewing hours each video accumulated on the various platforms.)

There’s the rub: when does it add negative value to keep a human in the loop? When do we get optimal results from leaving the AI alone to do its job without human interference, because humans will only make things worse?

It is crucial here to distinguish between human agency and human judgment. Agency is the decision to initiate a task: for example, deciding which video to make. Judgment is the evaluation of quality or the selection of methods during the task: deciding how to make the video. The evidence suggests that the value of human judgment is rapidly declining as AI improves, while the value of human agency remains, for now, paramount.

Even when AI-only videos get to be better than AI+Jakob videos, they might still not be better than AI+Steven Spielberg movies. (Spielberg may be able to create better movies by dismissing all human actors and film crews and realizing his vision purely with AI. For example, he would be able to iterate through several completely different versions of a two-hour movie every day, likely resulting in a new masterwork every week that’s better than anything he could have made with humans, with the resulting strictly limited creative exploration of the latent design space.)

This illustrates a critical dynamic: the “Trough of Mediocrity.” As AI improves, it surpasses the ability of the average human. In this phase, most people (those who are competent but not elite) will degrade the AI’s performance if they interfere. Yet, exceptional humans at the far right of the skill distribution (the Spielbergs of the world) may still add value. The challenge lies in recognizing that for many tasks, the vast majority of the workforce falls into the trough where their involvement is now detrimental, even if they were previously considered skilled professionals.

However, even in the case of Steven Spielberg, the quality of the resulting films may have an upper bound imposed by his limited human creativity. AI might surpass this in ten or twenty years, and it’s possible that the pure-AI movie would be better than what would happen if Spielberg were to interfere with the AI’s creativity.

Chess: Short-Lived Synergy Advantage

For more contained domains, we long since passed the line where humans should stay away. Chess is a famous example. For many years, no chess-playing computer could beat a human Grandmaster. Then, on May 11, 1997, IBM’s Deep Blue chess computer beat the reigning human world champion, Garry Kasparov. (Kasparov had won the previous match, in February 1996, in humanity’s last hurrah. A little more than a year’s worth of AI progress sufficed to make AI the world’s best chess player.)

Interestingly, for several years after that AI victory, the very best chess games were not played by AI alone. Rather, the combination of a human Grandmaster and a chess computer proved to be the superior player. At the time, this combo-player was referred to as a “centaur,” after the creatures of Greek mythology. However, just as the Lapiths eventually defeated the centaurs during the famous battle known as the Centauromachy, the chess centaurs didn’t live long.

A centaur is half human and half horse, making it a good fighter, and also a suitable metaphor for the synergy effect that sometimes arises from having an AI and a human work together on a problem. (As an aside, in the early days of generative AI, image models could not draw a centaur, but inevitably rendered a human riding a horse, as opposed to a creature that’s partly human and partly horse. This, of course, was due to the training data containing copious images of humans riding horses, but no photos and only a few fantasy illustrations of actual centaurs. As you can see, present-day AI passes the centaur-drawing test with flying colors. This image was made with Seedream 4, although it should be noted that of the eight images I made, one still depicted a woman riding a horse. 7 out of 8 is not bad for something impossible two years ago.)

In 2005, a team called ZackS defeated a combination of a Grandmaster, another highly rated human player, and chess-playing software, despite the absence of any strong human players on the ZackS team. Their strategy was straightforward: they would always play the move recommended by the chess AI, without any human analysis or interference. By now, this is the recognized best approach to chess: let the AI play alone, because anything humans do will only make things worse.

A 2024 analysis of elite chess matches showed that having a human Grandmaster deviate from AI-suggested moves will cause win rates to drop by about 1 percent compared to letting the AI decide solo. It’s like having a GPS that knows every shortcut, yet insisting on taking the scenic route because you “feel” it’s better. On chess, human “creativity” now translates to creative ways to lose.

(Elo ratings provide another way of looking at chess-playing prowess. The best AI models currently have Elo ratings of around 3,550, whereas top humans such as Magnus Carlsen are around 2,850.)

For about a decade, “centaurs” (half human, half AI) were the best chess players in the world, but that period didn’t last long. (GPT Image-1)

Chess is a very restricted domain, which is why AI started dominating humans so early. Let’s turn to more varied domains next.

When Two Heads Aren’t Better: Cases Where AI Alone Wins

In many traditional decision-making and analytical tasks, researchers are finding that the AI on its own can outperform a human–AI team. These are the scenarios where the human partner, despite good intentions, introduces errors, bias, or delays. It’s as if the AI is a master chef and the human is an apprentice who keeps tossing in the wrong spices. Here are several such cases documented in recent studies:

Too many cooks spoil the broth. There’s often a kernel of truth in old sayings, and in cases where the AI has already seasoned the broth perfectly, adding human interference can only make it taste worse. (GPT Image-1)

Online Content Judgment (Fake Reviews): An illuminating example comes from a 2024 meta-analysis by MIT’s Center for Collective Intelligence. In one experiment detecting fake hotel reviews, the AI was more accurate on its own (73% accuracy) than either a human alone or the human-AI combo. Humans averaged only 55% accuracy on their own, and even with AI assistance, their performance was 69%, meaning that adding human “judgment” reduced the value of the AI’s results by four percentage points. In other words, whenever people overruled or doubted the algorithm, they often erred. The researchers noted people weren’t good at knowing when to trust the AI vs. their own judgment, leading the combined team to miss the mark more often than the autonomous AI. Here, well-intentioned human oversight actually spoiled the results.

In one experiment, AI acting alone identified 73% of fake information, but when humans “helped” the AI, accuracy dropped to 69%, meaning that the humans contributed a negative four percentage points. (GPT Image-1)

Forecasting and Predictions: This same comprehensive review reported that tasks like demand forecasting showed a similar pattern, with the algorithms out-predicting the humans, and any human adjustments to the AI’s prediction tended to degrade accuracy. In business forecasting scenarios, analysts often have the ability to override statistical forecasts (for instance, adjusting a sales prediction based on personal hunches or anecdotal info). Studies indicate that unless humans have clear additional insight, these overrides introduce noise. The MIT meta-study found that AI alone outperformed both humans alone and human–AI teams in demand prediction tasks. The likely culprit is overconfidence or misplaced intuition: people might add “gut feeling” changes to the AI’s numbers, but those feelings can be dead wrong.

Acting on human “gut feel” will often lower accuracy, compared to the more neutral AI. The human “gut” (as a metaphor for our intuition) evolved to help us avoid being eaten by sabertoothed tigers by making split-second decisions, but the resulting cognitive biases serve us poorly for making more strategic decisions based on complex contexts and long time frames. (GPT Image-1)

Medical Diagnosis (Doctors vs. AI): Much recent evidence suggests that even in complex fields like medicine, a top AI can edge out human doctors. Worse, adding the doctor to the AI doesn’t always help. A 2024 study at the University of Virginia tested physician diagnoses on tough cases, with one group of doctors using a chatbot (ChatGPT) and another using traditional tools. The doctors aided by AI did no better at diagnosis than those without (around 76% accuracy in both groups), meaning the AI assistance didn’t significantly improve their hit rate. More striking, the AI on its own (with no human) correctly solved over 92% of the cases, far above either group of doctors. (And this was the old GPT 4 model, which is now handily outscored by GPT 5 Pro.) The researchers were surprised to find that “adding a human physician to the mix actually reduced diagnostic accuracy,” even though it slightly improved efficiency in reaching a conclusion. In other words, the human-AI team was less accurate than the AI acting solo. The human doctors sometimes overrode the AI’s correct answers or led the process astray, highlighting a harsh reality: in these diagnostic trials, the weakest link was the human. (Doctors, don’t panic – we’ll discuss caveats and how AI can assist you shortly. But these results underscore how far AI’s medical expertise has come in certain domains.)

There are already several studies where human doctors made things worse when they interfered with AI’s diagnosis of a case. (Seedream 4)

Medical Decision-Making and Management: Another 2025 study from Stanford examined “clinical management” decisions (complex questions like tailoring treatment plans) and found that an AI chatbot alone outperformed doctors given only internet resources. However, when doctors were paired with the chatbot, together they performed as well as the AI alone. In this case the human–AI collaboration managed to keep up with the AI, but not exceed it. Essentially, the doctors’ best move was to accept the chatbot’s advice, mirroring the chess grandmaster whose best move is always the one the computer suggests. If the physicians deviated from the AI’s guidance, their performance dropped. Co-senior author Dr. Jonathan Chen admitted this challenges the long-held assumption that a human plus computer is inevitably better than either alone: “We may need to rethink where we use and combine those skills”, he noted. The synergy here was weak: the AI was doing the heavy lifting, and the human’s main value was to verify or implement the AI’s plan without introducing mistakes.
Radiology and Image Analysis: Front-line radiologists are also grappling with this phenomenon. Highly accurate AI models for reading medical images (like X-rays or mammograms) have emerged, and studies show mixed human–AI results. An analysis published in Nature Medicine (2024) revealed that AI assistance improved some radiologists but worsened others. In some instances the AI actually interfered with the doctor’s accuracy, for example by tempting them to see a pattern that wasn’t really there. The aggregate finding was that individual differences are huge, where certain doctors get a boost from AI, while others become less effective. Pranav Rajpurkar, one of the authors, put it plainly: “Different radiologists react differently to AI – some are helped while others are hurt by it.” Notably, less experienced clinicians were not guaranteed to improve with AI’s help; a poorly calibrated user might trust the AI’s suggestion even when it’s wrong or might get distracted by it. So if you pair the wrong human with the AI, the duo underperforms the algorithm alone. This suggests that until training and interfaces catch up, a superhuman AI in imaging might be better off working solo or only with radiologists who know how to correctly complement it.

For many aspects of medicine, it’s already malpractice if the human doctor doesn’t consult AI to help with diagnosis and treatment plans. In a few more years, it will likely be considered malpractice in many cases if humans override the recommendations of medical AI. (GPT Image-1)

Other Analytical Tasks: Across numerous decision-type tasks studied (from financial predictions to content moderation and fake news detection), researchers consistently found that when the AI’s stand-alone performance was higher than the human’s, the combined performance suffered. The meta-review by Vaccaro et al. noted a clear pattern: if AI > human, then AI + human < AI in many cases. People either doubted a correct AI answer or blindly followed an incorrect one, with both scenarios dragging down accuracy. As one example, experimental subjects trying to identify fake online content often misjudged when to listen to the AI, leading their joint accuracy to drop below the algorithm’s level. In emerging areas like deepfake detection, this gap is even starker: humans are barely above chance at spotting deepfakes, whereas AI systems (while not perfect) are considerably better. In such cases, the human “intuition” about what’s real or fake is more noise than signal.

The pattern is consistent: whenever the AI’s raw ability surpasses the human’s (as is increasingly the case in domains like medical Q&A and image recognition) the human+AI team tends to perform below the AI’s solo level. Instead of lifting the AI, the human often pulls it down. If the AI is “smarter” or more consistent, the best outcome from combining them is that the person doesn’t interfere; worst case, the person’s input actively reduces accuracy.

Blindly assuming that adding a human decision-maker will improve an AI system can be false, especially if the human isn’t bringing complementary skills to the table. In fact, the average of 106 experiments compiled by MIT researchers showed that while human–AI teams handily beat humans alone (no surprise there), they did not outperform the best AI alone in general. Thomas Malone, the study’s author, admitted this was a “most surprising finding,” undermining the naive assumption that mixing human judgment with AI would “be quite a bit better.” In plain terms: if you already have an AI that’s better at the task than you are, your involvement might get in its way.

This metacognitive failure, this inability to accurately assess when our judgment exceeds the machine's and when it doesn't, sits at the heart of the problem.

The situation gets worse when stakes are high. Research on algorithm aversion shows that, as the stakes of a decision rise, humans become more cautious about trusting algorithms. Humans can be overly punitive when an AI makes a mistake, judging its errors more harshly than identical errors made by a human. This fragile trust can lead users to abandon a statistically superior algorithm after a single, salient failure. This creates a perverse dynamic where humans might defer to AI for routine decisions but then override it precisely in the high-stakes moments where the AI’s judgment might be most valuable.

This paradox exists because human trust is not a rational calculation of an AI’s aggregate error rate. It is an emotional, heuristic-driven response. We become complacent when a task is routine and our cognitive load is high. We become distrustful when an error violates our expectation of machine perfection or when the emotional stakes feel high. This emotional volatility makes the human an unreliable and unpredictable partner for a purely logical system.

Beyond accuracy, we must also consider the economics of intervention. Keeping a human in the loop is expensive and slow. In many business applications, the decision isn’t just about whether Human+AI is a little better than AI alone, but whether the marginal improvement (if any) justifies the substantial increase in cost and delay. If an AI alone achieves 95% accuracy instantly and at near-zero marginal cost, while a human-AI team reaches 96% accuracy but requires 30 minutes of expensive human labor, the organization will rightly choose full automation. If the human is adding negative value to accuracy, the economic cost of their involvement is profoundly negative: we are paying for the privilege of degrading the AI’s performance.

Human labor is overwhelmingly more expensive than AI. This makes it cost-prohibitive to add them to a task if human involvement only improves the outcome slightly. (Seedream 4)

When Synergy Works

Thankfully, it’s not all doom and gloom for us humans. There are plenty of situations where a human–AI partnership can outperform either alone. The key is that in these cases, humans bring something vital that the AI lacks, and vice versa. It’s like a good buddy-cop duo: one partner covers the other’s blind spots. Here are some recent studies highlighting where the synergy worked:

Specialized Image Recognition (Experts + AI): Classifying bird species from images is a task requiring expert knowledge. In an experiment, humans were actually better than the AI alone (81% vs 73% accuracy) due to their domain expertise. However, when the expert human and the AI worked together, accuracy jumped to 90%. This is a classic synergy scenario. The AI brings speed and a second pair of “eyes” that might notice patterns the human could miss, while the human brings contextual knowledge (e.g. recognizing if an image angle or lighting might fool the AI, or recalling rare field guide details). The result is a team that beats either one’s solo score. As MIT’s Malone explained, when the human is initially the stronger player, they also tend to be better at using the AI appropriately because they know when to trust the AI and when to trust their own skills. The outcome: no ego clashes, just each doing what they do best.
Creative Content Generation (Design, Writing, Ideation): One area where human–AI collaboration truly flourishes is content creation. Generative AI systems (like those that produce text, images, or music) can supply a wealth of ideas and drafts, while humans provide judgment, taste, and refinement. Research indicates that creative tasks enjoy positive synergy more often than analytical decision tasks. In fact, the meta-analysis found that for creation tasks, the average human–AI team effect was positive and significantly greater than for decision-making tasks, which tended to have negative effects. Let’s consider a few concrete examples:
- Writing and Brainstorming: In one 2023 experiment, people were tasked with writing creative short stories; those given AI-generated story ideas as prompts wrote more creative and interesting stories than those who started from scratch. The AI-generated suggestions often contained offbeat or “whimsical” concepts (one participant was inspired by an AI’s bizarre phrase about “demonic dough” to craft a horror story scene). These surprising ideas helped break writers out of conventional thinking. Crucially, the writers still steered the ship by picking which AI ideas to use and integrating them into a coherent narrative. Another study from Hong Kong observed that in co-writing, humans retained control most of the time, using AI to get past writer’s block or flesh out details. The human authors would let the AI generate a burst of options when stuck and then selectively build on those. This kind of iterative back-and-forth showed that AI can act as a creative partner that expands human imagination rather than replacing it. The synergy is evident in the outcomes: writers found that even the AI’s “failures” (random or off-target outputs) sparked new ideas, and the final stories benefited from the blend of human narrative sense and AI’s outside-the-box suggestions. Notably, less inherently creative people benefited the most: one study found that participants with lower creativity scores saw the biggest improvement when using AI-proposed ideas.
- Design and Ideation with Visuals: A fascinating experiment looked at how AI-generated images could boost human creativity in design brainstorming. Participants were asked to come up with creative uses for everyday objects (a classic Alternate Uses Test) in three conditions: with no image, with a typical stock photo of the object, or with a weird AI-generated image of the object. The group that saw AI-generated images produced the most ideas (fluency) and equally original ideas compared to others. For example, an AI might generate a surreal image of a fork twisted into a sculpture, prompting a person to think of alternate uses for a fork beyond eating. This shows how AI can contribute diverse stimuli that break humans out of perceptual boxes. The AI doesn’t evaluate the ideas (it just outputs novel stimuli) and the human brain does the rest, combining those visuals with real-world knowledge to generate feasible creative solutions. The synergy here is that AI expands the search space of ideas, while the human filters and elaborates on them.
- Generative Design & Art: In fields like graphic design, architecture, or game design, we see similar collaboration patterns. Designers use generative AI to produce multiple drafts or variations in seconds, then they pick and refine the best ones. The human + AI together explore far more alternatives than a human alone could, leading to more innovative final designs. Malone’s team pointed out that creation tasks were relatively under-studied up to 2023 (only ~10% of papers in their review were about content creation), but those that existed showed clear positive synergy. Anecdotally, many artists say an AI image generator can inspire ideas they wouldn’t have thought of, but that a human still needs to add sensibility and intentionality to turn it into a polished piece. The explore–refine iterative loop (AI proposes, human adjusts, AI refines, etc.) is a new kind of workflow that seems quite fruitful. Unlike a static decision, creative work benefits from the real-time give-and-take between human and machine.

Content creation is currently a prime example where humans and AI achieve a synergy effect, because each form of intelligence contribute differently, rather than compete. (Seedream 4)

Safety-Critical Systems (Oversight when AI isn’t perfect): When AI still makes significant errors that humans can catch, having a human in the loop is advantageous. For example, if an AI autopilot struggles with unusual weather, a human pilot’s intervention can avert disaster. In early 2020s self-driving car trials, hybrid modes (where a human driver could take over) were considered safer overall because AI wasn’t fully reliable yet. The synergy wasn’t about reaction speed, but about judgment: the human could notice a situation the AI didn’t handle (say, a construction worker directing traffic) and take control. As AI improves, these hand-offs become rarer, but until AI reaches near-perfect reliability in complex open environments, a human fail-safe adds value. So in cases where AI alone is not yet as good as an expert human, the combination tends to shine. Indeed, the MIT meta-review confirmed that whenever the human alone was initially better at a task than the AI, the human–AI team usually achieved performance gains over either alone. The human provides the primary skill and uses the AI to cover blind spots or speed up tedious parts. Think of a veteran doctor using an AI to double-check for rare conditions: the doctor’s expertise leads, but the AI might catch an odd detail the doctor hadn’t considered, yielding a more accurate joint diagnosis. This dynamic is constructive and confidence-boosting for the human: they remain in charge and only lean on the AI where it helps.

To sum up the synergy successes: they occur when each party contributes something distinctive and valuable. If the human has knowledge or abilities the AI lacks (whether it’s creative vision, contextual understanding, ethical reasoning, or simply being better at the task in the current state of AI), and the AI offers speed, breadth, or consistency the human can’t match, then you have the ingredients for a 1 + 1 = 3 scenario. A well-designed collaboration harnesses AI’s strengths without letting its weaknesses trip you up, and vice versa for the human’s role. Achieving the latter is harder, but the creative domains give us a taste of what’s possible when synergy clicks.

Why Humans Help in Some Cases and Hurt in Others

After seeing both sides of the coin, the natural question is: what’s different between the synergy cases and the negative-value cases? Several themes emerge from the studies:

Relative Skill Levels (Who’s Better at the Task?)

This is the most straightforward factor. If the AI is already better than the human at the task (more accurate, faster, or more consistent), then the human’s contributions are likely to be suboptimal. In contrast, if the human is better than the AI (or the AI is at least prone to mistakes the human can catch), then the human can add value. The meta-analysis evidence strongly supports this: teams had performance gains when humans outperformed the AI initially, and performance losses when the AI outperformed the humans initially.

Think of it as averaging rather than adding performance: if one party is strictly superior, including the inferior partner drags performance toward that lower level. In chess, the computer is superior, so any human deviation from the engine’s plan is likely a weaker move (hence negative synergy). In a creative brainstorm, a human’s imagination is superior in judging what ideas are interesting, while the AI is superior in brute-forcing many variants. Each covers the other’s gap, leading to positive synergy.

The key point is knowing who (or what) has the edge in which aspect of the task is crucial. The best collaborations split the work such that each only handles the parts they’re best at. Poor collaborations have the human and AI working on the same decisions when one of them is objectively worse at it: a recipe for the AI’s precision to be diluted by human error, or the human’s insight to be drowned out by AI noise.

Human Trust and Judgment (Calibration)

How well the human can judge when to rely on the AI versus their own opinion is a make-or-break factor. Unfortunately, people are notoriously bad at judging their own skill and talents, with a strong bias toward overestimating themselves. This is often exacerbated by the Dunning-Kruger effect, where individuals with lower competence in a domain are the most likely to overestimate their abilities and, consequently, are more likely to interfere with a superior AI. (Anybody who has ever run a company knows that virtually all employees feel that their contribution to the company is greater than it actually is.) Negative outcomes often stem from miscalibration: humans either over-trust the AI (accepting its suggestion even when it’s wrong) or under-trust it (rejecting a correct AI suggestion due to false intuition).

The less skilled people are, the more they tend to overestimate their own skill, ability, and contribution. (Seedream 4)

In the fake review detection example, people couldn’t discern when the algorithm was likely right, so they occasionally second-guessed the AI’s correct flag or accepted a fake that the AI would have caught. Likewise, doctors might override an AI’s correct diagnosis because “it just doesn’t feel right,” or conversely, they might follow an AI’s suggestion for a tricky case where the AI actually isn’t equipped to account for a patient’s unique context (a form of automation bias).

Successful synergy, on the other hand, often features humans who are aware of the AI’s strengths and weaknesses. Expert birders knew when to trust the AI on bird ID (perhaps on clear photos of common species) and when to rely on their expertise (on ambiguous images or rare species). In creative work, users often treat the AI as a partner rather than an oracle: they don’t expect perfect ideas, so they aren’t disappointed or misled when the AI outputs something offbeat; they just mine it for inspiration and move on. Calibrated use of treating the AI’s output as helpful suggestions rather than gospel truth is a hallmark of positive synergy scenarios.

Design implication: Interfaces that help users understand an AI’s confidence or likely error modes could improve human judgment of when to listen to the AI. But interestingly, Malone’s review found that simply providing confidence scores or explanations didn’t magically fix the problem. So calibration is hard; it likely develops with experience and task-specific training. Teams that flourish probably have humans who’ve learned exactly how to use their AI tool (knowing, for example, that “the AI is great at identifying pneumonia on X-rays, so I’ll trust it there, but it often misses subtle fractures, so I’ll double-check those myself”). Mis-calibrated teams, by contrast, either slavishly follow the AI off a cliff or constantly fight the AI even when it’s right, wrecking any chance of a fruitful partnership.

Nature of the Task: Creative vs. Convergent

The type of task significantly affects whether human–AI combination helps or hurts.

Decision-making tasks with a single correct answer (e.g. classifying a review as fake or real, diagnosing a disease, forecasting a number) often see the AI perform very well, and the human’s role is mostly to decide whether to accept the AI’s answer. If the human is worse at that core decision (due to any of the reasons above), their involvement usually lowers the odds of the correct outcome. These tasks are convergent, meaning success is measured by converging on the one right answer. AI’s advantage in pattern recognition and consistency shines here, and human intuition can be surprisingly unreliable or biased.

On the flip side, open-ended creative tasks or tasks requiring divergent thinking see more complementarity. There isn’t a single “correct” output; success might be measured in quality, novelty, or user satisfaction, which are hard to optimize purely algorithmically. AI can generate options, but human taste ultimately decides what “works.” This means the human isn’t second-guessing a precise AI answer, they’re curating and guiding a range of AI outputs. The studies confirm that content creation tasks yielded significantly greater synergy than decision tasks.

Furthermore, tasks that involve subjective judgment, empathy, or understanding complex context (like handling an upset customer, planning a nuanced treatment considering patient preferences, etc.) leave room for humans to add their unique value. AI might provide data or standardized suggestions, but the human considers the holistic picture. The more a task requires common sense, ethics, and empathy, the more likely a human–AI team can outperform AI alone, because the AI (as of now) isn’t great at those softer aspects. Conversely, the more a task can be boiled down to data and rules, the likelier it is that the AI alone will eventually beat any human or human–AI duo.

Team Process & Design of Collaboration

The workflow of how the human and AI interact influences outcomes. Many failures of synergy can be chalked up to a poor process. If a human only gets to see the AI’s answer and say yes or no, that setup might encourage rubber-stamping or knee-jerk overriding. But if the collaboration is designed so that the human and AI truly combine inputs (e.g., AI analyzes raw data, human considers external context, and then they agree on a decision), you’re more apt to harness both strengths.

I have often recommended redesigning processes rather than just inserting AI into existing steps. For example, instead of having a human create a forecast and then an AI adjust it (or vice versa), a better process might be to have the AI generate a baseline forecast and highlight specific items with high uncertainty where a human analyst should investigate further. That way the human isn’t meddling where the AI is confident and likely correct, but is focusing their expertise where the AI is unsure (perhaps due to a sudden market shift or an out-of-distribution scenario). This division of labor can yield synergy by preventing unnecessary human interference while still capturing human insight on the hard cases.

Another process factor is timing and iteration: generative tasks did well partly because humans iteratively prompted the AI and could steer it gradually. In contrast, many decision tasks were one-shot, where the AI gives an answer and the human either accepts or changes it. By allowing more back-and-forth or integrating explanation (“why did the AI suggest this?”), humans might make better decisions with the AI. Early evidence suggests that explainable AI can improve human–AI team performance by helping humans understand when an AI might be wrong. So the design of the UI, the level of transparency, and how tasks are allocated between human and AI all govern whether the combination is effective or counterproductive.

Human Training, Experience, and Mindset

Finally, the people themselves matter. As the Harvard radiology study showed, individual differences are huge: some people benefit from AI, some don’t. Experience alone wasn’t a predictor, interestingly, but possibly adaptability and openness to AI are. A doctor who treats the AI as a collaborator and takes time to learn its quirks will likely do better than one who is either overly skeptical or overly trusting. Training users to work with AI is an emerging need. The UVA diagnosis study authors concluded that doctors will need “formal training in how best to use AI” after seeing that untrained doctors with AI didn’t improve accuracy. In other words, throwing AI at people without guidance can lead to negative results, but if people are taught how to effectively integrate AI (when to lean on it, how to verify its suggestions, etc.), the outcomes could improve.

The mindset is also crucial: in creative teams, participants who saw the AI as a partner and were willing to accept its wacky outputs as part of the process had great success. If someone instead approached the AI with either blind faith or scorn, the collaboration likely faltered. So, a growth mindset (“the AI can help me improve”) combined with healthy skepticism (“I’ll double-check because the AI isn’t perfect”) appears to characterize the best human members of a human–AI team.

In summary, synergy thrives when the human provides what the AI lacks (whether superior skill or complementary insight) and knows how to manage the AI’s contributions intelligently. It fails when the human is essentially doing a worse version of what the AI does, or when the human can’t effectively interact with the AI (due to miscalibration, poor process, or lack of training). As one research article cheekily pointed out, many people implicitly assume that combining human and AI will automatically be better “otherwise, they would just use the best of the two.” The reality is that combining forces is only worthwhile under the right conditions; if those conditions aren’t met, you really should just use whichever performer (human or machine) is best and leave the other out.

Better AI: Fewer Humans in 5 Years?

Looking ahead five years, as AI systems become even more powerful, will we see more cases where the optimal solution is to remove the human from the loop? The research trends hint at a nuanced future. Here are some speculative predictions on how things might unfold by 2030:

AI Mastery Expanding, Humans Retreating to Niches

As of 2025, AIs have already bested humans in many narrow domains (chess, Go, certain medical diagnostics, etc.). With each year, the frontier of human advantage recedes. By 2030, AI will have superhuman performance in even more areas, such as legal document review, routine surgical procedures, basic software debugging, translation, you name it.

In any domain where AI crosses the threshold of being clearly better than most humans, the argument for full automation strengthens. For instance, if future AIs can not only detect tumors on scans but also make holistic treatment recommendations more accurately than doctors, hospitals might rely on AI for standard cases and only call in a human for unusual situations. A telling recent milestone: OpenAI’s HealthBench evaluation in 2025 showed that when doctors were given access to responses from the GPT-4.1 model, the doctors could no longer improve on the AI’s answers. Just a few months prior, with a slightly older model (GPT 4o), doctors could enhance the AI’s answers with their expertise. The AI leapfrogged in quality, closing that gap. (And AI is likely even better now after the next generation of AI models has shipped.)

By 2030, many creative ideation tasks that currently require a human for filtering will be handled by a second AI system trained on aesthetic and strategic goals. The human role will shift further up the value chain to defining those goals, not executing the steps to reach them. That is human judgment (or “taste” as some people like to call it) will decline in importance, while human agency will remain preeminent.

Each successive AI model will leave less room for human improvement on top of its capabilities. In plain language: when the AI gets that good, the best a human can do is agree with it. We may see more benchmarks where initially the “AI + human” was best, but later a new AI alone becomes the top performer, essentially obsoleting the human helper. Humans are like training wheels that AI eventually outgrows.

New Synergy Opportunities with Advanced AI

On the flip side, as AI grows more capable, it might unlock collaboration in domains previously too hard for AI to contribute. Creative AI is one example: early AIs couldn’t really co-write an article or co-design a product with you, but modern ones can, leading to new synergistic workflows.

In the next five years, AI will develop more generalized reasoning and common sense. Paradoxically, that could make it a better collaborator even in areas like strategy, innovation, or scientific research, not just a replacement. If an AI can understand goals and constraints more like a human, working with it could feel more like working with a super-smart colleague. We might see “strong synergy” in creative and complex problem-solving become more routine: imagine human scientists working with an AI that proposes hundreds of hypotheses and experiment designs, some of which a human would never have conceived, accelerating scientific discovery. In that scenario, AI isn’t eliminating humans, but elevating them by doing the heavy lifting of exploration.

The dominant paradigm of AI is shifting from that of a tool or co-pilot (in Microsoft’s preferred term to soothe the sensibilities of their enterprise customers) that assists a human to an autonomous agent that executes tasks independently. Humans will move up the food chain and be responsible for managing year-long AI projects, even as the AI agent executes autonomously on a daily basis.

The human still guides the process and ensures relevance. So, while straightforward tasks might get fully automated, complex multifaceted tasks could still benefit from human judgment on top of AI suggestions. The trick will be that those suggestions will be coming from an ever more sophisticated AI, so the human’s role will shift to higher-level decision-making (like setting objectives, providing ethical oversight, or injecting genuinely human perspectives such as understanding other humans’ needs).

Human Out of the Loop (by Choice):

In some cases, removing the human might not only be advantageous for performance but also necessary for scale. Think about content moderation on a platform with billions of posts: humans in the loop are a bottleneck. Or real-time autonomous driving decisions. A human cannot be in that loop at scale (you can’t have a human approve each braking decision at highway speed).

Large-scale infusion of intelligence into big systems to improve billions of decisions may only be feasible when everything is pure AI. Humans don’t scale. (Seedream 4)

As AI reliability increases, organizations will feel more comfortable cutting the cord. Airlines might trust AI for autopilot gate-to-gate when data shows it’s safer than error-prone human pilots for routine flights (with humans only intervening remotely if needed). Social media might rely entirely on AI filters for certain kinds of obvious spam, with no human review unless the AI flags uncertainty.

We’re likely to see new domains tipping from partial human oversight to full autonomy as the technology and trust mature. However, this will come with growing pains. Ensuring the AI’s “values” align with ours will be critical when humans step back completely.

Human in the Loop (for Accountability and Comfort)

Even if AI becomes objectively better, in some sensitive areas, humans might remain in the loop for a while due to ethical, legal, or psychological reasons. For example, patients may simply be uncomfortable with an AI-only medical decision and want a human doctor to explain and confirm it. Similarly, juries or regulators might demand a “human in the loop” for AI-driven decisions like loan approvals or job interviews, to ensure someone can be held accountable or to check for bias.

So in the near future, some human–AI teams will exist not because the human is adding technical value, but because the human adds a sense of oversight and legitimacy. In a sense, the human becomes a moral and PR layer, even if the AI is doing 99% of the work.

One hopes that these humans will still catch the occasional AI mistake (ensuring at least that minimum value), but if AI mistakes become exceedingly rare, the human’s role will be pure theater. In these cases, the human may be the liability sponge: a person designated to absorb blame if the autonomous system fails, even if that person had no realistic way to prevent the failure. Humans will appear in control to make other humans more comfortable. Over time, as society gains confidence in AI (and if legal frameworks adapt to allow AI to carry more responsibility), those roles might fade too.

Expect a transitional period where humans stick around as performative bosses of AI systems largely for our peace of mind, akin to how elevators had human operators long after automation could run them, just because riders initially felt safer seeing a person at the controls.

AI will make mistakes. Fewer in the future, but we’ll never reach zero mistakes for complex tasks. But humans make errors too. It’s easy to overreact after an AI mistake and require human review of every AI decision, but that may sometimes increase the total number of mistakes by introducing human fallibility. We need a sensible cost–benefit analysis, where a limited number of mistakes are tolerated in return for progress and the many new nice things we’ll get from AI, including freedom from most human mistakes. (GPT Image-1)

Training and Interfaces Improve Synergy

As we learn from current failures, the next generation of tools will likely focus on making human–AI interaction smoother and more effective. We’ll see better AI explanations, confidence indicators, and user training to address the miscalibration problem. If in five years every professional gets coursework on how to work with AI (just like many now learn data literacy or UX basics), we might turn some of those negative synergies into positive ones.

The ideal future isn’t one where humans are completely extraneous, but one where whenever humans are involved, they truly add value because they’ve been equipped to do so and the AI is built to cooperate. In other words, the goal is to minimize scenarios where a human is in the loop “just because” or due to inertia, and maximize scenarios where human presence genuinely makes the outcome better.

In essence, the trend line is that AI will continue to gobble up the straightforward tasks where algorithmic prowess rules, and humans will retreat to either very complex, nuanced tasks (where their general intelligence and ethical reasoning still matter) or will remain as a safeguard/coach for the AIs. The balance of synergy vs solo AI will shift domain by domain as AI crosses performance thresholds. If we don’t add value, eventually the AI will be trusted to handle it alone.

A New Mandate for UX: Designing Governance, Not Just Interfaces

The conclusion from this body of evidence is not that humans are becoming obsolete, but that our role is undergoing a profound transformation. For UX professionals, this signals a critical shift in focus. The most important design work of the next five years will not be about creating clever UIs for humans to “collaborate” with an AI on a specific task. It will be about designing the systems of governance (rules of engagement) for a world where AI agents operate with increasing autonomy.

This new mandate has several key components:

Shift from Interaction Design to Relationship Design. The focus must move from the usability of a single tool to defining the roles, responsibilities, and communication protocols within a hybrid human-AI team. The design of conversational agents is already moving from dyadic (one-on-one) to polyadic interactions, where an AI mediates interactions between multiple humans, requiring a focus on social boundaries and ethics.
Designing Off-Ramps and Circuit Breakers. The most critical UX task for systems where AI is superior is to determine the precise conditions under which the AI should cede control to a human. This means designing systems with built-in guardrails that enforce this handover, rather than relying on a human's flawed, in-the-moment judgment.
The UX of Trust Calibration. Instead of simply building “explainable” systems that backfire through information overload or reinforcing over-trust, designers must create feedback loops and performance dashboards that help humans build an accurate mental model of an AI’s capabilities and limitations. The goal is to actively fight our inherent cognitive biases and foster correctly calibrated trust.

Quite possibly, the UX field will have to die, as it’s overly focused on design, which is becoming irrelevant. Management and judgment are the two fields that will likely contribute much more to future usability and human well-being. Of course, to the extent that these two fields craft the new user experience, we should deem them the new “UX,” but the skillsets to make humanity thrive in the future will be completely different than what’s currently sheltering under that two-letter umbrella. With superintelligent AI, the umbrella may have the same initials, but new people will carry it. (Except for those few current UX professionals who make the switch in time.)

Product design has long sheltered under the UX umbrella, where interaction designers and usability researchers conjured up wireframes that turned into finished user interfaces. In the future, little value will probably come from these commoditized activities, and attention will turn to higher levels of product and workflow strategic design, which requires different skillsets, originating in different disciplines than those that fed UX in the past. The umbrella may still be called “UX” in 10 years, even as new people will shelter under it. (Seedream 4)

Conclusion: Designing the Future of Human–AI Teams

So, when do we keep the human in the loop, and when do we let the AI fly solo? The answer boils down to understanding who’s bringing what to the table. If the human’s main contribution is second-guessing a superior AI, then get rid of them and let the AI do its thing. If rules or data strictly bound the AI’s outputs and miss the bigger picture, then by all means, keep a human in the mix to inject editorial judgment.

For UX designers and product leaders thinking about integrating AI, the takeaway is clear: don’t assume human+AI is automatically a win-win. Instead, study the task. If AI has reached expert-level accuracy, your design might lean toward full automation with perhaps an “alert a human if unsure” fallback. If AI is just an assistant in a creative or complex task, design the UX to amplify the human’s strengths (give users control, let them iterate with the AI, ensure they can override easily where they know better). In all cases, help users calibrate: show why the AI suggests something and how confident it is, so users can make informed choices on whether to go along with it.

Use AI for what it’s good at (crunching data, generating lots of options). Use humans for what they’re good at (judgment, context, and thinking outside the box). And don’t let either one try to do the other’s job. When we follow this principle, human–AI teams can be spectacularly effective; however, if we violate it, we often get awkward outcomes where the AI would have performed better without our involvement.

As humans we don’t want to be in the loop just to click “OK” on what the AI said. We want to be meaningful contributors. The next few years will challenge us to find those niches where our contribution is positive and to relinquish the ones where it’s not. With ever-smarter AI, that will be a shrinking island, but it’s also an opportunity to elevate our own work. Freed from rote tasks where AIs excel, humans can focus on higher-level problem solving, interpersonal interaction, and creative exploration: areas where, at least for now, we still have the home-field advantage.

In the end, the goal isn’t to pit humans against machines in a zero-sum game; it’s to allocate tasks optimally for the best overall outcomes. Sometimes that means teamwork, sometimes that means knowing when to get out of the way.

The ultimate value of a human in an AI-driven world is not to be a better calculator, a faster analyst, or even a more prolific artist. It is to be the arbiter of purpose, the setter of goals, and the guardian of values. Our job is to tell the AI what is worth doing. And, as the evidence increasingly shows, to have the wisdom to know when to get out of its way.

When AI performs nearly all current jobs, will the result be mass unemployment among humans? I think not, because many new needs will arise when the economy grows as immensely as it will with full AI deployment. Future jobs won’t be the current jobs, that’s for sure, but new jobs will likely arise. (GPT Image-1)

When Humans Add Negative Value: AI Alone vs. Human–AI Synergy

Chess: Short-Lived Synergy Advantage

When Two Heads Aren’t Better: Cases Where AI Alone Wins

When Synergy Works

Why Humans Help in Some Cases and Hurt in Others

Relative Skill Levels (Who’s Better at the Task?)

Human Trust and Judgment (Calibration)

Nature of the Task: Creative vs. Convergent

Team Process & Design of Collaboration

Human Training, Experience, and Mindset

Better AI: Fewer Humans in 5 Years?

AI Mastery Expanding, Humans Retreating to Niches

New Synergy Opportunities with Advanced AI

Human Out of the Loop (by Choice):

Human in the Loop (for Accountability and Comfort)

Training and Interfaces Improve Synergy

A New Mandate for UX: Designing Governance, Not Just Interfaces

Conclusion: Designing the Future of Human–AI Teams

Recent Posts

Top Past Articles

A New AI: Creation as Exploration and Discovery

The 10 Usability Heuristics in Cartoons

4 Metaphors for Working with AI: Intern, Coworker, Teacher, Coach

Dark Design Patterns Catalog

Jakob’s Law of the Internet User Experience

Ideation Is Free: AI Exhibits Strong Creativity, But AI-Human Co-Creation Is Better

The 10 Usability Heuristics Reimagined

UX Needs a Sense of Urgency About AI

AI Is First New UI Paradigm in 60 Years