top of page
  • Writer's pictureJakob Nielsen

Ideation Is Free: AI Exhibits Strong Creativity, But AI-Human Co-Creation Is Better

Summary: The future belongs to the symbiants. 12 research studies confirm that AI’s creativity surpasses humans in sheer idea abundance. Yet co-creation performs even better, fusing AI-generated conceptions with human discernment and refinement.

Recent studies found that AI is more creative than almost all humans at generating a profusion of new and eclectic ideas. I described the first round of research in my article on August 9, and due to the astounding speed of AI developments, we already have more research that confirms those findings and adds intriguing details across myriad domains, from sales calls to sublime Japanese poetry:

  • Creative writing of short stories improved by around 8% in novelty when authors used generative AI for story ideas. [Study 4]

  • Less-creative writers (as determined by a separate creativity test) enjoyed a more extensive lift from AI than more-creative writers, so AI narrowed the skill gap. [Study 4]

  • Creative writers wanted to retain the initiative when writing with AI tools but happily used AI to fill in gaps. When they got stuck, the writers kept momentum by allowing the AI to push the project forward. Writers even found beneficial inspiration in the frequent mistakes and supposedly “useless” ideas generated by AI. [Study 5]

  • Co-writing with the AI worked best in an iterative manner, both in prompt refinement and by looping back to earlier stages of the creation process instead of proceeding linearly. For example, writers could loop back to ideation and ask AI for more ideas when stuck in the implementation step. [Study 5]

  • People generated 18% more ideas when getting inspiration from an AI-produced image than those study participants who were shown a traditional photograph for inspiration. The likely explanation is that the “oddness” in AI-generated art sparked broader associations than the conventional images did. [Study 6]

  • AI produced many more raw ideas than humans, and the originality of these ideas was rated higher (3.2 vs. 2.6 on a 5-point scale) when testing ChatGPT 4, whereas ChatGPT 3 demonstrated parity with humans. [Study 7]

  • GPT-4 scored better than 91% of humans on a standard creativity test, the Alternative Uses Test (AUT). Of course, 9% of humans are still more creative than current AI on this metric. [Study 7]

  • The rated beauty of AI-generated haikus was higher than that of poems composed by renowned human poets (4.56 vs. 4.15 on a 7-point scale) when the AI poems had been pre-selected by humans from a larger set generated by the AI. [Study 8]

  • Co-creating paintings with an AI image tool leveled the aesthetic playing field between artists and non-artists, even though the latter group would probably have painted terrible pictures without the AI (not measured in the research, alas). [Study 9]

  • Sales agents with AI assistance were 133% more successful than unaided agents in answering questions they were not previously trained for, which was considered a measure of agent creativity. Of utmost business importance, the AI-human symbiants closed 61% more sales than the unaided sales agents. [Study 11]

  • The gap between top and bottom-ranked salespeople was narrowed by using an AI tool in the sales process. [Study 11]

  • Business ideas for improving the environment were rated the same quality, whether produced by humans or AI, but the human ideas received higher novelty scores. [Study 12]

  • In sum, AI is not confined to sterile algorithms and computational prowess. It is an evolutionary leap in creative capacity. It’s not mere machinery; AI is a dynamic enabler, a catalyst to unleash human imagination and discernment.

AI produces creative ideas in such abundance that our idea pool blossoms. Like ripe fruit on an overgrown tree of creativity, humans must judiciously harvest the finest ideas for refinement and implementation. (“Tree of Creativity” by Midjourney.)

Recap of “Old” Studies From 9 Days Ago

The current article should be considered a Part 2 that continues the analysis from my first article about AI Creativity, which I’ll refer to as Part 1 from now on. This also means that I will number the studies discussed below to continue the three studies covered in Part 1. To recap the conclusions from Part 1:

  • ChatGPT outperforms 99% of the human population in tests of how many different ideas it can produce and the originality of these ideas. [Study 1]

  • ChatGPT generated 7 times more top-rated product ideas than elite business school students. [Study 2]

  • AI is 40 times more efficient than humans regarding how quickly it produces ideas. [Study 2]

  • The only area where AI was rated slightly worse than humans was the novelty of the product ideas. [Study 2]

Same Conclusion Across Much Varied Research

The most crucial point is that all 12 studies (both from Part 1 and the studies presented here) are in rough agreement: AI is more creative than humans, but humans have the edge in novelty. The more different research studies come to the same conclusion, the more that conclusion becomes a robust research finding that we can trust to be accurate and not a spurious accident of statistical coincidence.

Remember that 1 out of 20 false hypotheses will be confirmed as “statistically significant” if a researcher’s only goal in life is to have a p-value below 5%. I’m not saying that 5% of Ph.D. degrees are necessarily bogus, though the replication crisis in many research fields suggests as much. Given that the base rate of false hypotheses is probably much higher than that of true (and new) hypotheses, more than 5% of published papers are likely false. This is why individual papers are less credible than a collection of papers with very similar conclusions.

Different scientists, labs, methods, and research protocols, widely varying domains, different specifics being measured, research on three continents; same conclusion. That’s impressive. It matters less that there is some variance in the exact numbers — this is to be expected when study details differ.

The individual research studies are discussed in more detail below. (Hat tip to Ethan Mollick for alerting me to Studies 4, 7, and 12.)

The Future Belongs to Symbiants: Human-AI Co-Creation

The most exciting conclusion across the 12 creativity studies is that many of them confirm two points that were also found in research on the productivity impact of AI use in business:

  • AI narrows the skill gap between the best human workers and less-skilled workers. It does so, not by stifling the high-talent people but elevates those who perform poorly without AI assistance. This has now been empirically validated for both productivity and creativity: AI enhances all human creativity, but those less creative receive the most significant boost.

  • Humans and AI working together outshine either working solo. Again, this holds for both productivity and creativity.

Yes, AI generates more creative ideas than humans, and it does so faster and at less cost. AI’s capacity for churning out concepts rapidly and cost-effectively is unmatched. The abundance of ideas from AI has led to a new reality where ideation is free. For any creative effort, the more shots at goal, the more you score. Thus, more raw ideas are always good, because quantity becomes quality in creativity.

The more darts you throw at the dartboard, the more often you’ll hit the bullseye. (Dartboard by Midjourney.)

With greater idea volume comes the need to winnow the idea overload, allowing only the most promising ideas to proceed. In our synergy model, this selection filter is provided by the skilled judgment of experienced human UX professionals.

Augmentation, not replacement, is the winning strategy. AI should not supplant human creators but rather empower them to be more creative and productive than they could ever be alone — and better than AI itself could on its own.

This means that symbiants will do the future of creative work: people intricately intertwined with AI, both sides providing pivotal contributions. AI will likely have the largest contribution during the divergence stage of the diverge-converge design model. Humans will then drive convergence, selecting the most promising ideas for implementation. But true symbiant creators will recognize the frequent need to alternate between both sides and to iterate — returning to earlier divergent phases rather than proceeding linearly, as might be implied by many of the “double-diamond” model images you see on the Internet.

The future belongs to symbiants who combine the AI idea-generation torrent with human judgment and taste. And no, you need not look like a character from an SF movie to become an augmented human-AI symbiant, but this is the best image your correspondent (also known as the Jakob-Midjourney symbiant) could produce.

Summary by Haiku

In honor of Study 8, I produced a haiku using their winning strategy to summarize this article, picking the best suggestion from ChatGPT and Claude (Claude won this time).

Endless ideas flow Humans pick the finest fruits Creation ascends

The rest of this article reports the details of the individual research studies. You can stop reading now if you only care about the conclusions already given above. But the research has many interesting specifics, so I recommend that you read on if you have time.

Study 4: AI-Assisted Humans Are More Creative Writers than Unaided Humans

Anil Doshi and Oliver Hauser from University College London and the University of Exeter in the UK conducted this study of writing literature, which is a prime example of raw creativity where anything goes. There are fewer constraints than when writing business documents (the domain in a study discussed in my review of the productivity lift from using AI in the writing process).

In this study, 293 participants wrote a short story of only 8 sentences. So we’re far from studying authors producing complete novels. But at the same time, ultra-short stories depend almost entirely on pure creativity for their impact since there’s so little room for leaning on other aspects of the writer’s craft, like plot twists and character development.

Three different writing conditions were tested, with about 1/3 of the study participants randomly assigned to each condition:

  • Unaided humans who write the old-school way, with no AI support

  • Humans who could use GPT-4 to generate one 3-sentence idea for their story

  • Humans who could use the same AI tool to generate up to 5 ideas

In the two AI-aided conditions, the human authors wrote the short story as they pleased, drawing as much or as little inspiration as they wanted from the AI-produced idea(s).

All the stories were then evaluated by a separate group of human judges (600 in total) who scored the stories on 3 novelty characteristics (whether the story was novel, original, and rare), 3 usefulness characteristics (whether the story was appropriate, feasible, and publishable), and emotional characteristics. Of course, the judges didn’t know which stories came from which study condition.

On a 1–9 scale, novelty was rated at 3.85 for the unaided humans, whereas the AI-aided writers scored 4.11. The difference was statistically significant (p<0.001).

On the same rating scale, the usefulness score was 5.02 for the unaided humans, compared to 5.34 for the writers who had AI assistance. Again, the difference was significant at p<0.001.

The judges didn’t give out super-high scores in this study, but the authors were members of the general public instead of professional writers. It would be interesting to see a similar study of stories written by published authors.

In any case, the rated quality of the stories was improved for those writers who received creative-writing assistance from AI. Furthermore, the scores increased more for writers who received more ideas from the AI tool.

Writers who could ask for a single AI-generated story idea scored 4.06 for novelty and 5.21 for usefulness. In contrast, writers who could ask for 5 AI ideas scored 4.16 for novelty and 5.48 for usefulness. (Both differences were significant at p<0.05.)

Since ideation is free with AI, this study shows that we should ask our AI tools to generate many ideas when collaborating with AI in a creative process.

The 1–9 rating scales used in this study are not ratio scales, so we can’t truly compute the percentage gain from using AI. (It’s not the case that a story rated 4 is twice as good as a story rated 2.) Even so, we can get a rough estimate of the magnitude of the gain from AI by calculating percentages anyway. Just don’t take them as anything more than an indicator of our neighborhood.

With these caveats in mind, the lift in story novelty was 8.1% for using 5 AI-generated ideas and 5.5% for using a single AI-generated idea. These are small gains compared to what we usually see when testing the impact of using AI to aid knowledge workers.

Study participants completed a short Divergent Association Task (DAT) before writing their stories. This test is a fast and straightforward creativity test where people have to provide 10 words that are as different from each other as possible. When comparing participants with low and high DAT scores, the study found that using AI-provided story ideas led to the biggest improvements for the participants with low DAT scores. In other words, inherently less creative people gain more from working with AI in their writing. This is consistent with previous research that found that AI narrows skill gaps.

Study 5: In Creative Co-Writing, Humans Retain Most Initiative but Allow AI to Take Over When Stuck

Qian Wan and colleagues from the City University of Hong Kong studied human-AI co-creation in two content-creation domains: story writing (as was done in Study 4) and slogan writing. The participants were 15 Chinese-speaking students in creative fields (e.g., studying creative media, art & design, literature, etc.) Thus, in contrast to Study 4, which tested regular folks, these participants could be expected to exhibit above-average creativity. Users were tested with a version of GPT-3.

This was a qualitative study, which is excellent for gaining deeper insights into user behaviors than what can be gleaned from the numbers produced in quant research. I also like that the participants wrote in Chinese instead of English, which provides us with an additional dimension to judge the validity of the AI creativity research I’m analyzing. As I’ve said before, the more studies differ along various dimensions, the more we can believe the general application of the conclusions if all the results roughly agree despite those differences.

This study paid particular attention to the prewriting stage, where writers manage story ideas and map out what they will write. The participants were asked to think aloud while developing their writing ideas.

The four main conclusions from this study were:

  • AI was perceived as helpful throughout 3 distinct phases of their writing process: initial ideation, illumination (where vague ideas are made more concrete), and implementation (writing the final text). In this study, it was not the case that writers only appreciated AI for its initial profusion of raw ideas. During the illumination stage, participants were impressed with AI’s ability to transform vague prompts with fragmented concepts into concrete and fascinating ideas.

  • The writing process was usually iterative, with writers returning to the AI with revised prompts for more concrete ideas. Participants would also revert to earlier stages of the writing process when encountering block, as opposed to treating the 3 phases as a linear process where, for example, one would not return to raw ideation after having progressed to the illumination stage with those ideas that came from a writer’s first ideation attempt.

  • In addition to explicit ideation rounds, participants found inspiration in the unexpected results and randomness of the AI output. Indeed even failures, such as useless ideas, were cherished as seeds of inspiration. Since current AI tools are notorious for frequently going off track, this is a happy finding and a reason to caution the providers of AI tools against tightening their guardrails and confining AI within the boundaries of orthodox thinking.

  • The writers preferred to retain the initiative and control the writing process, mostly asking AI for elaborations or filling in their writing gaps. Only when they encountered writer’s block would they let the AI take the initiative and generate ideas from scratch. (On a personal note, this is also my preferred process.) AI was granted the initiative during initial ideation since participants prioritized its ability to generate many different ideas. For the other two stages (illumination and implementation), the writers assigned AI to play more of an assistive role.

For example, one horror story author found inspiration in AI's whimsical phrase “demonic dough,” envisioning a baking scene gone awry.

One participant was writing about health-related issues in heart-rate monitoring and wanted to include fireflies, which is not an obvious connection. She prompted the AI and was rewarded with a scenario that would work. This participant commented, “It was a crazy idea that didn’t seem to make sense, but the AI makes it work.”

An important observation is that the writers did not expect perfect results from a given prompt. Instead, AI was used to avoid writer’s block or fixation, where imperfect ideas, unexpected results, and the sheer randomness of AI output all served as inspiration. This willingness to treat AI as a partner with humans rather than expecting it to do everything on its own seems critical to successful creative applications of the technology.

The study illustrates AI’s role as a valuable collaborator in the creative writing process. Whether augmenting ideas, breaking through writer’s block, or making implausible concepts viable, AI’s contributions are tangible and statistically significant. It is a tool that enriches the creative process without stifling human initiative — a partnership that respects human control while leveraging AI’s boundless ideation.

Study 6: Weirdness of AI-Generated Images Sparks Extra Creativity in Humans

Jennifer Haase and colleagues from the Humboldt University of Berlin, Germany, and the Weizenbaum Institute in Germany are behind Study 6. These researchers tried a different approach to the collaboration between humans and AI by studying the effect of switching modality. Whereas Studies 4 and 5 employed AI to generate text to help writers write, this study used AI to generate images to help humans create ideas. The AI image tool used was DALL-E-2.

The study had 298 participants, with about a third in each of 3 conditions:

  1. Humans produced ideas, with no images to inspire the person.

  2. Humans produced ideas, with a traditional photograph provided for inspiration.

  3. Humans produced ideas, with an AI-generated image for inspiration.

The participants were a middle ground between the general public tested in Study 4 and the creative media specialists tested in Study 5. Study 6 participants had art as a hobby and reported working in Arts, Design, Entertainment, or Recreation.

In all conditions, participants were given the same task: to write down as many alternate uses as possible in 3 minutes for each of 5 objects (ball, fork, pants, tire, and toothbrush). This is known as the Alternate Uses Task (AUT) and has been used in much prior research to measure creativity. This task produces two scores: fluency (the sheer number of alternate uses listed) and originality. Ideally, you want both scores to be high.

It’s known from prior research that visual stimuli can spark creativity by triggering new associations. This was confirmed in this study for the originality score, which was higher in both image conditions compared to the condition where no image was provided. For fluency (the sheer number of ideas), participants who saw AI-generated images outperformed participants in the no-image condition. Still, the participants in the traditional-images condition performed worst of all.

The number of ideas generated (the fluency score) was as follows:

  • · No images provided: 8.12

  • · Traditional image provided: 7.37

  • · AI-generated image provided: 8.66

The differences between conditions were significant at the p<0.05 level.

The originality of the ideas was rated on a 1–5 scale, giving the following results:

  • No images provided: 2.67

  • Traditional image provided: 2.78

  • AI-generated image provided: 2.80

Here, the difference between the two image types was not significant, but the difference between participants with and without images was significant at the p<0.05 level.

Anybody who has used an AI image-generation tool knows that the results often have some weirdness. This study shows that this bug can be turned into a feature: people who see somewhat strange images seem to become more creative.

This study underscores the potential of AI in sparking extra creativity through visual stimuli, which adds a new dimension to our understanding of how AI can be harnessed to foster human creativity across multiple modalities.

This last finding resonates with my workflow as a purely subjective observation because I often ask AI to generate images from a very broad prompt before I start writing. I can’t measure how much this may inspire me, but I like the process. Here are the 4 images generated by Midjourney from the prompt “a fork.” (This is my screenshot, not imagery from the study.) In the study, each participant was only shown one image, but since ideation is free when using AI, we could easily show people a broader set of images, which would probably spark even more creativity. In this case, the upper right image of a lake is entirely off the wall, but just like the participants in Study 5, a human might turn this weirdness into something valuable and creative.

Study 7: ChatGPT 4 Much More Creative than Both ChatGPT 3 and Average Humans

This study was also conducted by Jennifer Haase from the Humboldt University in Berlin, this time together with Paul Hanel from the University of Essex in the UK. This study also employed the Alternative Uses Test (AUT) as the research instrument. This time, 100 humans were pitted against various AI models in another ideation competition. As a reminder, the dependent variables are fluency (the number of ideas) and originality (how different the ideas are).

The originality scores were as follows, on a 1–5 scale:

  • Humans: 2.6

  • ChatGPT 3: 2.7

  • ChatGPT 4: 3.2

We can’t compute a percentage lift from this 1–5 scale used in the study, but it’s clear that ChatGPT 4 was much better than both the humans and ChatGPT 3. In one short year, AI creativity advanced from parity with humans to clear superiority.

In terms of fluency, the paper doesn’t provide details, but says that AI came up with 2-3 times more ideas than humans.

Bottom line, GPT-4 scored better on the AUT than 91% of the humans, which confirms previous findings from Study 1 of AI’s superior creativity as measured on traditional tests. (In Study 1, AI scored better than 99% of humans on a more extensive creativity test.)

Study 8: AI Haikus Top Human Poets After Curation

Jimpei Hitsuwari and colleagues at Kyoto University in Japan conducted this study. 385 participants judged 80 haiku poems for beauty. Half were composed by Japanese poetry masters, including Kobayashi Issa (1763-1828) and Takahama Kyoshi (1874-1959), and half were written by an AI that specializes in haiku generation called “AI Issa-kun.” Of the 40 AI-generated haikus, half came straight from the machine, whereas three humans selected the other half from a more extensive set.

On a rating scale from 1–7, the beauty of the poems was rated as follows:

  • Human-composed haikus: 4.15

  • AI-generated haikus, straight from the computer: 4.14

  • AI-generated haikus, winnowed by humans: 4.56

Participants were asked if a human or AI created each haiku, and the results showed that they could not distinguish between them.

This study further confirms the benefits of human-AI collaboration. In particular, the study confirms the value of winnowing, where we combine the advantages of unlimited idea production from the IA with the judgment of experienced humans to select the best option.

This is similar to the function of the editor of a poetry journal, who will also attempt to pick the best poems to publish from the slush pile. The difference is that the human poets will have sweated over the many rejected poems, leading to a substantial waste of human lifespan. In contrast, ideation is free with AI, so one can order up any number of haikus from the computer without spending more than a few cents.

Coincidentally, the winning approach in this study is also how I occasionally get haikus for my articles. (The study rated true haikus in Japanese, whereas I sadly have to make do with English versions.) I ask ChatGPT 4 to compose 5-7 haikus from my description of the topic at hand. And then, I rely on my judgment to pick the best haiku to inflict on readers.

Study 9: Paintings Co-Created by Humans and AI Closed the Skill Gap

Yanru Lyu and colleagues from Beijing Technology and Business University in China and several other affiliations conducted this study. These researchers had six artists and six non-artists co-create images of oil paintings with the AI image tool Midjourney. (The participants’ deliverables were images of oil paintings, not actual paintings, which the non-artists would probably have been unable to paint.)

The resulting 12 paintings were then scored for aesthetic experience on a 1–5 scale by 42 judges with professional experience in painting or art research. The resulting scores were virtually identical, at 3.43 points for the artists and 3.45 points for the nonartists.

Most research finds that AI use narrows the skill gap between the people who would have been high or low performers without AI assistance. But this study didn’t just narrow the gap between artists and nonartists; it eradicated it.

In fact, on a “sweetness” score, the nonartists were the high performers, scoring 3.54 vs. 3.43 for the artists (significant at p<0.001). While the researchers don’t explain this result, I guess that professional artists tend to like edgy work, whereas non-artists often prefer more comforting artwork. So when AI equalizes the ability to execute a vision, the non-artists end up creating paintings that have more immediate sweetness appeal.

Another story is whether edgy or sweet work will stand up better to the test of time. A favorite painting in my art collection is somewhat melancholy, and yet I’ve hung it in the dining room together with a shipwreck (another gloomy subject for a painting) and two winter landscapes — plus a happy summer scene. Of course, my personal preferences are neither here nor there. You ≠ User, as we say in the UX business, and it’s even more true that I ≠ User.

Study 10: People Preferred AI-Generated Images Over Human-Created Images

This study was conducted by Andrei Daniel Niculae from the Bucharest University of Economic Studies in Romania. He analyzed 417 ratings of images that had been created either by AI or by a human. For two images of mountains, the AI-generated image was preferred by 86% of respondents, whereas the human-generated image was preferred by 14%. For two images of idyllic villages, the AI-generated one was preferred by 69%, and the human-generated one was preferred by 31%. For two images of a forested landscape, the AI-generated image was preferred by 78%, whereas the human-generated image was preferred by 22%.

This small case study of tourism-related imagery doesn’t lend itself to firm, general conclusions. But it’s an extra piece to the puzzle we’re piecing together to see the value of AI as a creative tool.

Study 11: AI Narrowed the Skill Gap Between Sales Agents, and Sales Increased

Turning from the gentle world of art to the hardnosed world of sales, Nan Jia from the University of Southern California and colleagues tested whether an AI aide would help sales agents sell more when placing telemarketing calls. Indeed it did: agents with AI assistance on sales calls closed 61% more sales than the unaided agents. That’s the bottom line and the only thing a sales manager would need to know to sign on the bottom line to buy the AI tool for his or her sales staff. (The difference was significant at p<0.05.)

This study was conducted under field conditions (i.e., while reps placed actual sales calls) in a “large telemarketing company in Asia.” I always relish data from actual business use: much as lab-collected data is better than no data, there’s always that nagging doubt about how well the lab translates to the real world. The authors analyzed voice recordings of 40 sales agents as they placed 3,144 sales calls.

A second measure in the study was the sales agents’ ability to answer prospects’ questions outside the material provided in training. The authors consider this to be a measure of salespeople’s creativity. Here too, using an AI tool during the sales call helped because agents with AI were able to answer 133% more novel questions than the unaided agents. (The difference was significant at p<0.05.)

The ability to answer novel questions was also analyzed separately for the sales teams’ best and worst agents, as differentiated by sales during the previous month. Top agents without AI scored 5, and top agents with AI scored 11, for a lift of 6. In contrast, bottom agents without AI scored 1, and bottom agents with AI scored 3, for a lift of only 2.

Without AI, top agents performed 5.0 times better than bottom agents at answering questions, but with AI assistance, the top agents were only 3.7 times better. I count this as narrowing the skills gap between good and bad sales staff when using AI. The paper’s authors have the opposite interpretation of their data because the absolute lift in performance between the two groups was the largest for the best agents.

Both are valid ways of presenting the raw data, but relative performance is the way to measure skill gaps. How much better are the top people than the poor performers? Among other things, that’s an indicator of how much you should pay for employees at different skill levels. If good salespeople are 5 times better, you want to pay them a lot. And if they’re only 3.7 times better, they still deserve more money, but not as much more. Thus, adding the AI support tool to the sales process should narrow the compensation gap between the top and bottom agents. Another way of analyzing the results is that bottom agents performed 200% better with AI assistance, whereas top agents only improved their performance by 120%. Again, to me, this says that AI was of more help to the poor performers.

Study 12: Humans Generated More Novel Ideas, But Overall Idea Quality Was the Same from AI

This study was conducted by Léonard Boussioux from MIT and three colleagues from the Harvard Business School. Their domain was the generation of business ideas for reusing, recycling, or sharing products. Two groups of ideas were collected: First, a crowdsourced set of 125 submissions from an Internet contest to produce good business ideas for the problem. Anybody could volunteer a proposal for this worthy cause but were also incentivized by a prize of $1,000 for the best proposal. A second set of 730 ideas was generated by GPT-4.

Why so many more AI ideas? Because ideation is free with AI, the researchers took what they could produce in 5.5 hours. (Which is admittedly not genuinely free, even if they probably tasked a graduate student with feeding the prompts to GPT-4. The authors calculated that the direct cost of the inference compute was around $27.) In contrast, the 125 human-generated ideas were the extent to which a $1K prize could coax Internet users into participating in this challenge to improve the environment.

This is admittedly an odd comparison and not a controlled experiment. But the two conditions are probably realistic for what one can get from a medium-scale effort to collect ideas from humans and AI, respectively.

A subset of the available solutions was randomly selected for evaluation: 180 AI-generated and 54 human-generated ideas. The authors employed 145 evaluators to each judge 13 solutions, for a total of 1,885 judgments. The evaluators scored the proposed solutions on 5 criteria:

  • Novelty (how different is it from existing solutions)

  • Feasibility and scalability (how likely is it to succeed and how well does it scale up)

  • Environmental impact (how much does it benefit the planet)

  • Financial impact (what financial value can it create for businesses)

  • Overall quality (based on the 4 criteria, what is the overall quality of the solution)

The human-generated solutions received slightly higher novelty scores (3.54 on a 1-5 scale) than the AI solutions (3.15), with the difference being significant at the p<0.05 level. Feasibility, value, and quality ratings were all about the same for AI and humans. (The environmental impact score is not analyzed in the paper.

The finding that humans received higher novelty scores is similar to that of Study 2 (discussed in my Part 1 article about AI creativity.) In contrast, whereas Study 2 reported a slightly higher value for AI-generated business ideas, Study 12 found that AI and humans were rated the same.


Together, the 12 studies resoundingly demonstrate:

  • AI sets a new high-water mark for imagination, ideation, and creativity across diverse domains.

  • But AI's true power emerges when coupled with human judgment and discernment.

  • This symbiotic synergy outshines either humans or AI working in isolation.

  • AI also democratizes creativity by uplifting amateurs closer to expert levels.

The mandate is clear — augment, don’t replace. The future belongs to human-AI collaboration. Creativity, productivity, and innovation will surge to new heights through this partnership.

But we should keep humans firmly at the wheel. AI now provides plentiful rocket fuel, but we must steer the turbocharged spaceship.


  • [Study 4] Anil R Doshi and Oliver Hauser (2023): “Generative artificial intelligence enhances creativity” (August 8, 2023). Available at SSRN: or

  • [Study 5] Qian Wan, Si-Yuan Hu, Yu Zhang, Pi-Hui Wang, Bo Wen, and Zhicong Lu (2023): “’It Felt Like Having a Second Mind’: Investigating Human-AI Co-creativity in Prewriting with Large Language Models.” ArXiv 2307.10811, DOI:10.48550/arXiv.2307.10811

  • [Study 6] Jennifer Haase, Djordje Djurica, and Jan Mendling (2023): “The Art of Inspiring Creativity: Exploring the Unique Impact of AI-generated Images” AMCIS 2023 Proceedings. 10.

  • [Study 7] Jennifer Haase and Paul H. P. Hanel (2023): “Artificial Muses: Generative Artificial Intelligence Chatbots Have Risen to Human-Level Creativity” ArXiV 21 March 2023,

  • [Study 8] Jimpei Hitsuwari, Yoshiyuki Ueda, Woojin Yun, and Michio Nomura (2023): “Does human–AI collaboration lead to more creative art? Aesthetic evaluation of human-made and AI-generated haiku poetry,” Computers in Human Behavior vol. 139, February 2023, Article 107502. DOI: 10.1016/j.chb.2022.107502

  • [Study 9] Yanru Lyu, Xinxin Wang, Rungtai Lin, and Jun Wu (2022): “Communication in Human–AI Co-Creation: Perceptual Analysis of Paintings Generated by Text-to-Image System” Applied Sciences 2022, 12(22), 11312;

  • [Study 10] Andrei Daniel Niculae (2023): “Business Use: Is AI Surpassing Human Creativity?” Cactus Tourism Journal Vol. 5, No. 1, 2023 New Series, pages 53-63.

  • [Study 11] Nan Jia, Xueming Luo, Zheng Fang, and Chengcheng Lia (2023): “When and How Artificial Intelligence Augments Employee Creativity”, Academy of Management Journal, March 2023,

  • [Study 12] Léonard Boussioux, Jacqueline N. Lane, Miaomiao Zhang, Vladimir Jacimovic, and Karim R. Lakhani (2023). “The Crowdless Future? How Generative AI Is Shaping the Future of Human Crowdsourcing” Harvard Business School Technology & Operations Mgt. Unit Working Paper No. 24-005, August 10, 2023.


Top Past Articles
bottom of page