Summary: The new Microsoft Copilot increases the productivity of office tasks by around 40%, with even higher gains seen for search tasks. These numbers are within the range of previous research with other AI products.
Alexia Cambon, Brent Hecht, and a crowded field of 20 additional researchers from Microsoft published a paper this month about their productivity research with Microsoft’s AI Copilot for Office [Cambron et al. 2023]. Sorry, strike “Office.” The product is now officially named “Microsoft 365.”
The authors have impressive degrees and pedigrees, but one should always take it with a grain of salt when a company sponsors and publishes research on its own products.
That said, I don’t suspect the authors of falsifying data, so I believe the reported numbers and findings. The problem is more that if the data had turned out to be unfavorable to Microsoft, it would likely not have been published. There’s a distinct publication bias in favor of positive research findings. But that’s true for traditional academic journals as well which tend to publish findings that are (a) interesting and (b) align with the editors’ agendas. Any professor or graduate student who conducts a study comparing conditions A and B and concludes that there is no difference will have a hard time getting that paper published in a prestigious journal. A big difference? Bang, it’s in print. Just as much of a publication bias, when you think about it.
Microsoft has a series of research studies on AI and productivity, with more papers worth reading.
Productivity of AI Use in Office Work
Five Microsoft studies are of interest for evaluating the impact of AI on traditional office work in business:
Information Retrieval Study: 163 external participants were given a simulated computer environment with files, email inboxes, and calendars. They then had to answer questions about information in those files, email messages, and calendar appointments. Half used AI assistance, and half did not. The AI users had 36% higher productivity than the non-AI users.
Common Office Tasks: For this study, the researchers recruited 147 external participants and asked them to perform everyday office tasks: email information retrieval, intranet/SharePoint information retrieval, content creation, and meeting summarization. Participants were randomly assigned to perform the tasks with the Copilot or without AI assistance. AI increased productivity by 42%.
Search [Spatharioti et al. 2023]: 90 users from Mechanical Turk searched for information about delivery vehicles using a traditional search engine or GPT 3.5. The AI users had 112% higher productivity than the search users.
Summarize a Missed Meeting: In this study, 57 Microsoft employees were given a recording and the transcript of a 35-minute-long meeting with four participants (recorded on the Teams platform). 33 participants were also given access to Copilot, whereas 24 participants performed the task without Copilot. The task was to write a summary of the meeting, which simulates the scenario that the study participants had missed a meeting and had to catch up on it. This was the study with the biggest performance gains (282%), but also a fairly unrealistic scenario in my view. Employees who miss a group meeting are much more likely to ask a colleague what happened than watch a meeting recording or read the transcript.
In all these cases, I am reporting the productivity increase, measured as the number of tasks a user can perform within a given time, such as an hour or a workday. Many of the Microsoft papers report the saved time on task instead. Still, it’s a trivial matter to convert between the two metrics. I find employee productivity to be the more important way of thinking about workplace performance from a business perspective.
(For example, if a task takes 6 minutes without AI and 5 minutes with AI, the AI users have saved one minute, or 17% of the time spent by non-AI users. The productivity perspective is that the non-AI users can perform 10 tasks per hour, whereas the AI users can perform 12 tasks per hour, for a productivity gain of 20%.)
AI adds a dash of superpower to office workers, vastly increasing their productivity, according to the new Microsoft research (and according to other studies, as outlined below). Super-Office-Worker by Dall-E.
A fifth study in the Microsoft research didn’t measure productivity but user perceptions. 297 users who had used Copilot as part of an early access program completed a survey about their experience. 73% of respondents estimated that Copilot helped them complete tasks faster, with the average estimated time saving being 1.2 hours per week. We can compare this with independent self-reported estimates from the advertising agency Dentsu, where employees estimated time savings of 7.5 hours per week using Copilot. Self-reported estimates are notoriously unreliable, so we should not read too much into these numbers. If anything, this small amount of data indicates that the Microsoft numbers do not seem inflated compared with independent outside estimates.
How do the estimated productivity gains of 36%–282% from using Microsoft Copilot square with employees only saving 1.2–7.5 hours per week? Even at the lowest productivity percentage, 36% of a 40-hour workweek would be 14 hours. The easy answer is that users only realize the productivity gain from any software tool while using it. Employees perform many other tasks during a work week where they (currently) don’t have AI tools. For example, sitting in a meeting (as opposed to summarizing the recording of a meeting) will consume the full time of that meeting with zero productivity gains even if you have an AI tool installed on your computer.
Microsoft’s Productivity Numbers Seem Credible
In the beginning, I mentioned that we should take Microsoft’s research on its own AI tool with a grain of salt. However, I think a very tiny pinch of salt will suffice in this case, because the reported numbers are highly credible and fall within the range of results reported in independent studies of non-Microsoft AI tools.
The following chart compares the new Microsoft research with 3 such previous studies:
Elite management consultants at the Boston Consulting Group analyzing business problems and writing client deliverables. The reported productivity gains were 33%. Quite comparable to the 36%–42% gains reported in the Microsoft research for information retrieval and common office tasks.
Business professionals writing common business documents in an MIT study. This study found productivity gains of 59%. Again, comparable to the 36%–42% gains reported in the Microsoft research for information retrieval and common office tasks. In fact, the Microsoft numbers sit squarely in the middle between the BCS and MIT numbers. Certainly not a warning signal of any inflated findings.
Users answering questions with AI or Google search in a Hong Kong study. In this research, the AI users had a productivity gain of 158% compared with the search users. This is higher, but roughly the same, as the 112% productivity gain in the Microsoft search study.
The only Microsoft study for which I don’t have a comparison is the summarization of a missed meeting, where AI made employees 282% more productive than employees who only had the recording and transcript of the meeting. But as mentioned before, I don’t think this is a realistic scenario anyway, and the 282% productivity gain for meeting summarization will not be a common phenomenon in normal business settings.
Productivity gains from using AI for various business tasks. Data from the new Microsoft research (solid bars) and from previous research reported in the linked articles (striped bars). We see that except for the summarization of a missed meeting, the Microsoft findings are highly comparable to those of independent studies of non-Microsoft AI. This increases the credibility of Microsoft’s data.
Work Quality Unchanged with AI
In the Microsoft studies, the quality of the work was the same whether or not users were helped by the Copilot. Considering that users were much faster to complete their tasks with AI, one might have feared that quality would suffer, but this was not the case.
Many previous studies have actually found a substantial quality increase when using AI. It is not clear to me why Microsoft Copilot simply produced the same quality and not better quality. I guess this is one of those famous cases where more research is needed.
Non-Office Findings from the Microsoft Research
The Microsoft paper presents 8 studies, but three of these are different than the others, which is why I did not analyze them in this article:
A study of the productivity of software developers using the GitHub Copilot. Despite having the same name, this programming tool is very different than the office worker tool used in the remaining studies. I covered the GitHub Copilot study in an earlier article: It found very impressive productivity gains of 126% when developing code.
A study of enterprise security operations, using the M365 Defender Security Copilot [Edelman et al. 2023]. Again, the name is the same but security is a different tool and different task. Productivity increased by 35% (p<0.01) when users without security expertise answered questions about various security incidents using AI vs. performing the same tasks without AI assistance. The users with AI assistance got 44% more questions correct.
A study of the quality of email messages rewritten by AI [Edelman and Ngwe 2023]. A number of email messages were rewritten by the Outlook Copilot. Across 62 human judges, the AI-written emails were rated as 18% clearer and 19% more concise. Comparing the human-written and AI-rewritten versions of the same email, judges preferred the AI rewrite 64% of the time. (p<0.05 for all three metrics.)
Office Worker Productivity Gains from AI High, But Will Get Higher
These findings show that the productivity gains from using AI to assist with office work are in the range of 33% (BCS) to 59% (MIT), with a reasonable single estimate being around 40%. Specialized tasks for which AI is particularly suited can realize stupendous gains of more than 100% (search) and 250% (meeting summarization).
These are the gains while performing the tasks that current AI is able to help with. Across a full workweek, the productivity gains would be much smaller since employees spend much of their time on other tasks that currently do not have AI support.
Future productivity gains will be much larger, for several reasons:
The current AI tools are only the first generation of employing AI to support business tasks. Future AI will be better than current AI, following Jakob’s First Law of AI (“Today’s AI is the worst we'll ever have”). But maybe more importantly, future AI applications will be optimized for individual business situations and vertical industry needs, which will make them much better than generic tools at supporting employees in those businesses.
More AI tools will become available, extending AI support to a wider range of business tasks and a bigger share of the workweek.
Users will get better at using AI. Everybody is currently a novice user when it comes to AI tools, and we know that the learning curve dictates better performance with more experience.
The tasks themselves will change, as businesses reorganize the way they perform work to account for the effect of AI tools. This is known as the task-artifact cycle, which means that when the artifact (tool) changes, that makes users change the way they perform their tasks so that they can make better use of the new capabilities of the new tool. And once users have changed their tasks, we should redesign the tool to better support the new task. But this new tool will lead to even more changes in the tasks, and so on it goes in a never-ending cycle.
Organizations will change to reflect the new way work is done. Right now, we’re just measuring the productivity gains for individual employees who are doing the same work as before. But a new organizational structure and a new way of doing work can be much more efficient once we redesign corporations to be AI-driven. This last change will be the slowest, probably taking more than a decade, since organizations and upper management are famously resistant to change.
In the task-artifact cycle, we first invent a new tool (the artifact) to support the way users currently work (the task). But the new artifact changes the task. Then we have to redesign the tool for the new way or working, and the cycle continues forever, hopefully leading to ever-higher productivity caused by better tools and optimized workflows. (Dall-E)
References
Alexia Cambon, Brent Hecht, Ben Edelman, Donald Ngwe, Sonia Jaffe, Amy Heger, Mihaela Vorvoreanu, Sida Peng, Jake Hofman, Alex Farach, Margarita Bermejo-Cano, Eric Knudsen, James Bono, Hardik Sanghavi, Sofia Spatharioti, David Rothschild, Daniel G. Goldstein, Eirini Kalliamvakou, Peter Cihon, Mert Demirer, Michael Schwarz, and Jaime Teevan (2023): “Early LLM-based Tools for Enterprise Information Workers Likely Provide Meaningful Boosts to Productivity” Published by Microsoft December 5, 2023: accessed at https://www.microsoft.com/en-us/research/uploads/prod/2023/12/AI-and-Productivity-Report-First-Edition.pdf (warning: PDF file).
Benjamin G. Edelman and Donald Ngwe (2023): “Sound Like Me: Findings from a Randomized Experiment” (November 29, 2023). Available at SSRN: https://ssrn.com/abstract=4648689 or http://dx.doi.org/10.2139/ssrn.4648689
Benjamin G. Edelman, James Bono, Sida Peng, Roberto Rodriguez, and Sandra Ho (2023): “Randomized Controlled Trial for Microsoft Security Copilot” (November 29, 2023). Available at SSRN: https://ssrn.com/abstract=4648700 or http://dx.doi.org/10.2139/ssrn.4648700
Sofia Eleni Spatharioti, David M. Rothschild, Daniel G. Goldstein, and Jake M. Hofman (2023): “Comparing Traditional and LLM-based Search for Consumer Choice: A Randomized Experiment” ArXiv November 8, 2023, https://arxiv.org/abs/2307.03744
Comments