Impact of Request Formats on Effort Estimation: Are LLMs Different than Humans?
Abstract
Expert judgment is the dominant strategy used for software development effort estimation. Yet, expert-based judgment can provide over-optimistic effort estimates, leading to projects’ poor budget planning and cost and time overruns. Large Language Models (LLMs) are good candidates to assist software professionals in effort estimation. However, their effective leveraging for software development effort estimation requires thoroughly investigating their limitations and to what extent they overlap with those of (human) software practitioners.
One primary limitation of LLMs is the sensitivity of their responses to prompt changes. Similarly, empirical studies showed that changes in the request format (e.g., rephrasing) could impact (human) software professionals’ effort estimates. In this paper, we replicated a series of experiments, which were initially conducted with (human) software professionals in the literature, to see how LLMs’ effort estimates change due to the transition from the traditional request format (i.e., ”How much effort is required to complete X?”) to the alternative request format (i.e., ”How much can be completed in Y work hours?”). Our experiments involved three different LLMs (GPT-4, Gemini 1.5 Pro, Llama 3.1) and 88 software project specifications (per treatment in each experiment), resulting in 880 prompts, in total that we prepared using 704 user stories from 3 open-source projects (Hyperledger Fabric, Mulesoft Mule, Spring XD).
Our findings align with the original experiments conducted with software professionals: The first four experiments showed that LLMs provide lower effort estimates due to transitioning from the traditional to the alternative request format. The findings of the fifth and first experiments detected that LLMs display patterns analogous to anchoring bias, a human cognitive bias defined as the tendency to stick to the anchor (i.e., the ”Y work-hours” in the alternative request format).
Speaker: Gül Çalıklı is a Lecturer (Assistant Professor) in Software Engineering at the School of Computing Science, University of Glasgow in Scotland, United Kingdom. Her research field is program comprehension and empirical software engineering with a focus on human aspects (cognitive and social psychology) accompanied by Data Analytics and Machine Learning. Her broader vision is to facilitate the human-AI symbiosis integral to software development tools to improve software practitioners' productivity while ensuring the quality of software development processes and resulting software.
Gül Çalıklı received her Ph.D. degree in computer engineering from Bogaziçi University in Istanbul, Turkey. Before joining the University of Glasgow, She was a senior researcher at the Department of Informatics, University of Zurich (UZH), Switzerland. The papers she coauthored received a Best Industry Paper Award at ESEM’2013, an ACM SIGSOFT Distinguished Artifact Award at ICSE’2020, an ACM SIGSOFT Distinguished Paper Award at ICSE’2021 and an ACM SIGSOFT Distinguished Paper Award at ESEC/FSE 2022.
She served as the Programme co-Chair of ICPC 2024 (Era Track). She has been a Program Committee member of several leading conferences, including ICSE, ASE, CSCW, ICPC, ICSME and MSR. Her service is recognized by two Distinguished Reviewer Awards at ICPC’2022 and ICSME 2023.
Contact and booking details
- Booking required?
- No