The study explores how developers use foundation model–powered tools like ChatGPT during open-source collaboration, revealing that shared conversations can enhance collective innovation. Findings highlight gaps in current AI benchmarks, showing that nearly half of code generation prompts contain partial code and many involve multi-turn dialogues. These insights inform better benchmark design, improved prompt-engineering strategies, and the creation of FM tools tailored to diverse developer roles and real-world workflows.The study explores how developers use foundation model–powered tools like ChatGPT during open-source collaboration, revealing that shared conversations can enhance collective innovation. Findings highlight gaps in current AI benchmarks, showing that nearly half of code generation prompts contain partial code and many involve multi-turn dialogues. These insights inform better benchmark design, improved prompt-engineering strategies, and the creation of FM tools tailored to diverse developer roles and real-world workflows.

Foundation Models Are Reshaping How Developers Code Together

2025/11/13 23:00

Abstract

1 Introduction

2 Data Collection

3 RQ1: What types of software engineering inquiries do developers present to ChatGPT in the initial prompt?

4 RQ2: How do developers present their inquiries to ChatGPT in multi-turn conversations?

5 RQ3: What are the characteristics of the sharing behavior?

6 Discussions

7 Threats to Validity

8 Related Work

9 Conclusion and Future Work

References

Discussions

Implications for Designing and Investigating FM-powered SE collaboration tools. The most important finding from our study is that developers do share their conversations with ChatGPT while contributing to open-source projects. This insight opens a new view for researchers and FM practitioners assessing the role and influence of FM-powered software development tools, such as ChatGPT, within the realm of collaborative coding. It underscores the potential of these tools to not only assist individual developers but also to enhance the collective productivity and innovation of open-source communities. Furthermore, our study provides several taxonomies that researchers can further utilize to characterize developers’ interactions with ChatGPT or other FM-powered software development tools. For instance, the taxonomy and annotated prompts in RQ1 can be leveraged to develop a learning-based approach that can automatically identify tasks per interest and analyze the corresponding response quality. Designers can also leverage our reported frequency of software engineering tasks to prioritize improvement for their tools. The answers to RQ3 reveal how developers with different roles use shared conversations with ChatGPT in collaborative coding, which can be used to design FM-powered tools tailored to support developers with other roles.

Implications for Benchmarking FM for SE tasks

Our findings from RQ1 shed light on future benchmark designs for evaluating the impact of FMs in different types of software engineering tasks. In RQ1, we find multiple types of input for code generation and issues resolving inquiries, but those types are not fully captured by existing benchmarks. For instance, the widely recognized code generation benchmark, HumanEval (Chen et al., 2021), relies on textual specifications and method signatures.

\ Yet, our analysis shows that nearly half of the code generation prompts (47%) include initial code drafts alongside textual descriptions. Similarly, our examination of prompts categorized under (C4) Issue resolving indicates that a significant portion (36%) of issue resolution requests involve sharing error messages or execution traces, often without accompanying source code. Therefore, we recommend that researchers designing future benchmarks take these findings into account.

\ Our observation that multi-turn conversations are often utilized also motivates future evaluation of FMs allowing multi-turn interactions. Currently, there are only a few studies allowing multi-turn code generation (Wang et al., 2024; Nijkamp et al., 2022). Last but not least, we observed many other tasks beyond code generation and issue resolution, such as code review, conceptual question, and documentation, which are rarely considered as benchmark tasks for FM-powered software development tools.

\ Implications for Prompt Engineering. The findings from RQ2 highlight the frequent use of multi-turn strategies to improve ChatGPT’s solutions iteratively. The flow chart shown in Figure 5 illustrates the diverse approaches developers employ in these interactions. This finding motivates future investigations into the efficiency of developers’ prompting techniques within these multi-turn conversations. Specifically, whether the best practices in prompt engineering have been applied and whether improved prompts can effectively alter the flow of these interactions is a future direction for enhancing the utility and effectiveness of FM-powered tools in software development.

:::info Authors

  1. Huizi Hao
  2. Kazi Amit Hasan
  3. Hong Qin
  4. Marcos Macedo
  5. Yuan Tian
  6. Steven H. H. Ding
  7. Ahmed E. Hassan

:::

:::info This paper is available on arxiv under CC BY-NC-SA 4.0 license.

:::

\

Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

PhotonPay Joins Circle’s Arc Public Testnet to Advance Global Payment Innovation

PhotonPay Joins Circle’s Arc Public Testnet to Advance Global Payment Innovation

BitcoinWorld PhotonPay Joins Circle’s Arc Public Testnet to Advance Global Payment Innovation HONG KONG, Nov. 14, 2025 /PRNewswire/ — PhotonPay, an AI-powered financial infrastructure provider, has officially joined Circle’s Arc public testnet, an open, developer-friendly Layer-1 blockchain network designed to bring real-world economic activity onchain and evolve into the next-generation Economic Operating System (OS) for the internet. Working alongside leading innovators in global payments, technology, and fintech, this initiative represents a major stride toward building open, programmable financial infrastructure. It also highlights a key shift in modernizing global payment systems and empowering enterprises to adopt blockchain-driven financial solutions. Trusted by 200,000+ businesses worldwide to overcome banking and payment challenges, PhotonPay delivers simple, scalable, and customizable solutions – including accounts, card issuing, global payouts, online payment, FX management, and embedded finance. Arc marks a significant milestone in developing open financial networks for the global economy. With predictable dollar-based fees, sub-second transaction finality, optional privacy configurations, and seamless integration into Circle’s full-stack platform, Arc supports diverse use cases across lending, capital markets, FX, and international payments. Through its participation in Arc’s testnet, PhotonPay seeks to bridge traditional finance with blockchain-powered innovation, advancing transparency, security, and efficiency across the global financial ecosystem. This post PhotonPay Joins Circle’s Arc Public Testnet to Advance Global Payment Innovation first appeared on BitcoinWorld.
Share
Coinstats2025/11/15 00:27