Scientific Writing with AI: Insights, Challenges, and Best Practices

The release of ChatGPT has brought Large Language Models (LLMs) into the everyday lives and work of many people. In academia, this includes its potential use for writing scientific papers, which might have a positive or negative impact on the state of science. Anecdotal evidence suggests that since LLMs were introduced the average language quality of reviewed manuscripts as well as student theses has improved. Conversely, LLMs can also be used to generate fictitious manuscripts or even support plagiarism (e.g. see this case). Funding organisations such as the German Research Foundation (DFG) and scientific societies such as the Association for Computing Machinery (ACM) also see opportunities for using generative AI and hence allow its use under certain conditions. It is, therefore, all the more important to learn how to use these tools sensibly at an early stage. However, can the use of ChatGPT really improve scientific writing and if so, how should it be used? These questions motivated our bidt working group to initiate a two-stage exploratory study.

In this study, 18 scientific abstracts were collected each in two versions, one written with the help of ChatGPT and the other without. These abstracts were provided voluntarily by early career researchers. Specifically, the study asked PhD students from various research fields to contribute  abstracts they had previously written together with modified versions using ChatGPT. Each student submitted both versions along with additional information via an online survey. In a second survey, these abstract-pairs were presented to 23 other researchers, who indicated their preferred version in 79 comparison tasks. The Figure below shows this procedure across both studies. The team analysed its preferences in various ways, including time spent with ChatGPT and qualitative interpretation of modified text parts.

Analysing interaction logs, our team found that the overall number of interactions with ChatGPT did not seem to improve preference for the resulting abstracts in this study. However, the team did notice a specific interaction pattern: Students first prompted ChatGPT to improve the quality of their abstracts and then prompted it again to shorten texts. This indicates that ChatGPT’s first response was perceived as too verbose, which is in line with findings from other studies (e.g. Kabir et al., Benharrak et al.).

Researchers also qualitatively analysed the abstract versions side-by-side. Here, they found that ChatGPT can introduce unintended changes that are misleading (e.g., switching “this”/“recent” study) and relevant to the writing standards of scientific communities (e.g. switching “I”/“We”). In two cases, ChatGPT introduced sentences that summarised abstract goals that were not present in the original versions. In one case a structured abstract became an unstructured abstract, albeit in this case the user performed just a single interaction with ChatGPT.

Across the analysed abstracts, an overarching impression gained was that ChatGPT tended to shift towards less formal and more colloquial writing styles.

The dependence on the slice setting was assessed with Student’s t test for paired samples (normally distributed IVIM parameters) and the Wilcoxon signed-rank test (non-normally distributed parameters).The impact of the slice setting was assessed using statistical tests appropriate for normally distributed and non-normally distributed parameters.Detailed information about the statistical test was omitted.Here I present…In our recent workThe perspective of the work was changed.pulsed RF irradiation of 2s duration at 90% RF duty-cycleusing 2-second pulsed RF irradiation at 90% duty-cycleThe phrase “2-second pulsed” is used infrequently.The authors find that the claim is not based on any original research or statisticsThe examination reveals that the claim lacks a foundation in authentic research or reliable statisticsMade it more wordy.However, individuals’ (i) motivations for and (ii) well-being outcomes of disconnection are not well understoodHowever, there is still limited understanding of both the motivations for disconnection and the impact it has on individuals’ well beingThe revised sentence structure is easier to read but the sentence has become longer.

In summary, these are our takeaways from this study for those who want to use tools such as ChatGPT in scientific (abstract) writing:

  • Start with your own draft in line with the standards of your community, for example with respect to the abstract’s structure, tone and voice.
  • Provide your draft in the prompt. Write a clear instruction. Include details about the context and what should (should not) be changed. For instance, “Improve the following abstract for a scientific paper in the domain of X. Keep a formal tone and do not use I.”
  • Expect verbosity. Instructions to “keep it short” are often ineffective. Rather use a second prompt to shorten text or do so manually.
  • Carefully check any result for changes in meaning, which can be quite subtle.

By sharing these practical insights, our  researchers aim to help people involved in scientific writing to navigate their use of recent AI writing tools. Furthermore, the results of our exploratory study here can inform future investigations with a more controlled, confirmatory setup. As this  technology is further developed in both academia and large tech companies, additional investigations into its impact on scientific writing will remain important for the foreseeable future.

Die vom bidt veröffentlichten Blogbeiträge geben die Ansichten der Autorinnen und Autoren wieder; sie spiegeln nicht die Haltung des Instituts als Ganzes wider. 

Arbeitskreis ChatGPT


Der Beitrag Scientific Writing with AI: Insights, Challenges, and Best Practices erschien zuerst auf bidt DE.