Automating Peer Review with Large Language Models

In this episode, Ryan explores the potential of large language models (LLMs) to revolutionize the peer review process. He delves into the impact of LLMs on various fields and their ability to enhance the quality of peer review.

Evaluating LLM Capabilities

Ryan conducted three studies to assess the capabilities of LLMs in the peer review process. The first study focused on error detection, where LLMs were tested on their proficiency in identifying errors in condensed versions of papers. The results revealed that LLMs excelled in error detection, particularly in complex subject matters.

The second study examined the use of LLMs in answering checklist questions within papers. LLMs demonstrated their potential by providing accurate yes/no answers to these questions, showcasing their ability to assist in the review process.

The third study aimed to determine whether LLMs could evaluate the scientific contributions of two versions of a paper and select the superior one. However, LLMs struggled in this task, as it required more subjective judgment.

Overall, LLMs displayed promise in aiding specific aspects of the peer review process, such as error detection and checklist evaluation. Nevertheless, it is crucial to exercise caution in relying too heavily on LLMs, as human reviewers still play a vital role in the process.

Future Applications and Ethical Considerations

Ryan also delves into the potential future applications of LLMs in peer review, such as utilizing them to assign papers to reviewers based on their expertise. He emphasizes the necessity for further development of LLMs and the careful consideration of ethical implications in their utilization.

For updates on Ryan's work, listeners can follow him on Google Scholar.

Original Show Notes

In this episode, we are joined by Ryan Liu, a Computer Science graduate of Carnegie Mellon University. Ryan will begin his Ph.D. program at Princeton University this fall. His Ph.D. will focus on the intersection of large language models and how humans think. Ryan joins us to discuss his research titled "ReviewerGPT? An Exploratory Study on Using Large Language Models for Paper Reviewing"

