|
 |
Paper: |
Using Large Language Models to Assess Student Writing Assignments |
Volume: |
539, ASP 2024: Astronomy Across the Spectrum |
Page: |
165 |
Authors: |
Impey, C.; Wenger, M.; Golchin, S.; Garuda, N.; Stamer, S.; Buxner, S. |
Abstract: |
Assessing writing in large classes for formal or informal learners presents a significant challenge. Consequently, most large classes, particularly in science, rely on objective assessment tools such as multiple-choice quizzes, which have a single correct answer. The rapid development of AI has introduced the possibility of using large language models (LLMs) to evaluate student writing. An experiment was conducted using two of OpenAI's GPT (Generative Pre-trained Transformer) LLMs, GPT-3.5 and GPT-4, to determine if machine learning methods based on LLMs can match or exceed the reliability and automation of peer grading in evaluating short writing assignments on topics in astronomy. The audience consisted of adult learners in three massive open online courses (MOOCs) offered through Coursera. The LLMs were provided with total grades, model answers, and rubrics from an instructor for all three courses. The LLMs were more reliable than peer grading, both in aggregate and by individual student, and they approximately matched instructor grades for all three online courses. GPT-4 outperformed GPT-3.5. The implication is that LLMs can be used for automated, reliable, and scalable grading of student science writing. |
|
 |
|
|