Synthetic Students: A Comparative Study of Bug Distribution Between Large Language Models and Computing Students (SIGCSE Virtual 2024 - Conference)

Who

Stephen MacNeil, Magdalena Rogalska, Juho Leinonen, Paul Denny, Arto Hellas, Xandria Crosland

Track

SIGCSE Virtual 2024 Conference

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sat 7 Dec 2024 16:07 - 16:30 at Track 1 - Saturday - Papers 1: AI (1)

Abstract

Large language models (LLMs) present an exciting opportunity for generating synthetic classroom data. Such data could include code containing a typical distribution of errors, simulated student behaviour to address the cold start problem when developing education tools, and synthetic user data when access to authentic data is restricted due to privacy reasons. In this research paper, we conduct a comparative study examining the distribution of bugs generated by LLMs in contrast to those produced by computing students. Leveraging data from two previous large-scale analyses of student-generated bugs, we investigate whether LLMs can be coaxed to exhibit bug patterns that are similar to authentic student bugs when prompted to inject errors into code. The results suggest that unguided, LLMs do not generate plausible error distributions, and many of the generated errors are unlikely to be generated by real students. However, with guidance including descriptions of common errors and typical frequencies, LLMs can be shepherded to generate realistic distributions of errors in synthetic code.

Link to Presentation: https://youtu.be/nPr2osrJTV4

Link to Publication

https://dl.acm.org/doi/10.1145/3649165.3690100

Link to Preprint

https://arxiv.org/abs/2410.09193

DOI

https://doi.org/10.1145/3649165.3690100

Stephen MacNeil

Temple University

United States

Magdalena Rogalska

Temple University

United States

Juho Leinonen

Aalto University

Finland

Paul Denny

The University of Auckland

New Zealand

Arto Hellas

Aalto University

Finland

Xandria Crosland

Western Governors University

United States

Synthetic Students: A Comparative Study of Bug Distribution Between Large Language Models and Computing Students

Time Zone

The program is currently displayed in (UTC) Coordinated Universal Time.

Use conference time zone: (UTC) Coordinated Universal TimeSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sat 7 Dec
Displayed time zone: (UTC) Coordinated Universal Time change

15:00 - 16:30	Papers 1: AI (1)Conference at Track 1 - Saturday

15:00 22m Other		Watch Videos Conference
15:22 22m Paper		Integrating AI Tutors in a Programming Course Conference Iris Ma University of California, Irvine, Alberto Krone-Martins University of California, Irvine, Crista Lopes University of California, Irvine Link to publication DOI Pre-print Media Attached
15:45 22m Paper		Integrating Natural Language Prompting Tasks in Introductory Programming Courses Conference Chris Kerslake Simon Fraser University, Paul Denny The University of Auckland, David H. Smith IV University of Illinois at Urbana-Champaign, James Prather Abilene Christian University, Juho Leinonen Aalto University, Andrew Luxton-Reilly The University of Auckland, Stephen MacNeil Temple University Link to publication DOI Pre-print Media Attached
16:07 22m Paper		Synthetic Students: A Comparative Study of Bug Distribution Between Large Language Models and Computing Students Conference Stephen MacNeil Temple University, Magdalena Rogalska Temple University, Juho Leinonen Aalto University, Paul Denny The University of Auckland, Arto Hellas Aalto University, Xandria Crosland Western Governors University Link to publication DOI Pre-print Media Attached

Information for Participants

Sat 7 Dec 2024 15:00 - 16:30 at Track 1 - Saturday - Papers 1: AI (1)

Info for room Track 1 - Saturday:

Track 1 - Saturday December 7th

To access the live meeting for this track, please use the following Zoom link:

https://acm-org.zoom.us/j/99069522006?pwd=Lje2z3fWti91RmkoOlECcShrbOQUPi.1