In response to the recent surge in easily accessible generative AI, Harvey Mudd College has integrated AI-assisted coding into the introductory Computer Science course. In this context a question arises: How do we measure the quality of students’ code when AI generated code is present?
Allowing generative AI to write coding assignments comes with the expectation of improved efficiency and accuracy. While generative AI is a useful tool, it merely supplements fundamental computing skills. This technological step towards being fully syntax free allows for emphasis on the already important skill of developing problem-solving and critical thinking skills in more abstract contexts. In past years, metrics were designed to measure quantitative aspects of code, but these metrics alone are insufficient when evaluating how code written with the assistance of AI will perform in broader applications. When students submit code written with the assistance of generative AI, they are still expected to meet standards given by past metrics, such as Correctness and Complexity. To establish foundational computing skills, students will also be held to new standards and evaluated by new metrics such as Individuality and Ambition.
While the model does give objective measures of the metrics, due to the fast-evolving nature of programming, predefined rules-of-thumb for these metrics are not provided. As users of our system, we recognize that evaluating the measurements will require our judgment, which will evolve over time. This work offers the foundation for that evolution.