There is a growing interest in utilizing large language models (LLMs) for various educational applications. Recent studies have focused on the use of LLMs for generating various educational artifacts for programming education, such as programming exercises, model solutions or multiple-choice questions (MCQs). The ability to efficiently and reliably assess the quality of such artifacts has become of paramount importance. In this paper, we investigate an example use case of assessing the quality of programming MCQs. To that end we carefully curated a data set of 192 MCQs annotated with quality scores based on a rubric that assesses crucial aspects such as clarity, the presence of a single correct answer, the quality of distractors, and alignment with learning objectives (LOs). Our results show that the task presents a considerable challenge even to the state-of-the-art LLMs. To further research in this important area we release the data set as well as the evaluation pipeline to the public.