Students and teaching assistants (TAs) in programming courses spend a large amount of time asking and answering questions that require the understanding of code. Models that understand code can help them reduce the time and effort required to answer the questions by identifying relevant snippets in code. I introduce CodeQA, a dataset for machine-in-the-loop question answering (QA) for the programming education domain. CodeQA's tasks, question type classification and code line selection, are designed to aid code-based question answering by providing outputs useful for answering the question, and challenge models to understand both the question and code, a fundamentally different type of text from documents in other QA datasets. CodeQA contains 9,237 question-answer-code triples gathered from chat logs in an introductory programming course in the original language, mostly Korean and the rest in English, and in English translation of the Korean texts. I provide detailed analysis of CodeQA dataset and illustrate the baseline models' behavior through qualitative studies. The relatively low scores of baseline models for CodeQA tasks suggest that the tasks are challenging even for well-performing models for natural language QA.