Online education has become more important due to COVID. However, there is a gap between lecturers and students in online learning: lecturers demand to know students' attentional states; however, online setting limits observing the entire class' attention. Moreover, existing attentional state prediction methods utilize specialized sensors such as eye trackers, which are not readily deployable in real-world settings.
To solve the problem, we utilize facial recordings from student webcams for online learners' attentional state prediction. By the experiment in the wild with 37~participants, we end up with a dataset consisting of a total of 15~hours of facial recordings with corresponding 1,100~attentional state probings. We present $\textsc{Pafe}$ (Predicting Attention with Facial Expression), a facial-video-based framework for attentional state prediction that focuses on the vision-based representation of traditional physiological mind-wandering features related to partial drowsiness, emotion, and gaze. Based on $\textsc{Pafe}$, we present the end-to-end visualization system providing the attentional state of students.