Auditory-verbal interactions with in-vehicle information systems have become increasingly popular for improving driver safety because they obviate the need for distractive visual-manual operations. This opens up new possibilities for enabling proactive auditory-verbal services where intelligent agents proactively provide contextualized recommendations and interactive decision-making. However, prior studies have warned that such interactions may consume considerable attentional resources, thus negatively affecting driving performance. This work aims to develop a machine learning model that can find opportune moments for the driver to engage in proactive auditory-verbal tasks by using the vehicle and environment sensor data. Given that there is a lack of definition about what constitutes interruptibility for auditory-verbal tasks, we first define interruptible moments by considering multiple dimensions and then iteratively develop the experimental framework through an extensive literature review and four pilot studies. We integrate our framework into OsmAnd, an open-source navigation service, and perform a real-road field study with 29 drivers to collect sensor data and user responses. Our machine learning analysis shows that opportune moments for interruption can be conservatively inferred with an accuracy of 0.74. We discuss how our experimental framework and machine learning models can be used to design intelligent auditory-verbal services in practical deployment contexts.