BackgroundThe Insomnia Severity Index (ISI) is a widely used questionnaire with seven items for identifying the risk of insomnia disorder. Although the ISI is still short, more shortened versions are emerging for repeated monitoring in routine clinical settings. In this study, we aimed to develop a data-driven shortened version of the ISI that accurately predicts the severity level of insomnia disorder.MethodsWe collected a sample of 800 responses from the EMBRAIN survey system. Based on the responses, seven items were grouped based on the similarity of their response using exploratory factor analysis (EFA). The most representative item within each group was selected by using eXtreme Gradient Boosting (XGBoost).ResultsBased on the selected three key items, maintenance of sleep, interference with daily function, and concerns about sleep problems, we developed a data-driven shortened questionnaire of ISI, ISI-3 m (machine learning). ISI-3 m achieved the highest coefficient of determination (R2=0.910\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${R}<^>{2}=0.910$$\end{document}) for the ISI score prediction task and the accuracy of 0.965, precision of 0.841, and recall of 0.838 for the multiclass-classification task, outperforming four previous versions of the shortened ISI.ConclusionAs ISI-3 m is a highly accurate shortened version of the ISI, it allows clinicians to efficiently screen for insomnia and observe variations in the condition throughout the treatment process. Furthermore, the framework based on the combination of EFA and XGBoost developed in this study can be utilized to develop data-driven shortened versions of the other questionnaires.