To secure numerous Internet of Things (IoT) devices globally, conducting a large-scale vulnerability analysis is essential. However, developing a scalable analysis approach that is applicable to various devices is not straightforward because 1) IoT devices have a wide variety of hardware configurations, implementations, and execution environments, and 2) their vendors often withhold information about their products. To address the scalability issue, several studies have attempted to analyze device firmware rather than physical devices. However, these approaches are currently limited to a few simple/small devices, resulting in low analysis success rates.
In this thesis, we present a practical approach towards scalable vulnerability analysis of IoT devices. We began by conducting an empirical analysis of various IoT devices and discovered that many of them share a common codebase. We leveraged this similarity to develop several heuristics that enable successful firmware emulation and firmware structure analysis, which are essential for vulnerability analysis. Using these heuristics, we discovered 23 0-day vulnerabilities in wireless routers and IP cameras, as well as three 0-days in smartphone baseband devices.
Following that, we present another approach that extends the vulnerability analysis by utilizing binary code similarity analysis (BCSA). There have been several BCSA approaches, but none are easily applicable because they often 1) do not share their source code or datasets and 2) employ uninterpretable machine learning techniques that make the results difficult to comprehend. To address this, we first conducted a comprehensive study of existing BCSA techniques, which revealed several insights. For instance, a simple model with a few basic features can achieve results comparable to those obtained using deep learning techniques. Based on the findings, we developed a BCSA framework and two heuristic features. We demonstrated our system’s effectiveness by analyzing over 53M functions in 1,142 IoT firmware images and successfully identifying 442 vulnerabilities. We make our source code and datasets publicly available to encourage further research.