Real-time augmented reality (AR) is actively studied for the future user interface and experience in high-performance head-mounted display (HMD) systems. The small battery size and limited computing power of the current HMD, however, fail to implement the real-time markerless AR in the HMD. In this paper, we propose a real-time and low-power AR processor for advanced 3D-AR HMD applications. For the high throughput, the processor adopts task-level pipelined SIMD-PE clusters and a congestion-aware network-on-chip (NoC). Both of these two features exploit the high data-level parallelism (DLP) and task-level parallelism (TLP) with the pipelined multicore architecture. For the low power consumption, it employs a vocabulary forest accelerator and a mixed-mode support vector machine (SVM)-based DVFS control to reduce unnecessary external memory accesses and core activation. The proposed 4 mm 8 mm HMD AR processor is fabricated using 65 nm CMOS technology for a battery-powered HMD platform with real-time AR operation. It consumes 381 mW average power and 778 mW peak power at 250 MHz operating frequency and 1.2 V supply voltage. It achieves 1.22 TOPS peak performance and 1.57 TOPS/W energy efficiency, which are, respectively, and higher than the state of the art.