Due to the increasing demand for 3D applications, development of novel depth-sensing cameras is being actively pursued. However, most of these cameras still face the challenge of high energy consumption and slow speed in the depth extraction process. This becomes a serious bottleneck in embedded implementations where real-time performance is required, constrained by power and area. This work proposes Offset Aperture (OA) camera, a new hardware architecture for fast, low-energy, and low-complexity depth extraction. Optimal implementations of pre-processing, cost-volume generation and cost-aggregation are presented. The whole depth-extraction pipeline has been implemented on a Field Programmable Gate Array (FPGA). Overall, a mere 2.8% of bad classification was achieved with the proposed system. Also, the proposed system can process 37 VGA frames per second while consuming 0.224 μJ/pixel. High accuracy, speed and low energy consumption of the proposed OA architecture make it suitable for embedded applications.