The authors propose a heterogeneous floating-point (FP) computing architecture to maximize energy efficiency by separately optimizing exponent processing and mantissa processing. The proposed exponent-computing-in-memory architecture and mantissa-free exponent-computing algorithm reduce the power consumption of both memory and FP MAC while resolving previous FP computing-in-memory processors' limitations. Also, a bfloat16 DNN training processor with proposed features and sparsity exploitation support is implemented and fabricated in 28-nm CMOS technology. It achieves 13.7-TFLOPS/W energy efficiency while supporting FP operations with CIM architecture.