Voice activity detection (VAD) is a key technique in numerous speech-related application such as speech recognition, speech enhancement and speech coding. In these applications, VAD discriminates the speech from the incoming signal, so that subsequent process steps can aim to speech signal rather than silence or noise. Therefore, VAD must have a robust accuracy in severe, various noise environment. Furthermore, VAD should have a low complexity to be adapted in real-time applications. The most important thing to construct the robust VAD is the feature that system found from the speech signal. Thus, the VAD design procedure can be mapped to feature extraction problem from speech signal. In this paper, we proposed two-direction to extract the robust feature from speech signal. First, unsupervised learning based feature that used the intrinsic harmonicity in the vowel sound. In this procedure, the new approach is proposed to verify the harmonicity and it was applied to VAD system. Our experiments show that the computation cost was extraordinarily reduced compared to previ-ous harmonicity based approach even though the accuracy is slightly improved in severe noise environment. Second, supervised learning based feature which use the discriminative pre-training (DPT). In this approach, we assume that various speech-related features have dissimilar robustness according to different noise types so that, if we fuse these features well, the fused one become a robust feature regardless of the noise type. In order to veri-fy this assumption, well-known speech-related features are fused by DPT. The training step was conducted with various SNR and noise type signal different from previous approach. The result show that the accuracy was out-standing compared to other state-of-the-art approaches.