We present a new unified training scheme using a feature extractor and HMM classifiers for better speech recognition performance. Both feature extractor and classifier are trained simultaneously to minimize classification error. Multiframe features are extracted using spectro-temporal dynamics and the feature extractor is implemented as a multilayer network, which is trained by a backpropagation (BP) algorithm with the help of an HMM inversion algorithm. The initial parameter values of the feature extractor are set for Mel-frequency cepstral coefficients (MFCC) as well as their delta and acceleration components. The experiments for phoneme classification demonstrate the practicality of unified training.