Affective interaction between humans and robots/machines is a cherished goal for socially intelligent robots/machines. Ability to recognize human emotional states is an essential prerequisite to such affective interactions. Therefore, this dissertation addresses the issue of human emotion recognition by first individually processing and then aggregating different modes of human communication through a classification and aggregation framework. Specifically, the proposed framework analyzes the speech acoustics, facial expressions, and body language using unimodal emotion classifiers. The speech emotion is classified using a deep neural network (DNN) while Facial and body language emotion classifiers are implemented using classifiers implemented through supervised fuzzy adaptive resonance theory (ARTMAP). The speech emotion classifier uses acoustic features, the facial emotion classifier uses features based on facial animation parameters (FAP), and body language emotion classifier uses head and hands motion capture data to formulate body language features. These unimodal evaluations are then aggregated using a fuzzy integral for interval type-2 fuzzy-valued attributes (FIIFA). FIIFA is proposed in this dissertation as a novel aggregation framework for attribute evaluations with linguistic and numeric uncertainties. Moreover, FIIFA also utilizes reliability based preferences for the unimodal evaluations. The dissertation proposed to generate these reliabilities based preferences from the accuracies of the unimodal classifiers for each emotion. The framework was tested and compared against the existing state-of-the-art. The results show that the proposed framework significantly outperforms the existing techniques. Furthermore, because of late fusion, the functionality of the proposed approach is robust to unavailability all but one of the modes of communication.