With the omnipresence of computing devices in our daily life, interests in supporting seamless interactions between users and these ubiquitous devices have been growing. In response to these interests, new and quite effective interaction mechanisms which use the spatial information of sound, called spatial sound interaction, have been introduced. As an example, we can provide users an in-situ 3D listening experience by collaboratively using built-in speakers of their own or surrounding devices. It is also possible to support gesture or touch input methods by tracking the location of target objects or events with the use of sound. However, in ubiquitous computing environments, enabling such spatial sound interactions is much challenging, primarily due to a high environment complexity. For example, since users can interact with any type of computing devices anytime and anywhere, sound can be produced and transferred differently depending on the environment. These acoustic changes in ubiquitous environments have made it difficult to utilize existing spatial sound interaction techniques which are commonly designed assuming pre-defined and static environments.
In this dissertation, we address the environment complexity of supporting ubiquitous spatial sound interactions, by 1) virtualizing audio output devices and 2) abstracting audio input data. First, we give spatial sound reproduction applications an illusion that they are running in pre-defined and static environments, even in diverse and dynamically-changing ubiquitous environments. Toward this, we develop both adaptive multi-audio device coordination and real-time audio request scheduling techniques. Thus, these speaker virtualization techniques allow spatial sound reproduction applications to easily provide users spatial impressions without considering the complexity of ubiquitous environments. Second, we propose a novel sound-based touch input method which identifies a user's touch locations on any solid surface robustly against environmental changes. The key enabling idea is to abstract audio input data, i.e., to extract environment-independent features from the input recordings, especially based on understanding dispersion phenomena. Our experiments show that, by using our proposed features, we can support sub-centimeter accuracy in localizing touch inputs even in the presence of environmental changes.