Recent achievements in biological experiment methods allow us to deal with a genomescale biological data. In many cases, these large-scale biological data from high-throughput experimental methods can be represented as graphs. The goal of an emerging field in biology, or systems biology, is to analyze, to extract knowledge for and to get an insight about biological systems from the analyses of these high-throughput experimental data.
An example of such data can be found in case of protein interaction networks, whose nodes denote proteins and edges denote protein interactions. Since protein interactions are closely related to involved biological processes of the proteins, several biological problems on proteins, such as the protein function annotation problem, have successfully been addressed with respect to protein interaction networks. It is expected that more detailed investigations on these networks will help us to understand biological systems better.
One of the methods that extract key features from a large-scale network is the recently proposed network motif analysis. Here we get a set of subgraphs which appear more frequently in the real network than in randomized networks. These subgraphs retain the characteristics of the original network. It was proven to be useful in analyzing biological networks such as protein interaction networks or gene regulation networks. However, it has been applied only to an unlabeled graph. In this thesis, we extend the idea of network motif to a labeled graph which works as a more powerful and informative representation than an unlabeled graph. In this extension, what is important is the randomized network model: If we use a degree invariant model for labeled graphs directly, node characteristics may vary and hence undesirable results can be obtained. We overcome this problem by generalizing the degree invariant model to a label specific degree invariant model.
The proposed network motif analysis method for labeled g...