Interpretation of natural language queries for effective data exploration over heterogeneous databases: applications to biomedical domain이질적인 데이터베이스에서의 효과적 데이터 탐색을 위한 자연언어질의 해석: 생물의료 분야에의 적용

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 402
  • Download : 0
Data exploration is an essential process for discovering novel knowledge in scientific researches. However, it is difficult for field experts to find out the target data only by exploration, especially when the data are scattered over multiple and heterogeneous databases. Since such data are usually associated with one another, there may be appropriate sequences of searches that the field experts can use for queries to reach the target data. In order to help such data exploration, conventional database interfaces provide useful tools for querying in keywords or structured forms. However, we argue that they are inadequate to express the queries for sequences of searches in multiple databases which embody diverse relations among their data. In order to describe such queries in a convenient and expressive manner, we propose to use natural language queries (NLQs) to interact with the databases. Such a database interface shall automatically interpret NLQs into formal language queries (FLQs) that are in turn composed of small FLQs for different databases. This task requires us to address the problem of database heterogeneity due to the differences in formal query languages, database structures, and data contents. The dissertation addresses this problem by considering NLQs as terms and syntactic relations, which respectively correspond to data objects and their operations. We utilize SQL-like expressions to coordinate such terms and syntactic relations, resulting in FLQs via a straightforward mapping process. In this work, we present a method that derives the SQL-like expressions from NLQs in a Combinatory Categorial Grammar (CCG) framework, and then translates the expressions into the locations of data objects accessible from our target databases. The method then constructs FLQs for such locations in possible sequences with accounts for data associations. Our method thus provides a fully automated way to locate and retrieve available data from databases. We also...
Advisors
Park, Jong-C.researcher박종철researcher
Description
한국과학기술원 : 전산학전공,
Publisher
한국과학기술원
Issue Date
2008
Identifier
304915/325007  / 000995309
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학전공, 2008. 8. , [ ix, 136 p. ]

Keywords

Natural Language Processing; Natural Language Interface; Natural Lanaguage Query; Bioinformatics; Text Mining; 자연언어처리; 자연언어인터페이스; 자연언어질의; 생물정보학; 텍스트 마이닝; Natural Language Processing; Natural Language Interface; Natural Lanaguage Query; Bioinformatics; Text Mining; 자연언어처리; 자연언어인터페이스; 자연언어질의; 생물정보학; 텍스트 마이닝

URI
http://hdl.handle.net/10203/33261
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=304915&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0