On temporally sensitive word embeddings for news information retrieval

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 52
  • Download : 0
Word embedding is one of the hot issues in recent natural language processing (NLP) and information retrieval (IR) research because it has a potential to represent text at a semantic level. Current word embedding methods take advantage of term proximity relationships in a large corpus to generate a vector representation of a word in a semantic space. We argue that the semantic relationships among terms should change as time goes by, especially for news IR. With unusual and unprecedented events reported in news articles, for example, the word co-occurrence statistics in the time period covering the events would change non-trivially, affecting the semantic relationships of some words in the embedding space and hence news IR. With a hypothesis that news IR would benefit from changing word embeddings over time, this paper reports our initial investigation along the line. We constructed a news retrieval collection based on mobile search and conducted a retrieval experiment to compare the embeddings constructed from two sets of news articles covering two disjoint time spans. The collection is comprised of 500 most frequent queries and their clicked news articles in July, 2017, provided by Naver Corp. The experimental result shows there is a need for word embeddings to be built in a temporally sensitive way for news IR.
Publisher
CEUR-WS
Issue Date
2018-03-26
Language
English
Citation

2nd International Workshop on Recent Trends in News Information Retrieval, NewsIR 2018, pp.51 - 56

ISSN
1613-0073
URI
http://hdl.handle.net/10203/310847
Appears in Collection
CS-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0