In the last few years the phenomenon of multilingual information overload has received significant attention due to the huge availability of information coded in many different languages. Recent statistics (http://www.internetlivestats.com/internet-users/#byregion) have reported that non-English contents represent more than half of the information available on the Internet. We have in fact witnessed a growing popularity of tools that are designed for collaboratively editing through contributors across the world, which has led to an increased demand for methods capable of effectively and efficiently searching, retrieving, managing and mining different language-written document collections. The multilingual information overload phenomenon introduces new challenges to modern information retrieval systems. By better searching, indexing, and organizing such rich and heterogeneous information, we can discover and exchange knowledge at a larger world-wide scale.
However, since research on multilingual information is relatively young, important issues still remain, including:
- how to define a translation-independent representation of the documents across many languages;
- whether existing solutions for comparable corpora can be enhanced to generalize to multiple languages without depending on bilingual dictionaries or incurring bias in merging language-specific results;
- how to profitably exploit knowledge bases to enable translation-independent preserving and unveiling of content semantics;
- how to define proper indexing structures and multidimensional data structures to better capture the multi-topic and/or multi-aspect nature of the documents in a multilingual context;
- how to detect duplicate or redundant information among different languages or, conversely, novelty in the produced information;
- how to enrich and update multi-lingual knowledge bases from documents;
- how to exploit multi-lingual knowledge bases for question answering;
- how to extend topic modeling to deal with multi/cross-lingual documents;
- how to evaluate and visualize retrieval and mining results.