Transparency and explainability
Transparency and explainability are two very important elements related to information access and artificial learning. When we focus on the tranparency of information retrieval systems during their design, axiomatic constrains may be used. Such aspects were studied for personalization purposes and for image search.
Others directions studied in the team are related to the transparency of classical web search engines “by testing”, i.e., without any knowledge of their internal features. We propose a global framework (the first one to our knowledge) to define and run experiments on “non-cooperative” search engines. It defines appropriate metrics for evaluating dissimilarities between search results, and protocols for evaluating the dependence of returned results on various parameters through constraints on test queries. A first preliminary experiment was awarded the best paper in the french conference CORIA 2019, and larger experiments are currently run.
A second research direction is dedicated to the explainability of by convolutional neural networks classification results. Our current focus are explanations through the use of data from the training set, knowing that this set is not always available. This work is achieved through several M2R work in the context of LIG emergence projects.
Social Networks and Personalization
The research work dedicated to social networks and personalization aims at defining methods and models integrating information coming from the user (or his interaction), in a way to enhance the quality of the retrieval. Our work is based on parsimonious information retrieval models: such probabilistic language models (PLM) seek to build compact and precise term distributions by eliminating stop words and non-essential terms. PLM were successfully applied for relevance feedback to capture relevant terms from feedback document to expand a query. Our approach consists in expanding such models in a way to support one important kind of social networks, namely the tagging systems that allow users to assign tags to documents (like web pages), in a way to generate a tag-based personalized parsimonious information retrieval model, PTPLM. Because personalization is used for expanding queries using terms of documents, one part of our extension integrates in the optimization process of PLM the links between user’s tags and documents terms through the use of word embeddings. The results show that out proposal outperforms state of the art. This work has been mainly supported by a regional PhD grant (Nawal Ould-Amer, on-going, RESPIR project).
Information retrieval models adapted to social media, and in particular microblogs, has been explored in several directions. The first one studies the applicability of integration of classical IR techniques with user-user relationships during the CLEF Social Book Search evaluation campaigns. A second direction is explored with the team AMA (Massih-Reza Amini) within an LIG Emergence project. Work carried out within this project has been published CORIA 2018.
Mining micro-blogs requires to handle properly noisy and duplicated data. Handling noise and detecting online e-activism has been explored as part of a collaboration with researchers from PACTE laboratory (laboratoire de sciences sociales).
IR in Under-Resourced Languages
Cross-lingual information retrieval (CLIR) consists in querying in a given language a system in which documents are in another language. Dealing with several languages requires an adaptation of information retrieval models: either by translating queries or documents to get back to a monolingual context, or through multilingual matching models. Our objective here is to focus on low-resourced languages, i.e., languages for which little or no linguistic knowledge and resources are available. As an illustration, a selection of use cases have been published.
In this context, solutions to CLIR are constrained by the limited resources available: machine translation tools, dictionaries or aligned corpora cannot be used. Recent work has shown that multilingual word embeddings could be build almost without any training data. We currently explore how these multilingual embedding spaces can be integrated in an information retrieval model. This work is part of Seydou Dombia PhD work, but also a collaboration with GETALP (Laurent Besacier, Didier Schwab) supported by LIG through 2 Emergence projects.