In this work, we worked on different major categories of IR models: formal logical IR model, and IR models associated with ontologies or word embeddings. These models can be general or personalized. This work often gave rise to contextualizations on social networks, on medical data.
The MRIM team also worked on personalized information search models in the context of social networks of bookmarks such as Delicious. The proposed probabilistic models, based on sparse language models, made it possible to model users along two axes, by jointly learning their profile according to the tags and documents marked by these tags. This work was carried out by a PhD thesis funded by the Rhône Alpes region, defended in 2020.
The work of the MRIM team on logical models has made it possible to establish a formal comparison framework between existing logical models of information retrieval (propositional logic, modal logic, description logic, etc.), with recognition of this work by an article published in ACM Computing Survey.
The MRIM team proposed to integrate elements taking into account ontologies into information retrieval based on neural networks. This research direction aims to reconcile “continuous knowledge” learned automatically by learning (vectors of real numbers) with “discrete knowledge” in the form of concept graphs. Our solution proposes a double representation: one coming from learning (word embedding), the other deduced from the ontology (concept embedding). This solution has shown superior results to a single-vector embedding representation in the case of medical documents where a very detailed ontology is available (UMLS). The MRIM team has worked on another direction: automatically learning the weighting of indexing terms by neural networks. We have then defined a framework to redefine one of the bases of information retrieval: the calculation of the importance of indexing terms (i.e. weighting). This advance was recognized by a PhD thesis and by a publication in the SIGIR 2020 conference.
The work of the MRIM team has explored the integration of knowledge in information retrieval models based on neural networks. The knowledge explored concerns the logical structure of documents, for example technical documents, as well as the domain concepts used in such documents. Our innovative proposal was based on an integration of passage embeddings from attention models such as BERT, and attention networks on graphs. The representations drawn from these attentiels have made it possible to improve the performance of passage search in technical documentation. This work was presented at ECIR 2022 and published in Information retrieval Journal 2023.
Another part of our work on language models for information retrieval also focuses on the ability of machine learning models to assist users. The proposed approaches are based on hybrid models, mixing regression models with encoder language models in order to predict different variables. The experiments are carried out for health professionals in general practice, as part of a company funded (CIFRE) PhD thesis.