Explainable Multimedia learning-based models

Deep models based on neural networks obtain remarkable results, but these models are “black boxes” that turn out to be enigmas for the people who use them. Two major scientific directions are studied by the MRIM team in this context: either “post-hoc” approaches for which we are interested in explaining the behavior of a pre-existing model, or self-explainable approaches that integrate the explanation into the very heart of the models. The idea defended by the MRIM team is that self-explainable approaches are those that are of the most interest, because the explanation consubstantial to the model is the only one capable of being faithful and precise. Following this approach, innovative work on the unsupervised classification of fine-grained visual classifiers has been proposed and evaluated. This work gave rise to a PhD thesis, funded by the CEA, and to the publication of articles in 3 international conferences. This work was awarded the best scientific paper prize at the CBMI 2023 conference. Another dimension on this research axis studied the indexing and searching of video documents, this time focusing on the causality dimension of neural networks; still in self-explainable approaches. The idea defended here is to determine how dual representations (text/video) can support, by construction, better explanations of causal inference. One of the innovations consisted in constraining the model used (convolutional neural network and LSTM) in order to take into account a certain frugality in the dual representation of videos (visual content on one side, conceptual content on the other), following a study on the dimensionality of dual spaces. This constraint, expressed in the form of a modification of the concept activation functions, made it possible, without any degradation of search results, to go from 1% representation to 32% for simple presentations of explanations (clouds of 10 words): which means that presenting the user with 10 concepts that made it possible to find a video counts for 32% of the total correspondence. We have thus clearly improved the fidelity of the explanation, without degrading the quality of the results.

As part of learning-based models, the MRIM team developed a model dedicated to a museum application, and a second on the analysis of log files:
The MRIM team thus worked on access to information in a mobile situation with museum applications. A neural model was defined and learned to interact through the visitor’s gesture with the museum guide software that he wears (on his chest). A new specialized network was also defined for the rapid recognition of instances of museum artifacts captured by the mobile guide’s camera and allows to deduce its location. This network is characterized by its compactness, making real-time operation possible on a mobile device without a network connection. These innovations were in products by the companies Ophrys and GlobeVIP, partners of the Guimuteic project. The learning collections produced by this project have been made available. As part of our collaboration with Nokia Bell Labs, the MRIM team also worked on the automatic analysis of log files generated by routing tools in communication networks in order to identify characteristic sequences of incidents in order to detect and prevent them before they occur. To this end, we developed original methods for the automatic abstraction of recurring patterns in the form of regular expressions from a limited number of positive examples only.