Evaluation and Corpora

Here are the list of corpora built by the MRIM team that are accessible for the community.

– LongEval (Longitudinal evaluation of Web search)

For the 2023 edition of LongEval, we provide access to 3 epoch (one for training and reference test) 2 for testing: link. More details about the collection has been published at SIGIR 2023: P. Galuscakova, R. Deveaud, G. Gonzalez-Saez, P. Mulhem, L. Goeuriot, F. Piroi, M. Popel: LongEval-Retrieval: French-English Dynamic Test Collection for Continuous Web Search Evaluation.

For the 2024 edition 2024, we provide one epoch for training and two for testing: link. If you use this data please cite the ECIR paper *REF* or the CLEF overview paper *REF*.

– CLEF eHEalth

These evaluation campaigns are dedicated to evaluate IR systems in the context of medical data (link).

– TRECVid

These evaluation campaigns focus on multimedia/video+text retrieval. TRECVid Semantic Indexing and High Level Feature Extraction (link)

(HLF and SIN task training annotations from 2007 to 2015).

If you use the TRECVid SIN or HLF annotations, please cite :

Stéphane Ayache and Georges Quénot, “Video Corpus Annotation using Active Learning”, 30th European Conference on Information Retrieval (ECIR’08), Glasgow, Scotland, 30th March – 3rd April, 2008.

The pdf of the paper is here.

– Corpora built during the GUIMUTEIC project

If you use the Clicide of GaRoFou collections, please cite this work as:

Maxime Portaz, Johann Poignant, Mateusz Budnik, Philippe Mulhem, Jean-Pierre Chevallet, Lorraine Goeuriot, Construction et évaluation d’un corpus pour la recherche d’instances d’images muséales. CORIA 2017 – Conférence en Recherche d’Informations et Applications- 14th French Information Retrieval Conference. Marseille, France, March 29-31, 2017, pp. 17-34.

The pdf of the paper is here.