Evaluation and Corpora

Here are the list of corpora built by the MRIM team that are accessible for the community.

– LongEval (Longitudinal evaluation of Web search)

For the 2023 edition of LongEval, we provide access to 3 epoch (one for training and reference test) 2 for testing: link. More details about the collection has been published at SIGIR 2023: P. Galuscakova, R. Deveaud, G. Gonzalez-Saez, P. Mulhem, L. Goeuriot, F. Piroi, M. Popel: LongEval-Retrieval: French-English Dynamic Test Collection for Continuous Web Search Evaluation.

For the 2024 edition 2024, we provide one epoch for training and two for testing: link. If you use this data please cite the ECIR paper *REF* or the CLEF overview paper *REF*.

– CLEF eHEalth

These evaluation campaigns are dedicated to evaluate IR systems in the context of medical data (link).

– TRECVid

These evaluation campaigns focus on multimedia/video+text retrieval. TRECVid Semantic Indexing and High Level Feature Extraction (link)

(HLF and SIN task training annotations from 2007 to 2015).

If you use the TRECVid SIN or HLF annotations, please cite :

Stéphane Ayache and Georges Quénot, “Video Corpus Annotation using Active Learning”, 30th European Conference on Information Retrieval (ECIR’08), Glasgow, Scotland, 30th March – 3rd April, 2008.

The pdf of the paper is here.

– Corpora built during the GUIMUTEIC project

Clicide : collection if still images (paintings) form the Grenoble Museum of Art (3425 images, 473 objects, 177 queries (143 objects)).
- Corpus (gzipped tar file)
- Queries (gzipped tar file)
GaRoFou : Cultural heritage collection (sculptures, steles, small artifacts) from the Museun of Fourvière in Lyon
- Image Corpus : 1252 images, 311 objects. (gzipped tar file)
- Image queries : 184 images from 166 objects. (gzipped tar file)
- Video corpus : 3 hours of video, (96 minutes with objects).
  - Video files : u1_A.mp4, u1_A.mp4, u1_D.mp4, u2_A.mp4, u3_A.mp4, u3_B.mp4, u3_C.mp4, u3_D.mp4, u4_A.mp4, u5_B.mp4, u5_C.mp4, u5_D.mp4.
  - Images from videos : Extraction program extract_image_from_video.py, image corpus extracted (test/train using cross validation) ImageFromVideo.tar.gz

If you use the Clicide of GaRoFou collections, please cite this work as:

Maxime Portaz, Johann Poignant, Mateusz Budnik, Philippe Mulhem, Jean-Pierre Chevallet, Lorraine Goeuriot, Construction et évaluation d’un corpus pour la recherche d’instances d’images muséales. CORIA 2017 – Conférence en Recherche d’Informations et Applications- 14th French Information Retrieval Conference. Marseille, France, March 29-31, 2017, pp. 17-34.

The pdf of the paper is here.