MC2 2018 Lab

Multilingual Cultural Mining and Retrieval

Home > Tasks 2017 > 2 - MicroBlog Search

2 - MicroBlog Search

Organizers: University of Avignon, Derby and London Universities


Given a cultural query about festivals in Arabic, English, French or Spanish, search for the 64th most relevant microblogs in a collection covering 18 months of news about festivals in all languages.


Arabic and English queries are extracted from the Arab Spring Microblog corpus:
Features Extraction To Improve Comparable Tweet Corpora Building by Malek Hajjem and Chiraz Latiri (JADT 2016).

French queries are extracted from the VodKaster Micro Film Reviews:
Contextualisation de messages courts : l’importance des métadonnées by Jean-Valère Cossu, Julien Gaillard, Juan-Manuel Torres-Moreno and Marc El Bèze.

Spanish queries are sentences from the Mexican newspaper La jornada.


A login is required to acces the data, once registered on CLEF each registered team can obtain up to 4 extra individual logins by writing to


Each individual participant can only submit three runs, so up to 15 runs per team. Submissions will be uploaded on a MySQL server through a web interface.

Expected format for each language are one table per run with five fields:

  1. topic id
  2. microblog rank between 1 and 64
  3. microblog id
  4. microblog author
  5. microblog content

There is an extra constrain: an author should not appear more than 8 times in a topic.