The document collection provided by GAFES project consists a pool of more than 70M unique microblogs from different sources with their meta-information and expanded URLs on a MySQL server. Due to legal terms the access to this database is restricted to registered participants under privacy agreement.
Along with the microblog corpus, a clean simplified xml dump of wikipedia easy to index and to process with state of the art NLP tools is made available to participants. Ground truth (…)
Home > Tasks 2017 > 2 - MicroBlog Search
2 - MicroBlog Search
Organizers: University of Avignon, Derby and London Universities
Synopsis
Given a cultural query about festivals in Arabic, English, French or Spanish, search for the 64th most relevant microblogs in a collection covering 18 months of news about festivals in all languages.
Topics
Arabic and English queries are extracted from the Arab Spring Microblog corpus:
Features Extraction To Improve Comparable Tweet Corpora Building by Malek Hajjem and Chiraz Latiri (JADT 2016).
French queries are extracted from the VodKaster Micro Film Reviews:
Contextualisation de messages courts : l’importance des métadonnées by Jean-Valère Cossu, Julien Gaillard, Juan-Manuel Torres-Moreno and Marc El Bèze.
Spanish queries are sentences from the Mexican newspaper La jornada.
Data
A login is required to acces the data, once registered on CLEF each registered team can obtain up to 4 extra individual logins by writing to admin@talne.eu.
- The complete stream of 70 000 000 microblogs is available here for registered participants.
- An indri Index with a web interface are available to query the whole set of microblogs
Submission
Each individual participant can only submit three runs, so up to 15 runs per team. Submissions will be uploaded on a MySQL server through a web interface.
Expected format for each language are one table per run with five fields:
- topic id
- microblog rank between 1 and 64
- microblog id
- microblog author
- microblog content
There is an extra constrain: an author should not appear more than 8 times in a topic.