MC2 2018 Lab

Multilingual Cultural Mining and Retrieval

Home > Tasks 2017 > 3 - Time Line Illustration > TimeLine illustration of a festival based on Microblogs

CLEF 2016 CMC workshop Piilot Task

TimeLine illustration of a festival based on Microblogs

Tuesday 3 November 2015, by Lorraine, Philippe


The goal of this task is to link the events of a festival program to a related microblog posts. This information is very important for attendees of festivals and for organizers to get feedback.

Microblog posts will be provided with their timestamps, which are crucial as a basis for the requested linking.

Participants will be have to provide a timetable for each event using the 10 best tweets based on their relevance and diversity. In this task, diversity is a must because retrieving several times the same post is not beneficial in our case.


  • Microblogs collection:

We collected all public micro-blog posts from twitter containing the keyword “festival” from June to September 2015 using a private archive service with twitter agreement based on streaming API. The average of unique micro-blog posts (i.e. without retweets) is 2,616,008 per month. The total number of collected posts is 13,167,910 without retweets and 24,228,699 with retweets.
These posts are provided in UTF8 csv format with various fields (tweet id, author name, language, …).
Because of privacy issues, this data cannot be publicly released but can be analyzed inside the organization that purchases these archives and among collaborators under privacy agreement. CLEF 2016 CMC workshop will provide this opportunity to share this data among participants. These archives can be indexed, analyzed and general results acquired from them can be published without restriction.

Participants for this task will be provided with a subset of the microblogs collection, matching the months of targeted festivals (July and December 2015).

  • Festival programme:

Two French music festivals have been selected: the festival des vieilles charrues and the transmusicales de Rennes.
The timelines provided are selected subset of each festival program: the organizers selected a subset of the whole festival program (for each stage and time, list of artists playing).

The participants are free to use any additional data to provide results: social (popularity, …) or not (knowledge bases, …); it should be described in the related paper and specified when submitting the runs.


We have selected 3 events from the festival des vieilles charrues. In the table are given 3 example tweets.

  • 16-juil-15 18:45-19:45 Anna Calvi
  • 16-juil-15 20:10-21:45 Soprano
    • RT @Sopranopsy4: Extraordinaire merci les vieilles es charrues merci la Bretagne!!!!
    • RT @Laura_AnneT: #charrues @soprano dingue surtout avec le maillot psg @MaxLaMendz3 t’es un client @GuillermNicola1 #rienafoutrederien
    • aux vieilles charrues on a tellement bien fait de pas aller voir soprano pour gratter des places pour muse putain
  • 16-juil-15 22:00-23:30 Muse
    • MUSE Festival des Vieilles Charrues 2015 - Carhaix - Live HD via @YouTube
    • RT @Charrues: .@muse retourne littéralement le public de Kerampuilh ! #charrues15 Crédit photo : @PierreHennequin
    • Aux Vieilles Charrues il y avait 1,7% de chance que Muse jouent The Groove. Et ils l’ont fait PUTAIN

Format of the results

The results will be submitted as usual trec_eval top file results. Related to classical trec_eval top files, each event will be associated to one query/topic identifier.
Specify a format, needs to give details re: type of run, resources used, system used…


The evaluation will be carried out on selected parts of the program chosen by the task organizers depending on the number of relevant tweets per event. The evaluation measures planned are recall/precision based. Several types of runs will be proposed: time-only, content-only, time&content.

How to get the data?

To get an access to the tweets, email
The topics (corresponding to the programs) can be downloaded here.

Participants should submit up to 3 runs in the TREC format, named as follows:
One of them should be a baseline. Other runs can use any additional information.

A text file should also describe the runs and give the priority order.

The runs should be submitted by the 31st of May. The submission website is TBD.

Contact Information

If you have any question, email us: and