MC2 2018 Lab - Multilingual Cultural Mining and Retrieval

MC2 CLEF Lab is centered on mining the social media sphere surrounding cultural events such as festivals and movies,
It provides access for registered participants to the microbolg collection of the GAFES project funded by the French National Research Agency and lead by the University of Avignon.

Articles les plus récents

Dialect detection in Informal Arabic Text
24 April 2018, by Malek Hajjem
Content Analysis Results: Language identification 2017
15 March 2018, by Malek Hajjem

Results Topics are a random selection of original microblogs posted in June 2016 without external links and with more then 80 characters. Submissions and scores for the two best teams can be found here Syllabs and Lia. The task paper can be found here
@inproceedingsDBLP:conf/clef/ErmakovaMS17, author = Liana Ermakova and Josiane Mothe and Eric SanJuan, title = CLEF 2017 Microblog Cultural Contextualization Content Analysis task (…)
Available ressources Clef 2018: detailed description
14 March 2018, by Malek Hajjem

The festival galleries dataset
A massive collection of microblogs and urls related to culture festivals are provided for registered participants here . In order to deal with such large dataset we propose different format : A CSV format : It is a tab-separated CSV file that could be useful in case of managing dataset via a Mysql database or python programming langague. An XML format for Indri: This format could be smoothly indexed with Indri in case of need. With tweet textual content (…)
Milestones and timetable 2018
21 February 2018, by sanjuan

Registration opens: 8 february 2018 (Task2) Registration closes: 30 April 2018 End Evaluation Cycle: 19 May 2018 Submission of Participant Papers [CEUR-WS]: 31 May 2018 Submission of Lab Overviews [LNCS]: 8 June 2018 Notification of Acceptance Participant Papers [CEUR-WS]: 15 June 2018 Notification of Acceptance Lab Overviews [LNCS]: 15 June 2018 Camera Ready Copy of Lab Overviews [LNCS]: 22 June 2018 Camera Ready Copy of Participant Papers and Extended Lab Overviews [CEUR-WS]: 29 June 2018 (…)
Task objectives and Evaluation process
21 February 2018, by Jean-valère, olivier, sanjuan

Objective
Vodkaster ( http://www.vodkaster.com/ ) is a French social network about movies where participants can share comments about movies under the form of microcritics not longer than a tweet. The main differences are the restricted cultural domain and the form. The objective of the task is for a given movie and microcitic and each language among French, English, Spanish, Portuguese and Arabic to provide a summary of the related microblogs. Microblogs included is a summary should (…)
More about use case, data and evaluation process
9 February 2018, by Chiraz Latiri, Julio Gonzalo, Malek Hajjem

Detailed description
use case
Given, a selected of festivals name from popular festivals on FlickR English and French language, participants have to search for the most argumentative tweets in a collection covering 18 months of news about festivals in different languages. The identified tweets have to be a summary of ranked tweets according to their probability of being argumentative tweets. This use case was proposed to help festival organiser treating such set of tweets on priority. (…)
Towards Argumentative Ranking
8 February 2018, by Chiraz Latiri, Julio Gonzalo, Malek Hajjem

Organizers:
Chiraz Latiri, Julio Gonzalo, Malek Hajjem
Task 2 participation deadline April 30, 2018
Argumentative Ranking of Microblogs
Argumentation mining is a new problem in corpus-based text analysis that addresses the challenging task of automatically identifying the justifications provided by opinion holders for their judgment. Several approaches of argumentation mining have been proposed so far in areas such as legal documents, on-line debates, product reviews, newspaper (…)
TimeLine Illustration based on Microblogs
19 October 2016, by Lorraine, Philippe

This paper by Nayanika DOGRA, Philippe MULHEM, Nawal OULD AMER, and Lorraine GOEURIOT presents the approach used by the LIG-MRIM research group to the participation of the pilot task TimeLine illustration based on Microblogs for the 2016 CLEF Cultural Microblog Contextualization WorkShop that lead to the 2017 lab.
Wikipedia XML corpus for summary generation
18 October 2016, by sanjuan

Wikipedia is under Creative Commons license, and its contents can be used to contextualize tweets or to build complex queries referring to Wikipedia entities.
We have extracted an average of 10 million XML documents from Wikipedia per year since 2012 in the four main twitter languages:- en, es, fr and pt.
These documents reproduce in an easy-to-use XML structure the contents of the main Wikipedia pages: title, abstract, section and subsections as well as Wikipedia internal links. Other (…)
The festival galleries dataset
18 October 2016, by sanjuan

This data set allows to experiment microblog search and stream summarization.
Microblog collection
The document collection is provided to registered participants by ANR GAFES project. It consists in a pool of more than 50M unique micro-blogs from different sources with their meta-information as well as ground truth for the evaluation.
The microblog collection contains a very large pool of public posts on Twitter using the keyword festival since June 2015. These micro-blogs are (…)