MC2 2018 Lab

Multilingual Cultural Mining and Retrieval

Home > Tasks 2017 > 2 - MicroBlog Search > Microlog Data Set

2016 CMC workshop

Microlog Data Set

Monday 2 November 2015, by Eric SanJuan

The document collection provided by GAFES project consists a pool of more than 70M unique microblogs from different sources with their meta-information and expanded URLs on a MySQL server. Due to legal terms the access to this database is restricted to registered participants under privacy agreement.

Along with the microblog corpus, a clean simplified xml dump of wikipedia easy to index and to process with state of the art NLP tools is made available to participants. Ground truth material is the following: