Home > Tasks 2017 > 2 - MicroBlog Search > Microlog Data Set
2016 CMC workshop
Microlog Data Set
Monday 2 November 2015, by
The document collection provided by GAFES project consists a pool of more than 70M unique microblogs from different sources with their meta-information and expanded URLs on a MySQL server. Due to legal terms the access to this database is restricted to registered participants under privacy agreement.
Along with the microblog corpus, a clean simplified xml dump of wikipedia easy to index and to process with state of the art NLP tools is made available to participants. Ground truth material is the following: