SB-CH: Swiss German Sentiment Corpus
SB-CH is a publicly available corpus that contains 165’916 German sentences, of which 2799 are labeled by 5 annotators with “positive”, “negative”, “neutral”, “mixed”, or “unknown”. It was created by SpinningBytes in collaboration with the Zurich University of Applied Sciences (ZHAW).
All data is provided under Creative Commons License CC BY 4.0.
This means that they are free to use and distribute, even commercially, as long as appropriate credit to the reference below is given.
Human-readable format: Link
Licence Contract: Link
If you use the corpus, please make sure to reference the following publication:
- Towards a Corpus of Swiss German Annotated with Sentiment. by Ralf Grubenmann, Don Tuggener, Pius von Däniken, Jan Deriu, Mark Cieliebak. In “Proceedings of the 11th Language Resources and Evaluation Conference (LREC), 2018 (to appear)”
A detailed description of the corpus and how it was constructed can be found in the reference above, as well as the README file contained in the corpus.
In order to use the corpus, download the annotations below. Since Facebook does not allow to distribute the content of posts, the dataset only contains comment ID’s and the corresponding annotations for Facebook posts. A download script is provided, simply follow the Readme on the linked page.