BRCC and SentiBahasaRojak

  • BRCC: Applying Bahasa Rojak's data augmentation algorithm to Malay Wikipedia corpus.
  • SemEval-2017 task 5 subtask1: Malay version of SemEval-2017 task 5 subtask1 constructed by human translation.
  • SentiBahasaRojak: A Bahasa Rojak sentiment analysis dataset. For product and movie reviews, we applied Bahasa Rojak's data augmentation algorithm to the Malay datasets. For stock reviews, we scraped from stock forums and hired 5 experts to label them.

Data and Resources


Wikidata Keywords

  • Q3201279
  • Q1172284
  • Q2271421

Basic Information

Data Type Archived data

Management Information

Creator Intelligent Information Service Research Lab, National Central University, Taiwan