This group hosts public discussions on Natural Language Processing, including the state-of-the-art technologies & models, benchmark corpora, use cases, industry applications and novel ideas.
It also includes the latest updates on the various AI Singapore’s NLP projects including Southeast Asia CoreNLP, the SG-NLP platform and open-source tools like Golden Retriever and Beagle.
Welcome to the discussion thread for SEACoreNLP. We welcome all discussions related to Natural Language Processing (NLP) for Southeast Asian (SEA) languages.
What is SEACoreNLP?
SEACoreNLP aims to be the central hub for Natural Language Processing (NLP) in Southeast Asia. The raison d’être of SEACoreNLP lies in the fact that many of the languages used in Southeast Asia do not have adequate NLP resources, be it open-source datasets, models or tools. With the growing demand for such capabilities in the industry but no one to supply them, SEACoreNLP hopes to lead the way in spearheading projects and gathering like-minded entities across the region to build a livelier NLP ecosystem for Southeast Asia.
The main languages of Southeast Asia are: Thai, Vietnamese, Malay, Indonesian, Lao, Khmer Burmese, Tagalog and Tetum. Tamil are used in Singapore and Malaysia, and so are English and Chinese. The latter two however are considered high-resource languages, and therefore are not a strong focus of the SEACoreNLP project.
We are trying consolidate existing public NLP resources and tools for Southeast Asian languages. Below are what we have found so far. They are for sure not exhaustive. Do post your reply if you know any Southeast Asian NLP resources that we miss. Thank you.
We are currently working on building open-source datasets not just for the tasks mentioned above but for other tasks such as Coreference Resolution and Semantic Role Labeling which currently do not have any data for Southeast Asian languages. Once we are done, we will be able to train and publish models and add these to our package and demo, so stay tuned for the next release!