Understanding Indonesian: developing a machine-usable grammar, dictionary and corpus

Project Description:

Indonesian is one of the world’s major spoken languages, and is increasingly used on the web. With this increasing availability of digital unstructured language data, language resources are needed for improving machine translation, data mining and other computational linguistic tasks. This project aims to build these resources by carrying out research on Indonesian to create a robust computational grammar, corpus and lexicon (including social variation) within the Pargram framework. Pargram is an international collaborative project to develop computational grammars within a shared linguistic framework based on common linguistic assumptions. The outcomes of this project should lead to a better understanding of Indonesian grammar as well as creating reliable machine-usable language resources.