Machine-readable grammatical resources for Indonesian
This project has produced grammatical resources for Indonesian to guide grammar development for computer-implemented grammars and to establish a standard by which grammar coverage can be measured. The resources consist of a set of 52 machine-readable (plain text) files containing acceptable and unacceptable sentences of Indonesian, their translations, and comments on their grammatical structure. In this, the resource differs from standard grammars and textbooks of Indonesian, which assume that the human reader or learner can fill in a full paradigm on the basis of an abstract description or a few representative examples. Unlike corpora assembled from naturally occurring texts, the files contain unacceptable as well as acceptable examples; including unacceptable examples is crucial in ensuring that grammars produce only well-formed analyses, and do not accept ungrammatical input.
Our project connects with the project "Understanding Indonesian: developing a machine-usable grammar, dictionary and corpus", based at the Australian National University and funded by the Australian Research Council, with which PI Dalrymple is associated as a partner investigator. The Australian project has produced a broad-coverage grammar, lexicon, and balanced corpus of Indonesian as a part of the Parallel Grammar Project (PARGRAM), an international consortium of academic and commercial research institutions to develop computational grammars and lexicons within the shared linguistic framework of Lexical Functional Grammar (LFG). The testsuites have been essential to their work in guiding the development of the grammar, ensuring coverage of less common as well as of basic constructions, testing the full paradigm of constructions and their interactions, and testing the "tightness" of the grammar in excluding impossible analyses as well as producing well-formed analyses for the constructions under examination. Feedback from the "Understanding Indonesian" project has guided development of the testsuites and ensured full coverage and comprehensiveness.