Open
Description
- uid: malindomorph__morphological_dictionary_and_analyser_for_malay_indonesian
- type: processed
- description:
- name: MALINDOMorph: Morphological dictionary and analyser for Malay/Indonesian
- description: Malay/Indonesian lacked an open wide-coverage dictionary that can be used for both NLP tasks and non-NLP purposes. The MALINDO Morph morphological dictionary is the first such dictionary. It provides morphological information (root, prefix, suffix, circumfix, reduplication) for roughly 232K surface forms. The entry forms are those found in the authoritative dictionaries in Malaysia (Kamus Dewan4) and Indonesia (Kamus Besar Bahasa Indonesia5) (core dictionary) as well as frequent words in the Leipzig Corpora Collection (Goldhahn et al., 2012) (expanded dictionary). The morphological analyses were checked by hand for all surface forms, except for (i) basic and di-forms in the expanded dictionary whose existence is predicted from the corresponding meN-active forms in the core dictionary and (ii) the case variants of the items in the core dictionary. This paper also discusses the morphological analyser that we developed to create our morphological dictionary. Our morphological analyser is more linguistically rigorous than previous morphological analysers and stemmers/lemmatizers such as MorphInd (Larasati et al., 2011) because it takes into account circumfixes, which have previously been neglected, largely due to a misunderstanding among NLP researchers that circumfixes are no more than combinations of a prefix and a suffix.
- homepage: chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/viewer.html?pdfurl=http%3A%2F%2Flrec-conf.org%2Fworkshops%2Flrec2018%2FW29%2Fpdf%2F8_W29.pdf&clen=201938&chunk=true
- validated: True
- languages:
- language_names:
- Indonesian
- language_comments:
- language_locations:
- Asia
- Indonesia
- validated: False
- language_names:
- custodian:
- name: Hiroki Nomoto
- in_catalogue:
- type: A university or research institution
- location: Japan
- contact_name: Hiroki Nomoto
- contact_email: [email protected]
- contact_submitter: False
- additional: chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/viewer.html?pdfurl=http%3A%2F%2Flrec-conf.org%2Fworkshops%2Flrec2018%2FW29%2Fpdf%2F8_W29.pdf&clen=201938&chunk=true
- validated: False
- availability:
- procurement:
- for_download: No - but the current owners/custodians have contact information for data queries
- download_url:
- download_email:
- licensing:
- has_licenses: Yes
- license_text:
- license_properties:
- license_list:
- pii:
- has_pii: Yes
- generic_pii_likely:
- generic_pii_list:
- numeric_pii_likely:
- numeric_pii_list:
- sensitive_pii_likely:
- sensitive_pii_list:
- no_pii_justification_class:
- no_pii_justification_text:
- validated: False
- procurement:
- processed_from_primary:
- from_primary: Taken from primary source
- primary_availability: Yes - their documentation/homepage/description is available
- primary_license: Unclear / I don't know
- primary_types:
- validated: False
- from_primary_entries:
- media:
- category:
- text
- text_format:
- audiovisual_format:
- image_format:
- database_format:
- other
- text_is_transcribed: No
- instance_type:
- instance_count:
- instance_size:
- validated: False
- category:
- fname: malindomorph__morphological_dictionary_and_analyser_for_malay_indonesian.json